Software Engineering Research, Management and Applications [1st ed. 2020] 978-3-030-24343-2, 978-3-030-24344-9

This edited book presents the scientific outcomes of the 17th International Conference on Software Engineering, Artifici

659 97 6MB

English Pages XIII, 221 [224] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Software Engineering Research, Management and Applications [1st ed. 2020]
 978-3-030-24343-2, 978-3-030-24344-9

Table of contents :
Front Matter ....Pages i-xiii
A QoS Aware Uplink Scheduler for IoT in Emergency Over LTE/LTE-A Networks (Jin Zhang, Yalong Wu, Wei Yu, Chao Lu)....Pages 1-22
Emulation-Based Performance Evaluation of the Delay Tolerant Networking (DTN) in Dynamic Network Topologies (Weichao Gao, Hengshuo Liang, James Nguyen, Fan Liang, Wei Yu, Chao Lu et al.)....Pages 23-41
Teaching Distributed Software Architecture by Building an Industrial Level E-Commerce Application (Bingyang Wei, Yihao Li, Lin Deng, Nicholas Visalli)....Pages 43-54
KNN-Based Overlapping Samples Filter Approach for Classification of Imbalanced Data (Mar Mar Nwe, Khin Thidar Lynn)....Pages 55-73
Spectrum-Based Bug Localization of Real-World Java Bugs (Cherry Oo, Hnin Min Oo)....Pages 75-89
The Role of Unconscious Bias in Software Project Failures (C. J. B. Macnab, Sam Doctolero)....Pages 91-116
Analysis of Missing Data Using Matrix-Characterized Approximations (Thin Thin Soe, Myat Myat Min)....Pages 117-129
Expansion of Cyber Attack Data from Unbalanced Datasets Using Generative Adversarial Networks (Tim Merino, Matt Stillwell, Mark Steele, Max Coplan, Jon Patton, Alexander Stoyanov et al.)....Pages 131-145
Beyond the Hawthorne Research: Relationship Between IT Company Employees’ Perceived Physical Work Environment and Creative Behavior (Jin-Hua Zhang, Jun-Ho Lee)....Pages 147-159
Structural Relationship Data Analysis Between Relational Variables and Benefit Sharing: Moderating Effect of Transaction-Specific Investment (Hae-Soo Pyun)....Pages 161-173
Fusion of Log-Mel Spectrogram and GLCM Feature in Acoustic Scene Classification (Mie Mie Oo, Lwin Lwin Oo)....Pages 175-187
Improvement on Security of SMS Verification Codes (Shushan Zhao)....Pages 189-203
An Alternative Development for RCANE Platform (Toan Van Nguyen, Geunwoong Ryu)....Pages 205-219
Back Matter ....Pages 221-221

Citation preview

Studies in Computational Intelligence 845

Roger Lee Editor

Software Engineering Research, Management and Applications

Studies in Computational Intelligence Volume 845

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. The books of this series are submitted to indexing to Web of Science, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.

More information about this series at http://www.springer.com/series/7092

Roger Lee Editor

Software Engineering Research, Management and Applications

123

Editor Roger Lee Software Engineering and Information Technology Institute Central Michigan University Mount Pleasant, MI, USA

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-24343-2 ISBN 978-3-030-24344-9 (eBook) https://doi.org/10.1007/978-3-030-24344-9 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

The purpose of the 17th International Conference on Software Engineering, Artificial Intelligence Research, Management and Applications (SERA 2019) held on May 29–31, 2019 at Honolulu, Hawaii is aimed at bringing together scientists, engineers, computer users, and students to share their experiences and exchange new ideas and research results about all aspects (theory, applications, and tools) of Software Engineering Research, Management and Applications, and to discuss the practical challenges encountered along the way and the solutions adopted to solve them. The conference organizers selected the best 13 papers from those papers accepted for presentation at the conference in order to publish them in this volume. The papers were chosen based on review scored submitted by members of the program committee and underwent further rigorous rounds of review. In chapter “A QoS Aware Uplink Scheduler for IoT in Emergency Over LTE/ LTE-A Networks”, Jin Zhang, Yalong Wu, Wei Yu, and Chao Lu propose a Quality of Service (QoS)-aware Normal Round Robin Uplink Scheduler (QNRR-US) over Long-Term Evolution (LTE)/LTE-Advance (LTE-A) networks, to efficiently allocate network resources with the rapidly growing connected IoT devices. In chapter “Emulation-Based Performance Evaluation of the Delay Tolerant Networking (DTN) in Dynamic Network Topologies”, Weichao Gao, Hengshuo Liang, James Nguyen, Fan Liang, Wei Yu, Chao Lu, and Mont Orpilla addresses the performance issue of DTN in dynamic networks by conducting a series of emulation-based experiments. Based on the experimental results, they are able to provide general guidelines to evaluate an application for the potential benefits of DTN, and the direction of optimizing the configuration for different applications in dynamic networks. In chapter “Teaching Distributed Software Architecture by Building an Industrial Level E-Commerce Application”, Bingyang Wei, Yihao Li, Lin Deng, and Nicholas Visalli propose a Project-Based Learning experience, which brings an open-source full-fledged system to the classroom in order to effectively teach distributed software architecture.

v

vi

Foreword

In chapter “KNN-Based Overlapping Samples Filter Approach for Classification of Imbalanced Data”, Mar Mar Nwe and Khin Thidar Lynn propose an effective under-sampling method for the classification of imbalanced and overlapping data by using KNN-based overlapping samples filter approach. This paper summarizes the performance analysis of three ensemble-based learning classifiers for the proposed method. In chapter “Spectrum-Based Bug Localization of Real-World Java Bugs”, Cherry Oo and Hnin Min Oo propose an automated bug localization technique that allows a programmer to be monitored up to the location of the error with little human arbitration. They used the real-world Apache Commons Math and Apache Commons Lang Java projects to examine the accuracy using spectrum-based bug localization metric. In chapter “The Role of Unconscious Bias in Software Project Failures”, Chris Macnab and Sam Doctolero, address the alarming rate of Failures of The propose a quantitative model of personalities that makes simple, testable predictions about biases. They analyze the main biases of, in particular, software managers and how their biases contribute to software failures. In chapter “Analysis of Missing Data Using Matrix-Characterized Approximations”, Thin Thin Soe and Myat Myat Min discuss the issues of veracity related to data quality such as incomplete, inconsistent, vagueor noisy data that creates a major challenge to data mining and data analysis. They present a rough set-based matrix-represented approximations to compute lower and upper approximations. The experimental results show that the system outperforms when missing data are characterized as “do not care” conditions than represented as lost values. In chapter “Expansion of Cyber Attack Data from Unbalanced Datasets Using Generative Adversarial Networks”, Tim Merino, Matt Stillwell, Mark Steele, Max Coplan, Jon Patton, Alexander Stoyanov, and Lin Deng explore using Generative Adversarial Networks (GANs) to improve the training and ultimately, performance of cyberattack detection systems. They determine the feasibility of generating cyberattack data from existing cyberattack datasets with the goal of balancing those datasets with generated data. In chapter “Fusion of Log-Mel Spectrogram and GLCM Feature in Acoustic Scene Classification”, Mie Mie Oo and Lwin Lwin Oo address the important problem of Acoustic Scene Classification (ASC). The purpose of the paper is to extract the effective feature from the combination of signal processing approach and image processing approach. The purpose of this feature is to reduce computational time for classification. In chapter “Improvement on Security of SMS Verification Codes”, Shushan Zhao brings up the idea that requires the SMS verification code be sent to not only the exact phone number but also the exact phone of the registered user and be used to generate a One-time Passcode (OTP). They propose a possession-based SMS verification framework and implementation algorithms in it, and analyze the security and performance features of them.

Foreword

vii

In chapter “An Alternative Development for RCANE Platform”, Toan Van Nguyen and Geunwoong Ryu, proposed an RCANE platform that is presented to solve the challenges of previous platforms and comply with the preconceived perspectives of the current political, social, and economic systems to build up an ecosystem for economy, society, and politics. In chapter “Structural Relationship Data Analysis Between Relational Variables and Benefit Sharing: Moderating Effect of Transaction-Specific Investment”, Hae-Soo Pyun analyze the structural relationship data analysis between relational variables and benefit sharing in the automobile industry, and analyze the moderating effect of transaction-specific investment, to suggest the theoretical and practical implications. Based on the collected data, reliability analysis, validity analysis, correlation analysis, and regression analysis were conducted. In chapter “Beyond the Hawthorne Research: Relationship Between IT Company Employees’ Perceived Physical Work Environment and Creative Behavior”, Jin-Hua Zhang and Jun-Ho Lee demonstrates the relationship between IT company employees’ perception of the organizational physical work environment, psychological well-being as a psychological factor, and creative behavior. Their study demonstrates that perceived physical work environment has a significant effect on creative behavior and that psychological well-being as a psychological factor. It is our sincere hope that this volume provides stimulation and inspiration, and that it will be used as a foundation for works to come. May 2019

Program Chairs Subrata Acharya Towson University, Towson, USA

Contents

A QoS Aware Uplink Scheduler for IoT in Emergency Over LTE/LTE-A Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jin Zhang, Yalong Wu, Wei Yu and Chao Lu Emulation-Based Performance Evaluation of the Delay Tolerant Networking (DTN) in Dynamic Network Topologies . . . . . . . . . . . . . . . Weichao Gao, Hengshuo Liang, James Nguyen, Fan Liang, Wei Yu, Chao Lu and Mont Orpilla

1

23

Teaching Distributed Software Architecture by Building an Industrial Level E-Commerce Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bingyang Wei, Yihao Li, Lin Deng and Nicholas Visalli

43

KNN-Based Overlapping Samples Filter Approach for Classification of Imbalanced Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mar Mar Nwe and Khin Thidar Lynn

55

Spectrum-Based Bug Localization of Real-World Java Bugs . . . . . . . . . Cherry Oo and Hnin Min Oo

75

The Role of Unconscious Bias in Software Project Failures . . . . . . . . . . C. J. B. Macnab and Sam Doctolero

91

Analysis of Missing Data Using Matrix-Characterized Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Thin Thin Soe and Myat Myat Min Expansion of Cyber Attack Data from Unbalanced Datasets Using Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . 131 Tim Merino, Matt Stillwell, Mark Steele, Max Coplan, Jon Patton, Alexander Stoyanov and Lin Deng

ix

x

Contents

Beyond the Hawthorne Research: Relationship Between IT Company Employees’ Perceived Physical Work Environment and Creative Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Jin-Hua Zhang and Jun-Ho Lee Structural Relationship Data Analysis Between Relational Variables and Benefit Sharing: Moderating Effect of Transaction-Specific Investment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Hae-Soo Pyun Fusion of Log-Mel Spectrogram and GLCM Feature in Acoustic Scene Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Mie Mie Oo and Lwin Lwin Oo Improvement on Security of SMS Verification Codes . . . . . . . . . . . . . . 189 Shushan Zhao An Alternative Development for RCANE Platform . . . . . . . . . . . . . . . . 205 Toan Van Nguyen and Geunwoong Ryu Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

Contributors

Max Coplan Department of Computer and Information Sciences, Department of Physics, Astronomy and Geosciences, Towson University, Towson, MD, USA Lin Deng Department of Computer and Information Sciences, Towson University, Towson, MD, USA Sam Doctolero Schulich School of Engineering, University of Calgary, Calgary, Canada Weichao Gao Department of Computer and Information Sciences, Towson University, Towson, MD, USA Jun-Ho Lee Department of Business Administration, Hoseo University, Cheonan, Chungnam, Korea Yihao Li Institute for Software Technology, Graz University of Technology, Graz, Austria Fan Liang Department of Computer and Information Sciences, Towson University, Towson, MD, USA Hengshuo Liang Department of Computer and Information Sciences, Towson University, Towson, MD, USA Chao Lu Department of Computer and Information Systems, Towson University, Towson, MD, USA Khin Thidar Lynn University of Computer Studies Mandalay, Mandalay, Myanmar C. J. B. Macnab Schulich School of Engineering, University of Calgary, Calgary, Canada Tim Merino Department of Computer and Information Sciences, Department of Mathematics, Towson University, Towson, MD, USA

xi

xii

Contributors

Myat Myat Min Faculty of Computer Science, University of Computer Studies, Mandalay, Myanmar James Nguyen US Army Command, Control, Computers, Communications, Cyber, Intelligence, Surveillance and Reconnaissance Center, Aberdeen, MD, USA Toan Van Nguyen RCANE Lab, Seoul, Korea Mar Mar Nwe University of Computer Studies Mandalay, Mandalay, Myanmar Cherry Oo Software Engineering Lab, University of Computer Studies, Mandalay, Myanmar Hnin Min Oo Software Engineering Lab, University of Computer Studies, Mandalay, Myanmar Lwin Lwin Oo University of Computer Studies, Mandalay (UCSM), Mandalay, Myanmar Mie Mie Oo University of Computer Studies, Mandalay (UCSM), Mandalay, Myanmar Mont Orpilla US Army Command, Control, Computers, Communications, Cyber, Intelligence, Surveillance and Reconnaissance Center, Aberdeen, MD, USA Jon Patton Department of Computer and Information Sciences, Department of Mathematics, Towson University, Towson, MD, USA Hae-Soo Pyun Department of Business Administration, Namseoul University, Cheonan-Si, South Korea Geunwoong Ryu RCANE Lab, Seoul, Korea Thin Thin Soe Web Mining Lab, University of Computer Studies, Mandalay, Myanmar Mark Steele Department of Computer and Information Sciences, Department of Mathematics, Towson University, Towson, MD, USA Matt Stillwell Department of Computer and Information Sciences, Towson University, Towson, MD, USA Alexander Stoyanov Department of Computer and Information Sciences, Department of Mathematics, Towson University, Towson, MD, USA Nicholas Visalli Department of Computer and Information Sciences, Towson University, Towson, MD, USA Bingyang Wei Department of Computer Science, Texas Christian University, Fort Worth, TX, USA Yalong Wu Department of Computer and Information Systems, Towson University, Towson, MD, USA

Contributors

xiii

Wei Yu Department of Computer and Information Systems, Towson University, Towson, MD, USA Jin-Hua Zhang Department of Business Administration, Hoseo University, Cheonan, Chungnam, Korea Jin Zhang Department of Computer and Information Systems, Towson University, Towson, MD, USA Shushan Zhao University of Pittsburgh at Bradford, Bradford, PA, USA

A QoS Aware Uplink Scheduler for IoT in Emergency Over LTE/LTE-A Networks Jin Zhang, Yalong Wu, Wei Yu and Chao Lu

Abstract A massive number of Internet-of-Things (IoT) devices are deployed to monitor and control a variety of physical objects as well as support a body of smart-world applications. How to efficiently allocate network resources becomes a challenging issue with the rapidly growing connected IoT devices. Depending on applications, the burst of IoT traffic could lead to the bandwidth deficiency within a short period of time and further deteriorates network performance. To tackle this issue, in this paper we first propose a Quality of Service (QoS) aware Normal Round Robin Uplink Scheduler (QNRR-US) over Long-Term Evolution (LTE)/LTEAdvance (LTE-A) networks. QNRR-US assigns a higher priority to IoT data that requires urgent treatment over normal IoT data, and then builds IoT devices’ scheduling queues based on priorities of data traffic. Thus, QNRR-US guarantees high priority data transmission. To provide fairness to IoT data, QNRR-US reserves some bandwidth for low priority data traffic. Based on QNRR-US, we then propose the QoS aware Bound Round Robin Uplink Scheduler (QBRR-US), which separates enormous IoT devices with burst data traffics and pushes them into service and waiting queue. The IoT devices in service queue take part in round robin resource allocation until the transmission of urgent data from the IoT device is complete and the new IoT device enters service queue from waiting queue for the next turn of resource allocation. Through simulations in NS-3, our experimental results show that QBRRUS outperforms the traditional proportional fair (PF) scheduler and QNRR-US with respect to throughput, packet loss ratio, and packet delay.

J. Zhang · Y. Wu · W. Yu (B) · C. Lu Department of Computer and Information Systems, Towson University, Towson, MD 21252, USA e-mail: [email protected] J. Zhang e-mail: [email protected] Y. Wu e-mail: [email protected] C. Lu e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_1

1

2

J. Zhang et al.

Keywords IoT · Uplink scheduling · QoS aware · LTE networks · Bandwidth distribution · Performance evaluation · Emergency

1 Introduction The Internet of Things (IoT) is a promising technology which aims to connect the global world via numerous smart sensors and actuating devices [1–4]. IoT devices are deployed to support a variety of smart-world systems including smart home, smart healthcare, smart transportation, smart grid, smart manufacturing, and smart agriculture, among others [5–15]. The network connection of IoT can be wired through Ethernet cable or wireless through Bluetooth, ZigBee, Wi-Fi, or a cellular network such as Long-Term Evolution (LTE), LTE Advanced (LTE-A), or the fifth generation (5G) of mobile communication [16, 17]. Due to the capability of supporting complex territory, wireless networks, specifically LTE/LTE-A networks, become the common infrastructure for IoT. Nonetheless, with the rapidly increasing number of IoT devices around the world (forecast to reach 20.35 billion by 2019 [18]), the shortage of bandwidth resource proves to be a challenging issue in a LTE/LTE-A IoT network. To address the problem, a number of network resource allocations and scheduling schemes have been proposed [19–21]. The 3rd Generation Partnership Project (3GPP) also integrates high frequency transmission and mini-slot into 5G new radio standard to overcome the gap between bandwidth shortage and the explosive growth of IoT devices. Nonetheless, some IoT applications have not been fully considered in these solutions. For instance, IoT devices could generate a large bulk of data within a short time period depending on applications, including emergency, failure detection, response, and others [22]. The amount of data goes far beyond the capacity of bandwidth, leading to the rapid deterioration of data transmission performance with respect to throughput, packet loss ratio (PLR), and packet delay, especially of those transmissions sending from IoT devices that deal with emergency and failure detection and response. Given the importance of such information, it calls for designing mechanisms to effective data transmission from those IoT devices in LTE/LTE-A networks. In this paper, we made the following contributions. • We propose a quality of service (QoS)-aware Normal Round Robin uplink traffic scheduling (QNRR-US) scheme to handle data traffic from IoT devices that require urgent treatment in the LTE/LTE-A network. The requirement of QoS in LTE is recorded by QoS Class Identifier (QCI). The QoS-based scheduler in eNodeB (eNB) allocates bandwidth resource according to QCI value. QNRR-US creates a new critical emergency IoT data type with a higher priority level than normal data types in a QCI table. Generally speaking, QNRR-US grants a minimum data rate to IoT devices (called UEs in LTE/LTE-A) on the sequence of their priority. When a data rate requested by IoT devices exceeds channel capacity, the data traffic with low priority will be dropped. Thus, the IoT devices with high priority data obtain

A QoS Aware Uplink Scheduler for IoT in Emergency …

3

more bandwidth to transmit data. Furthermore, considering fairness, QNRR-US reserves a limited amount of resource to IoT devices with lower priority data traffic. • Considering that enormous IoT devices (which have the same priority) may report urgent data at the same time, the data traffic is still beyond the capacity of bandwidth after dropping lower priority data traffic as QNRR-US does. To address this issue, we propose the QoS aware Bound Round Robin uplink traffic scheduling (QBRRUS) scheme, which is enhanced by QNRR-US. The core of QBRR-US is to adopt the Bound Round Robin (BRR) algorithm and add priority into scheduling as QNRR-US does. BRR algorithm separates emergency IoT devices with service and waiting states, then pushes them in service and waiting queues. IoT devices in service queue are allocated resources until all available resources have run out. Those emergency IoT devices in a waiting queue have to wait until IoT devices in a service queue have sent out their emergency information and dequeue from the service queue. Then, the devices in the waiting queue have an opportunity to enter service queue and join the BRR resource allocation process. • To validate the effectiveness of QBRR-US, we design a simulation experiment using NS-3 [23], which is an open source and widely used simulation platform in networking research and education. It allows the designer to create and configure network nodes, network channels, network devices, and network applications separately. Thus, it provides ways to simulate the IoT application and infrastructure. Particularly, NS-3 includes sophisticated LTE/LTE-A network modules and corresponding modules for network performance data collection, which benefits us to build our simulation IoT network based on LTE/LTE-A. Our evaluated IoT network consists of one eNB and a number of UEs as IoT devices, which are randomly deployed on the communication coverage area of eNB. Due to collecting data from IoT devices in main task in IoT networks, we mainly evaluate the uplink transmissions from IoT devices to eNB. The first part of our experiment evaluated the performance of an IoT network with increasing bandwidth and raising IoT devices which are in emergency state. The results demonstrate that IoT uplink performance deteriorates rapidly along the increasing of emergency IoT devices. In second part, we implement the three uplink transmission scenarios using traditional proportional fair (PF) scheduler [24], QNRR-US, and QBRR-US that we proposed. The simulation results show that our QBRR-US outperforms the other two baseline uplink schedulers with respect to the throughput, PLR, and packet delay of emergency traffic. The remainder of this paper is organized as follows: In Sect. 2, we conduct a literature review. In Sect. 3, we introduce a system model and an existing uplink scheduler over LTE/LTE-A. In Sect. 4, we present our proposed QNRR-US and QBRR-US scheduling schemes. In Sect. 5, we introduce the experimental design and configuration and analyze the experimental results of QBRR-US and two baseline schedulers with respect to a set of performance metrics. In Sect. 6, we conclude the paper and give final remarks.

4

J. Zhang et al.

2 Related Works In view of expanding connected IoT devices and limited network resources, resource allocation and scheduling play a significant role in IoT networks performance. Considering most data transmission in IoT is from end-users, a body of research efforts have been investigated on uplink scheduling for IoT applications [8, 11, 25–36]. Existing uplink scheduling schemes focus on diverse perspectives including specific IoT applications, heterogeneous networks, and new features introduced in 5G. There have been a number of research efforts devoted to improve uplink scheduling for specific applications or network structure [26, 30, 31, 34, 35]. For example, Wang et al. [32] proposed a scheduling algorithm for a camera surveillance system in cellular networks to meet the coverage requirement by minimizing the number of allocated RBs to each camera. Amarasekara et al. [35] designed two schedulers for smart grid periodic and emergency situations as well as introduced random delay for smart meter packet flows to ensure the data transmission in an emergency. Regarding resource management and performance assessment on IoT networks, there have been a number of existing efforts [21, 26, 31, 33, 37]. For example, Ghavimi et al. [26] designed a scheduling algorithm based on a group-based M2M communication, which cluster M2M devices according to not only their QoS features, but also network transmission protocol. Also, the network throughput is optimized via the Lagrange duality theory on resource allocation. Carlesso et al. and He et al. presented scheduling algorithms for networks where smart grid and human to human (H2H) coexist [31, 33]. Specifically, Carlesso et al. [31] proposed a scheduling policy for the co-existing network of smart grid and real-time application. The policy reduced negative impact of smart grid periodic traffics on real-time traffic through scheduling RBs based on the combined characteristics of smart grid traffic, including channel quality, and traffic priority, etc. He et al. in [33] formulated the uplink resource allocation problem to the sum-throughput optimization problem for a M2M and H2H co-existence network, then solved the optimization problem by using the Lagrange dual algorithm. Related to the development of 5G, some research efforts have been conducted to design uplink scheduling algorithms involving new features (such as transmission repetition) of 5G. For example, Yu et al. [28] proposed an uplink link adaption scheme involving transmission repetition. An inner loop link adaption in this scheme is used to reduce block error ratio by adjusting repetition number while an outer loop link adaption is applied to coordinate Modulation and Coding Scheme (MCS) value and repetition number. Also, Liu et al. [29] developed a quality-of-control-driven uplink scheduler to guarantee the stability of a control loop in 5G networks by handling the control loop as a network application. Unlike existing research efforts, in this paper we propose QBRR-US, a QoSaware uplink scheduler, for IoT in emergency situation in which a large burst of data is generated within a short interval. The objective of QBRR-US is not to optimize the whole network throughput like some researches have done. QBRR-US attempts

A QoS Aware Uplink Scheduler for IoT in Emergency …

5

to sustain the quality of emergency data transmission over a LTE/LTE-A network to guarantee the throughput of data transmission from emergency IoT devices and reduce the packet loss ratio and packet delay on uplink communication.

3 System Model In this section, we first present scheduling process in a LTE/LTE-A network and then describe the traditional PF scheduler used as baseline.

3.1 Resource Allocation LTE adopts Orthogonal Frequency Division Multiple Access (OFDMA) modulation scheme for downlink and Single Carrier FDMA (SC-FDMA) for uplink. In both modulation schemes, to reduce process overhead, channel resource is grouped into resource blocks (RBs). A RB has two dimensions: (i) frequency domain and (ii) time domain. In frequency domain, bandwidth is divided into subcarriers spacing of 15 KHz. In time domain, time is split into time slots of 0.5 ms. One RB consists of 12 subcarriers and 1 time slot. Radio resource allocation is executed every 1 ms, which is defined as Transmission Time Interval (TTI). Then, a pair of time slots (consisting of two consecutive RBs, also referred as one subframe) is the minimum unit of resource distribution. One subframe contains 12 or 14 symbols (Fig. 1). Generally speaking, Media Access Control (MAC) layer scheduler allocates RBs according to Channel Quality Indicator (CQI) and QCI report. In uplink scheduling, RBs are allocated per UE, not per bearer. UE measures the radio channel quality between UE and eNB, and then reports it through CQI to eNB. The eNB selects

Fig. 1 Subframe and resource blocks

6

J. Zhang et al.

a Modulation and Coding Scheme (MCS) value based on CQI report. Every MCS value corresponds to a transport block size (TBS), which tells how many bits can be transmitted per TTI. QCI demonstrates the transmission requirement of data traffic or applications. Every QCI connects with a priority level. Commonly, the scheduler in MAC layer will discard the lowest priority level traffic when congestion occurs.

3.2 Proportional Fair (PF) Scheduler We consider the PF scheduler as the baseline scheme, in comparison with the QNRR and QBRR uplink schedulers that we propose in this paper. The PF scheduler aims to obtain the highest throughput in a LTE/LTE-A network. It does not consider the QoS of individual traffic in scheduling. The same RB pair is allocated for a user when its instantaneous channel quality is higher than its own average channel condition over time. It also maintains a good trade-off between spectral efficiency and fairness. The data rate is the indicator of channel quality. We now describe how to obtain the data rate of one RB. Denote Ui as a user, i ∈ N as index for users, the symbol as S j and j ( j ∈ N) as the index of symbols, R Bk as RB, k ∈ N as index for RB. Also, denote Ns as the number of OFDM symbols in 1 RB, and Mi, j as the number of MCS constellation states used by the user Ui on symbol S j , and Ci, j as the code rate associated with Mi, j . The bit rate Ri,k for the user Ui on the R Bk is defined as Ri,k =

Ci, j log2 (Mi, j )Ns , τ

(1)

where τ is the time duration of 1 RB and its value is set to 0.5 ms. At the start of every TTI, RBs are assigned to users. The mapping relation between user and RB is determined by Ri,k . iˆ = argmax (2) i=1,2,...,N Ri Here, Ri is the historical throughput of user Ui , iˆ is the index of user who has the maximum ratio of current data rate to its throughput history. R Bk should be allocated to user Uiˆ .

4 Our Approach In this section, we first present the challenging issue of data transmission in emergency IoT devices and then propose two uplink schedulers to address the issue.

A QoS Aware Uplink Scheduler for IoT in Emergency …

7

4.1 Problem Statement Generally speaking, the IoT network consists of a connected massive number of low-power devices that send small volumes of data traffic. In some situations, such as emergency, or failure detection and recovery, IoT devices could generate a large amount of data within a short period of time. For example, phasor measurement unit (PMU) in the smart grid reports measurements in low frequency from 10 to 60 samples per second in normal cases. When an emergency occurs, the report rate could reach 15,000 samples per second [38]. The data pattern of PMU in an emergency can be represented in Fig. 2. IoT devices generate bulk data when an emergency occurs. Numerous emergency IoT devices request connection approximately at the same time. In LTE/LTE-A, these connection request means to activate large amount of random access procedure (used by UEs to initiate connection with eNB in LTE/LTE-A), which results into heavy control signal collision. Furthermore, the shortage of bandwidth prevents a number of IoT devices from being granted resources in a timely manner. They have to hold the data and wait for resources in future TTI. In view of the characteristics of IoT devices (e.g., normally it is low cost and transmits small data load), emergency IoT devices have limited ability to store data, the emergency data or fragment of them could be lost before the emergency IoT device has opportunity to send it out. Nonetheless, the integrity and timeliness of emergency data is crucial in emergency process. Maintaining a quality and stable communication from emergency IoT devices to eNB is more critical than getting a high overall throughput when an emergency situation occurs. In summary, there are two key issues for IoT in an emergency, one is resource shortage and the other is the integration of data. For the first issue, increasing bandwidth is an obvious solution. Nonetheless, given that an IoT device usually supports limited uplink bandwidth, raising uplink bandwidth makes a limited improvement on bandwidth shortage for emergency data traffic. Reasonable resource allocation is more efficient for increasing emergency data traffic and reducing the packet loss rate. In this paper, our QNRR-US introduces QoS into resource scheduling, and attaches a higher priority for emergency data than normal IoT data. The resource allocation based on QoS means that high priority emergency data traffic can preempt more resource from low priority data flow when the channel resource is not sufficient for data transmission. Our designed QBRR-US is a QoS concerned scheduler as QNRR-US. Different from QNRR-US, QBRRUS builds a particular scheduling queue based on priority. QBRR-US confines the

Fig. 2 Emergency data pattern

8

J. Zhang et al.

number of emergency IoT devices competing for resource through the management of queue. Only partial emergency IoT devices take part into the resource allocation every TTI. Furthermore, for those emergency IoT devices who get transmission resource, QBRR-US adopts non-preemptive policy. Once an emergency device starts data transmission, it is entitled to send whole bulk emergency data generated. QBRR-US alleviates the transmission conflicts through reducing the connected emergency link at the same time. Moreover, the non-preemptive policy ensures the bulk emergency data will be sent with minimal delay.

4.2 QNRR Uplink Scheduler QNRR uplink scheduler (QNRR-US) aims to promote the transmission performance of emergency information. Considering the different levels of importance for emergency IoT data and normal IoT data, QNRR-US introduces a QoS requirement into resource processing. It attaches a higher priority to emergency data type than normal data in QoS and records it into QCI table. QCI is a mechanism that 3GPP released in LTE/LTE-A standard to ensure appropriate QoS for every bearer traffic [39]. A QCI index has four main properties: resource type, priority level, packet delay budget, and packet loss rate. The scheduler can determine the volume of resources and allocation order for bearers based on their QCI. To support the emergency communication, 3GPP has successively added new QCI indexes for emergency situations including QCI index 65 for Mission Critical-push-to-talk (MCPTT) voice, and index 69 for MCPTT signaling in release 12. QNRR-US adds an emergency IoT data type with high priority. We set the emergency property just below QCI index 65 and 69, which is usually used for public safety or first response. The high priority of emergency IoT data means that it is allowed to preempt the RBs from almost all data types except the type of QCI index 65 and 69 for emergency IoT data traffic. After locating the priority level of emergency data, QNRR-US processes and forwards data traffic based on their priorities. For the normal situation, when the data traffic from all flows do not reach the capacity of bandwidth, QNRR-US acts as traditional QoS-aware uplink scheduler. There are three steps in scheduling for all IoT devices which request connection: (i) allocating the minimum data rate for all devices on the sequence of data traffic priority, (ii) allocating the remaining resource for the UE that currently has the highest priority, (iii) if there is still resource left, go run step (ii) for the next highest priority UE, till all RBs are assigned. Once an emergency occurs, the data generated from emergency IoT devices usually overwhelms the capacity of bandwidth. In this situation, QNRR-US still distributes the resource on priority order of data traffic as step (i), but no step (ii) and (iii) since eNB has not enough resource for all devices. Then, next TTI, step (i) is repeated. All emergency devices have the same priority, eNB adopts round robin

A QoS Aware Uplink Scheduler for IoT in Emergency …

9

algorithm to ensure that all of them have the opportunity to send data. To avoid the starvation, QNRR-US reserves fixed bandwidth for low priority data traffics. Through introducing QoS into scheduling, emergency devices receive more resource than normal devices due to high priority. We separate data flows into three categories: (i) data traffic of high priority level 1 (HD-L1), which is for public safety or first response, (ii) high priority level 2 (HD-L2) data traffic, which is emergency IoT data traffic (ED), and (iii) low priority data traffic (LD) of all normal data traffic. Notice that HD-L1 has the highest priority. HD-L2 is the second highest priority for data traffic except the one in crisis. LD represents all other data flows with lower priorities than ED traffic flows. The QNRR-US scheduling process is represented as Fig. 3. Three queues are built for devices with HD-L1, HD-L2, LD separately. From Fig. 3, we can see that HD-L1 data is allocated resource firstly. Then, all emergency devices take part into the round robin scheduling. In TTI 1, after RBs allocation for devices with HD-L1, only emergency device 1–5 (in HD-L2) get RBs because of the shortage of resource. In TTI 2, eNB moves forward to distribute resource for next group of emergency device 6–9 as round robin algorithm demands. QNRR-US applies the round robin algorithm for all emergency devices. The communication performance between emergency devices and eNB is improved comparing to a nonQoS aware scheduler because more resources are allocated to high priority ED. The devices with LD can only get resources from the preserved RBs that are reserved for them.

Fig. 3 Queues and RBs allocation in QNRR-US

10

J. Zhang et al.

4.3 QBRR Uplink Scheduler QBRR-US adopts a QoS aware strategy like QNRR-US. The resource allocation steps for a normal situation in QBRR-US is the same as QNRR-US. Nonetheless, when an emergency occurs, eNB uses round robin algorithm just on partial emergency devices in QBRR-US instead of all emergency devices in QNRR-US. The scheduler process of QBRR-US is illustrated in Fig. 4. In QBRR-US, the ED queue is separated into two parts: a service queue and a waiting queue. The eNB allocates resources to devices in HD-L1, HD-L2(ED) service queue and LD. The devices in an ED waiting queue have to wait until some devices dequeue from the ED service queue after the data transmission is complete. For example, from Fig. 4, we can see devices 1–5 get the RBs in TTI 1. In TTI 2, same devices 1–5 obtain the RBs until they have sent out their bulk emergency data. Then, those devices are dequeued from the ED service queue, the devices in the waiting queue get into the service queue and join the resource allocation. At the beginning, when an emergency appears, the ED service queue is empty, all emergency devices enter a waiting queue by first come first service policy. The first device in the waiting queue enters a service queue and eNB allocates RBs for it. Afterwards, the next device moves from the waiting queue to the service queue for RBs until available bandwidth capacity has run out. Then, QBRR-US handles resource allocation as we represent in Fig. 4. The RBs allocation algorithm and queues maintenance algorithm in QBRR-US are described in Algorithms 1 and 2, respectively. We use Q HD , Q LD , Q SED and Q WED to stand for HD-L1 queue, LD queue, ED service queue, and ED waiting queue, respectively. Let D represent UE and eNB holds an allocation map of RBs, which is defined as M(RB, D). Denote NRB as the number of RBs for network bandwidth, the item (R Bi , Ds ) in map M presents that RBi is allocated to UEs , where 1 < i < NRB ,

Fig. 4 Queues and RBs allocation in QBRR-US

A QoS Aware Uplink Scheduler for IoT in Emergency … Table 1 Notations for Algorithms 1 and 2 R Bi Ds : NRB : Rmin : R R Bi : M(R B, D): (R Bi , Ds ): QH D: QSE D: QW E D: QLD:

11

RB i UE s Number of RBs Minimum data rate Data rate of R Bi Allocation map of RBs R Bi is assigned to UEs Queue of UEs with high priority data Service Queue of UEs with emergency data Waiting Queue of UEs with emergency data Queue of UEs with low priority data

s ∈ N. If a Ds is in Q HD , we record it as Ds ∈ Q HD . So does Ds ∈ Q LD , Ds ∈ Q SED , and Ds ∈ Q WED . All notation for Algorithms 1 and 2 are listed in Table 1. Algorithm 1 RBs allocation in QBRR-US Require: RB Map M(R Bi , Ds ), Q H D , Q S E D ,Q L D Ensure: Updated M(R Bi , Ds ) for next TTI 1: while exist non-allocated R Bi do 2: if Q H D is not empty then 3: Ds ⇐ dequeue element from Q H D 4: for current Rmin of Ds > 0 do 5: allocate R Bi to Ds 6: update current Rmin by subtract R R Bi 7: end for 8: else if Q S E D is not empty And (N R B left) > (N R B for LD) then 9: Ds ⇐ dequeue element from Q S E D 10: for current Rmin of Ds > 0 do 11: allocate R Bi to Ds 12: update current Rmin by subtract R R Bi 13: end for 14: else if Q L D is not empty then 15: Ds ⇐ dequeue element from Q L D 16: for current Rmin of Ds > 0 do 17: allocate R Bi to Ds 18: update current Rmin by subtract R R Bi 19: end for 20: end if 21: end while

Algorithm 1 indicates the RBs allocation process of QBRR-US in one TTI when an emergency occurs. As a precondition, the devices requesting for resources should have entered into scheduling queues, including HD queue, ED service queue, and

12

J. Zhang et al.

LD queue. At the beginning of a new TTI, eNB starts allocating RBs one by one. If there is an available RB, eNB checks the scheduling queues by priority order (e.g., HD, ED and LD). This RB will be allocated to the first device in the scheduling queues. Given the shortage of resource, QBRR-US allocates a minimum data rate for devices. Thus, after this RB allocation, eNB checks whether the allocated data rate meets the device’s minimum data rate requirement. If so, jump to next device. If not, allocate next RB for the same device. Algorithm 1 checks every RB and binds it with a device. Thus, the time complexity of Algorithm 1 is the number of RBs, denoted as O(NRB ). Algorithm 2 : Queues Maintenance in QBRR-US Require: Q H D , Q S E D , Q W E D ,Q L D Ensure: updated queues of UE for next TTI 1: for new UE requests connection do 2: enter Q H D ,Q W E D , and Q L D based on priority level 3: end for 4: for all UE in Q H D do 5: if UE has data to send then 6: distribute RBs to UE through Algorithm 1 7: update remaining data volume 8: else 9: dequeue UE from Q H D 10: end if 11: end for 12: for all UE in Q S E D do 13: if UE has data to send then 14: distribute RBs to UE through Algorithm 1 15: update remaining data volume 16: else 17: dequeue an UE from Q S E D 18: moving a element form Q W E D to Q S E D 19: end if 20: end for 21: for all UE in Q L D do 22: if UE has data to send then 23: distribute RBs to UE through Algorithm 1 24: update remaining data volume 25: else 26: dequeue UE from Q L D queue 27: end if 28: end for

Algorithm 2 illustrates the queues maintenance of QBRR-US in emergency situation. QBRR-US maintains four queues: HD, LD, ED service, and ED waiting queue. The elements in a scheduling queue also record the property of data volume that needs to be transmitted, except UE index. All devices enter corresponding queues based on priority level of their data flow. After eNB allocates RBs for a device in a TTI, the remaining data volume of the device is reduced. eNB adjusts the amount of

A QoS Aware Uplink Scheduler for IoT in Emergency …

13

remaining data of devices in queue. If there is no data left, which means the device has sent out whole emergent bulk data, it will be dequeued from the scheduling queue. Particularly, when a device is dequeued from an ED service queue, the first element in the waiting queue will move forward to the service queue. Algorithm 2 checks elements in queues and allocates resources for them. Thus, the time complexity of Algorithm 2 is the number of activated devices, denoted as O(NUE ).

5 Performance Evaluation In this section, we first illustrate the experiment design to validate the QBRR-US that we proposed and details of the network configuration as well as parameters for simulation scenarios. Then, we demonstrate the uplink communication performance related to three key metrics (i.e., throughput, PLR, and delay) in an emergency along with the increased number of IoT devices on distinct bandwidths. At last, we compare the uplink transmission performance with PF, QNRR and QBRR uplink schedulers. The experimental results validate that QBRR-US that we proposed effectively improves the transmission performance regarding emergency throughput, PLR, and packet delay.

5.1 Simulation Scenarios We design three scenarios in our experiment: PF, QNRR and QBRR scenarios. Specifically, a PF scenario adopts PF uplink scheduler for uplink transmission. PF scheduler aims to achieve the highest network throughput by distributing RBs according to the quality of transmission channel. Those devices in good channel state will get more resources for data transmission. QNRR and QBRR scenarios adopt QNRR and QBRR uplink scheduler separately. Both of them are QoS aware scheduler. They maintain scheduling queues of IoT devices according to the priority of their data traffics. The eNB allocates RBs based on these scheduling queues. In QNRR scenarios, all UEs enter scheduling queues and compete for resources, whether in normal or emergency situations. Nonetheless, in QBRR scenarios, when the allocated data rate exceeds the bandwidth capacity, like the situation in an emergency, partial devices are excluded from resource allocation in waiting state until some devices have sent out their continuous bulk data and release RBs.

5.2 Scope of Experiment In our experiment, the following key metrics are defined and employed to evaluate the performance of IoT networks over LTE/LTE-A networks: (i) Throughput.

14

J. Zhang et al.

It is defined as the number of bits per second (bps) that eNB receives from those devices that report emergency information. As uploading is the main process of IoT applications, we evaluate the uplink performance of LTE/LTE-A network. The devices in an emergency send out a burst of data packets to eNB within a short period of time. This metric demonstrates the volume of data that eNB receives from emergency devices in unit time on the PDCP layer. (ii) Packet Loss Ratio. It is the ratio between the actual received data bytes of eNB and total transmitted data bytes sent from IoT device per second. It is also collected from the PDCP layer. (iii) Delay. It is measured as the average transmitting time of packets from IoT devices to eNB. Delay is collected from the PDCP layer, which means that every delay value is the time interval starting from the moment when a packet is sent out from the device’s PDCP layer to the moment when the eNB’s PDCP layer receives the packet.

5.3 Network Infrastructure and Parameters In the following, we describe the network structure and configuration details in our experiment. The simulation in our paper focuses on the wireless communication, the Evolved Universal Terrestrial Radio Access (E-UTRA) section in LTE. E-UTRA consists of eNBs and UEs. We build a single cell wireless network in NS-3. NS-3 is an open source network platform for research and education [23]. It provides a delicate LTE module and the trace methods to simulate the LTE network and measure the network performance. It allows the designer to create node, protocol, and application separately. The flexibility of NS-3 assists researchers to build complicated networks as they require. In our experiment, we position a macro cell (also called eNB in LTE) and all UEs dispersed around it following the uniform distribution within its coverage. For the three scenarios we described, we set 1/6 UEs as emergency IoT devices that generate large amounts of emergency data. Other UEs remain in normal situations with lower priority and periodic data generated. We ignored the data traffic of HD-L1 in the experiment setting because this type of data has no impact on our scheduling performance evaluation. HD-L1 data traffic gets the same volume of resources in QNRR and QBRR scenarios all the time. While in PF scenario, all UEs are processed in same way no matter what type. All parameters of our IoT simulation over LTE/LTEA network in NS-3 are listed as Table 2. Data Pattern. We adopt the profile of data load from a phasor measurement unit (PMU) in the smart grid as a data pattern [40], which is an example for demonstrating our approaches. The smart grid belongs to critical energy IoT systems. PMU is a smart device used to measure AC waveforms (voltages and currents) on a electricity grid. It reports the measurement results periodically. The reporting rate depends on actual applications. Commonly speaking, the range of the reporting rate is from 10 samples to 120 samples per second in a smart grid [40]. We choose PMU reporting rate as 48 samples per second for normal situation. In emergency situation, PMU generates approximate 5000–15,000 samples per second [38]. We set the reporting

A QoS Aware Uplink Scheduler for IoT in Emergency … Table 2 NS-3 simulation parameters Parameter UE Tx Power UE Height UE category eNB Tx Power eNB Height eNB Sensitivity Level UlEarfcn Uplink Central Band Frequency DlEarfcn Downlink Central Band Frequency Antennas Uplink Data Rate Propagation Loss Module Transport Layer Protocol Uplink Bandwidth (MHz) Number of Uplink Resource Block Simulation Time

15

Value 23 dBm 1m CAT-0 46 dBm 30 m −93.5 dBm 18,100 1930.0 MHz 100 2120.0 MHz SISOa 1 Mbps COST231 UDPb 1.4, 3, 5, 10, 15, 20 6, 15, 25, 50, 75, 100 30 s

a Single b User

Input Single Output (SISO) Datagram Protocol (UDP)

rate as 7000–7500 samples per second to simulate emergency situation. One PMU sample is a packet of 38 bytes. Then, the total sampling data rate of an emergency UE is above 7000 sample × 38 bytes = 2.128 Mbps to 7500 sample × 38 bytes, which equals to 2.28 Mbps. Network Environment. We position all UEs in eNB coverage on suburban environments. The coverage distance is determined by power of transmitter, sensitivity of receiver, and transmission environment. First, the uplink transmitter should be UEs. We adopted CAT-0 category on the UE end, which is a new UE category designed for IoT and M2M networks in LTE release 12. According to 3GPP TS 36.101, its transmission power was set to 23 dBm. Second, from the 3GPP standard TS36.104 for LTE, the eNB reception radio sensitivity level was −98.8 dBm for 1.4 MHz, −95 dBm for 3 MHz, −93.5 dBm for the other 5, 10, 15, and 20 MHz bandwidths [41]. We consider −93.5 dBm as the eNB radio sensitivity level to guarantee the quality on all bandwidth. At last, we assumed the wireless network was located in a suburban environment. NS-3 provided propagation loss model COST231 to mimic suburban environment. According to the settings of UE’s power, eNB’s sensitivity level, and propagation model above, the transmission coverage for suburban environment in our experiment should be 327 m. We set it as 300 m to rule out the impact of radio attenuation on simulation results.

16

J. Zhang et al.

Uplink Bandwidth Setting. According to 3GPP Specification 36.306, CAT-0 UE supports uplink bandwidth of 1.4, 3, 5, 10, 15 and 20 MHz. The corresponding number of RBs for these bandwidths are 6, 15, 25, 50, 75, and 100. CAT-0 UE has only one antenna and the uplink and downlink data rate of it reduces to 1 Mbps comparing to CAT-1. In the experiment, we first evaluated the uplink communication performance with heavy data traffic on all six common bandwidths that CAT-0 supports. In view of similar tendencies of simulation results on six bandwidths, we selected a middle bandwidth 10 MHz (50 RBs) for our three evaluation scenarios (i.e., PF, QNRR, and QBRR scenarios).

5.4 Network Performance in Emergency Figures 5, 6 and 7 show the uplink transmission performance (e.g., throughput, PLR and delay) with increasing emergency UEs on all distinct uplink bandwidths that CAT-0 supports. As a result of heavy traffic load in emergency, the total throughput in emergency approximately equals the capacity of bandwidth, which does not reflect any change of communication performance. The throughput in this plot is the average throughput of individual UEs. From Fig. 5, we can see that the average throughput falls down fast along with the growth of UEs from 5 to 50, no matter the bandwidth. It displays that extending bandwidth is not an effective method to maintain uplink throughput in this heavy load situation. Furthermore, the packet loss ratio also grows when more UEs start to transmit emergency information. From Fig. 6, we can see PLR increases from around 20% to over 80% when the number of activated UEs grow from 5 to 50 on 10 MHz bandwidth. The same tendency as PLR appears for the average delay of packets showed in Fig. 7. An increase of active emergency UEs results in a rise of average delay with respect to packet transmission from UEs to eNB.

Fig. 5 Average throughput in emergency Average Throughput (bps/UE)

104 15

10

5

0

5

10

15

20

25

30

35

number of UEs

40

45

50 1.4

3

5

10

15

20

Bandwidth (MHz)

A QoS Aware Uplink Scheduler for IoT in Emergency …

17

Fig. 6 Packet loss ratio in emergency Packet Loss Ratio (%)

100 80 60 40 20 0

50

45

40

35

30

25

20

15

10

5

number of UEs

20

15

10

5

3

1.4

Bandwidth (MHz)

Average Delay (seconds)

Fig. 7 Average delay in emergency 1.5

1

0.5

0

50

45

40

35

30

25

20

15

10

5

number of UEs

20

15

10

5

3

1.4

Bandwidth (MHz)

5.5 Uplink Schedulers Analysis in Emergency The transmission performance deteriorates fast in emergency IoT network based on the experiment results from Figs. 5, 6 and 7. To improve the uplink performance of IoT in emergency situations, we propose two uplink scheduling algorithms, QNRR-US and QBRR-US. Three scenarios (i.e., PF, QNRR, and QBRR) stated in Sect. 5.1 are considered and implemented in NS-3. We assume that 1/6 IoT devices are activated to report a chunk of continuous emergency data when emergencies occur. That means among 30 UEs, there are 5 emergency UEs and 25 normal UEs. Similarly, the largest number (300) of UEs is composed of 50 emergency UEs and 250 normal UEs. The

18

J. Zhang et al.

Fig. 8 Throughput of emergency UEs

4

10

6

PF scheduler PNRR scheduler PBRR scheduler

Throughput (bps)

3.5 3 2.5 2 1.5 1 0.5 0 30

60

90

120

150

180

210

240

270

300

Number of UEs

Fig. 9 PLR of emergency UEs Packet Loss Ratio (%)

100

80

60

40

20

0 30

PF scheduler PNRR scheduler PBRR scheduler

60

90

120

150

180

210

240

270

300

Number of UEs

communication performance (e.g., throughput, PLR and delay) relating to emergency UEs on PF, QNRR and QBRR scenarios are displayed in Figs. 8, 9 and 10. Notice that the improvement of emergency data transmission is our main focus; we display the emergency throughput in Fig. 8. We can see that the emergency throughput in a QBRR scenario is much higher than in PF and QNRR scenarios, especially in a situation where more IoT devices are activated as emergency devices. For example, when the number of UE reaches 300, which means there are 50 emergency UEs and 250 normal UEs, the throughput of emergency UEs is over 7 times in QBRR scheduler than PF scheduler. Comparing with QNRR scheduler, QBRR is still almost 4 times in emergency throughput. PF scheduler distributes the RBs according to the channel quality and it does not consider the difference of normal UEs and emergency UEs. All UEs compete for the resources at the same time. Thus, emergency UEs do not get

A QoS Aware Uplink Scheduler for IoT in Emergency … Fig. 10 Average delay of emergency UEs Average Delay (seconds)

1.6

19

PF scheduler PNRR scheduler PBRR scheduler

1.4 1.2 1 0.8 0.6 0.4 0.2 0 30

60

90

120

150

180

210

240

270

300

Number of UEs

more resources even if they carry important and large amounts of data to transmit. Both QBRR-US and QNRR-US allocate most RBs to emergency UEs through the QoS aware scheduling. They actually sacrifice the communication performance of normal UEs to ensure transmission for emergency UEs. Furthermore, QBRR confines the number of connections between emergency UEs and eNB at the same time to reduce traffic congestion. QBRR-US that we propose achieves effective improvement on emergency throughput. Figure 9 demonstrates the results of packet loss ratio in PF, QNRR, and QBRR scenarios. From the figure, we can see that QBRR scenario also achieves optimal performance over PF and QNRR scenarios. The PLR of QBRR scheduler maintains the lowest in all three scenarios. When the number of UE increases above 120, which means more than 20 UEs are in the emergency state, the minimum PLR is approximately 60% for QNRR and 80% for PF. However, in the same situation, the PLR for QBRR scheduler maintains around 30%. Figure 10 illustrates the results of packet delay in PF, QNRR and QBRR scenarios. The packet delay increases when the number of emergency UEs grows. Among these three scenarios, the packet delay in PF scenarios rapidly rises. The QNRR scenario outperforms the PF scenario. In the QBRR scenario, the delay of packets transmission between emergency UEs and eNB keeps smallest delay (below 0.1 s). Through the analysis of simulated results, we know that the QBRR uplink scheduler effectively improves the uplink transmission performance with respect to emergency throughput, PLR, and packet delay in an IoT emergency situation. Comparing with traditional PF and QNRR scheduler (another QoS-aware scheduler we investigate), QBRR is more practical to schedule uplink traffic in an emergency situation.

20

J. Zhang et al.

6 Final Remarks In this paper, to address the severe deterioration of data transmission performance in bandwidth shortage situations of emergency, we propose a QNRR and QBRR uplink schedulers. Particularly, QNRR-US introduces high priority level for emergency data and allocates resources according to the priority of service. Based on QNRR-US, QBRR-US further confines the activated emergency IoT devices that simultaneously communicate with eNB. QBRR-US splits the emergency IoT devices into service and waiting ones. The service ones (enter into a service queue) are activated to send data until they have sent out the entire bulk of emergency data. Then some devices into waiting queue become service devices to get resource. After simulating three IoT scenarios with PF, QBRR, and QNRR uplink schedulers in NS-3, the experimental results validate that our QBRR-US can effectively improve the performance on emergency throughput, PLR, and packet delay between emergency IoT devices and eNB. Acknowledgements The work was supported in part by the US National Science Foundation (NSF) under grants: CNS 1350145. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agency.

References 1. Lin, J., Yu, W., Zhang, N., Yang, X., Zhang, H., & Zhao, W. (2017, October). A survey on Internet of Things: architecture, enabling technologies, security and privacy, and applications. IEEE Internet of Things Journal, 4(5), 1125–1142. 2. Gao, W., Nguyen, J. H., Yu, W., Lu, C., Ku, D. T., & Hatcher, W. G. (2017, October). Toward emulation-based performance assessment of constrained application protocol in dynamic networks. IEEE Internet of Things Journal, 4(5), 1597–1610. 3. Yu, W., Liang, F., He, X., Hatcher, W. G., Lu, C., Lin, J., et al. (2018). A survey on the edge computing for the Internet of Things. IEEE Access, 6, 6900–6919. 4. Xu, H., Yu, W., Griffith, D., & Golmie, N. (2018). A survey on industrial Internet of Things: A cyber-physical systems perspective. IEEE Access, 6, 78238–78259. 5. Mallapuram, S., Ngwum, N., Yuan, F., Lu, C., & Yu, W. (2017, May). Smart city: The state of the art, datasets, and evaluation platforms. In 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS) (pp. 447–452). 6. Lin, J., Yu, W., Yang, X., Yang, Q., Fu, X., & Zhao, W. (2015, June). A novel dynamic en-route decision real-time route guidance scheme in intelligent transportation systems. In 2015 IEEE 35th International Conference on Distributed Computing Systems (pp. 61–72). 7. Yang, Q., An, D., Min, R., Yu, W., Yang, X., & Zhao, W. (2017, July). On optimal pmu placement-based defense against data integrity attacks in smart grid. IEEE Transactions on Information Forensics and Security, 12(7), 1735–1750. 8. Wang, W., Wang, Q., & Sohraby, K. (2017). Multimedia sensing as a service (MSaaS): Exploring resource saving potentials of at cloud-edge IoT and fogs. IEEE Internet of Things Journal, 4(2), 487–495. 9. Xu, J., Guo, H., & Wu, S. (2018, October). Indoor multi-sensory self-supervised autonomous mobile robotic navigation. In 2018 IEEE International Conference on Industrial Internet (ICII) (pp. 119–128).

A QoS Aware Uplink Scheduler for IoT in Emergency …

21

10. Wu, S., Rendall, J. B., Smith, M. J., Zhu, S., Xu, J., Wang, H., et al. (2017, June). Survey on prediction algorithms in smart homes. IEEE Internet of Things Journal, 4(3), 636–644. 11. Yao, R., Wang, W., Farrokh-Baroughi, M., Wang, H., & Qian, Y. (2013). Quality-driven energyneutralized power and relay selection for smart grid wireless multimedia sensor based IoTs. IEEE Sensors Journal, 13(10), 3637–3644. 12. Xu, G., Yu, W., Griffith, D., Golmie, N., & Moulema, P. (2017, February). Toward integrating distributed energy resources and storage devices in smart grid. IEEE Internet of Things Journal, 4(1), 192–204. 13. Yang, Q., Yang, J., Yu, W., An, D., Zhang, N., & Zhao, W. (2014, March). On false datainjection attacks against power system state estimation: Modeling and countermeasures. IEEE Transactions on Parallel and Distributed Systems, 25(3), 717–729. 14. Lin, J., Yu, W., Yang, X., Xu, G., & Zhao, W. (2012, April). On false data injection attacks against distributed energy routing in smart grid. In 2012 IEEE/ACM Third International Conference on Cyber-Physical Systems (pp. 183–192). 15. Lin, J., Yu, W., & Yang, X. (2016, January). Towards multistep electricity prices in smart grid electricity markets. IEEE Transactions on Parallel and Distributed Systems, 27(1), 286–302. 16. Akpakwu, G. A., Silva, B. J., Hancke, G. P., & Abu-Mahfouz, A. M. (2018). A survey on 5G networks for the Internet of Things: Communication technologies and challenges. IEEE Access, 6, 3619–3647. 17. Yu, W., Xu, H., Zhang, H., Griffith, D., & Golmie, N. (2016, August). Ultra-dense networks: Survey of state of the art and future directions. In 2016 25th International Conference on Computer Communication and Networks (ICCCN) (pp. 1–10). 18. Global internet of things market size 2009–2019. https://www.statista.com/statistics/485136/ global-internet-of-things-market-size/ (Online). Accessed May 2, 2018. 19. Al-Zihad, M., Akash, S. A., Adhikary, T., & Razzaque, M. A. (2017, December). Bandwidth allocation and computation offloading for service specific IoT edge devices. In 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC) (pp. 516–519). 20. Wang, L., Zhang, X., Wang, S., & Yang, J. (2017). An online strategy of adaptive traffic offloading and bandwidth allocation for green M2M communications. IEEE Access, 5, 6444– 6453. 21. Wu, Y., Yu, W., Griffith, D.W., & Golmie, N. (2018). Modeling and performance assessment of dynamic rate adaptation for M2M communications. IEEE Transactions on Network Science and Engineering, 1. 22. Yu, W., Xu, H., Nguyen, J., Blasch, E., Hematian, A., & Gao, W. (2018). Survey of public safety communications: User-side and network-side solutions and future directions. IEEE Access, 6, 70397–70425. 23. nsnam. ns-3 network simulator. https://www.nsnam.org/. Accessed November 4, 2018. 24. Sesia, S., Baker, M., & Toufik, I. (2011). LTE-the UMTS long term evolution: From theory to practice. Chichester: Wiley. 25. Abu-Ali, N., Taha, A. M., Salah, M., & Hassanein, H. (2014). Uplink scheduling in LTE and LTE-advanced: Tutorial, survey and evaluation framework. IEEE Communications Surveys & Tutorials, 16(3), 1239–1265. 26. Ghavimi, F., Lu, Y., & Chen, H. (2017, July). Uplink scheduling and power allocation for M2M communications in sc-fdma-based LTE-A networks with QoS guarantees. IEEE Transactions on Vehicular Technology, 66(7), 6160–6170. 27. Ragaleux, A., Baey, S., & Karaca, M. (2017, August). Standard-compliant LTE—A uplink scheduling scheme with quality of service. IEEE Transactions on Vehicular Technology, 66(8), 7207–7222. 28. Yu, C., Yu, L., Wu, Y., He, Y., & Lu, Q. (2017). Uplink scheduling and link adaptation for narrowband Internet of Things systems. IEEE Access, 5, 1724–1734. 29. Liu, Q., Zoppi, S., Tan, G., Kellerer, W., & Steinbach, E. (2017, October). Quality-of-controldriven uplink scheduling for networked control systems running over 5G communication networks. In 2017 IEEE International Symposium on Haptic, Audio and Visual Environments and Games (HAVE) (pp. 1–6).

22

J. Zhang et al.

30. Elhamy, A., & Gadallah, Y. (2015). BAT: A balanced alternating technique for M2M uplink scheduling over LTE. In 2015 IEEE 81st Vehicular Technology Conference (VTC Spring) (pp. 1–6). IEEE. 31. Carlesso, M., Antonopoulos, A., Granelli, F., & Verikoukis, C. (2015). Uplink scheduling for smart metering and real-time traffic coexistence in LTE networks. In 2015 IEEE International Conference on Communications (ICC) (pp. 820–825). IEEE. 32. Wang, C., Kuo, J., Yang, D., & Chen, W. (2018, December). Surveillance-aware uplink scheduling for cellular networks. IEEE Transactions on Mobile Computing, 17(12), 2939–2952. 33. He, Y., Li, N., Xie, W., & Wang, C. (2017, October). Uplink scheduling and power allocation with M2M/H2H co-existence in LTE—A cellular networks. In 2017 IEEE 17th International Conference on Communication Technology (ICCT) (pp. 528–533). 34. Chuang, T., Tsai, M., & Chuang, C. (2015, March). Group-based uplink scheduling for machine-type communications in LTE-advanced networks. In 2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops (pp. 652–657). 35. Amarasekara, B., Ranaweera, C., Evans, R., & Nirmalathas, A. (2017). Dynamic scheduling algorithm for lte uplink with smart-metering traffic. Transactions on Emerging Telecommunications Technologies, 28(10), e3163. 36. Wu, Y., Yu, W., Zhang, J., Griffith, D., Golmie, N., & Lu, C. (2018). A 3D topology optimization scheme for M2M communications. In 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (pp. 15–20). IEEE. 37. Wu, Y., Cui, Y., Yu, W., Lu, C., & Zhao, W. (2019). Modeling and forecasting of timescale network traffic dynamics in m2m communications. In Proceedings of 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE. 38. Zhao, X., Laverty, D. M., McKernan, A., Morrow, D. J., McLaughlin, K., & Sezer, S. (2017, September). GPS-disciplined analog-to-digital converter for phasor measurement applications. IEEE Transactions on Instrumentation and Measurement, 66(9), 2349–2357. 39. Policy and charging control architecture. 3GPP Standard, (TS 23.203 release 15), 2017. 40. Ieee standard for synchrophasors for power systems c37.118. (IEEE C37.118.2), 2011. 41. Evolved universal terrestrial radio access (e-utra); base station (bs) radio transmission and reception. 3GPP Standard, (TS36.104 release 12), 2015.

Emulation-Based Performance Evaluation of the Delay Tolerant Networking (DTN) in Dynamic Network Topologies Weichao Gao, Hengshuo Liang, James Nguyen, Fan Liang, Wei Yu, Chao Lu and Mont Orpilla Abstract Delay Tolerant Networking (DTN) is designed to achieve reliable data transmission in resource constrained networks. When it is applied to dynamic networks, there is a lack of research on the evaluation of its performance with the thorough consideration for the impacts of key factors of network components, as well as the configuration of the DTN implementation. In this paper, we address the performance issue of DTN in dynamic networks by conducting a series of emulationbased experiments. Particularly, we first design an emulation platform based on the Common Open Research Emulator (CORE), which can be used to emulate and evaluate the network performance of DTNs and other protocols in dynamic networks. We then design a set of scenario groups and conduct a thorough quantitative evaluation to understand the impacts of individual dynamic network factors on the performance of DTNs. The experimental results validate the performance and reliability of the DTN in dynamic networks. Based on the experimental results, we are able to provide general guidelines to evaluate an application for the potential benefits of DTN, and the direction of optimizing the configuration for different applications in dynamic networks. W. Gao · H. Liang · F. Liang · W. Yu (B) · C. Lu Department of Computer and Information Sciences, Towson University, Towson, MD 21252, USA e-mail: [email protected] W. Gao e-mail: [email protected] H. Liang e-mail: [email protected] F. Liang e-mail: [email protected] C. Lu e-mail: [email protected] J. Nguyen · M. Orpilla US Army Command, Control, Computers, Communications, Cyber, Intelligence, Surveillance and Reconnaissance Center, Aberdeen, MD, USA e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_2

23

24

W. Gao et al.

Keywords Delay Tolerant Networking · Dynamic networks · Performance evaluation · Emulation · Internet of Things

1 Introduction Dynamic networks, as networks that facilitate communication between mobile devices and networking infrastructure, have been deployed to support numerous applications such as public safety [1], Internet-of-Things (IoT) driven smart-world systems [2–10]. On one hand, the flexible structure of dynamic networks enables the mobility of networked devices, facilitating communication and data transmission in applications and scenarios with multiple moving objects. On the other hand, mobility inherently increases the difficulty of both data transmission and management in the network, due to changes in the network structure during data transmission. A number of applications are required to operate under dynamic networks, in which network resources are limited and the network structure is changing over time, including public safety, smart-world systems, and more [1, 2, 11, 12]. To address issues data transmission in dynamic networks with constrained resources, existing protocols (Constrained Application Protocol (CoAP) [13, 14] and IPv6 over Low-Power Wireless Personal Area Networks (6LoWPAN), among others) have been designed to efficiently transmit data through network devices with limited network resources. For example, Gao et al. [14] carried out a performance evaluation of CoAP in constrained IoT scenarios with both fixed and dynamic topologies in comparison with traditional protocols such as HTTP (HyperText Transfer Protocol). These protocols, in networks with dynamic topologies, are able to increase the reliability of data transmission by reducing packet overhead and rearranging retransmission schedules to adapt to unstable connections and moving nodes in the network. Nonetheless, these protocols require the connection to be available all the way from the source node to the destination node during the data transmission time, and any disconnection between any pair of nodes through the path could cause the transmission fail. This poses a significant challenge when the number of hops increases between the source and destination in the dynamic network, increasing the likelihood of failure of data transmission. To ensure the reliable data delivery, the Delay Tolerant Networking (DTN) [15– 17] has been proposed. Generally speaking, the DTN is designed with a store-andforward mechanism to handle intermittent connectivity, in which data is forwarded to the next hop and stored locally whenever the connection is not available. When such a hop-by-hop transmission with storage is enabled at forwarding nodes in the routing path, it is no longer necessary to maintain the connections continuously and simultaneously from source to destination. By being able to endure long and variable delays, as well as disruptive connections, the DTN becomes a viable technique to provide reliable data transmission in a number of extreme network deployment scenarios and shows enormous potential in supporting applications that require dynamic networks.

Emulation-Based Performance Evaluation of the Delay Tolerant …

25

Up to date, there are research development efforts that have been conducted toward evaluating the DTN with respect to routing algorithms, scalability, optimization, application, security, and platforms [18–34]. Nonetheless, there remains a lack of research on the practical performance of DTN in dynamic networks with thorough consideration for the impacts of key factors of network components, as well as the configuration of the DTN implementation. In addition, little research has been conducted on developing guidelines for configuring the DTN to meet different performance requirements from diverse applications. To fulfil this gap, in this paper we conduct a quantitative evaluation of data transmission of the DTN in dynamic network scenarios using emulation, and compare results with TCP (Transmission Control Protocol) as the baseline scheme for providing reliable data transmission. Our experimental results demonstrate the positive benefits of the DTN with respect to both performance and reliability in dynamic networks, especially those with disruptive network connections. In this work, we offer the following three key contributions. • We establish an emulation-based platform using the Common Open Research Emulator (CORE). The designed platform is generic and can be used to evaluate the performance of the DTN, as well as various other protocols. • We conduct a thorough and quantitative evaluation that considers the impact of each individual factor of the dynamic network on the performance of the DTN. The key factors include distance, disruption, and mobility of the network components. For our evaluation, we have designed four scenario groups, where each represents one type of basic scenario and the comparison is conducted by adjusting the values of one key factor at a time. • Based on the results of our evaluation, we provide guidelines to determine how to deploy DTN to support applications with the desired performance by optimizing the configuration of parameters. The remainder of this paper is organized as follows: In Sect. 2, we review the specifications and protocols of the DTN and the implementation used in our evaluation. In Sect. 3, we introduce the design rationale and our approach for the quantitative evaluation based on an analysis of both protocols and scenarios. In Sect. 4, we present the detailed design of the experiments, the general configuration of the evaluation environment, and the scenario groups. In Sect. 5, we report and summarize the results of the evaluation. In Sect. 6, we conduct a brief literature review of DTN. Finally, we conclude the paper in Sect. 7.

2 Preliminaries In this section, we briefly introduce DTN, Bundle Protocol, and DTN implementation.

26

W. Gao et al.

2.1 DTN As dynamic topologies of MANET affect end-to-end data transmission, the objective of DTN is to provide reliable transport between two endpoints in a constrained or stress networking environments. The initiative of DTN development was first to support space communications (e.g., communication between satellites and with ground stations). As designed, DTN is deployed above the traditional transport layer of TCP or UDP and below the application layer of the TCP/IP stack. DTN is implemented based on the store-and-forward mechanism, in which a payload of a message sent by a sending endpoint is divided into a series of data blocks called bundle, stored and kept for later user, and sent to next-hop neighbors. The next-hop neighbors then store and relay the bundle to the their next-hop neighbors. The process continues until the receiving endpoint receives all the data blocks of the bundle. Routing protocol for neighbor discovery and mechanisms of bundle forward that determines which data blocks need to be forward to the next-hop DTN neighbor can also be applied to reduce the traffic of the bundle forwarding. The data transmission is hop-by-hop and does not require the connection of the entire path to be continuously and simultaneously available, such that the reliable transmission can be realized in the dynamic network. Generally speaking, a network supported by DTN commonly has the following features: • Intermittent Connectivity. While TCP must maintain end-to-end connections between source and destination, DTN can support communications, in which the full end-to-end path may not be available but some partial path exists. • Long or Variable Delay. In order to support intermittent connectivity, the total summarized time of propagation delays along all forwarding nodes combined with the variable queuing delays of each node is significantly more than what TCP can handle. In contrast, DTN is required to endure long delays of data transmission. • High Error Rates. To overcome bit errors in data transmission links, data correction or the retransmission of packets is necessary, but could lead to additional data traffic over the network. Nonetheless, for a given bit error rate, fewer retransmissions are needed when the data transmission is hop-to-hop compared with end-to-end, because the data packets confirmed by the intermediate nodes do not have to be resent from all the way back at the originating source node.

2.2 Bundle Protocol and ibr-dtn As stated earlier, the Bundle Protocol is the main component of the DTN. It specifies that a series of contiguous data blocks be packaged into a bundle, in which each bundle contains payload blocks as well as the information to enable the communication for the application. Bundles are routed in a store-and-forward mechanism between participating nodes over different network transport technologies (e.g., IP-based and

Emulation-Based Performance Evaluation of the Delay Tolerant …

27

non-IP-based transports). The transport layers that take the bundles across their local links are called bundle convergence layers. In this study, we use the ibr-dtn as the implementation of DTN for carrying out the performance evaluation. Our designed evaluation environment can be used to evaluate the performance of other DTN implementations, which is part of our ongoing research. Notice that ibr-drn is an open-source bundle protocol framework [35]. The detailed design can be found in RFC 5050 [36]. The features of ibr-dtn include of the implementation of a Bundle Security Protocol (RFC 6257) [37], a Socket-based Application Program Interface (API), AgeBlock support and bundle age tracking, support for compressed bundle payloads, support for bundle-in-bundle, support for IPv6, and support for several routing modules, such as the Probabilistic Routing Protocol using History of Encounters and Transitivity (PRoPHET). It is worth mentioning that PRoPHET is the routing module that we use in our evaluation.

3 Our Approach To evaluate the performance of DTN in dynamic network topologies, it is critical to analyze the key features of the dynamic network environment and the particular DTN application to determine the scope of service, as well as relevant performance metrics. In the following, we first introduce the design rationale of the experiment in detail, and then analyze the features of both the DTN protocol and dynamic networks. Based on the analysis, we then identify the objectives and performance indicators of the subsequent experiments.

3.1 Design Rationale The objective of this study is to assess the performance of DTN in dynamic networks. Thus, our approach is to understand the key factors of the dynamic network itself, as well as the features and parameters of the DTN implementation. In our prior research works [14, 38], we characterized the key factors of dynamic networks such as distance, disruption, and mobility. By leveraging the combination of these key factors in a set of scenarios, we can validate the performance of the DTN (with different parameter values) with respect to the key performance metrics of data delivery status, delivery time, and system overhead. While the DTN is implemented over TCP in most cases, we set a control group with TCP only, in which we define as the Bare-TCP Group, to be the baseline for validating the effectiveness and efficiency of the DTN-enabled groups in the different scenarios.

3.2 DTN Features Recall that the DTN is designed for networks in situations when the network topology is frequently changed. A store-and-forward mechanism, which is a hop-by-top

28

W. Gao et al.

transmission, is designed to accommodate the tolerance of the unstable networks. A set of parameters in the DTN implementation are introduced in configuration, which include the storage size limit, maximum lifetime of bundles, neighborhood discovery, routing strategy, and timeout between handshakes, among others. Of all these parameters, two key parameters are considered the most important to the performance of data transmission: (i) the limit of bundle fragments in transit, and (ii) the limit of payload in the fragments. In the ibr-dtn implementation, the number of fragments in transit is limited. This means that any fragment that would exceed the limit instead waits for the next handshake triggered either manually (e.g., a new dtnsend command) or by the timeout. Since the timeout is set at a ten-minute interval by default, the system would be idle for ten minutes . Each time the number of fragments to send exceeds the limit, leading to quite a long latency. In real-world practice, there are three ways to deal with such a latency. First, the limit of payload in the fragments can be increased such that the original data would have a smaller number of total fragments. Second, by splitting the original data into multiple bundles and sending them asynchronously, the sending would be triggered more quickly. Third, by manipulating the limit of bundle fragments in transit, the number of allowed fragments in transit simultaneously could be manually increased. In this paper, as we focus on the availability and reliability of DTN in dynamic networks, we do not consider the wait time between handshakes so that the limit of bundle fragments is set high enough to allow all fragments being sent without waiting. Notice that in Sect. 4.2, below, we shall describe the detailed setup of a Preliminary Group scenario for our experiment to verify the two aforementioned parameters (i.e., bundle fragments in transit and payload in fragments). The results of the Preliminary Group are further presented in Sect. 5.1, which presents the impacts of these two parameters and helps us to select appropriate values for the other scenario groups in the performance evaluation.

3.3 Features of Dynamic Networks In our prior research [14, 38], we characterize the factors of dynamic networks as distance, disruption, and mobility, elaborating upon each as follows: (i) Distance. In this paper, distance is represented by the number of hops from the source to the destination. It is worth noting that the physical distance between adjacent nodes does not cause as high a latency as does the data processing in each node, and can be ignored in our experiments. Thus, we use the number of hops as the control variable for the distance. (ii) Disruption. It represents the instability of the connection and the frequency with which packets are lost during transmission. In our experiments, we can control the packet loss rate to represent different levels of disruption in the network. (iii) Mobility. The movement of nodes in the network could result in disconnections and reconnections, as well as obvious topology changes in the network. We use the simplest case of a node moving in a linear fashion, disconnecting from

Emulation-Based Performance Evaluation of the Delay Tolerant …

29

the network by moving away and reconnected by moving back, greatly simplifying the mobility model. Time intervals between disconnection and reconnection are controlled variables in the experiment. To thoroughly evaluate the performance of the DTN and the impacts of the many factors of the dynamic network, we design a set of scenario groups for our evaluation, in which each group is designed specifically to assess one particular factor. The configuration details are described in Sect. 4.2.

3.4 Scope of Experiments According to the feature analysis of both the DTN protocol and the dynamic network, we define the evaluation objective and performance metrics. The objective of the experiment is to evaluate the performance of the DTN (with the ibr-dtn implementation) in dynamic networks. Using bare-TCP as the baseline, we compare performance in data transmission to demonstrate the effectiveness and efficiency of DTN in the dynamic network. We use the following key performance metrics for the evaluation: • Delivery Status: It is defined as successful if the original data is delivered to the destination and its integrity is intact, and fails otherwise. In our experiment, failure occurs when packets are not delivered, indicating the effectiveness or ineffectiveness of the scheme. • Delivery Time: It is defined as the time taken from the original data being sent from the source to delivery to the destination, indicating the efficiency of the scheme. • Overhead Ratio: It is defined as the ratio of overhead in the network to the size of original payload data. The overhead is computed by subtracting payload size from the sum of all outbound bytes of all nodes in the network during the transmission. It includes the headers of packets, the retransmitted bytes, and other bytes that are needed to make the transmission successful. The number of packets is recorded so that the packet loss rate can be computed.

4 Experiment Design In this section, we introduce the detailed experimental design, including the emulation-based evaluation environment, data traffic and collection, and scenarios.

4.1 Generic Configuration 4.1.1

Emulation-Based Testing Environment

As mentioned above, our experiment is carried out through emulation of network devices and communication. We utilized the emulation tool Common Open

30

W. Gao et al.

Research Emulator (CORE, v5.1) [39] to realize the emulation environment, running on Ubuntu Linux (v16.04 LTS) operating system. While the emulated nodes are wirelessly connected in the dynamic network, we use OLSRv2 for the required routing protocol in the baseline group (Bare-TCP). To be specific, we use the olsrd2 [40] implementation, which can be acquired from the OLSR.org Network Framework (OONF). In addition, an NS-3 [41] compatible script is used to control the mobility, and a random mobility pattern can be generated by BonnMotion (v2.13). As stated in Sect. 2, we use ibr-dtn [35] as the DTN implementation, and PRoPHET [42] for the DTN routing. It is worth mentioning that PRoPHET is designed for DTN routing in wireless networks and it can support DTN neighborhood discovery as well as communication. Thus, it does not rely on other protocols for routing (such as OLSRv2) in mobile wireless networks.

4.1.2

Data Traffic and Collection

To present a typical use case, we consider file transfer as the data traffic in our evaluation. In the Bare-TCP Group described in Sect. 3.1, we implement a simple Python program for FTP (File Transfer Protocol) between the server and client pair deployed to the source and destination nodes in the emulated network. This program records the sending and receiving times in the format of a timestamp. In the DTN group, the file transfer is performed by embedded commands dtnsend and dtnrecv, with timestamps recorded as well. To track the related traffic in the network, tcpdump is running on all the active nodes, filtering TCP packets for both inbound and outbound traffic. These records are easily processed with the awk scripts that we have developed.

4.2 Scenarios According to the objectives and performance indicators that we defined in Sect. 3.4, we have designed the following four scenario groups. While the experiment is on dynamic networks, all connections described below are wireless connections by default. • Preliminary Group: In this group, we tend to verify the impacts of two key parameters of the DTN and PRoPHET configuration. The first is the limit of bundle fragments in transit and the second is the limit of payload in the fragments. We set a five-node topology, in which all nodes are fixed in position with stable connections. In the ibr-dtn implementation, the default values of the limit of bundle fragments and the limit of payload are 5 and 500 KiB, respectively. In our assessment, however, we vary the limit of fragments by {5, 10, 15, 20, 25} and the payload limit as {250 KiB, 500 KiB}. Meanwhile, the number of hops are varied as {1, 2, 3, 4} and the file size is fixed at 5 MB to create the scenarios.

Emulation-Based Performance Evaluation of the Delay Tolerant …

31

• Group 1: In this group, all nodes are fixed in position and all connections are considered as stable. The distance (i.e., the number of hops from the source node to the destination node) is controlled as the key variable of the network topology. Another important variable is the file size, as we use file transfer for the data traffic. In this group, we combine assessments of the number of hops {1, 2, 3, 4} and the file size {1 MB, 10 MB, 100 MB} to create the scenarios. In addition, the fragment size in the DTN group is controlled as the third variable, which is varied by the set {25 KiB, 50 KiB, 200 KiB, 800 KiB}. • Group 2: In this group, the topology is the same fixed topology as in Group 1, but the wireless network connection is now disruptive. We keep the distance parameter (i.e., the number of hops) in the same range of {1, 2, 3, 4} as the key variable, and fixing the file size to 1 MB to better adapt the disruptive network. The variable of packet loss rate is varied as the set {10%, 20%, 30%} to evaluate the tolerance to disruption. In addition, the fragment size in the DTN group is varied again as {25 KiB, 100 KiB}. • Group 3: In this group, the topology becomes dynamic. A three-node topology is emulated with source and destination nodes being fixed and a third mobile intermediate node added in between. While the number of hops is fixed to 2, the intermediate node moves out of range of both source and destination immediately after the file transfer starts. The intermediate node returns and recovers the connections after a controlled time interval, which is set as the key variable by the set {1 s, 10 s, 30 s, 60 s, 120 s, 300 s, 600 s, 1200 s, 1800 s}. Meanwhile, we keep the file transfer size to 2 MB and vary the fragment size variable as {25 KiB, 250 KiB} in the DTN group. • Group 4: In this group, the topology is dynamic and remains the same as in Group 3 (i.e., a three-node topology consisting of source, mobile intermediate, and destination). In this group, however, the mobility pattern is distinct from that of Group 3. In this case, the Group 4 intermediate node never connects to both source and destination nodes simultaneously. Instead, the mobile intermediate moves back and forth between the source and destination as a “Data Mule”. The moving follows a pattern of “2 s, 2 s, 2 s, 2 s,” which means that it connects to the source or destination node for 2 s, then disconnects (out of the range for both ends) for 2 s, and reconnects to the other node for 2 s, and then disconnects again for another 2 s. Thus, it takes 8 s for each entire round. Meanwhile, we vary the file transfer size by {1 MB, 10 MB, 50 MB} and the fragment size variable is varied by the set {25 KiB, 100 KiB, 400 KiB}.

5 Performance Evaluation Based on the experimental design outlined in Sect. 4, we now detail the evaluation results. In the following, we first explain the result of the Preliminary Group to assess the impacts of the DTN configuration parameters. The detailed configurations of remaining scenario Groups 1–4 are derived from the results of the Preliminary

32

W. Gao et al.

Group. The results of Groups 1–4, representing the impacts of the factors defined in Sect. 3.3 of the dynamic network, are also presented.

5.1 Impact of DTN Parameters As we mentioned in Sect. 3.2, the limit of bundle fragments in transit and the limit of payload in the fragments are two key parameters that will affect the performance of data transfer in DTN. Figure 1 illustrates that the delivery time of a fixed size file data (5 MB in the experiment) is inversely proportional to the allowed fragments in transit. That is, as the allowed fragments increase, the delivery time decreases. In addition, Fig. 2 illustrates the accumulated data sent from the source node over time,

450 ibr-dtn@250k ibr-dtn@125k

400

Delivery Time (ms)

350 300 250 200 150 100 50 0 5

10

15

20

25

Limit of Fragments in Transit

Accumulated Data Sent (Bytes)

Fig. 1 Impact of fragment size and allowed fragments in transit 6

×10 6

5 4 3

limit = 5 limit = 10 limit = 15 limit = 20 limit = 25

2 1 0

0

0.5

1

1.5

2

2.5

Time (ms)

Fig. 2 Accumulated data sent from source node

3

3.5

4

4.5

×10 5

Emulation-Based Performance Evaluation of the Delay Tolerant …

33

where waiting periods are indicated by increases in time (parallel to x-axis). As we can see from the figure, the incremental delivery times are caused by the fragments that wait for the next handshakes. We can also observe that, with the increase of the limit of fragments, more data is sent at each handshake, which further reduces the occurrence of waiting periods.

5.2 Impact of Distance in Fixed Networks In Group 1, we compare the performance of ibr-dtn with different values of fragment size limit against the traditional TCP (non-DTN) network for file transfer in fixed network topologies. Table 1 illustrates the round trip times (RTTs) in milliseconds and demonstrates that TCP incurs much shorter RTTs than ibr-dtn, which is expected, as the latter requires additional bundle processing. Meanwhile, Fig. 3 shows the delivery times for different sizes of data transfers in routes varied by number of hops, representing the different distances between sources and destinations. As we can see from the figure, TCP consistently achieves the best performance in fixed topologies with stable connections. In addition, the delivery times increase approximately linearly with the increase in number of hops between source and destination for all groups. Moreover, for ibr-dtn, a higher limit of fragment size results in faster delivery, since it reduces the processing time needed to split and reassemble bundles. Finally, in Fig. 4, we show the evaluation results of overhead. It is worth noting that all ibr-dtn groups and the bare-TCP group require similar overhead for all distances, especially

Table 1 RTT of TCP versus ibr-dtn in stable networks Hops TCP (ms) 1 hop: 2 hops: 3 hops: 4 hops:

ibr-dtn (ms)

0.067 0.070 0.092 0.094

5.721 9.094 11.627 16.290

(b). Data Size: 10 MB

(a). Data Size: 1 MB

(c). Data Size: 100 MB 7

5000

400

×10 4 ibr-dtn@25k ibr-dtn@50k ibr-dtn@200k ibr-dtn@800k TCP

4500

350

6 4000

Delivery Time (ms)

300

5

3500 250

3000

4

2500

200

3

2000

150

1500

2

100 1000

1

50

500 0

0 1 HOP

2 HOP

3 HOP

4 HOP

0 1 HOP

2 HOP

Fig. 3 Delivery time influenced by distance

3 HOP

4 HOP

1 HOP

2 HOP

3 HOP

4 HOP

34

W. Gao et al. (b). 2 HOPs

(a). 1 HOP 3

150

(d). 4 HOPs 500

ibr-dtn@25k ibr-dtn@50k ibr-dtn@200k ibr-dtn@800k TCP

300

2.5

Overhead Ratio (%)

(c). 3 HOPs 350

400 250

2

100 300

200 1.5 150 1

200

50 100 100

0.5

50 0

0 1MB

10 MB

100 MB

0 1MB

10 MB

100 MB

0 1MB

10 MB

100 MB

1MB

10 MB

100 MB

Fig. 4 Overhead ratio impacted by distance

when the data size is small. When the data size increases, the ibr-dtn groups with smaller fragment size limits require more overhead, as more fragments are generated for the same data. To conclude, each incremental hop increases the delivery time and overhead linearly under stable connections. Additionally, it is clear that there is no significant advantage to using ibr-dtn instead of bare-TCP in the fixed network with stable connections. Nonetheless, a larger limit of fragment size could lead to better performance of DTN.

5.3 Impact of Disruption in Fixed Networks When the connection becomes disruptive, data packets are lost during transmission. This packet loss would trigger retransmission according to the design of TCP, resulting in more overhead and longer times for data delivery. Table 2 illustrates the delivery status of the schemes (i.e., ibr-dtn and TCP) of our Group 2 experiments. Notice from the table that TCP becomes much less reliable as disruption increases, ultimately failing to deliver the file in majority of scenarios. In contrast, ibr-dtn is clearly much more reliable in a disruptive network environment, successfully delivering the file in all scenarios. In addition, Figs. 5 and 6 indicate that TCP takes more time and requires more overhead than ibr-dtn to deliver the file in the cases where it was even successful at completing the delivery. This is because TCP requires the entire path, source to destination, to be available during the transmission, and packet loss is accumulated exponentially over the increment of hops, reducing the delivery rate dramatically. In comparison, ibr-dtn delivers the data fragments whenever the path is available to the next hop. The completed fragments are stored in every hop of the path so that the failed fragments do not have to start over from the source node. In the meantime, ibrdtn nodes will detect and connect to neighbors by following the PRoPHET routing protocol for the existence of the fragments, keeping the retransmission active much longer than the timeout of TCP. Figure 5 also indicates that the data delivery is faster for ibr-dtn in disruptive networks when the limit of fragment size is smaller. This is because fragment with

Emulation-Based Performance Evaluation of the Delay Tolerant … Table 2 Delivery status in disrupted networks Packet loss: 10% 1-hop

TCP ibr-dtn@25k ibr-dtn@100k TCP ibr-dtn@25k ibr-dtn@100k TCP ibr-dtn@25k ibr-dtn@100k TCP ibr-dtn@25k ibr-dtn@100k

2-hops

3-hops

4-hops

Delivery Time (ms)

15

×10 4

(a). Packet Loss: 10% 9

10

5

×10 4

Delivered Delivered Delivered Delivered Delivered Delivered Failed Delivered Delivered Failed Delivered Delivered

20%

30%

Failed Delivered Delivered Failed Delivered Delivered Failed Delivered Delivered Failed Delivered Delivered

Failed Delivered Delivered Failed Delivered Delivered Failed Delivered Delivered Failed Delivered Delivered

(b). Packet Loss: 20% 18

8

16

7

14

6

12

5

10

4

8

3

6

2

4

2 HOP

3 HOP

(c). Packet Loss: 30% ibr-dtn@25k ibr-dtn@100k TCP

0

0 1 HOP

×10 5

2

1 0

35

1 HOP

4 HOP

2 HOP

3 HOP

1 HOP

4 HOP

2 HOP

3 HOP

4 HOP

Fig. 5 Delivery time influenced by disruptive connection (a). 1 HOP 60

(c). 3 HOPs 500

200

400

150

300

100

200

50

100

0

0

(d). 4 HOPs 700 ibr-dtn@25k ibr-dtn@100k TCP

600

50

Overhead Ratio (%)

(b). 2 HOPs 250

500 40 400

30 300

20 200 10

100

0 10%

20%

Packet Loss

30%

10%

20%

Packet Loss

30%

0 10%

20%

Packet Loss

30%

10%

20%

30%

Packet Loss

Fig. 6 Overhead ratio influenced by disruptive connection

a smaller size payload contains less data packets, and is thus more likely to be confirmed complete (i.e., all data packets of the fragment are received) in the next hop, reducing the chance of retransmitting the entire fragment. To conclude, TCP yields worse performance and reliability in the network with disruptive connections, as it requires all packets to successfully traverse the entire

36

W. Gao et al.

path of transmission without fail, end-to-end. As DTN transmission is hop-by-hop, the disruption is less amplified by the distance, resulting in better reliability. Meanwhile, smaller fragments in ibr-dtn are more likely to be delivered, leading to lower retransmission and overhead.

5.4 Impact of Mobility in Dynamic Networks While Groups 1 and 2 evaluated the impacts of distance and disruption in the fixed topology, Group 3 represents the impact of mobile nodes. Table 3 illustrates the delivery status of the candidates transferring a 2 MB file while the intermediate node moves. While TCP could only tolerate about 10 min of disconnection from the path before the session times out, ibr-dtn is able to complete the data transfer even after 30 min of intermediate node disconnection. We can also observe from Fig. 7 that the curve of delivery time over leaving time becomes straight after 10 min of waiting, indicating that ibr-dtn does not require retransmission as TCP does, but instead recovers the transmission as soon as the leaving node is reconnected. Moreover, the sizes of the fragments do not greatly affect the retransmission, as disconnection occurs only once during the transmission. To conclude, when nodes are in motion, TCP performs worse overall, and eventually fails entirely with the increase in disconnection duration. Packet retransmission scheduled as per the TCP configuration is soon terminated when the timeout is reached. In contrast, the PRoPHET protocol in ibr-dtn will detect and connect neighbors and recover the fragment transfer as soon as the link becomes available, resulting in better performance in tolerating disconnections.

Table 3 Delivery status in dynamic networks Leaving time TCP 0s 1s 10 s 30 s 60 s 120 s 300 s 600 s 1200 s 1800 s

Delivered Delivered Delivered Delivered Delivered Delivered Delivered Delivered Failed Failed

ibr-dtn@25k

ibr-dtn@250k

Delivered Delivered Delivered Delivered Delivered Delivered Delivered Delivered Delivered Delivered

Delivered Delivered Delivered Delivered Delivered Delivered Delivered Delivered Delivered Delivered

Emulation-Based Performance Evaluation of the Delay Tolerant … Fig. 7 Delivery time impacted by mobile nodes

2

×10 6 ibr-dtn@25k ibr-dtn@250k TCP

1.8 1.6

Delivery Time (ms)

37

1.4 1.2 1 0.8 0.6 0.4 0.2 0 0

200

400

600

800

1000

1200

1400

1600

1800

2000

Leaving Time (s)

5.5 Data Mule Pattern In the Data Mule scenarios (Group 4), data transfer using traditional TCP would never succeed, since there is never a complete and uninterrupted path from source to destination at any time. Thus, only DTN can make the data transfer possible in this scenario. Figure 8 validates the reliability of ibr-dtn in the data mule scenarios and demonstrates the delivery time of the ibr-dtn with different values of data size and fragment size, with delivery time given in number of rounds (recall that each round is 8 s in our design). We can also see that when the data size is small, the size of fragment limit does not have a significant impact. Nonetheless, as the data size increases, transmission with larger fragments is much faster, since it takes more time

Fig. 8 Delivery time in data mule pattern Delivery Time (Number of Rounds)

12

10

ibr-dtn@25k ibr-dtn@100k ibr-dtn@400k

8

6

4

2

0 1MB

10 MB

Data Size

50 MB

38

W. Gao et al.

to split the data to smaller fragments. Although the frequent disconnection favors small fragments, a non-disruptive network favors the opposite. To conclude, TCP transmission is feasible in the data mule scenario and DTN is the only choice. To optimize the performance, the parameters of DTN should be carefully chosen.

6 Related Works Recent efforts to progress the state of DTN have been aimed at making improvements, including through optimizations [20, 30], routing algorithms [18, 19, 21, 24, 33], security [29, 34], reliability [27], and verifiability [22, 25, 26, 32]. For example, Bulut et al. [43] proposed a cost-effective multi-period spraying algorithm for routing in DTN. Other works in DTN have sought to apply the protocol to new systems and applications, including satellite communications, social networks [28], disaster recovery [31], and resource constrained networks [23]. In addition to the DTN implementation outlined in this work, several other DTN implementations have been developed [35, 44]. In addition, several others exist that have been designed for simulating DTN environment to research particular features, such as movement models and routing messages. These simulation-based implementations include the ONE simulator [45], which is a Java-based DTN simulation environment developed by Nokia Research Center. In contrast, in this paper, we have focused our study on ibr-dtn [35], introduced in Sect. 2, since it has complete support for PRoPHET routing protocol.

7 Final Remarks In this paper, we have conducted a performance assessment of DTN in dynamic networks, leveraging an emulation-based testing platform and designing a series of scenario groups as representative dynamic network environments. Using this platform, we deployed the ibr-dtn implementation and conducted the quantitative evaluation of the DTN to validate its performance under dynamic network topologies in comparison with TCP. The experimental results show that, in unstable network environments, DTN not only performs better in data transmission than TCP with respect to delivery time and overhead, but also achieves high reliability of data delivery. Our evaluation and experimentation show how each key factor in dynamic networks affect the performance of DTN, which can be used to provide guidelines for optimized DTN configuration for particularized applications under unique network conditions. Our evaluation results also show that DTN is quite adept at sending data with small sizes in scenarios, in which the network is disrupted, constrained, or unsustained. These scenarios can be used to represent real-world cases and applications such as disaster recovery, environment monitoring, and endpoint controlling in IoT. Future direc-

Emulation-Based Performance Evaluation of the Delay Tolerant …

39

tions of this work will focus on the optimization of DTN parameters, cross-domain communications (dtn and non-dtn domains), and other application in heterogeneous networks and IoT applications, among others.

References 1. Yu, W., Xu, H., Nguyen, J., Blasch, E., Hematian, A., & Gao, W. (2018). Survey of public safety communications: User-side and network-side solutions and future directions. IEEE Access, 6, 70397–70425. 2. Lin, J., Yu, W., Zhang, N., Yang, X., Zhang, H., & Zhao, W. (2017, October). A survey on Internet of Things: Architecture, enabling technologies, security and privacy, and applications. IEEE Internet of Things Journal, 4(5), 1125–1142. 3. Xu, G., Yu, W., Griffith, D., Golmie, N., & Moulema, P. (2017, February). Toward integrating distributed energy resources and storage devices in smart grid. IEEE Internet of Things Journal, 4(1), 192–204. 4. Yu, W., Griffith, D., Ge, L., Bhattarai, S., & Golmie, N. (2015, January). An integrated detection system against false data injection attacks in the smart grid. Journal of Security and Communication Networks, 8(2), 91–109. 5. Yu, W., Liang, F., He, X., Hatcher, W. G., Lu, C., Lin, J., et al. (2018). A survey on the edge computing for the Internet of Things. IEEE Access, 6, 6900–6919. 6. Ekedebe, N., Lu, C., & Yu, W. (2015, June). Towards experimental evaluation of intelligent transportation system safety and traffic efficiency. In 2015 IEEE International Conference on Communications (ICC) (pp. 3757–3762). 7. Mallapuram, S., Ngwum, N., Yuan, F., Lu, C., & Yu, W. (2017, May). Smart city: The state of the art, datasets, and evaluation platforms. In 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS) (pp. 447–452). 8. Xu, H., Yu, W., Griffith, D., & Golmie, N. (2008). A survey on industrial Internet of Things: A cyber-physical systems perspective. IEEE Access, 1–1. 9. Xu, J., Guo, H., & Wu, S. (2018, October). Indoor multi-sensory self-supervised autonomous mobile robotic navigation. In 2018 IEEE International Conference on Industrial Internet (ICII) (pp. 119–128). 10. Wu, S., Rendall, J. B., Smith, M. J., Zhu, S., Xu, J., Wang, H., et al. (2017, June). Survey on prediction algorithms in smart homes. IEEE Internet of Things Journal, 4(3), 636–644. 11. Yu, W. (2002). DSR-based energy-aware routing protocols in ad hoc networks. In ICWN, 2002. 12. Gao, W., Nguyen, J., Wu, Y., Hatcher, W.G., & Yu, W. (2017, July). A bloom filter-based duallayer routing scheme in large-scale mobile networks. In 2017 26th International Conference on Computer Communication and Networks (ICCCN) (pp. 1–9). 13. Shelby, Z., Hartke, K., & Bormann, C. (2014). The constrained application protocol (CoAP). Internet Engineering Task Force, RFC 7252. 14. Gao, W., Nguyen, J. H., Yu, W., Lu, C., Ku, D. T., & Hatcher, W. G. (2017, October). Toward emulation-based performance assessment of constrained application protocol in dynamic networks. IEEE Internet of Things Journal, 4(5), 1597–1610. 15. Auzias, M., Maho, Y., & Raimbault, F. (2015, August). CoAP over BP for a delay-tolerant internet of things. In 2015 3rd International Conference on Future Internet of Things and Cloud (pp. 118–123). 16. Raveneau, P., & Rivano, H. (2015, August). Tests Scenario on DTN for IoT III Urbanet collaboration. Technical Report RT-0465, Inria—Research Centre Grenoble—Rhône-Alpes; INRIA. 17. Amendola, D., De Rango, F., Massri, K., & Vitaletti, A. (2014, June). Efficient neighbor discovery in RFID based devices over resource-constrained DTN networks. In 2014 IEEE International Conference on Communications (ICC) (pp. 3842–3847).

40

W. Gao et al.

18. Noviani, F., Stiawan, D., Siswanti, S. D., Septian, T. W., Riyadi, M. A., Aljaber, F., & Budiarto, R. (2017, October). Analysis of custody transfer on moving bundle protocol of wireless router in delay tolerant network (DTN). In 2017 4th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE) (pp. 50–53). 19. Xia, S., Cheng, Z., Wang, C., & Peng, Y. (2014, December). A deliver probability routing for delay tolerant networks (DTN). In 2014 International Conference on Wireless Communication and Sensor Network (pp. 407–410). 20. Gholap, M., & Shaikh, S. (2016, December). Information sharing in delay tolerant mobile networks with some incentive and fewest transmissions. In 2016 8th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 119–123). 21. Igarashi, Y., & Miyazaki, T. (2018, January). A DTN routing algorithm adopting the ‘community’ and ‘centrality’ parameters used in social networks. In 2018 International Conference on Information Networking (ICOIN) (pp. 211–216). 22. Mahbub Khan, M. K., & Sajjadur Rahim, M. (2018, February). Performance analysis of socialaware routing protocols in delay tolerant networks. In 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2) (pp. 1–4). 23. Husni, E. (2011, December). Delay tolerant network based internet services for remote areas using train systems. In 2011 17th IEEE International Conference on Networks (pp. 47–52). 24. Omidvar, A., & Mohammadi, K. (2014, May). Intelligent routing in delay tolerant networks. In 2014 22nd Iranian Conference on Electrical Engineering (ICEE) (pp. 846–849). 25. Juyal, V., Pandey, N., & Saggar, R. (2016, April). Impact of varying buffer space for routing protocols in delay tolerant networks. In 2016 International Conference on Communication and Signal Processing (ICCSP) (pp. 2152–2156). 26. Srividya, C., & Rakesh, N. (2017, January). Enhancement and performance analysis of epidemic routing protocol for delay tolerant networks. In 2017 International Conference on Inventive Systems and Control (ICISC) (pp. 1–5). 27. Patil, P., & Penurkar, M. (2015, January). Congestion avoidance and control in delay tolerant networks. In 2015 International Conference on Pervasive Computing (ICPC) (pp. 1–5). 28. Hom, J., Good, L., & Yang, S. (2017, January). A survey of social-based routing protocols in delay tolerant networks. In 2017 International Conference on Computing, Networking and Communications (ICNC) (pp. 788–792). 29. Cabaniss, R., Kumar, V., & Madria, S. (2012, October). Three point encryption (3PE): Secure communications in delay tolerant networks. In 2012 IEEE 31st Symposium on Reliable Distributed Systems (pp. 479–480). 30. Hossen, M. S., Ahmed, M. T., & Rahim, M. S. (2016, October). Effects of buffer size and mobility models on the optimization of number of message copies for multi-copy routing protocols in scalable delay-tolerant networks. In 2016 International Conference on Innovations in Science, Engineering and Technology (ICISET) (pp. 1–4). 31. Shibata, Y., Kitada, S., Sato, G., & Hashimoto, K. (2017, March). Research on realization of a multi-hop network based on delay tolerant network on disasters. In 2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA) (pp. 639–644). 32. Cello, M., Gnecco, G., Marchese, M., & Sanguineti, M. (2014, March). Evaluation of the average packet delivery delay in highly-disrupted networks: The DTN and IP-like protocol cases. IEEE Communications Letters, 18(3), 519–522. 33. Choksatid, T., Narongkhachavana, W., & Prabhavat, S. (2016, January). An efficient spreading epidemic routing for delay-tolerant network. In 2016 13th IEEE Annual Consumer Communications Networking Conference (CCNC) (pp. 473–476). 34. Bao, J., Zhang, S., Zhang, J., Wang, K., & Hu, H. (2014, June). Secure efficient routing based on network coding in the delay tolerant networks. In 2014 IEEE 5th International Conference on Software Engineering and Service Science (pp. 456–459). 35. Schildt, S., Morgenroth, J., Pttner, W., Wolf, L., Schildt, S., Morgenroth, J., et al. (2011). IBRDTN: A lightweight, modular and highly portable bundle protocol implementation. Electronic Communications of the EASST.

Emulation-Based Performance Evaluation of the Delay Tolerant …

41

36. Scott, K. (2007). Bundle protocol specification. Internet Engineering Task Force, RFC 5050. 37. Symington, S., Farrell, S., Weiss, H., & Lovell, P. (2011, May). Bundle security protocol specification. Internet Engineering Task Force, RFC 6257. 38. Gao, W., Nguyen, J., Ku, D., Zhang, H., & Yu, W. (2016). Performance evaluation of netconf protocol in MANET using emulation. Software Engineering Research, Management and Applications, 654, 11–32. 39. Common Open Research Emulator. http://www.nrl.navy.mil/itd/ncs/products/core. 40. olsrd2. http://www.olsr.org. 41. Network Simulator V3 (NS-3). http://www.nsnam.org/. 42. Lindgren, A., Doria, A., Davies, E., & Grasic, S. (2012, August). Probabilistic routing protocol for intermittently connected networks. Internet Engineering Task Force, RFC 6693. 43. Bulut, E., Wang, Z., & Szymanski, B. K. (2010, October). Cost-effective multiperiod spraying for routing in delay-tolerant networks. IEEE/ACM Transactions on Networking, 18(5), 1530– 1543. 44. Araniti, G., Bezirgiannidis, N., Birraneand, E., Bisio, I., Burleighand, S., Caini, C., et al. (2015). Contact graph routing in DTN space networks: Overview, enhancements and performance. IEEE Communications Magazine, 53(3), 38–46. 45. Keränen, A., Ott, J., & Kärkkäinen, T. (2009). The one simulator for DTN protocol evaluation. In Proceedings of the 2nd international conference on simulation tools and techniques (p. 55). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering).

Teaching Distributed Software Architecture by Building an Industrial Level E-Commerce Application Bingyang Wei, Yihao Li, Lin Deng and Nicholas Visalli

Abstract Software architecture design is a key component in software engineering. Teaching it effectively requires instructors come up with challenging real world projects, in which students need to recognize the architectural problems, figure out solutions, compare alternatives and make decisions. On the other hand, keeping students motivated and interested in this topic is also critical. However, truly understanding and appreciating a certain software architecture are not easy for most undergraduate students. Most architecture courses either focus too much on the theoretical concepts or fail to immerse students enough in complex industrial level projects unless the instructors are practicing software engineers. In this paper, we propose a Project-Based Learning experience, which brings an open-source full-fledged system to classroom in order to effectively teach distributed software architecture. Students are introduced the best practices that are widely used in industry to solve some of today’s common architectural problems. Our ultimate goal is to establish a public code repository of realistic projects from popular industrial sectors, so that instructors can reference to improve software architecture’s learning experience even though they are not practicing software engineers. Keywords Distributed architecture · Microservices · DevOps · Software engineering education B. Wei (B) Department of Computer Science, Texas Christian University, Fort Worth, TX, USA e-mail: [email protected] Y. Li Institute for Software Technology, Graz University of Technology, Graz, Austria e-mail: [email protected] L. Deng · N. Visalli Department of Computer and Information Sciences, Towson University, Towson, MD, USA e-mail: [email protected] N. Visalli e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_3

43

44

B. Wei et al.

1 Introduction A software architecture course teaches students system thinking skills. In a typical architecture course, students are taught numerous theoretical concepts and principles; simple example software systems are often introduced with the purpose of motivating students, providing some context, and making students link what they learned to the “real world.” [1] Unfortunately, due to the page limitations of the architecture textbooks, most example systems are really toy systems, and too simple to express enough architectural issues [2]. This contradicts with the requirements that training software architects needs to contend with the problem of how to make the learning realistic. For students without any industrial experience, it is a frustrating learning experience since everything is dry and abstract. From the instructors’ perspective, many young faculty who just spent the last 5 years working on their Ph.D. may have little experience in industrial software development. The industry relevance and alignment need to be improved. Our concern is backed by the feedback of graduates from the authors’ universities. Graduates reported that many popular architectural solutions that have already become the de facto standard in industry are never taught thoroughly at school, and the teaching materials are seriously outdated. In other words, the gap between undergraduate software engineering education and the industry is becoming larger and larger. As educators, we need to align students’ academic knowledge to industrial settings so that they can transition to their first job smoothly. In this paper, we present a Project-Based Learning (PBL) approach [3] to teach concepts in distributed software architecture. Using this approach, students can learn a step-by-step process of developing a real industrial-level E-commerce system as each architectural concept is taught. The E-commerce project chosen was originally designed by www.itcast.cn for an intensive 3-week software architects training camp. We modified and extended it to fit a one-semester undergraduate architecture course. Our goal is: for every theoretical concept and principle that students learn from the lecture, it is supported by a concrete application scenario in this project. By doing so, our students know why, when and where to apply the learned concept and principle. Furthermore, the course materials will be reviewed by our Strategic Industrial Partners/Industrial Advisory Board every year to make sure that all the techniques and solutions that are used in this project are state of the art and used ubiquitously in industry. We hope this paper can help and inspire instructors during their preparation of software architecture related courses. The rest of the paper is structured as follows. Section 2 discusses the rationale behind the selection of an E-commerce system as example system. The architectures of the E-commerce system are described in detail in Sect. 3. In Sect. 4, common problems that a future software architect would encounter and the state of the art solutions are presented. Section 5 presents the deployment and testing aspect of this project. Coverage of various of architectural issues, a possible course schedule and future work are discussed in Sect. 6. In Sect. 7, we conclude the paper.

Teaching Distributed Software Architecture by Building …

45

2 The Rationale Behind E-Commerce Systems as a Pedagogical Tool During the preparation of this course, problem domains familiar to students were given higher priority when it came to selection of example projects. E-commerce is one of the systems that fit this criterion. It is defined as the selling and buying of products (including services) through the Internet. A further look into E-commerce would reveal three main branches: business to consumer (B2C), business to business (B2B) and consumer to consumer (C2C). We chose B2C since it is the most wellknown type of E-commerce. Students can easily understand many requirements and problems based on their everyday online shopping experience on Amazon or Best Buy. The second reason to introduce E-commerce to classroom is the amazing growth of this field. Based on the predication by Statista.com, the annual growth rate of E-commerce in the next 4 years is around 14% [4]. Students are motivated to learn because of the huge job market for E-commerce software developers. The third reason of using an E-commerce project is that it is a perfect example to reveal issues of various quality attributes of a software system, such as availability, security and scalability. As a matter of fact, modern E-commerce systems are featured by their high concurrency and high availability. During the beginning of this course, we used www.Taobao.com, a Chinese E-commerce giant as an example: Since 2009, www.Taobao.com holds the Single’s Day Online Shopping Event on every November 11th. With four characters of ones, this date is called Single’s Day. In fact, everyone, not just single people will take advantage of this one-day sale and purchase online. Here are some amazing statistics about this 24-h sale in 2017 that we share with students (see Fig. 1). Based on the statistics released by www.Taobao.com [5], the Gross Merchandise Volume (GMV) reached at 1.47 billion dollars in the first three minutes of November 11th, 2017. By the end of the day, the GMV was over 26 billion dollars. The total number of payment transactions is 1.48 billion, and the number of orders placed and processed is 812 million. In the first 5 min of the sale, the payment software (AliPay) used by Taobao was handling 256,000 transactions per second. The most amazing part of this examples is, the system functioned perfectly on that day, without any downtime. Based on the feedback after class, students were shocked by this super project that could be so available and reliable under so much traffic. Students are highly motivated to keep on learning the concepts and principles behind this success.

Fig. 1 Timeline of the 2017 Single’s Day Sales in million US dollars

46

B. Wei et al.

In summary, we believe that an industrial level E-commerce system can both motivate and interest students in terms of its promising job market and various technical aspects.

3 Architectures of E-Commerce Systems This course is designed for senior computer science students as an elective course. They should already have some experience in developing small scale web or mobile applications. The course starts with introducing the traditional architecture for building a web application, and then discusses the drawbacks and possible improvements. The distributed architecture is introduced later.

3.1 Traditional Architecture Traditional architecture of a web application includes every module into a monolithic project (see Fig. 2). For example, in our E-commerce system, one project is developed for online shoppers to browse, search, add to cart, log on and check out. Those functionalities are included in the project called, “Online Shopping System,” as modules (i.e. Java packages). Another project is designed for the administrators to maintain the website. For example, addition, removal, modification of a product, creation of advertisement and promotions. These two monolithic projects are then deployed to a single web application server. For small traffic, one server might be enough. But when the application server is receiving a lot of traffic during holiday season, the projects will need to be duplicated to more servers (see Fig. 3). This clustering approach is easy to understand, develop and deploy. Students in our class already have some experience in using this kind of architecture. The traditional architecture’s drawbacks are then explained to students. First, not every module in the “Online Shopping System” receives the same amount of concurrency or pressure. For example, there will always be more people browsing and searching than really buying a product. The Administration System’s pressure is generally orders of magnitude smaller than the “Online Shopping System,” since its users are just a dozen of administrators and IT operation personnel. So, it is a waste to deploy everything on multiple application servers at the same time. This leads to poor scalability, and what we really should do is to add more servers to extend the services that are under most pressure. The second drawback is that such an architecture makes it hard for a team to develop. Different teams’ work needs to be integrated into one project in order to build and run. For example, if the frontend UI team only needs to modify the page of a product (maybe a typo), it needs to repackage and redeploy the entire application (during which period, the entire website is down). Once the pros and cons of the traditional architecture are explained. We can move on to the distributed architecture.

Teaching Distributed Software Architecture by Building …

Fig. 2 Monolithic E-commerce software system architecture

Fig. 3 Traditional network topology for E-commerce systems

47

48

B. Wei et al.

Fig. 4 Major components of the system

3.2 Distributed Architecture Based on the analysis of the traditional architecture, students are encouraged to figure out ways to correct the architectural issues. Every module from the traditional architecture should be taken out from the monolithic system, and an independent system is developed for it (see Fig. 4). Each module will run in a separate Docker container and communicate to other modules through RESTful web services. Separating out these modules can also facilitate better team collaboration and project management.

4 Key Solutions for Achieving High Availability and Concurrency As a software architect, one important trait is to stay current with mature solutions and know how to make tradeoffs. Based on our research of the current technology trends and the conversions with industrial partners, a dozen of mature solutions for achieving system availability and concurrency are identified and taught to students (Table 1). We want to warn the readers that this is an “opinionated” technology stack. They represent typical and contemporary industrial distributed software development challenges. The frameworks used for our E-commerce system is SpringMVC + SpringBoot + SpringData + SpringCloud.

Teaching Distributed Software Architecture by Building …

49

Table 1 Solutions for common problems in distributed system development Problems Solutions adopted Dev, test, staging and production environment switches Coordination of distributed systems Distributed ID generation Authentication and authorization Caching Job scheduling Search engine Load balancing Data access Message broker Distributed file system Deployment

Spring profile Spring cloud Twitter IDs (snowflake) Spring security and JSON Web Token (JWT) Redis Quartz Solr Nginx Spring data RabbitMQ FastDFS Docker and Rancher

The solutions mentioned in Table 1 should not be introduced at the very beginning of the project development. For example, the best practices to solve problems like distributed ID Generation, Authentication, Single Sign On, Microservices Management, Database Clustering, Template Engine, Distributed File System are not revealed in the first increment of the E-commerce system. Instead, students are encouraged to solve those problems by pure Java code. This is a valuable experience, since the first increments becomes the reference point for the future iterations. In the following iterations, mature solutions are introduced to students one by one, and refactoring takes place. Besides the solutions “adopted” by this course, comparing alternatives is assigned as homework. For example, for the problem of Search Engine, students also try ElasticSearch; for Message Broker, Kafka is also tried and compared. The architecture of the E-commerce system keeps evolving throughout the entire course. Figure 5 describes the final version of the architecture at the end of this course. Due to the space limitation, not all the techniques are shown in the figure.

5 Continuous Integration, Deployment and Testing Deployment and operations are also designed to be part of this course. During the first half of the course, virtual environment VMWare is used to simulate the deployment. We then switch to Amazon AWS deployment to make the deployment real. We also introduce the DevOps approach. Based on Fig. 6, the students are taught to provision Amazon AWS servers to host Jenkins and Artifactory, respectively. Jenkins is one of the automation tools for facilitating continuous integration and continuous delivery, while Artifactory is a popular enterprise artifacts management tool. Jenkins and

50

B. Wei et al.

Fig. 5 Distributed E-commerce software system

Fig. 6 Continuous integration technology stack

GitHub are connected through web hook so that every time there is a commit to GitHub, Jenkins can pull the artifacts from both GitHub and Artifactory, build and test them. During this course, every service is running in a Docker container. The development of the E-Commerce system follows the Test-driven development (TDD) approach. Since the project is designed around the MVC layered architecture. Unit tests for service layer and controller layer are written first; code for service and controller layers are added in order to pass the unit test. Integration test that involve database and more external resources are written to make sure the modules work well together. For now, JUnit, Mockito and Spring MVC Test are used in the project.

Teaching Distributed Software Architecture by Building …

51

6 Discussion In this section, the mapping between desired virtues of a qualified software architect and our proposed course, quality attributes covered in the course, potential schedules and future work are discussed.

6.1 Coverage of the Desired Virtues for a Software Architect in This Course Bass and Clements [6] list the traits of a qualified software architect in their architecture book. Table 2 describes how our teaching approach checks off each of them.

Table 2 Developing the desired virtues for a software architect Traits of software architects How our proposed course is related to training the traits Artistic skill to make seemingly simple designs that are easy to grasp and yet which solve complex problems Analytical skills, such as an ability to find root causes for high-level problems in existing designs, like why a system runs too slowly or is not secure Understanding of the business, social, and operational environment in which a system needs to operate

We bring many common problems (see Table 1) and ask students to use different techniques to solve them Deep analysis of the drawbacks of the traditional architecture for E-commerce system and comparison of different alternatives for solving the same problem Students are taught E-commerce domain knowledge along the way. Students are trained to operate under development, testing and production environments Ability to relate to a wide range of During design, students are taught to think from stakeholders, including the client, users, system different perspectives to make design decisions administrators, and suppliers Communication skills - such as listening to and Students work in a team to solve problems working out issues with those implementing a together design Knowledge of multiple technologies, from A dozen of current technologies are taught to which to choose the right ones for the next job students, comparisons are required in assignments

52

B. Wei et al.

6.2 Coverage of Quality Attributes in Our Proposed Course The Key Considerations in Teaching Software Architecture CSEE&T 2004 workshop pointed out the emphasis of a software architecture course needs to be on the quality attributes (QAs). We list a set of QAs and how our course is designed to cover them in the E-commerce project (Table 3).

6.3 Related Work Using huge and open-source systems as a tool for students to work on and make architectural decisions is discussed in the work of Costa-Soria and Perez [7]. In their course, students are asked to reverse-engineer open source projects and write reports about the architecture used. Our approach, however, is to train students to build a large, complex application from scratch using the latest technology. This process is guided by the instructor who is very familiar with the domain and current technology. In fact, the development of this realistic E-commerce systems requires extensive work by both students and instructors. We describe a possible schedule in the next section.

6.4 Suggested Course Schedule The content in this paper can be taught as a standalone undergraduate senior course. Table 4 shows a possible schedule for a one-semester hands-on course. This schedule is based on the assumption that students already have experience in web application development. Every week, there are dedicated lab hours for students to finish the milestone so that students can build the project incrementally and no one is left behind. Additional assignments for comparing solution alternatives are given every week.

Table 3 Quality attributes considered by students in this course Quality attributes How our proposed course covers Availability Performance Security Scalability Interoperability Maintainability

Clustering, master/backup servers Additional layer of cache between app and database, e.g. Redis and Solr, and Database Sharding Spring security and JWT Microservice approach, e.g. spring cloud RESTful web service, JSON-based communication Independent microservices and use of DevOps

Teaching Distributed Software Architecture by Building …

53

Table 4 Schedule of the E-commerce course Weeks Contents Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13

Background of E-Commerce. Introduction of distributed architecture Integration of Spring, SpringMVC and hibernate for the administration system and online shopping system Implementation of the administration system Implementation of the online shopping system Implementation of CMS (Content Management System) Adding cache using Redis Implementation of searching using Solr Implementation of single sign on subsystem with session sharing and spring security Implementation of shopping cart and order subsystems Amazon AWS, Docker and Artifactory Deployment in distributed environment (Nginx and reverse proxy) Set up Redis cluster and Solr cluster Refactor using spring family (SpringBoot, SpringCloud, Spring, SpringMVC and Spring Data)

6.5 Future Work An important future work is to modify more industrial level projects that we already collected from our partners in domains like Online Education, Online Social Network, Online Health, and Online Tourism. We firmly believe that bringing realistic and interesting domain knowledge into classroom generate a good context for students. A repository of large and industrial level open source projects with detailed instructions will be created on GitHub.

7 Conclusion In this paper, we report the design of a project-based learning approach, which makes use of an example from industry in an undergraduate software architecture course. By using a concrete and realistic case study from a familiar area, students are given better context to apply the architectural principles learned from lecture. Students can learn about new technologies by systematically evolving their own systems. The primary focus of this course is on quality attributes such as high availability and high scalability. Many current mature solutions are introduced so that students will be in a better position during job interview. We hope this paper can help instructors who want to bring realistic projects to architecture class.

54

B. Wei et al.

References 1. Van Deursen, A., Aniche, M., Aué, J., Slag, R., De Jong, M., Nederlof, A., & Bouwers, E. (2017, March). A collaborative approach to teaching software architecture. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (pp. 591–596). New York: ACM. 2. Rupakheti, C. R., & Chenoweth, S. (2015, May). Teaching software architecture to undergraduate students: An experience report. In Proceedings of the 37th International Conference on Software Engineering (Vol. 2, pp. 445–454). New York: IEEE Press. 3. Bender, W. N. (2012). Project-based learning: Differentiating instruction for the 21st century. Corwin Press. 4. Retail e-commerce sales in the United States from 2016 to 2022. https://www.statista.com/ statistics/272391/us-retail-e-commerce-sales-forecast/. 5. Alibaba smashes its Single’s Day record once again as sales cross $25 billion. https://techcrunch. com/2017/11/11/alibaba-smashes-its-singles-day-record/. 6. Bass, L., Clements, P., & Kazman, R. (2003). Software architecture in practice. Addison-Wesley (Professional). 7. Costa-Soria, C., & Pérez, J. (2009, July). Teaching software architectures and aspect-oriented software development using open-source projects. In ACM SIGCSE Bulletin (Vol. 41, No. 3, pp. 385–385). New York: ACM.

KNN-Based Overlapping Samples Filter Approach for Classification of Imbalanced Data Mar Mar Nwe and Khin Thidar Lynn

Abstract Imbalanced data classification is one of the most interesting problems in various real-world data sets. The class distribution of imbalanced data set strongly affects the classification rate of learning classifiers. If the class distribution problems can’t be solved before implementing the learning algorithms, the predictions of learning classifiers tend to support a large number of samples (majority class) and ignore the other samples (minority class). In addition, the class overlapping problem can increase the difficulty to classify the minority class samples correctly. In this paper, we propose an effective under-sampling method for the classification of imbalanced and overlapping data by using KNN-based overlapping samples filter approach. Besides, this paper summarizes the performance analysis of three ensemble-based learning classifiers for the proposed method. Experimental results on fifteen imbalanced data sets indicate that the proposed under-sampling method can effectively improve the five representative algorithms in terms of three popular metrics; area under the curve (AUC), G-mean and F-measure. Keywords Class imbalance problem · Class overlapping problem · Under-sampling · Over-sampling · Ensemble learning

1 Introduction As one of the most challenging and interesting problems in pattern recognition and machine learning, the classification of imbalanced data had received the greatest attention from the researchers. The class imbalance problem refers to the data set with the size of one class is more than the size of another class where the majority class and minority class, respectively [10, 11]. If the problems of class distribution are not M. M. Nwe (B) · K. T. Lynn University of Computer Studies Mandalay, Mandalay, Myanmar e-mail: [email protected] K. T. Lynn e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_4

55

56

M. M. Nwe and K. T. Lynn

resolved before applying the learning algorithms, the prediction of learning classifier tends to favor the majority class, and incorrectly classifies the minority class [8, 20, 26]. But, the corrected prediction of minority class is important in various real-world applications. For medical diagnosis, most patients have common illnesses while patients rarely suffer from cancer, so the correct prediction of cancer patients is difficult. The effective way to identify cancer patients is significantly important. Nowadays, the class-imbalanced data sets can be found in various practical applications such as email filtering, natural disaster, fraudulent credit card transaction, telecommunication fraud, bankruptcy prediction and medical diagnosis of rare diseases [11, 19]. When classifying the imbalanced data, the correct predictions are influenced by the class distribution of the data points [19, 29]. The characteristics of class distribution are: • small sample size, • small disjuncts, and • class overlapping. In analyzing the class distribution of imbalanced data set, small sample size and small disjuncts are the sources of between class imbalance and within class imbalance problems. And, if the class overlapping problem is contained in the imbalanced data set, most of the learning classifiers are incorrectly classified the minority class samples into the majority class samples [9, 19, 27]. So, the classification performance depends on these two main problems: class imbalance and class overlapping. Many methods for dealing with the class imbalance and overlapping problems have been proposed, the data level solution and algorithmic level solution are commonly used [7, 22]. The purpose of data level solution is to balance the imbalanced data set. The algorithmic level solution based on the modification of existing learning models without considering the distribution of classes [21] (e.g. by changing weight values to training samples). The main focus of the data level and the algorithmic level solutions is to improve the performance of learning algorithm that affects the minority class. In this paper, an improved under-sampling method is proposed for embedding ensemble based learning classifiers (i.e. AdaBoost-C4.5, AdaBoost-Naive Bayes and AdaBoost-KNN). The aim of the proposed method is to solve the class imbalance and overlapping problems by reducing the loss of valuable information. The proposed method based on KNN is used to under-sample the majority class samples that have in the overlapping regions. Firstly, the overlapping rate of each majority class samples is detected. And, the sample that has a high overlapping rate is removed to under-sample the overlapping samples and alleviate the class imbalance problem. The remainder of this paper is organized according to the following. Section 2 discusses the characteristics and handling of class distribution strategies. Section 3 introduces the proposed under-sampling method. Section 4 shows the experimental setting of the proposed method. Section 5 discusses the results obtained in experiments and Sect. 6 concludes this work.

KNN-Based Overlapping Samples Filter Approach …

57

2 Related Work Although the numerous techniques have been developed to handle the imbalanced data set, the data level solution and algorithmic level solution are basically used. Moreover, the correct prediction of learning classifiers is influenced by the characteristics of the class distribution.

2.1 Characteristics of Class Distribution The characteristics of class distribution are [9, 19, 26]: Small Sample Size: The small sample size problem can be found when the number of minority class samples does not sufficiently exist in the training data set. Small Disjuncts: The problem of small disjuncts distribution can be obtained when the small number of minority samples is distributed in many spaces. Class Overlapping: The class overlapping problem occurs when the data samples belonging to the different classes overlap the similar data features. In the class imbalanced learning, small sample size and small disjuncts are the sources of between and within class imbalance problems. One solution for the class imbalance problem is to balance the class ratios of the training sets for reducing the misclassification error. And, the class overlapping problem is also an important factor in imbalanced data sets since most of learning classifiers make a mistake to classify the minority class samples into the majority class samples [3, 6, 16].

2.2 Data Level Solution The data level solution is a classifier independent strategy that modifies the class distribution of the original training data set. The modification of class distribution is more effective than the modification of learning classifiers because it can be easily combined with the different learning classifiers [19]. In general, the data level solution method can be divided into two groups: • under-sampling and • over-sampling techniques. To create a balanced data set, the instances in the original imbalanced data set can be re-sampled by undersampling the majority class and oversampling the minority class [23]. Under-sampling Techniques: Random Under-Sampling (RUS) method is a popular under-sampling approach that randomly eliminates the majority class samples to balance the class distribution [2, 26]. Although the under-sampling technique creates a balanced class distribution, it has the information loss problem. To overcome this

58

M. M. Nwe and K. T. Lynn

drawback, the overlapping samples filter methods are used. Tomek Link (TL) [25] and Edited Nearest Neighbors (ENN) methods [28] are the popular under-sampling method for handling of class overlapping problems. TL method [25] modifies from Condensed Nearest Neighbors (CNN) [13]. CNN attempted to remove the redundant samples of the training set, so TL eliminates the noisy and borderline samples of majority class. Edited Nearest Neighbors (ENN) method [28] is also used to remove the majority class samples that are predicted to be made using the KNN method. If the number of neighbors of each majority class is the dominant of the minority class, the sample of the majority class is removed as a noisy or boundaries sample. Over-sampling Technique: The popular method of over-sampling approach is Random Over-Sampling (ROS) method that randomly duplicates the minority class samples to balance the class distribution [2, 26]. Although the oversampling technique creates the balanced class distribution, it suffers from the drawback of overfitting. But, Synthesis Minority Over-Sampling Techniques (SMOTE) method can prevent the drawback of overfitting because it can create new minority class samples using a linear process among the minority samples that are closely located [5]. If the over-sampling methods are used, the computational cost of learning classifier can be increased. On the other hand, when the under-sampling methods are used, the training time is faster for learning model but they can lose useful information. Moreover, the over-sampling techniques can overcome the useful information loss problem but shouldn’t be used a large amount of training data by considering the training time [11, 18].

2.3 Algorithmic Level Solution The algorithmic level method includes the modification of the previous machine learning algorithm to overcome the class imbalance problem between majority and minority classes. These techniques require special knowledge from the appropriate classifier and the application domains, and understand why the learning classifiers failed when the distribution of the class was uneven [9].

3 Proposed Methodology The main purpose of this paper is to find an effective model for classifying the imbalanced and overlapping data. The prediction model of imbalance data has two stages: • Data pre-processing Stage and • Classification Stage.

KNN-Based Overlapping Samples Filter Approach …

59

3.1 Data Pre-processing Stage Both of class imbalance and overlapping problems are harmful to get the corrected prediction [24].The class imbalance problems are related to the case where the number of minority class samples differs from the number of majority class samples. And, the class overlapping problems are related to the case where the majority and minority class samples are scattered into the heterogeneous regions. In this paper, KNN-based under-sampling method (denoted as K-US) is proposed to tackle the class imbalance and overlapping problems. K-US can under-sample the majority class samples that are located in the heterogeneous regions, so the sizes of these two classes are to be closely balanced. Three computation steps of the proposed K-US under-sampling method are: 1. Each minority class sample finds the Kth nearest majority class neighbors, (K = IR). 2. Each neighbor sample (majority sample) determines the number of association’s counts of the minority class samples. 3. Select the desired number of majority class samples by depending on the number of association count. The training set is assumed as T = {Dmin , Dmaj } which is the binary class data set. Dmin and Dmaj are the sets of majority class and minority class. The algorithm as follows: Inputs: Training Data Set T= {Dmaj , Dmin }, K=IR ratio to find the nearest neighbor through KNN. indexnk = { }, for the associated index in the majority class. Rmaj is empty. N is a new training data set. 1. For each sample x ∈ Dmin a. nk = KNN (x, Dmaj , K) ; b. for each nk ∈ Dmaj • indexnk = indexnk +1; c. end for 2. End For 3. i-value =0 ; 4. Do { a. Rmaj =Rmaj ∪ Dmaj−index (Overlap-Count(i-value)); b. i-value = i-value +1; 5. } while (  Dmin  >  Rmaj ); 6. End Do Output : New Training Data Set, N= {Dmin ∪ Rmaj }; Tables 1, 2 and 3 show three steps of the proposed under-sampling method for handling the class imbalance and overlapping data. Dmaj = {Y1 , Y2 , Y3 , …,Ym } and Dmin = {X1 , X2 , X3 ,…,Xn }. nk is the set of Kth nearest majority class neighbor

60

M. M. Nwe and K. T. Lynn

Table 1 First step: Find the Kth nearest majority class neighbors

Majority samples

Kth nearest majority class neighbors (nk )

X1 X2 : : Xn

Y2 , Y8 , … Y2 , Y4 , … : : Y6 , Y8 , …

Table 2 Second step: Determine the number of associations for each majority class sample

Majority samples

Overlap-Count(indexnk )

Y1 Y2 : Y6 : Ym

0 4 : 1 : 0

Table 3 Third step: Determine the association count and related majority class sample

Overlap-Count(i-value)

Majority class samples, (Dmaj -index)

Overlap-Count (0) Overlap-Count (1) Overlap-Count (2) : :

Y1 , Y3 , Y5 , Y7 , Y9 , Ym Y6 Y8 : :

samples of each minority class sample. indexnk is the associated index for every majority class samples. Overlap-Count(i-value) is the number of association count between two classes. The majority class sample which has the highest number of association count can be determined as the overlapping samples. So, we under-sample the majority samples by detecting the Overlap-Count(i-value). The proposed undersampling method based on KNN, so the value of K is defined by the IR ratio of the training set. This proposed (K-US) method under-samples the majority class samples to alleviate the class imbalance and overlapping problems. This method can also avoid the over-fitting problem of random over-sampling and useful information loss problem of random under-sampling method.

KNN-Based Overlapping Samples Filter Approach …

61

3.2 Classification Stage After pre-processing, we needed the learning model to classify the unknown input data. For the classification, we used the Weka implementations of three learning classifiers, namely J48, IBk (Nearest Neighbor) and Naives Bayes. J48 is a Weka implementation of C4.5 and IBk algorithm is a Weka implementation of KNN [4]. For all learning classifiers, the default parameter values were used. Furthermore, the individual classifiers are assembled with the Adaboost mechanism to build the final prediction model (such as Adaboost-C4.5, AdaBoost-KNN and AdaBoost-NB). Adaboost mechanism is implemented with the AdaboostM1 of Weka Tool. All of the learning classifiers are implemented with the default parameter setting in the Weka Machine Learning Tool.

4 Experimental Setup In this section, we present the experimental details to test the proposed (K-US) method. First, we introduce the nature of data sets and the evaluation metrics. And, we discuss the comparative methods for the experimental results.

4.1 Data Sets This research used the 15 imbalanced data sets from KEEL-data set repository [1] with different imbalanced ratio and overlapping rate. Table 4 displays the details of the imbalanced data set with important characteristics. For each one, the number of attributes (#Att), the number of majority samples (#Maj), the number of minority samples (#Min) and the imbalanced ratio IR are shown. The IR ratio is a ratio of the number of samples in the majority class to the number of samples in the minority class and can be expressed as follows [14]: IR =

no : of majority class samples no : of minority class samples

(1)

To evaluate the overlapping rate between two classes, Fisher’s discriminant ratio is used. Fisher’s small discriminant ratio indicates a high level of overlapping. Fisher’s maximum discrimination (MaxF) for all features is used to identify the overlapping rate. Fisher’s discriminant ratio (f ) of each attribute can be described in the following formula: (μmaj − μmin )2 f = (2) 2 2 (σmaj + σmin )

62

M. M. Nwe and K. T. Lynn

Table 4 Characteristics of experimental data sets Data set #Att glass1 pima yeast1 haberman vehicle1 vehicle3 glass-0-1-2-3 versus 4-5-6 yeast-0-3-5-9 versus 7-8 yeast-0-2-5-6 versus 3-7-8-9 ecoli-0-2-3-4 versus 5 glass-0-1-5 versus 2 ecoli-0-1 versus 5 glass-0-1-6 versus 2 ecoli4 yeast-2 versus 8

9 8 8 3 18 18 9 8 8 7 9 6 9 7 8

#Maj

#Min

#Total

IR

138 500 1055 225 629 634 163 456 905 182 155 220 175 316 462

76 268 429 81 217 212 51 50 99 20 17 20 17 20 20

214 768 1484 306 846 846 214 506 1004 202 172 240 192 336 482

1.82 1.87 2.46 2.78 2.9 2.99 3.2 9.12 9.14 9.1 9.12 11 10.3 15.8 23.1

where (f ) is a Fisher’s discriminant ratio [23] of attribute i, μmaj and μmin are the 2 2 and σmin are the mean values of majority class and minority class of attribute i. σmaj variances of majority class and minority class of attribute i.

4.2 Evaluation Metrics In this paper, we study the two-class problem, where the minority class and the majority class refer to positive class and negative class, respectively. In Table 5, TP and TN show the number of positive and negative samples that are correctly classified, while the FN and FP show the number of misclassified samples, respectively [24]. In the evaluation metrics, when the class distribution is not uniform, all of the traditional performance indicators are not suitable [8, 17]. In an imbalanced data set, the performance evaluation depends on the results of both classes, and the measures are not adequate by themselves [9]. So, AUC (Eq. 3), G-mean (Eq. 4), and F-measure (Eq. 5) are used as the performance evaluation metrics for imbalanced data classification. AUC, G-mean, and F-measure can show the performance of minority and majority classes, simultaneously. To determine these measurement methods, the true positive rate (TP rate), the true negative rate (TN rate), the false positive rate (FP rate) and Precision (Pr) are required to make previously defined. TP rate and TN rate refer to the cases, the learning models correctly classified the minority class and majority class, respectively. FP rate refers to the case, the learning models incorrectly

KNN-Based Overlapping Samples Filter Approach … Table 5 Confusion matrix Class Actual positive class Actual negative class

63

Positive predicted

Positive predicted

True positive (TP) False positive (FP)

False negative (FN) True negative (TN)

classified as the minority class which is actually the majority class. Pr is defined as the number TP rate divided by the number of TP rate plus FP rate [12]. (1 + TP rate − FP rate) 2  G − mean = (TP rate × TN rate) AUC =

F − measure =

(2 × Pr × TP rate) (Pr + TP rate)

(3) (4) (5)

Subsequently, Wilcoxon’s test is also used to support the statistical analysis of comparative results using SPSS software. For multiple comparisons, Tables 11, 12 and 13 will show the Wilcoxon’s test results for comparisons of the state-of-art resampling methods versus K-US method in term of AUC, G-mean, and F-measure. Win-Tie-Loss column is the comparison cases of K-US method and the state-of-art re-sampling methods. M+ and M− are the average ranks of Win-Loss. PW ilcoxon is the p-value of Wilcoxon’s test. If PW ilcoxon < 0.05, this means that the comparison is a significant difference. In this paper, all of the significant differences are highlighted in bold and underline.

4.3 Comparative Method In this paper, the proposed (K-US) under-sampling method was tested and compared with the several state-of-art re-sampling methods. Where, TL and ENN are the most popular under-sampling methods for handling of the class overlapping problem. RUS, ROS and SMOTE are also popular re-sampling methods for imbalanced learning. Tomek Link (TL): Tomek Link [25] is a popular method to solve the class overlapping problems, which acted as a data cleaning method. Where, Dmaj and Dmin are the set of majority and minority classes, and d (Dmaj , Dmin ) is the distance between Dmaj and Dmin . If d (Dmaj , E) < d (Dmaj , Dmin ) or d (Dmin , E) < d (Dmaj ,Dmin ), d (Dmaj , Dmin ) pair is called a Tomek Link, where E hasn’t in the training set. Edited Nearest Neighbor (ENN): ENN method [13] removes the majority class samples whose prediction made by KNN method. If the number of neighbors of each majority class is dominant from minority class, this majority class sample is removed as noisy and borderline samples.

64

M. M. Nwe and K. T. Lynn

Random Under-sampling (RUS): RUS [2] is a simple under-sampling approach. It is used to balance the class distribution through the random elimination from the majority class samples set (Dmaj ). Random Over-sampling (ROS): ROS [2] is also a simple over-sampling approach. It is used to balance the class distribution through the random duplication of minority samples set (Dmin ). Synthesis Minority Over-sampling Techniques (SMOTE): SMOTE [5] is an improvement of ROS approach in which a synthetic minority class sample is generated based on the minority class instances and its kth nearest neighbor. SMOTE method runs with three steps to produce the synthesis samples. Firstly, a minority sample is randomly chosen (u). And, its kth nearest minority class neighbor (a) is randomly selected. Finally, the new samples (e) is generated by the following formula: e = u + w × (a − u)

(6)

where, w is the random weight between 0 and 1.

5 Finding and Results To examine the experimental analysis of proposed under-sampling method (K-US), we used 15 imbalanced data sets and compared these results with the state-of-art resampling methods (TL, ENN, RUS, ROS and SMOTE). For all experimental analysis, the data set is divided into 80% for the training set and 20% for the testing set and then validated by the five-fold cross-validation approach [15]. In this paper, the data partition can be obtained from the KEEL-data set repository,1 so any interested researcher may use it for experimental studies [1]. The findings of this paper are discussed with the three main topics: 1. The impact of the proposed, (K-US) method on the performance of AdaboostC4.5 classifiers for all data sets. 2. General comparison for the performance of three learning classifiers (AdaboostC4.5, Adaboost-NB and Adaboost-KNN) and the six re-sampling methods on all datasets. 3. Comparing the performance of three learning classifiers (Adaboost-C4.5, Adaboost-NB and Adaboost-KNN) on the proposed, (K-US) method.

1 http://sci2s.ugr.es/keel/imbalanced.php#subA.

KNN-Based Overlapping Samples Filter Approach …

65

5.1 Impact of K-US Method on Adaboost-C4.5 Classification Performance In this section, we analyze the impact of overlapping samples filter approach on the 15 imbalanced data sets by using AUC, G-mean and F-measure. Tables 6 and 7 indicate that the imbalanced ratio and overlapping rate of all methods over the experimental data sets. In Table 6, RUS, ROS, and SMOTE are exactly balanced and TL and ENN are not balanced. K-US is also imbalanced, but its imbalance ratio is smaller than the imbalance ratio of TL and ENN. And, most of the smallest overlapping rate is obtained in K-US method referred to Table 7. Seeing all methods shown in Table 8, most of the better AUC results for 15 data sets are achieved from K-US method except glass1, pima, and haberman. And, KUS has significantly increased the other methods according to the Wilcoxon’s test results in Table 11. Similarly, Table 9 reports a summary of six re-sampling methods on G-mean values. K-US obtains better G-mean results on 12 out of 15 data sets. So, K-US outperforms the other re-sampling methods. According to the Wilcoxon’s test results in Table 12, we conclude that K-US can significantly improve G-mean results for all methods. As for F-measure, K-US can perform better on eight data sets than all methods. Regarding the Wilcoxon’s test results in Table 13, K-US can significantly improve the F-measure results for all methods.

Table 6 Imbalanced ratio of all data sets Data set Normal TL glass1 pima yeast1 haberman vehicle1 vehicle3 glass-0-1-2-3 versus 4-5-6 yeast-0-3-5-9 versus 7-8 yeast-0-2-5-6 versus 3-7-8-9 ecoli-0-2-3-4 versus 5 glass-0-1-5 versus 2 ecoli-0-1 versus 5 glass-0-1-6 versus 2 ecoli4 yeast-2 versus 8

1.8 1.87 2.46 2.77 2.89 2.98 3.17 9.13 9.16 9.13 10 11 10 15.81 23.13

1.6 1.5 2.1 2.4 2.5 2.6 3 8.8 8.9 9.1 9.8 11 9.8 15.8 23.1

ENN

RUS

ROS

SMOTE K-US

1.5 1.3 1.8 2 2.1 2.3 3 8.2 8.6 8.8 9 10.7 9 15.6 22.6

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1.1 1.2 1.5 1.1 1.6 1.8 2.4 5 5.9 6.1 5 7.4 5 12.4 10.5

66

M. M. Nwe and K. T. Lynn

Table 7 Overlapping rate of all data sets Data Set Normal TL glass1 pima yeast1 haberman vehicle1 vehicle3 glass-0-1-2-3 versus 4-5-6 yeast-0-3-5-9 versus 7-8 yeast-0-2-5-6 versus 3-7-8-9 ecoli-0-2-3-4 versus 5 glass-0-1-5 versus 2 ecoli-0-1 versus 5 glass-0-1-6 versus 2 ecoli4 yeast-2 versus 8

0.2 0.6 0.2 0.2 0.2 0.2 3.3 0.3 0.7 1.7 0.2 2.2 0.3 3.2 1.1

0.2 1.0 0.4 0.3 0.3 0.3 4.1 0.4 0.8 1.8 0.2 2.2 0.3 3.2 1.1

ENN

RUS

ROS

SMOTE K-US

0.2 0.8 0.3 0.3 0.2 0.3 4.3 0.3 0.7 1.8 0.2 2.2 0.3 3.2 1.1

0.2 0.6 0.2 0.2 0.2 0.2 2.8 0.4 0.6 2.3 0.2 2.2 0.4 3.8 1.1

0.2 0.6 0.3 0.2 0.2 0.2 3.6 0.4 0.8 1.9 0.2 2.5 0.4 3.6 1.3

0.2 0.6 0.2 0.2 0.2 0.2 3.6 0.4 0.8 2.4 0.1 2.5 0.3 3.9 1.2

Table 8 AUC results for six pre-processing methods with AdaBoost-C4.5 Data Set TL ENN RUS ROS SMOTE glass1 pima yeast1 haberman vehicle1 vehicle3 glass-0-1-2-3 versus 4-5-6 yeast-0-3-5-9 versus 7-8 yeast-0-2-5-6 versus 3-7-8-9 ecoli-0-2-3-4 versus 5 glass-0-1-5 versus 2 ecoli-0-1 versus 5 glass-0-1-6 versus 2 ecoli4 yeast-2 versus 8 Avg

0.774 0.725 0.699 0.611 0.736 0.726 0.929 0.679 0.757 0.889 0.608 0.859 0.56 0.89 0.773 0.748

0.747 0.745 0.704 0.605 0.739 0.73 0.905 0.645 0.778 0.867 0.591 0.793 0.527 0.869 0.774 0.735

0.776 0.709 0.679 0.638 0.752 0.76 0.897 0.666 0.77 0.884 0.508 0.898 0.621 0.823 0.718 0.74

0.815 0.706 0.672 0.522 0.686 0.705 0.89 0.648 0.744 0.886 0.563 0.843 0.635 0.889 0.721 0.728

0.805 0.728 0.697 0.66 0.735 0.69 0.887 0.662 0.765 0.93 0.607 0.864 0.58 0.867 0.727 0.747

0.4 1.0 0.4 0.3 0.3 0.4 4.3 0.7 1.0 2.0 0.4 2.8 0.9 3.2 1.1

K-US 0.794 0.726 0.705 0.64 0.758 0.795 0.966 0.724 0.821 0.93 0.658 0.964 0.653 0.914 0.816 0.791

KNN-Based Overlapping Samples Filter Approach …

67

Table 9 G-mean results for six pre-processing methods with AdaBoost-C4.5 Data Set TL ENN RUS ROS SMOTE glass1 pima yeast1 haberman vehicle1 vehicle3 glass-0-1-2-3 versus 4-5-6 yeast-0-3-5-9 versus 7-8 yeast-0-2-5-6 versus 3-7-8-9 ecoli-0-2-3-4 versus 5 glass-0-1-5 versus 2 ecoli-0-1 versus 5 glass-0-1-6 versus 2 ecoli4 yeast-2 versus 8 Avg

0.772 0.723 0.696 0.584 0.734 0.722 0.927 0.624 0.727 0.879 0.383 0.847 0.32 0.884 0.727 0.703

0.742 0.743 0.698 0.556 0.735 0.723 0.901 0.559 0.755 0.85 0.355 0.682 0.211 0.851 0.728 0.673

0.775 0.706 0.678 0.613 0.748 0.759 0.895 0.66 0.768 0.87 0.382 0.892 0.503 0.811 0.711 0.718

0.811 0.697 0.651 0.484 0.655 0.679 0.887 0.558 0.704 0.877 0.253 0.824 0.486 0.877 0.596 0.669

0.803 0.722 0.69 0.656 0.727 0.678 0.884 0.613 0.747 0.922 0.39 0.846 0.418 0.853 0.612 0.704

K-US 0.791 0.723 0.705 0.626 0.756 0.791 0.965 0.71 0.811 0.928 0.626 0.962 0.607 0.909 0.795 0.78

Summarizing, Tables 8, 9 and 10 indicate the performance measures using five baseline approaches and the proposed (K-US) method applied to the AdaBoost-C4.5 algorithm. The best results come from 15 data sets obtained from the proposed (KUS) method. The result of PW ilcoxon test in Tables 11, 12 and 13 suggest that the proposed (K-US) method is significantly improved than the other state-of-the-art re-sampling methods. In general, RUS, ROS and SMOTE can handle the class imbalance problem, and also TL and ENN can handle the class overlapping problems. But, the correct prediction of the imbalanced data set is strongly affected by the imbalanced ratio and overlapping rate. So, K-US method outperforms the other re-sampling methods because it can handle the overlapping rate and imbalanced ratio between two classes.

5.2 General Comparison for Performance of Three Learning Classifiers and Six Re-Sampling Methods In this part, we analyze the performance estimation of each ensemble-based learning classifiers for each method. Tables 14, 15 and 16 summarize the AUC, G-mean and Fmeasure values of each pre-processing method respectively, where we implemented with the Adaboost ensemble learning classifiers (C4.5, NB and KNN). Moreover, the appropriate classifier is suggested for the proposed K-US method to achieve the best results.

68

M. M. Nwe and K. T. Lynn

Table 10 F-measure results for six pre-processing methods with AdaBoost-C4.5 Data Set TL ENN RUS ROS SMOTE glass1 pima yeast1 haberman vehicle1 vehicle3 glass-0-1-2-3 versus 4-5-6 yeast-0-3-5-9 versus 7-8 yeast-0-2-5-6 versus 3-7-8-9 ecoli-0-2-3-4 versus 5 glass-0-1-5 versus 2 ecoli-0-1 versus 5 glass-0-1-6 versus 2 ecoli4 yeast-2 versus 8 Avg

0.707 0.65 0.573 0.433 0.594 0.578 0.87 0.422 0.578 0.805 0.25 0.723 0.202 0.781 0.649 0.588

0.671 0.669 0.577 0.415 0.602 0.589 0.849 0.37 0.614 0.786 0.247 0.767 0.137 0.765 0.668 0.582

0.712 0.631 0.55 0.468 0.604 0.611 0.828 0.284 0.399 0.626 0.158 0.556 0.222 0.472 0.166 0.486

0.764 0.615 0.531 0.316 0.532 0.563 0.84 0.415 0.578 0.791 0.205 0.755 0.37 0.745 0.534 0.57

0.749 0.643 0.57 0.502 0.603 0.532 0.832 0.366 0.53 0.808 0.258 0.741 0.254 0.76 0.378 0.568

K-US 0.73 0.651 0.58 0.49 0.609 0.644 0.919 0.403 0.596 0.808 0.341 0.869 0.283 0.788 0.645 0.624

Table 11 Wilcoxon’s test results for comparisons of K-US method (M+ ) versus compared methods (M− ) considering AUC results Method Win-Tie-Loss M+ M− PW ilcoxon K-US versus TL K-US versus ENN K-US versus RUS K-US versus ROS K-US versus SMOTE

15-0-0 14-0-1 15-0-0 14-0-1 11-1-3

8 8.39 8 8.36 8.82

0 2.5 0 3 2.67

0.0007 0.0011 0.0007 0.0012 0.0052

Table 12 Wilcoxon’s test results for comparisons of K-US method (M+ ) versus compared methods (M− ) considering G-mean results Method Win-Tie-Loss M+ M− PW ilcoxon K-US versus TL K-US versus ENN K-US versus RUS K-US versus ROS K-US versus SMOTE

15-0-0 14-0-1 15-0-0 14-0-1 13-0-2

7.5 8.43 8 8.5 8.54

0 2 0 1 4.5

0.0010 0.0010 0.0007 0.0008 0.0038

KNN-Based Overlapping Samples Filter Approach …

69

Table 13 Wilcoxon’s test results for comparisons of K-US method (M+ ) versus compared methods (M− ) considering F-measure results Method Win-Tie-Loss M+ M− PW ilcoxon K-US versus TL K-US versus ENN K-US versus RUS K-US versus ROS K-US versus SMOTE

13-0-2 12-0-3 15-0-0 12-0-3 12-1-2

Table 14 Average AUC results Methods TL AdaBoost C4.5 AdaBoost NB AdaBoost KNN Avg

0.748 0.735 0.75 0.744

Table 15 Average G-mean results Methods TL AdaBoost C4.5 AdaBoost NB AdaBoost KNN Avg

0.703 0.708 0.715 0.709

Table 16 Average F-measure results Methods TL AdaBoost C4.5 AdaBoost NB AdaBoost KNN Avg

0.588 0.568 0.59 0.582

8.38 8.88 8 8.67 8

5.5 4.5 0 5.33 4.5

0.0054 0.0082 0.0007 0.0125 0.0063

ENN

RUS

ROS

SMOTE

K-US

0.735 0.73 0.749 0.738

0.74 0.737 0.748 0.742

0.728 0.742 0.726 0.732

0.747 0.742 0.738 0.742

0.791 0.743 0.784 0.773

ENN

RUS

ROS

SMOTE

K-US

0.673 0.699 0.711 0.694

0.718 0.723 0.742 0.728

0.669 0.732 0.678 0.693

0.704 0.727 0.712 0.714

0.78 0.729 0.777 0.762

ENN

RUS

ROS

SMOTE

K-US

0.582 0.567 0.599 0.583

0.486 0.505 0.498 0.496

0.57 0.55 0.567 0.562

0.568 0.555 0.534 0.552

0.624 0.541 0.561 0.575

Based on average results shown in Table 14, the highest AUC results of each classifier are obtained from our proposed K-US method. And, Fig. 1 shows the average results of each re-sampling method for all classifiers. Similarly, Table 15 presents the average G-mean results of all pre-processing method. The best results of K-US obtained from AdaBoost-C4.5 and AdaBoostKNN. AdaBoost-NB is more appropriate for ROS methods. By considering Fig. 2, K-US outperforms the other methods.

70 Fig. 1 Average AUC results for all pre-processing methods

Fig. 2 Average G-mean results for all pre-processing methods

Fig. 3 Average F-measure results for all pre-processing methods

M. M. Nwe and K. T. Lynn

KNN-Based Overlapping Samples Filter Approach …

71

Table 17 Average results of K-US with three classifiers Methods AdaBoost-C4.5 AdaBoost-NB Average AUC result of K-US 0.791 Average G-mean Result of K-US 0.78 Average F-measure Result of K-US 0.624

0.743 0.729 0.541

AdaBoost-KNN 0.784 0.777 0.561

Fig. 4 Average results of K-US with three classifiers

As for F-measure, Table 16 and Fig. 3 show the average results of all preprocessing methods by applying the three learning classifiers. Accordingly, to the results from Tables 14, 15 and 16, the proposed (K-US) method is most suitable with Adaboost-C4.5 learning classifier.

5.3 Performance Comparison of Three Learning Classifiers on K-US Method The average result shown in Table 17 and Fig. 4, the proposed (K-US) method combined with AdaBoost-C4.5 classifier has the best performance, whereas AdaBoostKNN takes the second place.

6 Conclusion This paper proposed the KNN-Based under-sampling method (K-US) to handle the class imbalance and overlapping problem in the training set. Firstly, the proposed (K-US) method started by finding the kth nearest majority class neighbors of each minority class sample, and then determine the association count of each neighboring majority class sample from the minority class sample. Finally, the desired number of majority class samples is selected from the number of association counts. In the experiment analysis, all of experimental data sets have been pre-processed with the proposed (K-US) method compared with the several state-of-the-art re-

72

M. M. Nwe and K. T. Lynn

sampling methods, and then three ensembles based learning classifier (AdaboostC4.5, Adaboost-NB and Adaboost-KNN) have been tested over these pre-processing methods. Based on the results of our experimental study and significant analysis through 15 data sets, we observed that the proposed (K-US) method outperformed the state-of-art re-sampling methods in terms of AUC, G-mean, and F-measure. Moreover, the performance of proposed (K-US) method is significantly improved on Adaboost-C4.5 classifier. In future work, we will keep on considering a plan to extend the proposed method that can handle the higher dimensions and multiclass classification.

References 1. Alcala-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., & Herrera, F. (2011). Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing, 17. 2. Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20–29. 3. Beckmann, M., Ebecken, N. F., & de Lima, B. S. P. (2015). A KNN undersampling approach for data balancing. Journal of Intelligent Learning Systems and Applications, 7(04), 104. 4. Bilal, M., Israr, H., Shahid, M., & Khan, A. (2016). Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques. Journal of King Saud University-Computer and Information Sciences, 28(3), 330–344. 5. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. 6. Chen, L., Fang, B., Shang, Z., & Tang, Y. (2018). Tackling class overlap and imbalance problems in software defect prediction. Software Quality Journal, 26(1), 97–125. 7. Devi, D., & Purkayastha, B. (2017). Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance. Pattern Recognition Letters, 93, 3–12. 8. Elhassan, T., & Aljurf, M. (2016). Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method. 9. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463–484. 10. Guo, H., Diao, X., & Liu, H. Improving undersampling-based ensemble with rotation forest for imbalanced. 11. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220–239. 12. Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques. Amsterdam: Elsevier. 13. Hart, P. (1968). The condensed nearest neighbor rule (Corresp.). IEEE Transactions on Information Theory, 14(3), 515–516. 14. Kang, Q., Chen, X., Li, S., & Zhou, M. (2017). A noise-filtered under-sampling scheme for imbalanced classification. IEEE Transactions on Cybernetics, 47(12), 4263–4274. 15. Kohavi, R. (1995, August). A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI (Vol. 14, No. 2, pp. 1137–1145). 16. Lee, H. K., & Kim, S. B. (2018). An overlap-sensitive margin classifier for imbalanced and overlapping data. Expert Systems with Applications, 98, 72–83.

KNN-Based Overlapping Samples Filter Approach …

73

17. Li, J., Fong, S., Hu, S., Chu, V. W., Wong, R. K., Mohammed, S., & Dey, N. (2017, November). Rare event prediction using similarity majority under-sampling technique. In International Conference on Soft Computing in Data Science (pp. 23–39). Singapore: Springer. 18. Li, J., Fong, S., Hu, S., Wong, R. K., & Mohammed, S. (2017, August). Similarity majority under-sampling technique for easing imbalanced classification problem. In Australasian Conference on Data Mining (pp. 3–23). Singapore: Springer. 19. Lin, W. C., Tsai, C. F., Hu, Y. H., & Jhang, J. S. (2017). Clustering-based undersampling in class-imbalanced data. Information Sciences, 409, 17–26. 20. Ofek, N., Rokach, L., Stern, R., & Shabtai, A. (2017). Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem. Neurocomputing, 243, 88–102. 21. Saryazdi, S., Nikpour, B., & Nezamabadi-Pour, H. (2017, December). NPC: Neighbors’ progressive competition algorithm for classification of imbalanced data sets. In 2017 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS) (pp. 28–33). New York: IEEE. 22. Saez, J. A., Luengo, J., Stefanowski, J., & Herrera, F. (2015). SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences, 291, 184–203. 23. Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and CyberneticsPart A: Systems and Humans, 40(1), 185–197. 24. Song, J., Huang, X., Qin, S., & Song, Q. (2016, June). A bi-directional sampling based on K-means method for imbalance text classification. In 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS) (pp. 1–5). New York: IEEE. 25. Tomek, I. (1976). Two modifications of CNN. IEEE Transactions on Systems, Man and Cybernetics, 6, 769–772. 26. Tsai, C. F., Lin, W. C., Hu, Y. H., & Yao, G. T. (2019). Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Information Sciences, 477, 47–54. 27. Vorraboot, P., Rasmequan, S., Chinnasarn, K., & Lursinsap, C. (2015). Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms. Neurocomputing, 152, 429–443. 28. Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, 3, 408–421. 29. Yen, S. J., & Lee, Y. S. (2009). Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications, 36(3), 5718–5727.

Spectrum-Based Bug Localization of Real-World Java Bugs Cherry Oo and Hnin Min Oo

Abstract The localization of software bug is one of the most expensive tasks of program repair technology. Hence, there is a great demand for automated bug localization techniques that allow a programmer to be monitored up to the location of the error with little human arbitration. Spectrum-based bug localization helps software developers to quickly discover errors by investigating a program’s trace summary and creating a ranking list of most modules that may be in error. We used the real-world Apache Commons Math and Apache Commons Lang Java projects to examine the accuracy using spectrum-based bug localization metric. Our findings show that the higher performance of the specific similarity coefficients used to examine the spectra information is more effective in locating individual bugs. Keywords Software testing · Bugs · Program spectra · Bug localization

1 Introduction Bug identification, localization, and repair are vital software development activities. While software testing is the main activity for program bug identification, software repairing is the process of finding and correcting portions of the buggy program. The process of bug localization mentions the problem of detecting portions of buggy programs due to test execution failures. It was recognized as one of the expensive parts of the repair process that justifies the vital research effort for automated bug location action [1]. Buggy statements in software code may lead the program failures such as crack or incorrect results and outcomes in the software development lifecycle. The task to decide and discover the buggy statements is called bug localization. In a software C. Oo (B) · H. M. Oo Software Engineering Lab, University of Computer Studies, Mandalay, Myanmar e-mail: [email protected] H. M. Oo e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_5

75

76

C. Oo and H. M. Oo

system, it will be very time consuming for the software developer to locate the buggy statements because of containing thousands of lines of code. Researchers have designed effective ways to find the buggy statement through bug localization approaches [1]. One of the popular in software repairing approaches is Spectrum-based Bug Localization (SBBL). In SBBL, the statement execution record (program spectra) of passing and failing test cases are examined to support program developers to locate the buggy statements. SBBL metrics have been designated to rank the buggy statements in program code according to their suspicious scores. In SBBL, statements with the highest score calculated by the SBBL metric will be ranked first as it is the most suspiciousness that might be the buggy statement. On the other hand, the statement with the lowest score is the safest statement as it is most not likely to be the buggy statement. Through this ranking, the software developer can examine the top ranking statement first to locate the buggy statement rather than checking statement by statement from the beginning until the end of the program code. The performance of SBBL metric is determined by how high it ranks the buggy statement based on the suspicious score calculated from the SBBL metric. In this paper, we analyze individual bugs in Apache Commons Math and Apache Commons Lang Library projects using our spectrum-based metric. In particular, the paper makes the following contributions: • The first study is the comparison of the bug-localization ability of our approach with Ochiai, Tarantula, and Jaccard. For the subjects studied, our study shows our approach consistently outperforms these techniques, performing it the best techniques known for bug localization on these subjects. • The second study is a description of our approach in terms of suspicious ranking for their suspiciousness that provides a way to compare it with the Ochiai, DStar, Tarantula and Jaccard techniques, as well as other future techniques [11]. The remaining of this paper is organized as follow: Sect. 2 outlines the background of Spectrum-based Bug Localization (SBBL), followed by the methodology of our approach in Sect. 3. We discussed our experimental results in Sect. 4 and some related works in Sect. 5 and we concluded the paper in Sect. 6.

2 Background As input, a bug localization technique takes a buggy program and its test suite containing at least one failed test, and as output, it produces a ranked list of suspicious statement locations, such as blocks or statements. In this paper, we use program statements as the locations [13]. Given a bug localization technique and a buggy program with an individual buggy statement, the quality of the technique can be calculated numerically as follows: (1) execute the bug localization technique to calculate the list of suspicious program

Spectrum-Based Bug Localization of Real-World Java Bugs

77

statements; (2) use the metric proposed in the literature to assess the effectiveness of the technique [13].

2.1 Failures, Errors, and Bugs A program bug is a failure, error, fault, or flaw in software that produces an unexpected or incorrect outcome. The bug repairing process regularly uses proper tools or techniques to identify bugs, and some computer systems find or repair various bugs during operations since the 1950s. Most of the bugs arise from errors and mistakes made in the program source code or program components. A small number of bugs are caused by compilers because the incorrect code is produced by compilers. A buggy program can contain a large number of bugs that seriously interfere with its functionality. Bugs can effect errors that may have ripple effects. Bugs may have subtle effects or cause the program to freeze or crash the computer system.

2.2 Program Spectra At run-time, program spectra are collected as the records that provide an exact observation on the lively behavior of program for different parts of a program, it classically consists of a number of flags or counters. In this paper, we work with statement hit spectra [2]. A hit spectra of the program statement consists of a counter for every statement of the program source code that indicates in a particular run whether or not that statement was executed [2].

2.3 Spectrum-Based Bug Localization Two types of information are employed by the SBBL technique and they are gathered during program testing, clearly outcomes of testing and program spectra. While a program spectrum is a data collection, the testing outcome related to records whether each test case is failed or passed [18]. Given a buggy program P = S1 , S2 , …, S j with j statements and executed by i test cases T = T1 , T2 , …, Ti . The testing outcomes of all test cases are recorded as spectra information of the program in the form of a matrix. The component in the ith row and jth column of the matrix denotes the spectral information of statement S j , by test case Ti , with 1 indicating S j is executed, and 0 otherwise [18]. An example is shown in Fig. 1. It shows a buggy program that contains six statements S1 , S2 , S3 , S4 , S5 , S6 , but we don’t consider some statements that contain

78

C. Oo and H. M. Oo

Fig. 1 Buggy program example Table 1 Test suite of example buggy program Test case Input (a, b) Test 1 Test 2 Test 3 Test 4 Test 5

(3, 5) (5, 3) (4, 0) (0, 4) (1, 1)

Expected output

Actual output

−2 −2 4 −4 0

−2 −2 −4 −4 0

Table 2 Coverage information Test case Statement (S1 , S2 , S3 , S5 ) T1 T2 T3 T4 T5

(1, 1, 0, 1) (1, 1, 0, 1) (1, 1, 1, 0) (1, 1, 1, 0) (1, 1, 0, 1)

Error status 0 0 1 0 0

opening and closing curly bracket (). Table 1 shows five test cases T1 , T2 , T3 , T4 , T5 to test the buggy program. Specifically, four test cases among them pass and T3 gives rise to failed run. The coverage information for each statement is recorded as a matrix. Finally, a coverage information matrix is generated by mean of the gathered information. The matrix is listed in Table 2. In Table 2, S j represents a statement of a buggy program; Ti represents a test case. (S j , Ti ) = 1 represents S j is covered by test case Ti ; on the contrary, (S j , Ti ) = 0 represents S j is not covered by test case Ti [17]. ErrorStatus denotes the program execution result of a test case, (ErrorStatus, Ti ) = 1 means the execution effect of Ti is fail whereas (ErrorStatus, Ti ) = 0 means pass. In addition, a11 , a10 , a01 , and a00 are coverage statistics result of the statement in program execution. For the execution of a statement with a test case, only one of these four symbols can be assigned by value 1, e.g., a11 = 1 means this statement is covered by this test case and the result is failed. Therefore, a11 , a10 , a01 , and a00 are the sum of the result value of program execution with each test case respectively [14]. For example, (S1 , a10 ) = 4 means

Spectrum-Based Bug Localization of Real-World Java Bugs

79

S1 is covered 4 times in total by test case set T = T1 , T2 , T3 , T4 , T5 . For gathering the aforementioned information accurately and rapidly, our study utilizes program instrumentation technique to obtain execution. Furthermore, one of the most popular units testing tools-JUnit1 is used to the input test case. Previous studies have identified some coefficients, such as Ochiai, and Tarantula, as the best metric to be used for SBBL. For example, two popular bug localization techniques, Ochiai [2] and DStar [16] are defined as follows. Sj = √

a11 ( j) (a11 ( j) + a01 ( j)) ∗ (a11 ( j) + a10 ( j)) Sj =

a11 ( j)∗ a10 ( j) + a01 ( j)

(1)

(2)

A program statement with higher suspicious value is a higher likelihood to be buggy. Therefore, the statements are sorted according to their suspiciousness in descending order after assigning the suspicious values to all program statements [10]. Repairing techniques begin from top to bottom of the ranking list. To identify the buggy statements, an effective method should be able as top in the ranking list as possible [18].

3 Methodology SBBL techniques inform buggy statements only after examining a large number of lines or code elements. This section presents our approach to locate the buggy statements from Apache Commons Math and Apache Commons Lang projects. There are two steps in our approach: 1. Program spectrum information gathering: The statement coverage information and its execution result associated with a certain test case set will be gathered. 2. Calculate the suspicious value: For a program statement, the suspicious value is calculated by the purposed bug localization method, respectively.

3.1 Real-World Java Projects The Apache Commons is a project of the Apache Software Foundation. The purpose is to provide reusable, open source Java software [8]. Commons Math is distributed under the terms of the Apache License, Version 2.0. Apache Commons Math2 consists of mathematical functions, structures representing mathematical concepts (like 1 www.junit.org. 2 http://commons.apache.org/proper/commons-math/.

80

C. Oo and H. M. Oo

Table 3 Summary of subject projects a KLOC Project Commons Math Commons Lang a KLOC

85 22

Test a KLOC

Number of test cases

19 6

3602 2245

for the most recent version

complex numbers, polynomials, vectors, etc.), and algorithms that we can apply to these structures (root finding, optimization, curve fitting, computation of intersections of geometrical figures, etc.). Apache Commons Math project consists of 106 bugs in total. Among them, 26 bugs are individual bugs. The standard Java library does not provide enough methods to manipulate its core classes. Apache Commons Lang3 provides these additional methods. Lang provides lots of helper utilities, especially basic numeric methods, string manipulation methods, object reflection, creation, serialization and concurrency, and system properties for the java.lang API. In addition, it includes a set of dedicated utilities that aid in the basic extension of java.util.Date and construct methods such as hashCode, toString, equals. Apache Commons Lang Library consists of 13 individual bugs. Table 3 demonstrates the summary of subject projects. We apply our spectrumbased ranking metric to localize 26 bugs in Apache Commons Math and 3 bugs in Apache Commons Lang. In the debugging process, we can repair individual bugs with some patterns such as one line removal, one line addition, or one line replacement.

3.2 Program Spectrum Information Gathering Program spectrum or coverage information reflect a certain face of program execution. More specifically, coverage information shows whether a program unit is executed during execution with a certain test case. This information has been widely used in software testing, and it also can be used for bug localization. While program unit can be defined variously, such as statement, basic block, predicate, method, and path, etc. In our study, statement coverage information is utilized since it is simple to calculate, and most important of all, the benefit of statement coverage is its ability to be used for statement-level bug localization. In addition, the corresponding execution result is also collected. Our approach collected spectral information such as a11 , a10 , a01 , and p j for each buggy statement while other SBBL techniques collected spectral information such as a11 , a10 , a01 , and a00 . We use p j value instead of a00 because the number of test cases is required that it is important whether each buggy statement is passed or failed by test cases [19]. So, we find that the p j values depends on which: pj =



Sj.

3 https://commons.apache.org/proper/commons-lang/.

(3)

Spectrum-Based Bug Localization of Real-World Java Bugs

81

For p j value, we find the total number of test cases which pass on each statement S j and then calculate the suspiciousness of each buggy statement using the collected spectra information.

3.3 Calculating the Suspiciousness Spectrum-based bug localization methods generally calculate the suspicious value by using collected information, such as a11 , a10 , a01 , and a00 (but we didn’t use this one). Researchers have been proposed many formulas for calculating the suspicious value and program units are ranked by the value to predict the probability of containing the bug. Our SBBL metric is: Sj =

a11 ( j) . a11 ( j) + a10 ( j) + a01 ( j) + p j

(4)

In the above equation, a11 means the case which discovers a bug when the statement is passed. a10 means the case which does not discover a bug when the statement is passed. a01 means the case which discovers bug when the statement is not passed and p j means the number of test case which passes on each statement.

4 Empirical Evaluation All our experiments were performed on Intel(R) Core(TM) i3-6100U CPU @2.30GHz machine with 4.00 GB of RAM. The effectiveness of the SBBL technique depends on the set of failed and passed test cases. The most efficient approach may not be to use two sets of test cases to locate bugs [14]. We explore the following research questions: RQ1: How the effectiveness of existing bug localization methods in the same program associated with the test case set? RQ2: How the effectiveness of our proposed method compared with some existing automatic bug localization methods? We apply our SBBL technique to Apache Commons Math and Apache Commons Lang projects, and check the output ranked list of individual bugs identified as likely bug locations. Figure 2 shows the results of our method on individual bugs from Apache Commons Math Library project. We made the localization of 26 individual bugs by our method. We localized 10 out of 26 individual bugs in rank one.

82

Fig. 2 The ranking list of individual bugs using our proposed metric

C. Oo and H. M. Oo

Spectrum-Based Bug Localization of Real-World Java Bugs

83

4.1 Evaluation Metrics For evaluating a bug localization technique, one important principle is to measure its effectiveness, such as statements that are inspected by programmers to locate the bugs. A test set may also result in two different sets of test cases when performed against the same program but in two altered environments [14]. We considered the following two metrics to assess the effectiveness of our approach: mean reciprocal rank (MRR) and top-N rank. MRR and top-N are widely used to evaluate techniques for bug localization [7, 20]. Top-N: This metric count the number of ranked results within the Top-N ranking (N = 1, 3, 5, 10). If the techniques of bug localization share the same score, we use the average position to show the location of the bug. Higher Top-N refers to a more effective location of bugs [21]. Mean Reciprocal Rank (MRR): The reciprocal rank of a query is the reciprocal position in the results ranked as suspicious for the first buggy statement. MRR is the mean of the reciprocal ranks of the results of a set of statements, Q, and it can be calculated as follows: |Q| 1  1 . (5) MRR = |Q| i=1 ranki MRR covers the overall quality of ranked suspicious statements. Larger values of all metrics indicate better accuracy [20].

4.2 Experimental Results For RQ1, Fig. 3 shows the results of our proposed method and Ochiai coefficient. Both our metric and Ochiai metric got the top one position for 13 bugs, but Ochiai localized 16 bugs within the top three and 20 bugs within the top ten ranking list while our metric localized 17 bugs within the top three and 22 bugs within the top ten ranking list. Figure 4 shows the comparison results of our metric and DStar metric on the bugs of Commons Maths project. BugID 75 is ranked at position 480 by DStar metric while our metric localized it at position 1. Figure 5 demonstrates the numbers of ranks of our proposed metric and the DStar metric. We calculate them using the value 2 for the power value in DStar metric. The comparison of the proposed metric and other metrics such as Tarantula and Jaccard are shown in Figs. 6 and 7. For RQ2, Tables 4 and 5 show the results of our proposed method and other methods, i.e., Ochiai [2], DStar [16], Tarantula, and Jaccard [2, 3, 5, 9, 16]. Both our metric and Ochiai metric produced for 13 bugs at Top-1, but Ochiai produced only 61.5% average rate in Top-3 and 76.9% in Top-10 level and got 0.53 in MRR while our proposed metric produced 65.4% in Top-3 and 84.6 in Top-10 and got

84

C. Oo and H. M. Oo

Fig. 3 Rank level comparison of Ochiai and our proposed metric

0.54 in MRR. According to the results of MRR and Top-N, our proposed metric outperforms than other metrics such as Ochiai, DStar, Tarantula, and Jaccard. So, our approach is more effective than others in localizing for individual bugs.

5 Related Work Spectrum-based bug localization is the representative among the bug localization approaches [12]. Spectrum-based bug localization is the approaches which estimate the relationship between the information about passed test cases failed test cases and the hit spectra information of statements [6]. If the failed test case occurs in the runtime, this case can say that the statement contains a bug. Spectrum-based bug localization is how to discover the bug location by using coverage information of a11 , a10 , a01 , and a00 . For example, assumes that five test cases hit 3rd statements 3 times. If one test case among them is failed, this statement is likely to contain a bug relatively. However, if all test cases are passed, suppose that this statement is not likely to contain a bug. There are some representative algorithms such as Ochiai and Tarantula. Each algorithm calculates the suspicious ratio respectively [17]. Abreu et al. [3] proposed a metric, called Ochiai, to get better effectiveness for bug localization techniques and then to enhance its diagnostic quality, they proposed a combination framework that is SBBL with a model-based debugging. The modelbased approach is used for refining the ranking obtained from the spectrum-based method. Furthermore, Abreu et al. [4] also proposed a fault localization method to solve the multiple faults problem. For root cause analysis on the J2EE platform,

Spectrum-Based Bug Localization of Real-World Java Bugs

85

Fig. 4 Comparison of DStar and our proposed metric

Chen et al. [5] proposed a framework and it is targeted at large, dynamic Internet services, such as search engines and webmail services. Jones et al. developed the Tarantula tool for the C language and works with spectra information [9]. Wong et al. [15, 16] proposed a technique called DStar that could automatically suggest suspicious locations for fault location without requiring prior program structure or semantic information. It is demonstrated to be more effective than some coefficient - based similarity techniques, such as Tarantula. In addition, DStar’s superior effectiveness is not limited to programs that contain only one fault; it extends equally well to programs with multiple faults based on their investigation.

86

C. Oo and H. M. Oo

Fig. 5 Number of ranks for DStar and our proposed metric

Fig. 6 Comparison of Tarantula and our proposed metric

In terms of early spectrum-based methods, only failed information is utilized for locating bugs. Based on these methods, the later studies obtain better results by means of using both the passing and failing test cases. SBBL method uses a different metric to evaluate the probability of containing a bug of a unit of the program, and a ranking list is produced to highlight program units which strongly correlate with failures [18]. At present, many formulas of SBBL have already been proposed, typical SBBL methods include Ochiai, DStar, Jaccard, Tarantula, and so on.

Spectrum-Based Bug Localization of Real-World Java Bugs

87

Fig. 7 Comparison of Jaccard and our proposed metric Table 4 Ranking list of Ochiai and our proposed metric Rank Ochiai 1 2 3 6 7 12 18 19 72 84 92 105 108 Out

13 2 1 3 1 2 1 1 1 0 1 1 1 1

Our metric 13 2 2 4 1 2 1 0 0 1 1 0 1 1

88

C. Oo and H. M. Oo

Table 5 Top-N and MRR comparison with other approaches Approach Top-1 Top-3 Top-5 Our metric Ochiai DStar Tarantula Jaccard

44.8 44.8 41.4 0.00 0.00

65.4 61.5 48.3 3.85 0.00

65.4 61.5 48.3 3.85 0.00

Top-10

MRR

84.6 76.9 58.6 7.69 11.54

0.54 0.53 0.48 0.04 0.03

6 Conclusion We conclude that the superior performance of our metric in localizing individual bugs in Apache Commons Math and Apache Commons Lang library projects. In this paper, we proposed an effective spectrum-based bug localization approach for individual real-world Java bugs. We evaluated our approach on real bugs from two real-world projects. In our approach, we only need to study SBBL metrics from two collections (i.e., a11 , a10 ). A statement executed by more failed test cases has higher possibility to be buggy so that it is observed to have the most significant outcome on the effectiveness of a metric. The experimental results showed that our approach outperforms the existing three SBBL techniques for Java programs significantly with low overhead. In future work, we plan to study the localization of other individual and multilocation bugs, from large scale real-world Java programs.

References 1. Abreu, R., Zoeteweij, P., Golsteijn, R., & Van Gemund, A. J. (2009). A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 82(11), 1780–1792. 2. Abreu, R., Zoeteweij, P., & Van Gemund, A. J. (2006, December). An evaluation of similarity coefficients for software fault localization. In 12th Pacific Rim International Symposium on Dependable Computing, 2006. PRDC’06 (pp. 39–46). IEEE. 3. Abre, R., Zoeteweij, P., & Van Gemund, A. J. (2007, September). On the accuracy of spectrumbased fault localization. In Testing: Academic and Industrial Conference Practice and Research Techniques-MUTATION (TAICPART-MUTATION 2007) (pp. 89–98). IEEE. 4. Abreu, R., Zoeteweij, P., & Van Gemund, A. J. (2009, November). Spectrum-based multiple fault localization. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering (pp. 88–99). IEEE Computer Society. 5. Chen, M. Y., Kiciman, E., Fratkin, E., Fox, A., & Brewer, E. (2002, June). Pinpoint: Problem determination in large, dynamic internet services. In null (p. 595). IEEE. 6. Fu, W., Yu, H., Fan, G., Ji, X., & Pei, X. (2017, November). A test suite reduction approach to improving the effectiveness of fault localization. In 2017 International Conference on Software Analysis, Testing and Evolution (SATE) (pp. 10–19). IEEE. 7. Gharibi, R., Rasekh, A. H., & Sadreddini, M. H. (2017, October). Locating relevant source files for bug reports using textual analysis. In 2017 International Symposium on Computer Science and Software Engineering Conference (CSSE) (pp. 67–72). IEEE.

Spectrum-Based Bug Localization of Real-World Java Bugs

89

8. Hall, T., Zhang, M., Bowes, D., & Sun, Y. (2014). Some code smells have a significant but small effect on faults. ACM Transactions on Software Engineering and Methodology (TOSEM), 23(4), 33. 9. Jones, J. A., & Harrold, M. J. (2005, November). Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of the 20th IEEE/ACM International Conference on Automated software engineering (pp. 273–282). ACM. 10. Laghari, G., Murgia, A., & Demeyer, S. (2016, August). Fine-tuning spectrum based fault localisation with frequent method item sets. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (pp. 274–285). ACM. 11. Le, T. D. B., Lo, D., & Li, M. (2015, September). Constrained feature selection for localizing faults. In 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) (pp. 501–505). IEEE. 12. Le, T. D. B., Oentaryo, R. J., & Lo, D. (2015, August). Information retrieval and spectrum based bug localization: Better together. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (pp. 579–590). ACM. 13. Pearson, S., Campos, J., Just, R., Fraser, G., Abreu, R., Ernst, M.D., et al. (2017, May). Evaluating and improving fault localization. In Proceedings of the 39th International Conference on Software Engineering (pp. 609–620). IEEE Press. 14. Schneidewind, N., Montrose, M., Feinberg, A., Ghazarian, A., McLinn, J., Hansen, C., et al. (2010). IEEE reliability society technical operations annual technical report for 2010. IEEE Transactions on Reliability, 59(3), 449–482. 15. Wong, W. E., Debroy, V., Gao, R., & Li, Y. (2014). The DStar method for effective software fault localization. IEEE Transactions on Reliability, 63(1), 290–308. 16. Wong, W. E., Debroy, V., Li, Y., & Gao, R. (2012, June). Software fault localization using DStar (D*). In 2012 IEEE Sixth International Conference on Software Security and Reliability (pp. 21–30). IEEE. 17. Wong, W. E., Qi, Y., Zhao, L., & Cai, K. Y. (2007, July). Effective fault localization using code coverage. In 31st Annual International Computer Software and Applications Conference, 2007. COMPSAC 2007 (Vol. 1, pp. 449–456). IEEE. 18. Xie, X., Chen, T. Y., Kuo, F. C., & Xu, B. (2013). A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Transactions on Software Engineering and Methodology (TOSEM), 22(4), 31. 19. Xu, Y., Yin, B., Zheng, Z., Zhang, X., Li, C., & Yang, S. (2019). Robustness of spectrumbased fault localisation in environments with labelling perturbations. Journal of Systems and Software, 147, 172–214. 20. Youm, K. C., Ahn, J., & Lee, E. (2017). Improved bug localization based on code change histories and bug reports. Information and Software Technology, 82, 177–192. 21. Zhang, M., Li, X., Zhang, L., & Khurshid, S. (2017, July). Boosting spectrum-based fault localization using PageRank. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (pp. 261–272). ACM.

The Role of Unconscious Bias in Software Project Failures C. J. B. Macnab and Sam Doctolero

Abstract Failures of large-scale software projects continue at an alarming rate; moreover processes and frameworks known to address the (well-documented) problematic issues remain under-utilized. In this paper we posit that unconscious bias, rather than incompetence or unprofessionalism, explains these trends. We present a quantitative model of personalities that makes simple, testable predictions about biases. We analyze the main biases of, in particular, software managers in this paper - and how their biases contribute to software failures. We demonstrate biases in the process of looking for software solutions through text-analysis. We analyze a popular development framework, SAFe, for how well it addresses these biases and offer modifications to the framework that should better avoid the effects from these biases. Keywords Software failure · Project failure · Software management · Software dysfunction · Project management · Failure factors · Risk management · Project outcomes · Development processes

1 Introduction The software industry continues to inadvertently create a vast number of software project disappointments each year. Projects usually get described as challenged if the project gets completed and becomes operational, but over budget, over the time estimate, or with fewer features and functions than initially specified. People may refer to project failure only as the “total abandonment of a project before or shortly after it is delivered” [1], but here we use the generic term “failure” to refer to both situations. The many articles, processes, and methodologies that point out ways to avoid problems and catastrophes have only moderately improved success rates; companies still do not deliver over half of projects successfully [2], and poorly-

C. J. B. Macnab (B) · S. Doctolero Schulich School of Engineering, University of Calgary, Calgary, Canada e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_6

91

92

C. J. B. Macnab and S. Doctolero

designed systems that end up wasting billions of dollars or destroying companies regularly make the news, for example [3, 4]. Why do people ignore good advice to such an extent? We posit that only human factors can explain the insistence on maintaining dysfunctional processes that often cause failure; since the software industry consists of dedicated and highly-trained individuals only unconscious bias offers a reasonable explanation. This paper intends to reveal the main sources of bias that cause software failures, and show the reader how to avoid these common pitfalls. This paper probably represents a difficult read for some; in general we like our biases because they make us feel comfortable, often look consistent with cultural values, and get reinforced (or enforced) through social pressures. This paper describes exactly where the biases come from, using a quantified predictive model of human personalities. We use this model to precisely diagnose the problems in software development and outline simple solutions. We do not claim this paper represent dramatic breakthroughs in theory; rather we talk about things that seem fairly obvious to everyone, regularly acknowledged in popular media for instance, yet somehow still seem forbidden to talk about in the work place (much less use as a reason for changing workplace behaviours and processes). Would we rather risk spectacular failure than face discomfort in confronting our own biases and going against social norms? Too often the answer is “yes” according to those who have examined these issues: The biggest tragedy is that software failure is for the most part predictable and avoidable. Unfortunately, most organizations don’t see preventing failure as an urgent matter, even though that view risks harming the organization and maybe even destroying it. Understanding why this attitude persists is not just an academic exercise; it has tremendous implications for business and society. [1] Yet failures, near-failures, and plain old bad software continue to plague us, while practices known to avert mistakes are shunned. It appears that getting quality software is not an urgent priority in most organizations. The tragedy of software engineering is not that we don’t know how to plan and conduct software rigorous and elaborate test, but that we know how and just don’t want to do it. [5]

We recommend companies start formulating and/or modifying development frameworks and processes to address biases specifically, and we give an example of how to do so in this paper. This solution seems succinctly “simple”, but the powerful influence of social norms can make the approach quite difficult to “sell” and implement.

2 A Model of Personalities Here we briefly describe an objective evaluation method for personalities based on our idea of developing a scientific model that can make testable predictions; we use different assumptions than many about how one may construct academic models of human thinking, combining ideas of personality from the world of clinical psychology with those found in the self-help section of the bookstore. In our graphing

The Role of Unconscious Bias in Software Project Failures emotional Dependent

empowering

Visible

Hidden

Avoidant

Visible

Borderline

conscious

Histrionic

Hidden

manipulative

Visible

Hidden

self-images Visible

Schizoid

Hidden

Fig. 1 Personalities lie inside the circle and personality disorders lie further out on the graph. Note that unconscious thinking, feeling, and motivations will be the opposite to the conscious self-image i.e. the terms are conscious self-images, not “truths”

93

Schizotypal

Narcissistic Antisocial

logical

system, we put self-images of emotional and logical thinking on the vertical axis of a graph, and self-images of how much we use empowerment and manipulation on the other axis; we then look at the personality self-images that one finds in the quadrants (Fig. 1). Thus, the vertical axis determines how we see ourselves pondering and reflecting on things in an effort to make decisions, while the horizontal axis determines how we see ourselves influencing people. Since everyone has a bias against (at least) one of logic, emotion, empowerment, or manipulation (depending on personality type) this model generally makes people feel uncomfortable when they first see it; but since it produces quantifiable, testable predictions it constitutes a valid scientific model. We notice that people who lie very far from the center typically get a diagnosis of personality disorder [6]. In the graph the visible characteristic (always the axis on the counter-clockwise side of a quadrant) shows the obvious characteristic of the personality disorder apparent to outside observers, and quite apparent in written descriptions of the disorders. Narcissistic always comes across as manipulative and Antisocial as controlling, so we have labelled the right horizontal axis with visible on the graph. Schizoid and Schizotypal come across as emotionless, and we argue they see themselves as logical since they are obsessed with creating theories and so we have labelled the bottom vertical axis with visible on the graph. Avoidant gives power away by leaving people alone, while Dependent allows someone to have all the power over them, so we have labelled the left horizontal axis with visible on the graph. People always describe Histrionic and Borderline as extremely or overly emotional, so we have labelled the top vertical axis with visible on the graph. The personality disorders furthest from the center sound like the worst ones (with the most suffering) from current written descriptions. The hidden characteristic does not appear obviously significant to outsiders, but comes across to friends and family. The hidden characteristics show consistency with descriptions of the personality disorders from the literature, in that no one describes them as having the opposite characteristic: Narcissistic and Antisocial never get described as emotional, Schizoid and Schizotypal never sound manipulative, Avoidant and Dependent do not

94

C. J. B. Macnab and S. Doctolero emotional

UR

UL

Blue

Gold empowering

manipulative

Orange

Green LL

LR logical

Fig. 2 Personalities inside the circle: the Modified-Lowry Colors personality self-image theory

get labelled as logical/rational, and no one describes Histrionic and Borderline as empowering. People who lie away from the center, yet do not actually have a personality disorder, will start to display some erratic behaviour. We have no polite names for such people; with apologies we point out that people often describe such upper-right (UR) people as fitting a stereotype of being like a “hysterical woman”, while such lower-right (LR) people might seem like a “sociopath”, such lower-left (LL) people might come across like a “zany professor”, and a “psychology student studying to solve their own problems” gives an offensive picture for an upper-left (UL) person. In the graph the circle represents the point where people generally start to have serious problems in life; inside the circle indicates where people feel integrated and balanced (everyone would have their own individual-size circle) (Fig. 2). Inside the circle, people in different quadrants simply come across as having personalities. Personality stems from the amount of imbalance in the four characteristics: a perfectly certain person would have no personality. One of the four-color personality theories closely maps to our quadrant, called Lowry True Colors [7] (Fig. 2). Our proposed wording for a Modified-Lowry Colors that makes it exactly match our system (with our own visible quadrant characteristics in brackets to make the mapping clear) is: Green personality types feel primarily thought-oriented (with logic) and like to argue ideas, Gold personality types feel primarily mission-oriented (for empowering others) and like to plan, Orange personality types feel primarily action-oriented (using manipulation) and like to get things done, Blue personality types feel primarily people-oriented (using emotions) and like relationships. In our community of the university engineering faculty, we find many Orange research professors, many Green students, many Gold teaching professors and administrators, and many people choosing Blues as romantic partners. Many claim that personality temperament theories like Lowry Colors are not empirically valid because they cannot predict an individual. However, since the dis-

The Role of Unconscious Bias in Software Project Failures

95

tribution of such personalities in a given population is constant, probability models that predict macroscopic behaviour should be used (e.g. similar to the Schrödinger equation in quantum mechanics or the Hardy-Weinberg principle in genetics) and we developed successful personality-type computer models using this idea in [8]. Thus, our model of bias in this paper describes populations like a group of Orange managers or a group of Green programmers, but does not necessarily hold on an individual level. Our key idea of biases identified by the quadrant system: the visible and hidden characteristics of the personality disorders reveals the unconscious biases of all the people in that quadrant. Specifically the visible characteristic predicts a projected bias, often felt as (an irrational) hope in the conscious mind, and the hidden characteristic predicts a reflected bias, often felt as (an irrational) fear in the conscious mind. We use the word “reflected” to indicate that it does not cause a bias or fear when one engages in this characteristic, but rather when ones see this characteristic reflected in the behaviour of other people i.e. other people’s behaviour triggers one’s bias and irrational fear. For example, if you think of yourself as Green (lower-left) you probably felt an uncomfortable reaction when we said we were going to use theories usually found in the self-help section of the bookstore (reflected bias around empowerment). If Orange, you probably felt uncomfortable with our statement about using logic based on a different set of assumptions (reflected bias around logic). If so, this gives the reader a sense of the magnitude of bias we face in trying to create software frameworks designed to work rather than to make us feel good about ourselves. Each type also has two internal biases: the two characteristics not associated with their quadrant. People are generally not interested in these other two, i.e. they don’t see value, rather than feeling triggered by them or worried about other people displaying them. The Enneagram personality theory, as people use it today, was created in the 1960s using one person’s intuitive understanding of human motivations [10], whose methods were later popularized through mainstream books [11, 12]. Many leaders in the religious world have put significant effort into refining it, based on insights into human motivations gained during pastoral conversations with “their people” (conversations involving spiritual counselling, support, and guidance). Many people in North America now use the Enneagram as their primary tool for self-reflection, selfdiscovery, and self-growth. The typical bookstore carries many titles, many websites describe the various aspects, and Amazon.com lists over 1000 books when searching for “Enneagram.” We do not modify the descriptions of the Enneagram personality types in any way to map it to our quadrant; rather we strictly use the descriptions from [9, 13]. The descriptions in these books achieve very high consistency, but emphasize different aspects. The Enneagram describes nine personality types. The numbers define the types, but every book gives different names. We use the names from [9] in Table 1 where we also list five key internal understandings of motivations and reflections on self that give us insight into the different types. Although one finds many aspects to explore in the Enneagram, we highlight the direction of integration as a key feature of interest (Fig. 3). The direction of integration shows the healthy direction of growth, and also indicates what one might feel like

96

C. J. B. Macnab and S. Doctolero

Table 1 The Enneagram personality: internal motivations and reflections [9] Name

I have the need to

I am... Self-Image

Avoidance: I Pitfall: I get don’t like to trapped

Temptation: I feel good

Passion (Sin)

1

Reformer

Be perfect

I am right

Feel angry

Criticizing

Perfection

anger

2

Helper

Be needed

I help

Suppress needs

Flattering

Helping

Pride

3

Achiever

Succeed

I am successful

Fail

Grooming

Efficiency

Deceit

4

Individualist Be special

I am different

Be ordinary

Mourning

Authenticity Envy

5

Investigator

Perceive

I see through Feel empty

Wanting

Knowledge

Greed

6

Loyalist

Be secure

I do my duty Doubt

Fearing

Security

Fear

7

Enthusiast

Avoid pain

I am happy

Feel pain

Scheming

Idealism

Gluttony

8

Challenger

Be against

I am strong

Feel weak

Taking revenge

Justice

Lust (for power)

9

Peacemaker

Avoid

I am content See or be in conflict

Being comfortable

Selfdeprecation

Sloth

Fig. 3 The original Enneagram with directions of integration (mental health)

Peacemaker (Mediator) Challenger (Protector)

9

Reformer (Perfectionist)

8

1

Enthusiast (Epicure)

Helper (Giver)

7

2 3

6

Loyalist (Skeptic)

Achiever (Motivator)

5

Investigator (Observer)

4

Individualist (Romantic)

when really relaxed (like on vacation). One takes on some characteristics of the type in one’s direction of disintegration (the opposite direction) when one becomes less mentally healthy, or temporarily when one encounters a lot of life stress. What we call the direction of mental comfort, not shown in the original configuration, indicates the preferred personality type for each personality i.e. what each type wants to be like in the conscious mind. For the 6-3-9 triad the direction of mental comfort goes the opposite way to the direction of integration, but lies in the same direction for the other types [13]. By redrawing the Enneagram as a continuous circle of counter-clockwise directions of mental comfort, we can map it to our personality grid (Fig. 4). Note this configuration simply changes the original configuration by having counter-clockwise integrations for the 8-2-4-1-7-5 types. The new configuration simply explains the Enneagram types in terms of one’s self-image of one’s relative use of emotion, logic, empowerment, and manipulation. We fully describe the reasons for the mapping on the proposed quadrant system in terms of standard Enneagram theory in Appendix A. Some data from Google-searches appears in Appendix B which shows consistency with the mapping between Enneagram and the personality disorders on the quadrant.

The Role of Unconscious Bias in Software Project Failures Fig. 4 Our proposed new Enneagram configuration: spiral for 8-2-4-1-7-5 and near-triangle for 3-6-9 indicate integration/disintegration directions- closer to the middle is healthier. Circle with arrows indicates mental comfort (desirability). Type 3’s make stereotypical middle managers while Type 5’s make stereotypical computer programmers

97 emotional

9 1

4

7

2 manipulative

empowering

6

3 manager

programmer 5

8 logical

Note that the hypothesized position for the Type 9 Enneagram personality does not lie in a quadrant and does not have a Lowry color; we do not discuss the Type 9 in this paper. Note when one starts to take on a personality disorder, one does not necessarily move from a personality in a quadrant out to a personality disorder in the same quadrant [12, 14].

3 The Key Relationship Dynamic in Software Development A stereotypical computer programmer has a Type 5 self-image in the Enneagram: 5’s are discoverers of new ideas, researchers and inventors, objective, questioning, and interested in exploring things in detail. They can be original minds, provocative, surprising, unorthodox and profound. They are good listeners, because they pay close attention. ... 5’s link their knowledge to a search for wisdom and strive for a sympathetic knowledge of the heart... The primary experience of many 5’s is a sort of emptiness.... 5’s need a closed-off and protected sphere. They long for a fortress in which they won’t be watched and where they can think... most 5’s are introverts... by nature they are monks, hermits, ascetics, bookworms, librarians and technical sticklers. [9]

A stereotypical manager feels like a Type 3: The special talent of 3’s often cause them to radiate an ease and assurance that inspire confidence. This allows them to spread a good atmosphere around them. They have an easy time getting jobs done efficiently and competently, aiming for and achieving personal goals, as well as inspiring and motivating other people and making it possible for them to get ahead too ... Through their convincing charisma and the force of their arguments 3’s can gain great influence and bring the projects they believe in to success .... 3’s want to be winners and for that reason they go far .... Occupations in which 3’s go far are agents, salespeople, and managers. [9]

98

C. J. B. Macnab and S. Doctolero

Our basic assumption in this paper posits that these stereotypes have become so strong in the IT field that they tend to become the very definition of the expected social norms - which get made fun of in the popular Dilbert cartoon strip, for instance, with an extreme Green Type 5 Dilbert frustrated by an extreme Orange Type 3 pointyhaired boss [15]. Although people inside the software industry will generally view their colleagues a diverse group of people, outside observers generally perceive computer programmers as a personality type (especially as introverted, thoughtful) and office middle-managers as a particular personality type (especially as charming, success-driven). Consider the description of a typical intimate relationship between 3’s and 5’s captures the typical experience of technical-work-place dynamics: ...emphasis on work and competency can also lead 3’s and 5’s into conflicts and tensions with each other. A great deal of their self-esteem is also derived from their work and how it is regarded by others. 3’s and 5’s can get into more or less open contentiousness over who was the original source of ideas and work. There can be elements of comparing one’s work and contributions, claims about who is responsible for which ideas or breakthroughs, and other forms of competitiveness coming not only from 3’s but from 5’s. 3’s also tend to want to get on with the project, or with whatever they feel needs to be done, while 5’s tend to take a long time fine tuning and tinkering until they feel that they are adequately complete. Conflicts can erupt over use of time, resources, and priorities as the more practical minded 3 becomes increasingly impatient with the 5’s lengthy preparations but lack of action. 5’s may also begin to lose respect for the ethical standards of 3’s who they feel are ready to cut corners or exaggerate claims in order to accomplish goals or to stay ahead professionally ... Both types also tend to not speak directly about their feelings or misgivings about the relationship... 5’s can be too blunt and argumentative for 3 ... Both types can be arrogant and impatient with the other ... 3’s can seem shallow and dishonest to 5’s, while 5’s can seem weird and repulsive to 3’s. [16]

The classic portrayal of a manager-engineer relationship in the workplace occurs in the original Star Trek television series. Captain James T. Kirk often works with Chief Engineer Scott to sort out the latest technical glitch in the engine room. The typical conversation has Scotty telling Kirk how long it will take to fix the problem. Then Kirk yells at Scotty to fix it faster and/or provides a key insight into a technological solution (demands a deadline and a deliverable), but has little or no interest into the details of the technical challenge. Scotty always manages to fix it along the lines of imposed time frame or achieve the desired outcome, after which he get praised as a miracle worker by Kirk. Consider the following dialogues, the first from episode 7, “The Naked Time”: SCOTT: Captain. KIRK: What is it? SCOTT: He’s turned the engines off. Completely cold. It will take thirty minutes to regenerate them. UHURA [OC]: Ship’s outer skin is beginning to heat, Captain. Orbit plot shows we have about eight minutes left. KIRK: Scotty! SCOTT: I can’t change the laws of physics. I’ve got to have thirty minutes. SCOTT: Maybe twenty two, twenty three minutes. KIRK: Scotty, we’ve got six.

The Role of Unconscious Bias in Software Project Failures

99

SCOTT: Captain, you can’t mix matter and antimatter cold. We’d go up in the biggest explosion since... KIRK: We can balance our engines into a controlled implosion. SCOTT: That’s only a theory. It’s never been done. KIRK: Scotty, help. SPOCK: Stand by to intermix. I’ll call the formulae in from the Bridge. [17]

The second comes from episode 35, “The Doomsday Machine”: KIRK: Well, we just can’t stand around while our ship is being attacked. Scotty, you’ve got to get me some manoeuvering power. SCOTT: I can’t repair warp drive without a spacedock. KIRK: Then get me impulse power. SCOTT: Captain, the impulse engines’ control circuits are fused solid. KIRK [OC]: What about the warp drive control circuits? SCOTT: Aye, we can cross-connect the controls, but it’ll make the ship almost impossible for one man to handle. KIRK: You worry about your miracles, Scotty, I’ll worry about mine. Get to work. SCOTT: Aye, sir. [17]

We find it rather depressing how well the TV show captures the common pitfall fantasy of many IT managers. Thus, in one sense software project failures do not provide mystery - everyone has seen the dynamic at work in a fantasy on a popular television series - the mystery occurs in how to make people acknowledge their bias and change social norms around this issue. The key idea of biases from the quadrant system described in Sect. 2 predicts two main sources of bias in software development: the bias of IT developers in the Type 5 personality and the bias of the Type 3 managers. This paper only examines the Type 3 managers, and we leave analysis of the role of Type 5 biases in software failures for future work. Since managers hold the positions of power, our current social norms do not require them to provide evidence that they make unbiased decisions - on the contrary many will argue a paper like this must provide overwhelming proof of bias before anyone in authority should self-reflect. We do not agree; this paper simply aims to provide much stronger evidence that bias exists than those who argue the opposite.

4 Results In this section we assume validity of the the quadrant model and use it to investigate the bias of people searching for solutions to software project failures. In order to measure bias we perform text-analysis on suitable Google pages, assuming that more popular websites appear higher up in Google searches i.e. the first page of Google measures what people like to read the most. We also provide text-analysis

100

C. J. B. Macnab and S. Doctolero

of Bill Gates and Steve Jobs for comparison. The text analysis consists of coding each sentence of text into one of the four categories of manipulation, empowerment, logic, or emotion using the following definitions (all raw data with codings available for download at [18]). The particular wording of the definitions provides consistency with the mappings of Enneagram, modified Lowry Colors, and personality disorders. Definition of Emotion: Anything someone would name as an emotion or feeling, subjectively felt below the brain in the body. Includes use of extreme or colorful language obviously meant to evoke or convey emotional reactions and not meant to be taken literally. Definition of Logic: Deductive logic i.e. logic based on a framework of consistent beliefs (assumptions or axioms) whose validity has nothing to do with the truth of the underlying beliefs, in the tradition of formal Aristotelian logic and Boolean logic. Thus, two people could reach two different valid conclusions based on two different beliefs; if a belief was proved wrong, this would not imply the person was being illogical. Thus, it really includes any ways of analytic thought that do not take emotions into account and (in our coding) everything else that isn’t clearly emotion, empowerment, or manipulation i.e. seemingly “neutral phrases.” Definition of Manipulation: Gaining power or exercising power: Anytime someone tries to get someone else to do something, or tries to evoke an emotional reaction. This could take the form of an outright command, request, suggestion, or making a decision for someone else - or could involve subtle manipulation techniques. This could involve encouraging someone to collaborate/cooperate with other people. This includes creating rules for someone to follow and establishing limits/boundaries for them that do not necessarily apply to other people; but it also involves trying to make someone follow the (known) rules or established norms i.e. rules not applying in any kind of empowering way. In our coding; anytime someone analyzes or complains about the manipulations of others i.e. it seems like that one is on their mind, out of the four, even if they are not doing it themselves. Definition of Empowerment: Giving away power or avoiding power dynamics: Anytime someone tries to give something to a person that will allow them to operate independently, tries to give them tools for living, or wants to give someone else their space. It can involve giving up personal power and even letting others control them. This could include encouraging someone to do it on their own, giving someone something useful, facilitating someone’s thinking process to help them come up with a solution to their own problem, teaching someone useful information, following orders, or being by oneself. This can also be creating an empowering system of rules, procedures, and rights that apply to everyone and would allow someone to operate independently and safely. This includes criticizing someone for not being a good enough person or not living up to an arbitrary expectation. We code this anytime someone analyzes the empowerment of others i.e. it’s on their mind. Using our coding on Captain Kirk from the two Star Trek episodes mentioned previously, we see the character lies between Type 8 and Type 3 (Fig. 5). Although the coding indicates he might be an average Type 8 or healthy Type 3 (in the direction of the Type 6), the dialogue content makes Kirk come across as a very human-like (relatable) Type 3 and, moreover, his leadership style seems appreciated by others

The Role of Unconscious Bias in Software Project Failures

101

Fig. 5 Results: People Googling “Why software projects fail” are biased toward manipulation and tend to ignore the pages heavy on logic, as evidenced by popular first-page results (‘o’) and unpopular tenth page results (‘*’). They do not appreciate pages written in the personality style of Bill Gates (‘g’) and Steve Jobs (‘j’) which one would assume would be ideal in this industry, although they do appreciate pages written in the style of Steven Wolfram (‘w’). Note that dialogues between Captain Kirk (‘k’) and Chief Engineer Scott (‘s’) illustrate our bias model because they are Orange and Green Types, respectively

as typical with 3’s. The writers portray Scotty as a Type 5, but not very human-like because he does not express emotion in these two episodes. We now use these tools to investigate leadership bias; many think managers are responsible for the most numbers of failures, for example: Bad decisions by project managers are probably the single greatest cause of software failures today. Poor technical management, by contrast, can lead to technical errors, but those can generally be isolated and fixed. However, a bad project management decision can wreak havoc. [1]

We first look at exemplars of software success, Bill Gates, Steve Jobs, and Stephen Wolfram; then we examine whether people look for advice about preventing failure from people like them. We coded a speech and interview by Gates and Jobs [19–22], and and an article by Wolfram [23]. To give the reader an example we offer the first few paragraphs from Jobs’ speech (with our codings in brackets). Again, you can’t connect the dots looking forward; you can only connect them looking backwards. (logic) So you have to trust that the dots will somehow connect in your future. (empowerment) You have to trust in something - your gut, destiny, life, karma, whatever. (empowerment) This approach has never let me down, and it has made all the difference in my life. (empowerment) My second story is about love and loss. (emotion) I was lucky - I found what I loved to do early in life. (emotion) Woz and I started Apple in my parents garage when I was 20. (logic) We worked hard, and in 10 years Apple had grown from just the two of us in a garage into a $2 billion company with over 4000 employees. (manipulation) We had just released our finest creation - the Macintosh - a year earlier, and I had just turned 30. (logic) And then I got fired. (manipulation) [21]

102

C. J. B. Macnab and S. Doctolero

Both Gates and Jobs turn out to have Type 5 self-images in the Enneagram (Fig. 5). Bill Gates gets described as a Type 5 on many websites including [24, 25]. However, our results disagree with these previous webpages which describe Steve Jobs as a Type 7, although one Enneagram expert argues forcefully that Jobs was not [26]. Since Type 7 defines the direction of disintegration for a Type 5, we wonder if Jobs may simply lived as a highly-stressed individual and thus often came across to others as a Type 7. In the interview, he completely lacks the enthusiasm and idealism one would expect from a successful Type 7; rather he talks exactly like a typical Type 5 interested in computers: If you could interpret all those instructions 100 times faster than any other person in this cafe, you would appear to be a magician: You could run over and grab a milk shake and bring it back and set it on the table and snap your fingers, and I’d think you made the milk shake appear, because it was so fast relative to my perception. That’s exactly what a computer does. It takes these very, very simple-minded instructions-‘Go fetch a number, add it to this number, put the result there, perceive if it’s greater than this other number’–but executes them at a rate of, let’s say, 1,000,000 per second. That’s a simple explanation, and the point is that people really don’t have to understand how computers work. You don’t have to study physics to understand the laws of motion to drive a car. You don’t have to understand any of this stuff to use Macintosh–but you asked. [22]

He also displays almost no emotion in the interview, typical of a 5 but not of a 7. We feel confident in claiming he had a Type 5 self-image. Both Gates and Jobs lie slightly counter-clockwise to the Type 5 line, indicating they reached the level of healthy Type 5’s who have moved in their direction of integration toward the 8 - as we would expect of two highly successful people who would require great maturity and a broad range of people-skills to run their companies. Wolfram gets coded as a Type 8. This provides consistency with the reviews of his self-published book, for example [27], which overwhelmingly emphasized commenting on his perceived arrogance rather than the actual content - typical of people’s reaction toward an 8. We did a Google search of “Why software projects fail” and coded the sentences in the web-pages on the first page of Google results (the most popular pages) and the tenth page of results (the least popular, since by the 11th page of Google pages no longer have this as their main thesis topic). We assume Google ranks search results by popularity, rather than by any objective evaluation of the best advice i.e. it measures bias. We find the first-page results have a lot less logic bias (either have less logic or include more emotion) than the tenth-page results; moreover the number one hit on Google falls almost right on the Type 3 personality (Fig. 5). This shows consistency with the idea that IT managers searching on Google have a bias against reading (overly) logical solutions - and prefer writing with more balance including manipulation and emotion. Notably, they do not seek out writings in the style of (arguably) the two most successful IT people in history, Jobs and Gates. Ignoring the personality style of Jobs and Gates reveals a rather glaring unconscious bias, one particularly unhelpful in software development.

The Role of Unconscious Bias in Software Project Failures

103

4.1 Diagnosing Manager Bias in Software Failure In a search through the literature we found only a few papers addressing human factors in software-project failure [1, 28–31]. All of the modes of failure listed in these papers classify into one of four different categories: Category (A) Not listening to technical information from developers, Category (B) The belief in technology to solve problems, Category (C) Deceit, Category (D) Belief in a standard development process, and these appear linked to our posited unconscious bias in the thinking of software managers in the lower-right section of the quadrant. Category (A) Not listening to technical information In the literature on software failures one often reads conclusions like “Relatively inexperienced and/or powerless IT staff lacking clout among corporate decision makers” [28]. Our theory explains this dysfunctional dynamic with the idea managers do not listen (or do not like to listen) to their IT people explain details that they need to know to make good decisions. We call this an irrational fear of logic bias. Type 3’s view themselves as logical people, but the logic defines their hidden characteristic in our quadrant system. This means that when they see this behaviour reflected back to them from other people, they feel uncomfortable. In situations where the logic follows from a different set of assumptions or beliefs than the Type 3 holds themselves, the Type 3 will often have a “bad gut feeling” and judge the ideas as (a bit) crazy. Thus, the bias results in avoiding technical details, and trying to manage by giving orders: Seventy six percent of projects suffered from the two top failure factors: both a delivery date that the developers felt was impacted by the development process, and a project that was underestimated. [30]

Category (B) The belief in technology to solve problems In software failures, often managers thought they could succeed by giving orders (avoiding true two-way communication and collaboration) because they truly believed that IT people would solve all the problems. This fits in well with our idea of irrational hope in manipulation of the Type 3 manager, due to this being the visible characteristic in the lower-right quadrant: The second reason lies in FoxMeyer’s culture, which did not invite open communication and therefore information sharing. Even when IS employees believed that incorrect decisions were being made, they were not forthcoming. Management’s belief, as evidenced in statements from the project champion/sponsor that technology could solve any problem, may have deterred employees from presenting a realistic view to management, which, in turn, may have precluded FoxMeyer from addressing implementation issues more effectively. [31]

Essentially, managers feel that they give orders well and that developers program well - and if everyone does what they do well all will work out. “You worry about your miracles, Scotty, I’ll worry about mine.”

104

C. J. B. Macnab and S. Doctolero

Category (C) Deceit In software failures managers often hide the true status of the project: Frequently, IT project managers eager to get funded resort to a form of liar’s poker, overpromising what their project will do, how much it will cost, and when it will be completed. [1]

This may very well exist as part of the culture of the organization: ...the political climate in some organizations was such that no project manager could admit to a failed project. Even if none of the project benefits were realized project managers time and again assured us that their project was a success. The developers who were involved with many of these projects were much more likely to believe the project was a failure. [30]

This fits in well with the theory of the Enneagram, where Type 3’s experience deceit as their passion (or root sin [9]): The pressure to succeed leads to the root sin of the 3, which is deceit. While they don’t generally go around telling lies, they do embellish the truth and put the best face on everything. They create an image that looks good, can be sold, and can win. The person they deceive the most is their own self. They have often been so spoiled by success that in the end they believe everything they do is good and great. [32]

Category (D) Belief in a standard development processes In software failures managers often believed that software development can work in a generic, formulaic, replicable way. One often sees “a tendency to undervalue and under-employ critical organizational management capabilities to enable and support software projects in favour of formularized and often generic planning and process methodologies” [29]. This view has become highly-cited in the literature, for instance: Why write your own application for word processing or e-mail or, for that matter, supplychain management when you can buy a ready-made, state-of-the-art application for a fraction of the cost? ... When companies buy a generic application, they buy a generic process as well. [33]

which does not get supported by those who investigate software failures: ... information systems are by no means standard and are indeed complex undertakings that require good governance. Articles that suggest that management complacency about IT and IS need not be a concern but are a big mistake. These case studies reflect systems that have been catastrophic or near catastrophic to their organizations as a result (largely) of management complacency. [28] The IT debacle that brought down FoxMeyer Drug a year earlier also stemmed from adopting a state-of-the-art resource-planning system and then pushing it beyond what it could feasibly do. [1] The case illustrates that this capability dimension of risk management is essential to handle unforeseen threats that suddenly arise and cannot be responded to through planning-based responses... consistent with a narrow definition of risk, they provide no generic response options for unforeseen threats. [29]

The Role of Unconscious Bias in Software Project Failures

105

Our theory explains the mechanism for this as the exact same one as in Category (B); in this case the “technology” now consists of processes and procedures [34] rather than computer hardware and software, and now the mid-level managers take the role of the presumed miracle workers. Top-level mangers think they can just do what they feel good at (giving broad orders like delivery deadlines and deliverables) and mid-level managers will just perform their skilled function (implementing and overseeing standard software development processes) and all will work work out well. “You worry about your miracles, Scotty, I’ll worry about mine.” Thus, the toplevel managers demonstrate the bias of an irrational hope in manipulation (applied to their subordinate managers).

5 Recommendations 5.1 Analysis and Modification of SAFe One method of addressing bias involves creating or modifying frameworks and processes so that sources of bias get specifically addressed. We give an example here of how to do so with a popular framework. Scaled Agile, Inc. suggests a development framework for overcoming many of the traditional problems found in the software industry; the Scaled Agile Framework (SAFe) incorporates lean and agile development with systems thinking [35]. It’s one of the most integrated process in the software industry [36] and has been widely adopted. Here we examine SAFe in light of the biases revealed in this paper, and suggest modifications based on our insights. As for nomenclature, on the vendor side we talk about developers and managers, and on the client side we talk about client-managers (purchasing the software) and client-IT (whether in-house or consultants, expected to have continuous on-site communications/relationships with client-managers and end-users). SAFe divides the project into discrete units, typically 2-week or month-long sprints. At the transition between sprints, client-management and developers meet up (and management only if needed). The meeting starts with demos and retrospectives. Then planning occurs for the next sprint. The development team reports on the project status to management after the meeting, and starts the next sprint. During the sprint the development team holds daily (stand-up) meetings and the clientmanager may not participate, other than to answer questions. Furthermore, since the development team follows self-organizing and self-managing strategies, managers are not supposed to alter the flow of the sprint. At the end of a sprint, people can discard or adjust any tasks that do not align with the client-manager’s request (in later sprints). Larger and longer meetings occur when developers, management, and client-managers all come together in one room for a day or two to engage in program increment (PI) planning, which looks 4–6 sprints ahead. This approach should ensure continuous, open communication with the three parties (developers, management, and client-managers) with the ability to quickly get off a non-productive path.

106

C. J. B. Macnab and S. Doctolero

SAFe tries to remedy the main problem of imposing both a delivery deadline and a too-large project scope using the PI meetings. The PI planning event attempts to match demand with capacity so that, depending on what the current capacity of the development team is, some of the demand by the client-manager will not get done in the next increment and will have be either dropped or pushed over to the next increment i.e. decision(s) on whether deadline or scope defines the main objective for each item. SAFe tries to remedy the problem of top-level managers trusting in processes (for their mid-level managers to follow) by creating a framework rather than a process. The retrospectives at the end of each sprint and at each PI reflect on the suitability of the current process, and can modify the process, change processes, or even develop a new process considered more suitable for the current project. Having managers and developers in the same room during extensive PI meetings addresses three modes of failure: Type 3 stereotype will have difficulty ignoring technical information in such an intense and inter-personal situation (Category A), the managers get an appreciation of what the development team can feasibly do (Category B), and Type 3 stereotype will find it hard to engage in deceit with the Type 5 stereotypes present (Category C). In light of the results of our paper, we offer suggestions for how to improve SAFe in order to better address unconscious bias. Firstly, a good framework needs to better address the biases that can occur on the client side. Specifically, we need to define clear roles for both client-manager and client-IT, with the client signing on to the arrangement. Thus, the client-managers will then have to listen to their own IT people, much as the managers have to listen to developers in the SAFe framework. This should better prevent client-managers asking for unsuitable deliverables which the vendor does not know are unsuitable, and gives managers a better idea of realistic deadlines. The second improvement can come from better addressing the reluctance of managers to listen to developers. We suggest replacing the report on the status of the project (at the end of each sprint) with a request for resources made by the developers for upcoming sprints - using a software analogy we would call this active publishing rather than polling. In this way, the manager never reads technical information for the sake of technical information; rather the manager receives requests i.e. a language they can relate to and understand, and even enjoy responding to. The requests could include things like modifying the process, extending deadlines, reducing scope, increasing staff, increasing/decreasing hours, changing working conditions, etc. The requests address any and all items not going according to plan - any item not addressed by a request should be assumed to be exactly on track according to the plan, and thus just requires a “checkmark.” The necessary technical information will still appear in the email or document, but qualify as supporting information for the request rather than standing alone i.e. presented in a way that Type 3 culture managers would like to read.

The Role of Unconscious Bias in Software Project Failures

107

5.2 Seeking Balance The quadrant graph strongly suggests that one can reduce bias by seeking a better balance of the four characteristics; this means trying to include at least some of each characteristic, not literally trying to balance them equally. In the hiring process this means making sure not all managers are Types 3 and that there are some non-Type5 developers. For better balance within each individual, companies can promote progress policies (mental-health programs, work-life balance, fun culture, creative benefits, etc.) that enable or encourage self-growth; yet technology companies still often have strong social norms against expressing emotions in particular In a world that values facts, science, and measuring progress through objective means, we have cut off a large part of our natural wisdom in the form of emotions. Emotions are hard. They are messy. Some of them do not feel good and there is a value placed on happiness, engaging us in an internal struggle with the deeper darker feelings and whether they are acceptable to even have, nevermind sit with or share with others. Emotions are the wisdom of the body. [37]

Many people use the creative process to assist in emotional processing, for instance through the medium of art therapy [38]. Such techniques can be very simple and straightforward, yet highly effective. For example, the theory of transformative learning for leadership using creative ritual from [39] (as taught in workshops [40]) involves: • Creatively exploring vulnerability when feeling fear in an effort to build the leadership capacity of courage, • Creatively exploring mourning when feeling grief in an effort to build the leadership capacity of compassion, • Creatively exploring accountability when feeling shame in an effort to build the leadership capacities of humility and dignity, • Creatively exploring conflict when feeling anger in an effort to build the leadership capacity of fierceness. Since the process of creatively exploring emotions can simply involve doing a doodle at one’s desk for ten minutes once a day without anyone having to know about it, this technique has no risk and no drawbacks. One difficulty stems from the fact that many “tech” workers have cultural backgrounds that pressure them to suppress their emotions and never do anything about them (e.g. Northern-European and its influence on the United States, as well as many East-Asian societies). Cultural effects get reinforced by the internal bias against (disinterest in) emotions for Green and Orange, as well as the reflected bias against other people showing emotions for Gold personalities (and these three quadrants describe the vast majority of people working in the “tech” sector).

108

C. J. B. Macnab and S. Doctolero

6 Conclusions This paper uses a predictive model of human personalities to identify the biases of software managers, and how those bias contribute to the continuing trend of softwareproject failures in spite of the high competency level of said managers. Our analysis of the personalities of Bill Gates and Steve Jobs shows consistency with our theory, as do the biases revealed by a Google search of “why software projects fail.” We show how these themes also appear in popular-cultural references, implying we all “know” these biases exists even if we do not like to talk about them or address them in the work place. Does our desire to stay in our comfort zone justify putting up with high rates of project disappointments and failures? We suggest, instead, naming the known biases explicitly when designing development frameworks - and we give an example of how to do so with the popular SAFe method. We have been rather fortunate so far in that the largest software failures have “merely” bankrupted companies and cost governments billions, but failure to address this issue may inevitably lead to great human tragedy when a critical piece of software infrastructure fails. Acknowledgements The first author wishes to thank the following people for engaging in discussions, giving insights, and providing support: Catherine Ann Lewis, Kelly Osgood, Monica Tomlinson, Robyn Mae Paul, Alexandra Tabler, Doug Schroeder, Martin Blackwell, Rachael L’Orsa, Carrie Scherzer, Lia Serpentini, and Katherine Kuchenbecker.

Appendix A: Enneagram Mapping Here we use the descriptions of the Enneagram types as described in [13], and to make the quadrant mapping we posit that people feel proud when motivated by the visible characteristic, and ashamed when motivated by the hidden characteristic in their quadrant. In the Orange (driving) quadrant: • Enneagram Type 31 - I am successful because I am (proudly) manipulative, • Enneagram Type 8 - I am strong because I am (ashamed to be) logical. Type 3’s are proudly manipulative because they take pride in being charming and wooing others using (what they see as) their fake personality. Type 8’s are ashamed of being logical because they don’t think logic will sway anyone; they come up with logical plans in their head but convince people to follow them using other methods and keep the original logic hidden because they would much rather people thought they were just making courageous decisions from the gut and the heart. Type 8’s manipulate by giving out-and-out commands, orders, instructions, etc. and they do not seem subtle at all. In the Green (thinking) quadrant: • Enneagram Type 5 - I see through because I am (proudly) logical, 1 Alternate

names: (1) Advocate, (2) Nurturer, (3) Producer, (4) Elitist, (5) Sage, (6) Rebel, (7) Optimist, (8) Intimidator, (9) Abdicator (https://enneagramawakeningschool.com).

The Role of Unconscious Bias in Software Project Failures

109

• Enneagram Type 6 - I do my duty because I am (ashamed to be) empowering. Type 5’s are the self-proclaimed intellectuals of the world and love to use logic to explain everything and tell everyone all about it. Note 6’s don’t like to admit that they do their duty only because they really like someone telling them what to do - they would rather people think they are a ‘good’ loyal person. Types 5 and 6 empower by giving away their personal power and following orders, being easy-going, etc. Type 5 loves to brag about their logic, but Type 6 also thinks logically as evidenced by the fact they love to argue. In the Gold (planning) quadrant: • Enneagram Type 7 - I am happy because I (proudly) empower others, • Enneagram Type 1 - I am right because I am (ashamed to be) angry. Type 7 empowers others by inspiring, encouraging, motivating with energy and enthusiasm. Type 1 feels shame to be angry because anger invokes both their guilt/shame and their avoidance - they would much rather people think they think they are right because they have perfected what they are doing and have the perfect system in place. Type 1 loves to plan and Type 7 loves to scheme, both trying to come up with beautiful systems that make other people feel empowered - i.e. creating an ideal, utopian, perfect system or community. Blue (relationship) quadrant: • Enneagram Type 4 - I am different because I am (proudly) emotional, • Enneagram Type 2 - I help because I am (ashamed to be) manipulative. Type 4 feels proud to have emotions because that makes them the true Artist. Type 2 would rather everyone think they help because they are such a ‘good’ person and would feel mortified if people saw it as a manipulation strategy - but they hope to get a positive response back from the person being helped (2’s are not anonymous donors on a website). Type 4 often manipulates people in very indirect ways, like suffering to get sympathy, cutting off communication to invite pursuit, etc. Now we provide an additional mapping technique, justifying the order of the types in the new configuration. We notice that the visible characteristic of the quadrant lies in the direction one wants to go according to the descriptions in [13] i.e. what we call the direction of mental comfort. Thus, everyone wants to go counter-clockwise in our new Enneagram configuration. Below, we rephrase the explanations from [13] for the sake of brevity as to why each type wants to go in this direction, and does not want to go in the opposite direction. The types 5-8-2-4-1-7 want to go counter-clockwise because: • The thinker 52 wants to stop being trapped thinking in their heads and learn to truly help people by implementing their plans as an action-oriented 8, • The justice-seeking 8 wants to stop angering people during their crusades and learn to help in way they get appreciated like a 2, • The helper 2 wants to get out of the trap of helping people who don’t want to be helped and learn to truly benefit people by just being present, sharing emotions, and authenticity like 4, 2 Alternate

names: (1) Critic, (2) Caretaker, (3) Motivator, (4) Artist, (5) Thinker, (6) Guardian, (7) Entertainer, (8) Maverick, (9) Accommodator (https://enneagramawakeningschool.com).

110

C. J. B. Macnab and S. Doctolero

• The artist 4 wants to stop being trapped in emotional self-reflection and learn to truly help people by implementing their emotions in helpful plans for the world as a reformer 1, • The reformer 1 wants to stop being trapped in an angry, righteous state about the world and learn to reform it in a fun way as an enthusiast 7, • The fun-loving 7 wants to stop being trapped in endless scheming and learn to think things through logically as a 5. These are the same as the direction of integration for the 5-8-2-4-1-7, so they feel encouraged to become healthy. The types 5-8-2-4-1-7 do not want to go clockwise because: • • • •

The dignified 5 does not want to become an outrageous 7, The action-oriented 8 does not want to become an “egghead” 5, The helper 2 does not want to become a blunt and scary 8, The respectful give-you-your-space 4 does not want to become an annoying overhelper 2, • The planner 1 does not want to become a scattered 4, • The fun 7 does not want to become the serious 1. These are the same same as the direction of disintegration for the 5-8-2-4-1-7, so they feel discouraged to become unhealthy. The 6-3 wants to go counter-clockwise (we do not analyze Type 9) because: • The successful, high-intensity 3 would just like to relax and leave the stress of work behind as a coddled 9, • The loser 6 (who always gives away their power to other people) would like to become a winner 3. Note ‘loser’ and ‘winner’ are self-images, not truths. These go against the direction of integration for the 9-6-3, so they feel discouraged to become healthy. We omit a full discussion of why healthy and desired directions oppose each other for this triad and how it makes their journey through mental health different than the other numbers, but the authors discuss this at some length in [13]. The 6–3 does not want to go clockwise because: • The Protestant-work-ethic 6 does not want to become a lazy 9, • The winner 3 does not want to become a loser 6. Although the above descriptions of the Enneagram are known, here we offer an explanation by pointing out that the visible characteristic of the personality disorder in the same quadrant points in the direction the person with the personality wants to go in the direction of mental comfort (Fig. 4). The hidden characteristic in the quadrant lies in the direction the person does not want to go. Although the person cannot (usually) become the personality type in their desired direction, they can take on more of the visible characteristic and feel like they get closer to the desired type. Likewise, the person can try to reject the hidden characteristic and try to move away

The Role of Unconscious Bias in Software Project Failures

111

from the undesired type. In terms of personality disorders, we hypothesize that the more suffering they experience the more they want to become the desired type and avoid the undesired type; thus the characteristic to the counter-clockwise direction becomes very visible as they try to embrace it, and the clockwise one becomes hidden to the outside world as they try to suppress it.

Appendix B: Data Consistent with the Model In an attempt to provide the reader with some validation our quadrant theory of describing personality disorders and Enneagram types within the limited space of this paper, we looked at top-10 search results in Google to examine people’s attitudes and biases; the top-10 results indicate the most popular websites, providing us with attitudes that people resonate. We call this the “responder’s” attitude when the person responds to an Enneagram personality type or personality disorder i.e. the personality that is most popular on the internet for giving advice in a given situation. We counted the number of sentences that inferred dealing with emotions was important, or inferred that everyone else thinks that dealing with an emotion is important, and tallied them under “Emotions” in Tables 2 and 3 - and did the same for the other three characteristics as per the definitions in Sect. 4 (all raw data with codings available

Table 2 Google’s advice for “Dealing with a ...” personality disorders Google search Emo. Log. Emp. Man. Emo. term Avoidant Histrionic Narcissist Schizoid

0 0 14 0

6 7 2 3

6 8 11 5

7 5 13 12

Dependent Bordeline Antisocial Schizotypal

2 13 5 0

Log.

Emp.

Man.

16 9 1 3

14 20 1 3

15 11 6 10

Table 3 Google’s advice for dealing with extreme Enneagram types (Type) Google search term

Emotions

Logics

Empowerments Manipulations

(7) “Dealing with a drama queen”

0

14

6

(1) “Dealing with a perfectionist”

1

14

7

6

(4) “Living with an artist”

0

1

11

0

(2) “Dealing with over-helpers”

3

0

2

7

0

(3) “Dealing with the boss”

13

9

16

4

(8) “Living with a type-A”

13

3

11

5

(5) “Dealing with a geek”

0

5

7

13

(6) “Dealing with a bureaucrat”

0

5

1

13

112

C. J. B. Macnab and S. Doctolero

for download at [18]). Unlike with Bill Gates and Steve Jobs texts, only peices of advice were coded rather than literally every sentence on the web page. We found that dealing with the personality disorders had a very large manipulation bias, so we subtracted the minimum amount of manipulation from each manipulation score, for each of the personality disorder tables before graphing the data. Note the responders usually consists of professionals in the websites that address personality disorders, whereas the large majority of websites responding to Enneagram types come from untrained or not-highly-trained people. If the personality disorders lie in the same quadrants as corresponding Enneagram personality types as we have claimed, then the responders’ average personality should be clustered for each quadrant. The personalities of the average responder indeed show a cluster in the direction for Enneagram types in Figs. 6, 7, 8, 9, 10, 11, 12, 13 and 14. To see this first locate the asterisk ‘*’ on a graph; that indicates how people on the internet like to treat someone with this Enneagram type i.e. their personality as they interact with the Enneagram type. Then notice the circle ‘o’ and ‘x’ indicate how people on the internet like to treat the inner and outer personality disorders that we have predicted to be in that quadrant respectively i.e. their personality as they interact with the other person with a personality disorder. If all theses symbols are in the same direction on the graph, it means people treat the Enneagram type and the personality disorders similarity which is consistent with our theory. Note that Paranoid personality disorder lies close to Type 9, meaning it should be placed above Type 9 in the quadrant graph. Some further evidence comes in the form of developing computer models of human personalities based on our proposed quadrant that successfully match human data, specifically predicting the distribution of Myers-Briggs conflict types for each Enneagram type [8]. Mapping Merrill’s C.A.P.S. theory of work-place personalities [41] also provides supporting evidence for the placement of personalities in the quadrant sys-

Fig. 6 Type 7 ‘*’, Avoidant ‘o’, Dependent ‘x’

Type7

20

emotions 10

empowerments

manipulations

0 o -10 x -20 -20

Fig. 7 Type 1 ‘*’, Avoidant ‘o’, Dependent ‘x’

-15

-10

*

-5

logics 0

5

10

15

20

Type1

20

emotions 10

empowerments

manipulations

0 o -10 x -20 -20

-15

-10

-5

* logics 0

5

10

15

20

The Role of Unconscious Bias in Software Project Failures Fig. 8 Type 9 ‘*’, Paranoid ‘o’ (no outer type)

113 Type9

20

emotions 10

empowerments

manipulations

0 o * -10 logics -20 -20

Fig. 9 Type 4 ‘*’, Histrionic ‘o’, Borderline ‘x’

-15

-10

-5

0

5

10

15

20

Type4

20

emotions 10

empowerments x

manipulations

0

* o

-10 logics -20 -20

Fig. 10 Type 2 ‘*’, Histrionic ‘o’, Borderline ‘x’

-15

-10

-5

0

5

10

15

20

Type2

20

emotions 10

empowerments x

manipulations

0

* o

-10 logics -20 -20

Fig. 11 Type 3 ‘*’, Narcissist ‘o’, Antisocial ‘x’

-15

-10

-5

0

5

10

15

20

Type3

20

emotions 10

o empowerments

manipulations *

x

0 -10 logics -20 -20

Fig. 12 Type 8 ‘*’, Narcissist ‘o’, Antisocial ‘x’

-15

-10

-5

0

5

10

15

20

Type8

20

emotions 10

o

*

empowerments

manipulations x

0 -10 logics -20 -20

-15

-10

-5

0

5

10

15

20

114 Fig. 13 Type 5 ‘*’, Schizoid ‘o’, Schizotypal ‘x’

C. J. B. Macnab and S. Doctolero Type5

20

emotions 10

empowerments

manipulations

0 x o

*

-10 logics -20 -20

Fig. 14 Type 6 ‘*’, Schizoid ‘o’, Schizotypal ‘x’

-15

-10

-5

0

5

10

15

20

Type6

20

emotions 10

empowerments

manipulations

0 x o

*

-10 logics -20 -20

-15

-10

-5

0

5

10

15

20

tem. C.A.P.S also maps to three categories in the Five Factor Model (FFM), the most widely cited personality descriptor in the psychology literature [42]. However, the FFM uses judgemental categories, making it completely inappropriate for selfreflection or for use in industry; in contrast the four Merrill types use neutral language and (with our suggestions in brackets) they are: Controllers (Crusaders Type 1 and 8), Analyzers (Dreamers Types 5 and 4), Promoters (Inspirers Types 7 and 3), and Supporters (Doers Types 6 and 2). We suggest the descriptions influencing and reflective in place of Merrill’s dominating and easy-going, respectively. We emphasize the Merrill type descriptions below indicate how the person is perceived at work. Since the two Enneagram types in each Merrill type are opposites on the quadrant, we assume it is the unconscious that makes the two types get perceived similarly i.e. the unconscious is opposite to the conscious self image. Controllers (Crusaders) Types 1 and 8: • perceived as dominating and influencing; encourage action a lot; often seen as leaders, • low in agreeableness and low in openness in the FFM, • often perceived as formal; often under-appreciated. Analyzers (Dreamers) Types 5 and 4: • perceived as easy-going and reflective; seen as thoughtful and creative; often seen as followers, • high in agreeableness and high in openness in the FFM, • often perceived as formal; often under-appreciated. Promoters (Inspirers) Types 7 and 3: • perceived as dominating and influencing; seen as building and visionary • low in conscientiousness and low in openness in the FFM,

The Role of Unconscious Bias in Software Project Failures

115

• often seen as leaders, • usually perceived as informal; often over-appreciated. Supporters (Doers) Types 6 and 2: • perceived as easy-going and reflective; seen as dutiful and helpful; often seen as followers, • high in conscientiousness and high in openness in the FFM, • usually perceived as informal; often over-appreciated.

References 1. Charette, R. N. (2005). Why software fails. IEEE Spectrum, 42(9), 42–49. 2. Hughes, D. L., Rana, N. P., & Simintiras, A. C. (2017). The changing landscape of is project failure: An examination of the key factors. Journal of Enterprise Information Management, 30(1), 142–165. 3. Consulting, T. (2016). What can Target Canada’s ERP failure teach us? http://www.tgo.ca/ what-can-target-canadas-sap-failure-teach-us/. Accessed January 5, 2019 (Online). 4. CBC News. (2018). Price tag for fixing phoenix pay system now tops original cost. https:// www.cbc.ca/news/canada/ottawa/phoenix-pay-update-may-24-1.4129049. Accessed January 5, 2019 (Online). 5. Ogheneovo, E. E. (2014). Software dysfunction: Why do software fail? Journal of Computer and Communications, 2(06), 25. 6. Fox, D. (2014). The clinician’s guide to the diagnosis and treatment of personality disorders. PESI Publishing and Media. 7. Lowry, D. (1989). True colors: Keys to successful teaching. True Colors. 8. Macnab, C. (2019) Developing a computer model of human personalities suitable for robotics and control applications. In: IEEE International Conference on Systems, Man, and Cybernetics. Bari, Italy: IEEE. (Submitted). 9. Rohr, R., & Ebert, A. (1992). Discovering The Enneagram: An ancient tool a new spiritual journey. Crossroad (1992). 10. Ichazo, O. (1982). Interviews With Oscar Ichazo. Oscar Ichazo Fndt. 11. Naranjo, C. (1990). Ennea-type structures: Self-analysis for the seeker. Gateways/IDHHB Incorporated. 12. Naranjo, C. (1994). Character and neurosis: An integrative view. Gateways/IDHHB Nevada City. 13. Riso, D. R., & Hudson, R. (1996). Personality Types: Using the Enneagram for Self-Discovery. Mariner Books. 14. Talbott, J. A comparison of the unhealthy expressions of the Enneagram types and the personality disorders of the DSM-IV-TR. Liberty University. https://portfolio.du.edu/downloadItem/ 287388. 15. Adams, S. (2018). Cubicles that make you envy the dead. Kansas City: Andrews McMeel Publishing. 16. The Enneagram Institute. (2017). Enneagram type three (the achiever) with Enneagram type five (the investigator). https://www.enneagraminstitute.com/relationship-type-3-with-type-5/. Accessed December 20, 2018 (Online). 17. Spinrad, N. (1967). The doomsday machine. http://www.chakoteya.net/StarTrek/35.htm. Accessed January 4, 2019 (Online). 18. Macnab, C., & Doctolero, S. The role of unconscious bias in software project failures (dataset). Mendeley Data, V1 https://doi.org/10.17632/55tk4zksrv.1.

116

C. J. B. Macnab and S. Doctolero

19. Allison, D. (1993) Transcript of a video history interview with Mr. William “Bill” Gates. http:// americanhistory.si.edu/comphist/gates.htm. Accessed December 20, 2018 (Online). 20. Gates, B.: The turning point; our first trip to Africa. https://www.gatesnotes.com/About-BillGates/The-Turning-Point-Our-First-Trip-to-Africa. Accessed December 2, 2018 (Online). 21. Jobs, S. (2005). Steve Job’s stanford commencement address. http://www.applematters.com/ article/steve_jobs_standford_commencement_address/. Accessed December 13, 2018 (Online). 22. Sheff, D. (1985). Playboy interview: Steven jobs. https://allaboutstevejobs.com/verbatim/ interviews/playboy_1985. Accessed December 10, 2018 (Online). 23. Wolfram, S. (2017, August 2). High-school summer camp: A two-week path to computational thinking. https://blog.stephenwolfram.com/2017/08/high-school-summer-camp-atwo-week-path-to-computation-al-think-ing/. Accessed December 19, 2018 (Online). 24. The Enneagram in Business. (2010). Famous enneagram fives: Georgia O’Keefe and Bill Gates. http://theenneagraminbusiness.com/9-types/famous-enneagram-fives-georgia-okeefeand-bill-gates/. Accessed January 2, 2019 (Online). 25. Condon, T.: Enneagram styles of famous people. https://www.thechangeworks.com/images/ Famous.pdf. Accessed December 3, 2018 (Online). 26. The Enneagram in Business. (2010). Was Steve Jobs really a seven? (Part 1). http:// theenneagraminbusiness.com/special-people/was-steve-jobs-really-a-seven-part-1/. Accessed January 2, 2019 (Online). 27. Lavers, C. (2002, August 3). How the cheetah got his spots: Chris lavers weighs Stephen Wolfram’s explanation of nature’s complexity, a new kind of science. https://www.theguardian. com/books/2002/aug/03/featuresreviews.guardianreview2. Accessed February 4, 2019 (Online). 28. Avison, D., Gregor, S., & Wilson, D. (2006). Managerial IT unconsciousness. Communications of the ACM, 49(7), 88–93. 29. Bannerman, P. L. (2008). Risk and risk management in software projects: A reassessment. Journal of Systems and Software, 81(12), 2118–2133. 30. Cerpa, N., & Verner, J. M. (2009). Why did your project fail? Communications of the ACM, 52(12), 130–134. 31. Scott, J. E., & Vessey, I. (2002). Managing risks in enterprise systems implementations. Communications of the ACM, 45(4), 74–81. 32. Rohr, R. (2019). Type THREE: The need to succeed. https://cac.org/type-three-need-succeed2016-04-29/. Accessed January 4, 2018 (Online). 33. Carr, N. G. (2003). IT doesn’t matter. Educause Review, 38, 24–38. 34. Ursula, F. (1989). Real world of technology. New York: Dover. 35. Knaster, R. (2019). SAFe 4.5 distilled: Applying the Scaled Agile Framework for Lean enterprises. 36. Inc., V. (2018). 12th annual state of Agile report. https://explore.versionone.com/state-of-agile/ versionone-12th-annual-state-of-agile-report. Accessed January 16, 2019 (Online). 37. Lewis, C. A. (2018). Understanding and freeing the self (Master’s thesis). Naropa University (Unpublished). 38. Stuckey, H. L., & Nobel, J. (2010). The connection between art, healing, and public health: A review of current literature. American Journal of Public Health, 100(2), 254–263. 39. Omer, A. (2005). The spacious center: Leadership and the creative transformation of culture. Shift: At the Frontiers of Consciousness, 6, 30–33. 40. Nicolson, M. (2018). The edge of courage. http://t12n.com/. Accessed December 13, 2018 (Online). 41. Merrill, D. W., & Reid, R. H. (1981). Personal styles & effective performance. Boca Raton: CRC Press. 42. Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. Annual Review of Psychology, 41(1), 417–440.

Analysis of Missing Data Using Matrix-Characterized Approximations Thin Thin Soe and Myat Myat Min

Abstract Nowadays, the veracity related to data quality such as incomplete, inconsistent, vague or noisy data creates a major challenge to data mining and data analysis. Rough set theory presents a special tool for handling the incomplete and imprecise data in information systems. In this paper, rough set based matrixrepresented approximations are presented to compute lower and upper approximations. The induced approximations are conducted as inputs for data analysis method, LERS (Learning from Examples based on Rough Set) used with LEM2 (Learning from Examples Module, Version2) rule induction algorithm. Analyzes are performed on missing datasets with “do not care” conditions and missing datasets with lost values. In addition, experiments on missing datasets with different missing percent by using different thresholds are also provided. The experimental results show that the system outperforms when missing data are characterized as “do not care” conditions than represented as lost values. Keywords Rough set · Incomplete data · Missing values · Matrix-represented approximations · “Do not care” conditions · Lost values

1 Introduction Available knowledge about the real world is inherently uncertain, and decisions have been usually made based on incomplete and partially imprecise data. The incomplete data means that data in which some features are missing from its training set affects the learning quality of the classifiers. Since rough set theory (RST) is a special tool for handling the imprecise and incomplete data in information systems, many T. T. Soe (B) Web Mining Lab, University of Computer Studies, Mandalay, Myanmar e-mail: [email protected] M. M. Min Faculty of Computer Science, University of Computer Studies, Mandalay, Myanmar e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_7

117

118

T. T. Soe and M. M. Min

researchers have presented rough set based methods for handling incomplete data [1–6]. A sequential matrix-based algorithm (SMA) which calculates lower and upper approximations of incomplete information systems is introduced in [7]. However, they did not mention about handling missing datasets with different missing percent. The speeding up incomplete data analysis system using matrix-represented approximations is proposed in [8]. This system enables to handle missing data within an acceptable time and speedup than the traditional method. In which, missing datasets with different missing percent are examined by different thresholds. Also, the performance comparison between the traditional rough set and the system is conducted. The two types of missing attribute values: “do not care” conditions and lost values are also depicted. However, the performance comparison between the two types of missing values was not mentioned specifically. Therefore, in this paper, we contribute that; • A bunch of experiments on five missing datasets wherein missing values are represented as “do not care” conditions by using different thresholds • A set of analyses on five missing datasets where missing values are represented as lost values by using different thresholds • The performance comparison between two characterizations of missing values: “do not care” conditions and lost values. The rest of the paper is arranged as follows. Section 2 expresses the existing methods. The basic concept of incomplete data analysis with matrix-represented approximations and the case study of the system are presented in Sect. 3. Experimental results with two types of missing values and performance comparison are discussed in Sect. 4. Finally, the paper ends with conclusions in Sect. 5.

2 Related Works Rough set based characteristic relation for incomplete decision tables, was introduced by Grzymala [1]. According to [2–4, 6], there were three main characterizations of missing attribute values: “do not care” conditions, lost values, and attribute-concept values. Rough set approach to missing attribute values with “do not care” conditions was proposed in [3, 4]. In this approach, each missing attribute value was replaced by all possible values of that attribute. Missing attribute values represented as lost values (i.e. the original value was erased) was presented in [6]. Another approach with attribute-concept values was described in [2]. The matrix characterizations of the lower and upper approximations in set-valued information systems and two incremental approaches for updating the relation matrix were introduced in [9]. Set-valued ordered information systems are generalized models of single-valued information systems and can be presented into two types: disjunctive and conjunctive systems [9, 10]. Authors constructed a relation matrix based on the tolerance relation. Using the relation matrix, authors expressed a basic vector H(X) and four cut matrices of H(X) to acquire the approximations, positive,

Analysis of Missing Data Using Matrix-Characterized …

119

boundary and negative regions. Then, authors presented approaches for updating the approximations incrementally through the variation of the relation matrix. Authors proposed a sequential matrix-based algorithm (SMA) and three parallel methods based on MapReduce to calculate approximations in incomplete decision tables [7]. SMA is a sequential matrix-based algorithm which computes the approximations of the decision, the positive region, the negative region, and the boundary region. In [8], the speeding up incomplete data analysis system using matrix-represented approximations was proposed. Moreover, by different thresholds, a set of experiments on datasets with different missing percent was implemented. Also, the performance comparison between the traditional rough set and the system was presented.

3 Missing Data Analysis with Matrix-Represented Approximations The characteristic sets for the incomplete decision table are calculated initially. Based on the resulting characteristic sets, matrix-represented lower and upper approximations are generated. The detailed descriptions of these steps are presented in this section. The induced lower and upper approximations are used as inputs for data analysis method, LERS (Learning from Examples based on Rough Set) used with LEM2 (Learning from Examples Module, Version2) rule induction algorithm [11, 12].

3.1 Missing Data In RST, a decision table is exploited to describe an information system [1, 2, 13]. Each row of the decision table corresponds to a case and columns stand for attributes (a finite set of condition attributes and a decision attribute). The set of all cases and the set of all attributes are presented by U and A respectively. Then the value of an attribute ‘a’ for a case ‘c’ is specified as a(c). A decision table is incomplete when there are some missing attribute values. In this paper, two main types of missing values: “do not care” conditions ‘*’ and lost values ‘?’ are presented. Table 1 shows a sample missing dataset with lost values ‘?’; in which all missing values can be represented as lost values ‘?’, or can be represented as “do not care” conditions ‘*’. The attribute information of Table 1 is depicted in Table 2. The complete information of the attribute values is depicted in [14].

s

s

s

s

s

s

s

s

s

s

s

s

x

10

f

?

9

?

x

8

s

x

7

s

x

6

s

b

5

s

?

4

s

b

3

s

x

2

Stalksurfacebelow-ring

x

1

Stalksurfaceabove-ring

Cap-shape

Case

?

?

y

s

y

y

f

s

?

y

w

w

w

w

w

w

w

?

w

w

Stalkcolorabove-ring

Cap-surface

n

?

w

?

?

w

n

?

y

y

t

t

t

t

t

t

f

t

t

t

w

w

w

w

w

w

w

w

w

w

p

p

?

p

p

p

p

p

p

p

Veiltype

Bruises

Stalkcolorbelow-ring

Capcolor

Table 1 Sample missing dataset with lost values ‘?’

p

?

p

p

p

a

?

a

?

l

w

w

w

w

w

w

w

w

w

w

Veilcolor

Odor

f

f

f

f

f

f

?

f

f

f

o

o

o

o

o

o

o

o

o

o

Ringnumber

Gillattachment

c

c

c

c

c

c

w

c

c

c

p

p

p

p

p

p

e

p

p

p

Ringtype

Gill-spacing

n

n

n

k

n

n

k

n

k

n

Sporeprint-color

n

?

n

n

n

b

b

b

b

b

Gill-size

v

?

s

s

v

n

a

s

s

n

Population

n

?

n

n

?

w

n

w

?

g

Gillcolor

e

e

e

e

e

e

t

e

e

e

g

?

u

g

u

m

g

g

?

g

Habitat

Stalk-shape

p

p

p

p

p

e

e

e

e

e

Decision

e

?

?

e

e

c

e

c

?

c

Stalkroot

120 T. T. Soe and M. M. Min

Analysis of Missing Data Using Matrix-Characterized …

121

Table 2 Attribute information of Table 1 1. Cap-shape: bell = b, convex = x

13. Stalk-surface-below-ring: ibrous = f, smooth = s

2. Cap-surface: fibrous = f, smooth = s, scaly =y

14. Stalk-color-above-ring: white = w

3. Cap-color: brown = n, white = w, yellow = y

15. Stalk-color-below-ring: white = w

4. Bruises: no = f, bruises = t

16. Veil-type: partial = p

5. Odor: almond = a, anise = l, pungent = p

17. Veil-color: white = w

6. Gill-attachment: free = f

18. Ring-number: one = o

7. Gill-spacing: close = c, crowded = w

19. Ring-type: evanescent = e, pendant = p

8. Gill-size: broad = b, narrow = n

20. Spore-print-color: black = k, brown = n

9. Gill-color: gray = g, brown = n, white = w

21. Population: abundant = a, numerous = n, scattered = s, several = v

10. Stalk-shape: enlarging = e, tapering = t

22. Habitat: grasses = g, meadows = m, urban = u

11. Stalk-root: club = c, equal = e

23. Decision: poisonous = p, edible = e

12. Stalk-surface-above-ring: smooth = s

3.2 Characteristic Relation In RST, the indiscernibility relation is generally used to describe completely specified tables. However, in many real-life applications, datasets have some missing attribute values, or the corresponding decision tables are incompletely specified [1, 2, 13]. The characteristic relation, a generalization of indiscernibility relation, is used to describe incompletely specified tables. The characteristic set K A (c) is the set of all cases U that are indistinguishable from ‘c’ using all attributes A [1, 7]. K A = { (c1 , c2 ) | ∀ a ∈ A, (a(c1 ) = ‘?’) ∧ (a(c1 ) = a(c2 ) ∨ a(c1 ) = ‘ ∗ ’ ∨ a(c2 ) = ‘ ∗ ’)}

(1)

Based on the characteristic set, the characteristic relation R(A) is defined as follows. (c1 , c2 ) ∈ R(A) ⇔ c2 ∈ K A (c1 )

(2)

Firstly, the characteristic sets for Table 1 with “do not care” conditions ‘*’ are calculated by using the Eq. (1). KA KA KA KA

(1)= {1, 9} (2) = {2} (3) = {3, 9} (4) = {4}

122

KA KA KA KA KA KA

T. T. Soe and M. M. Min

(5) = {5, 9} (6) = {6, 9} (7) = {7} (8) = {8, 9} (9) = {1, 3, 5, 6, 8, 9, 10} (10) = {9, 10}.

Then, the characteristic sets for Table 1 with lost values ‘?’ are computed as follows. KA KA KA KA KA KA KA KA KA KA

(1)= {1} (2) = {2} (3) = {3} (4) = {4} (5) = {5} (6) = {6} (7) = {7} (8) = {8} (9) = {1, 5, 6, 9, 10} (10) = {10}.

3.3 Matrix-Represented Approximations The calculation of lower and upper approximations is an essential part in rough set-based knowledge acquisition systems. Among the definitions of approximations [13], concept lower and upper approximations are utilized in this paper. A concept X means that the set of all cases or examples with the same decision value. The lower approximation is the set of all cases which are classified as members of the concept X. Then the upper approximation contains the set of cases which can be possible members of the concept X. The two concepts of Table 1 are X 1 = {1, 2, 3, 4, 5} and X 2 = {6, 7, 8, 9, 10}. Firstly, the relation matrix RM of the incomplete decision table is generated based on the characteristic relation [8]. The relation matrix RM, an n×n matrix representing K A , is   KA R Mn×n = m i j n×n Where, n = number of cases, 1 ≤ i, j ≤ n,  mi j =

  1, ci , c j  ∈ K A / KA 0, ci , c j ∈

(3)

Analysis of Missing Data Using Matrix-Characterized …

123

mii = 1. And then, the induced diagonal matrix IDM is constructed through the relation matrix. The induced diagonal matrix IDM is denoted as follows.   1 1 1 KA I D Mn×n = daig n , n , . . . , n (4) j=1 m 1 j j=1 m 2 j j=1 m n j Where, n = number of cases, 1 ≤ j ≤ n. The decision matrix DM is computed according to the concept X. The decision matrix DM is expressed as: X = (G(X 1 ), G(X 2 ), . . . , G(X r )) D Mn×r

(5)

Where, r = number of concepts n = number of cases T G(X 1 )=  (g1 , g2 , …, gn ) 1, (ci ) ∈ X gi = . / X 0, (ci ) ∈ The basic matrix BM is calculated via the matrix multiplication of the induced diagonal matrix, the relation matrix, and the decision matrix. The basic matrix BM: KA KA X · R Mn×n · D Mn×r B M(X ) = I D Mn×n

(6)

The resulting basic matrix BM (X) let be (b1 , b2 , …, bn )T . Then, the matrixrepresented lower and upper approximations are computed as follows. AX = B M [α,1] (X )

(7)

AX = B M (β,1] (X )

(8)

Where, B M [α,1] (X ) = (bi )n×1  1, α ≤ bi ≤ 1

bi = 0, else B M (β,1] (X ) = (bi )n×1  1, β ≤ bi ≤ 1 bi = 0, else n = number of cases, 1 ≤ i ≤ n, 0 ≤ α ≤ 1, 0 ≤ β ≤ 1.

124

T. T. Soe and M. M. Min

Table 3 Relation matrix RM for Table 1 with “do not care” conditions 1

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

1

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

1

0

1

0

1

0

1

1

0

1

1

1

0

0

0

0

0

0

0

0

1

1

In this paper, different values of α and β are exploited instead of using only α = 1 and β = 0 [7]. This threshold did not affect the performance of the system examined on missing datasets with a smaller amount of missing attributes values. The accuracy is lower than the traditional rough set approach whereas analyzing missing datasets with more missing values. The examining results and discussions were described in our previous work [8]. In this paper, we emphasize that the performance comparison between the two representations of missing values, “do not care” conditions and lost values. The examined results have been depicted in Sect. 4. For the sample missing dataset, illustrated in Table 1, matrix-representing lower and upper approximations for both types of missing values are calculated as follows. Firstly, the relation matrix RM is constructed with regard to the characteristic relation. The relation matrix for each type of missing values is depicted in Tables 3 and 4 respectively. Based on the resulting relation matrix RM, the induced diagonal matrix IDM is computed as follows. For  the relation matrix presented in Table 3, I D M = diag 21 , 11 , 21 , 11 , 21 , 21 , 11 , 21 , 17 , 21 . For the relation matrix depicted in Table 4, I D M =   diag 11 , 11 , 11 , 11 , 11 , 11 , 11 , 11 , 15 , 11 . Then, the decision matrix DM is calculated via the concept X. The basic matrix BM is constructed through the matrix multiplication of IDM, RM, and BM. The lower and upper approximations for Table 1 interpreted as “do not care” conditions are computed using the basic matrix BM with α = 1 and β = 0. Then the evaluated result is depicted in Table 5. In Table 6, the decision matrix, the basic matrix, the matrix-represented lower and upper approximations for sample missing dataset with lost values are illustrated.

4 Experimental Results and Discussions In this experiment, the mushroom dataset from “UCI Machine Learning Repository” [14] is exploited. Five missing datasets are generated by assigning different amounts (10, 15, 20, 25, and 30%) of missing attribute values to this dataset. This experiment

Analysis of Missing Data Using Matrix-Characterized …

125

Table 4 Relation matrix RM for Table 1 with lost values 1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

1

1

0

0

1

1

0

0

0

0

0

0

0

0

0

1

Table 5 Matrix-represented approximations for Table 1 with “do not care” conditions Decision matrix DM

Basic matrix BM

Lower approximation BM [1]

Upper approximation BM (0,1]

X1

X2

X1

X2

X1

X2

X1

X2

1

0

0.50

0.50

0

0

1

1

1

0

1.00

0.00

1

0

1

0

1

0

0.50

0.50

0

0

1

1

1

0

1.00

0.00

1

0

1

0

1

0

0.50

0.50

0

0

1

1

0

1

0.00

1.00

0

1

0

1

0

1

0.00

1.00

0

1

0

1

0

1

0.00

1.00

0

1

0

1

0

1

0.43

0.57

0

0

1

1

0

1

0.00

1.00

0

1

0

1

is coded in JAVA and performed on an Intel Core i5 processor for Windows 7, 2 GB RAM and 500 GB hard disk. The performance of the system is assessed with accuracy and execution time. The resultant five missing datasets are initially interpreted as missing datasets with “do not care” conditions ‘*’. Firstly the 10% missing dataset is examined with different thresholds, (α = 0.8, β = 0.1), (α = 0.8, β = 0.2), (α = 0.9, β = 0.1), (α = 0.9, β = 0.2) and (α = 1, β = 0). In this experiment, the accuracy remains the same for all thresholds. And then, the 15, 20, 25 and 30% missing datasets are analyzed with different thresholds. For the 15 and 20% missing datasets, the accuracy is the same as analyzing with the 10% missing dataset. In experiment with the 25% missing dataset, the accuracy decreases when (α = 1, β = 0). For the 30% missing dataset, the accuracy decreases when (α = 0.9, β = 0.1), (α = 0.9, β = 0.2) and (α = 1, β = 0). The experimental result is depicted in Fig. 1.

126

T. T. Soe and M. M. Min

Table 6 Matrix-represented approximations for Table 1 with lost values Decision matrix DM

Basic matrix BM

Lower approximation BM [1]

Upper approximation BM (0,1]

X1

X2

X1

X2

X1

X2

X1

X2

1

0

1.00

0.00

1

0

1

0

1

0

1.00

0.00

1

0

1

0

1

0

1.00

0.00

1

0

1

0

1

0

1.00

0.00

1

0

1

0

1

0

1.00

0.00

1

0

1

0

0

1

0.00

1.00

0

1

0

1

0

1

0.00

1.00

0

1

0

1

0

1

0.00

1.00

0

1

0

1

0

1

0.40

0.60

0

0

1

1

0

1

0.00

1.00

0

1

0

1

Fig. 1 Experimental results for missing datasets with “do not care” conditions using different thresholds

Then, the five missing datasets are interpreted as missing datasets with lost values ‘?’. Each of these missing datasets is analyzed with different thresholds. Analyzing the 10, 15, 20 and 25% missing datasets shows that accuracy remains the same for all thresholds. In examining with the 30% missing dataset, the accuracy decreases for all different thresholds. The comparison between the accuracy of these missing datasets with different thresholds is illustrated in Fig. 2. After examining with both types of missing values, we found that the accuracy remains the same for all thresholds while using missing datasets with smaller missing percent. For the 25% missing dataset with the threshold (α = 1, β = 0), the system outperforms when missing values are characterized as lost values than represented as “do not care” conditions. However, for the 30% missing dataset with the threshold

Analysis of Missing Data Using Matrix-Characterized …

127

Fig. 2 Experimental results for missing datasets with lost values using different thresholds

(α = 1, β = 0), the accuracy remains the same for both interpretations of missing values. For both datasets with larger missing percent, the system outperforms when missing values are represented as “do not care” conditions than represented as lost values while using (α = 0.8, β = 0.1) and (α = 0.8, β = 0.2). The computational time on missing datasets with “do not care” conditions and the execution time on missing datasets with lost values are compared in Fig. 3. In this experiment, different data sizes (100, 2000, 5000) with less than 10 percent missing values are used and the threshold (α = 1, β = 0) is used for both missing values. As shown in Fig. 3, the execution time on both missing values is not significantly different up to 2000 records. For the missing dataset with 5000 records, the lost value interpretation is slightly faster than the representation of “do not care” conditions. In addition, experiment on data with different missing percent had been made and we found that execution time on missing datasets may change according to the amount of missing values.

5 Conclusion Currently, data in many real-life applications are incomplete, inconsistent, vague or noisy due to inherent measurement inaccuracies or intentional blurring of data. Incomplete data handling is an important issue due to the incomplete data in either the testing set or training set affects the learning quality of the classifiers. In this paper, by using different thresholds, we first presented a set of experiments on missing datasets with “do not care” conditions ‘*’. Then, evaluating on missing datasets with lost values ‘?’ was provided. According to the experimental results, the accuracy on the datasets with smaller missing percent remains the same for all thresholds for both types of missing values. With the thresholds (α = 0.8, β = 0.1) and (α = 0.8, β =

128

T. T. Soe and M. M. Min

Fig. 3 Comparison of execution time between the two interpretations of missing data on different data sizes

0.2), the system outperforms when missing values are represented as “do not care” conditions ‘*’ than it is represented as lost values ‘?’. The lost value interpretation is slightly faster than the representation of “do not care” conditions for larger datasets. The execution time on both missing values is not significantly different for smaller datasets.

References 1. Grzymala-Busse, J. W. (2005). Characteristic relations for incomplete data: A generalization of the indiscernibility relation. Transactions on rough sets IV (pp. 58–68). Berlin, Heidelberg: Springer. 2. Grzymala-Busse, J. W. (2008). Three approaches to missing attribute values: A rough set perspective. Data mining: Foundations and practice (pp. 139–152). Berlin, Heidelberg: Springer. 3. Kryszkiewicz, M. (1998). Rough set approach to incomplete information systems. Information Sciences, 112(1–4), 39–49. 4. Kryszkiewicz, M. (1999). Rules in incomplete information systems. Information Sciences, 113(3–4), 271–292. 5. Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. In System Theory. Boston, London, Dordrecht: Kluwer Academic Publishers. 6. Stefanowski, J., & Tsoukiàs, A. (1999, November). On the extension of rough sets under incomplete information. In International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing (pp. 73–81). Springer, Berlin, Heidelberg. 7. Zhang, J., Wong, J. S., Pan, Y., & Li, T. (2015). A parallel matrix-based method for computing approximations in incomplete information systems. IEEE Transactions on Knowledge and Data Engineering, 27(2), 326–339. 8. Soe, T. T., & Min, M. M. (2018, June). Speeding up incomplete data analysis using matrixrepresented approximations. In 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) (pp. 206–211). IEEE. 9. Zhang, J., Li, T., Ruan, D., & Liu, D. (2012). Rough sets based matrix approaches with dynamic attribute variation in set-valued information systems. International Journal of Approximate Reasoning, 53(4), 620–635.

Analysis of Missing Data Using Matrix-Characterized …

129

10. Qian, Y., Dang, C., Liang, J., & Tang, D. (2009). Set-valued ordered information systems. Information Sciences, 179(16), 2809–2832. 11. Grzymala-Busse, J. W., & Wang, C. P. B. (1996, June). Classification methods in rule induction. In Proceedings of the 5th Intelligent Information Systems Workshop (pp. 120–126). 12. Grzymala-Busse, J. W. (1992). LERS-a system for learning from examples based on rough sets. Intelligent decision support (pp. 3–18). Dordrecht: Springer. 13. Grzymala-Busse, J. W. (2006). Rough set strategies to data with missing attribute values. Foundations and novel approaches in data mining (pp. 197–212). Berlin, Heidelberg: Springer. 14. https://archive.ics.uci.edu/ml/datasets/Mushroom.

Expansion of Cyber Attack Data from Unbalanced Datasets Using Generative Adversarial Networks Tim Merino, Matt Stillwell, Mark Steele, Max Coplan, Jon Patton, Alexander Stoyanov and Lin Deng

Abstract Machine learning is commonly used for both research and operational purposes in detecting cyber attacks. However, publicly available datasets are often highly imbalanced between attack and non-attack data. Training attack detection systems on unbalanced datasets leads to inaccurate and biased algorithms. Here, we explore using Generative Adversarial Networks (GANs) to improve the training and, ultimately, performance of cyber attack detection systems. We determine the feasibility of generating cyber attack data from existing cyber attack datasets with the goal of balancing those datasets with generated data. Our findings suggest that GANs are a viable approach to improving cyber attack intrusion detection systems. Our model generates data that closely mimics the data distribution of various attack types, and could be used to balance previously unbalanced datasets. T. Merino · M. Stillwell · M. Steele · M. Coplan · J. Patton · A. Stoyanov · L. Deng (B) Department of Computer and Information Sciences, Towson University, Towson, MD, USA e-mail: [email protected] T. Merino e-mail: [email protected] M. Stillwell e-mail: [email protected] M. Steele e-mail: [email protected] M. Coplan e-mail: [email protected] J. Patton e-mail: [email protected] A. Stoyanov e-mail: [email protected] M. Coplan Department of Physics, Astronomy and Geosciences, Towson University, Towson, MD, USA T. Merino · M. Steele · J. Patton · A. Stoyanov Department of Mathematics, Towson University, Towson, MD, USA © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_8

131

132

T. Merino et al.

Keywords Generative adversarial network (GAN) · Discriminator · Generator · Machine learning · Deep learning · Neural networks · Cyber attacks · Balancing datasets · Network security

1 Introduction When creating a neural network, the quality of the dataset used to train the system is crucial to the accuracy of the results. Even small irregularities in the dataset can completely change the system. However many datasets are unbalanced in the classes represented. For example, there may be thousands or even millions of data points for some classifications but only a few examples of others. This disproportionate representation is significant because many machine learning algorithms become more accurate with more available data points. Machine learning models are used for purposes as diverse as object detection, visual art creation, and spam e-mail classification. Since machine learning is so important to modern society, being able to increase accuracy by balancing existing datasets is powerful and will have farreaching benefits. We chose to apply our efforts to to cyber security first because cyber attack datasets are often unbalanced [1]. Machine learning algorithms generally require large datasets to effectively learn to perform a task. An intrusion detection system (IDS) is a predictive machine learning model capable of distinguishing between attacks and normal connections [2]. An IDS approach, like most machine learning algorithms, relies on available datasets of cyber attacks. From these datasets, the algorithm learns to classify certain attack types based on features of their network flow data. However, publicly available datasets are outdated, narrowly scoped, and highly unbalanced. Up-to-date datasets are needed to cope with the ever-increasing variations in cyber attacks. Current IDS models are high in false positives and false negatives because it is difficult to train and calibrate learning algorithms on unbalanced datasets. Unbalanced datasets are a common neural network training problem. A promising solution is generating realistic data points for underrepresented classes. We propose a new GAN framework for generating cyber attack data. This framework is aimed at generating attack data for underrepresented cyber attacks in existing datasets. For our initial test, we chose the KDD99 dataset, a cyber attack dataset created for an intrusion detection system design competition. This framework could theoretically be applied to any cyber attack dataset using network traffic captures similar to KDD99 [2]. By generating data to balance out underrepresented attacks, our GAN framework aims to improve existing IDS performance metrics by giving IDS models more data to learn from. Using our proposed framework, our GAN model was able to accurately mimic select attack types from within the KDD99 dataset. By evaluating our data using a binary classification neural network, we determined the feasibility of this framework for real world application. We found that our GAN was capable of generating moderately under-represented attack type data, of which more than 90% were correctly identified.

Expansion of Cyber Attack Data from Unbalanced Datasets …

133

Our proposed approach shows promise and potentially can be applied continuously as cyber-attacks evolve and adapt to new detection systems. The GAN designed in this research will benefit computer science and cyber security professionals and everyday users alike. If network administrators and providers are able to implement meaningful and effective classifiers into their detection methods, then the Internet will become a much safer place for everyone. By increasing the data available to professionals, we can increase the stability and safety of all Internet connected systems. This paper is organized as follows. Section 2 introduces background information on GAN. Section 3 describes the general structure of the GAN designed in our research. Section 4 presents and analyzes the experimental results. Section 6 discusses threats to validity. Section 5 gives an overview of related research, and the paper concludes and suggests future work in Sect. 7.

2 Background This section provides background on Generative Adversarial Networks, and cyber attack data.

2.1 Generative Adversarial Networks Goodfellow et al. first introduced the concept of GANs as two neural networks, a Generator model and a Discriminator model, dueling each other [3]. Figure 1 illustrates the general structure of GAN. The Discriminator trains on both real and generated data points, with the goal of determining whether a given input is “real” (that is, coming from the data distribution) or “fake” (that is, created by the Generator). A small number of generated data points are interspersed among the real training data points; the Discriminator then trains on a combined set comprising mostly real data points. The networks are in a feedback loop, such that upon repeated training sessions the Generator learns to better mimic the data distribution, and the Discriminator learns to better classify data points as real or fake. Training is complete when the Discriminator can discern at a rate no better than guessing that a generated data point is, indeed, fake. The GAN model is an excellent approach to generating data within a data distribution. A common analogy used to describe GANs is a scenario involving a wine forger and a wine taster. The wine forger (Generator) is tasked with creating fake wines that can fool the wine taster into believing they are real. The wine taster (Discriminator) is tasked with identifying the fake wines. They continuously compete in this way, learning their craft with each round of tasting. Eventually, the wine forger becomes so adept at creating fake wines, the (now expert) wine taster can only guess which wines are real or fake. At this point, the game is stopped, and we are left with a wine forger that can create extremely convincing fake wines.

134

T. Merino et al.

Fig. 1 The general structure of a GAN

In our work, the Generator tries to produce realistic cyber attack data, and the Discriminator must determine whether data passed to it is real or fake. During training, the Discriminator improves its ability to distinguish real and fake attacks, and the Generator improves its ability to generate data that realistically mimics an attack. Formally, according to Goodfellow et al., the Generator’s distribution pg over data x is learned for input noise pz (z), a differentiable function G, and parameters Θg of the multilayer perceptron G(z, Θg ) G and D(z, Θg ). The Discriminator perceptron is D(x, Θd ), which outputs a scalar, where Θd is the parameters, such that D(x) is the probability that a datapoint came from the original data rather than the Generator. The minimax game of the Generator and Discriminator may then be represented as min max V (G, D) = Ex∼ pdata (x) [logD(x)] + Ez∼ pz(z) [log(1 − D(G(z)))] G

D

(1)

2.2 Cyber Attack Data Background In a world where everything is digital, protecting the servers and data is vital. Cyber security is very important so that private data is not lost and systems are not shutdown. Every major organization invests time and money into protecting against

Expansion of Cyber Attack Data from Unbalanced Datasets …

135

cyber attacks, to protect themselves and those they serve. There are many different types of cyber attacks that may occur on an given system exposed to the Internet. A common example is a Distributed Denial of Service (DDOS) attack which involves overloading a server’s ability to hand out data to many users at once. This crashes the server, and time and resources are lost. Another common cyber attack is a Man-in-the-middle attack, wherein one user intercepts the packets sent from one system before forwarding them to the intended system. This allows hackers to analyze private data such as user names and passwords. These cyber attacks can devastate essential systems, and new types of attacks constantly emerge with new technologies. The details of these cyber attacks can be captured and logged in the form of network flow data. This data logs the characteristics of any connection made on a network, which can later be analyzed and used to improve security. This data is sometimes made publicly available, and presented to the public in the interest of improving overall cyber security. The dataset that we used to train our model is mainly comprised Neptune, Smurf, and IP Sweep cyber attacks but contains many other, lesser known attacks. Some of these attacks have only a handful of data points, whereas others have millions. One problem with implementing machine learning for cyber attacks is the lack of data for certain attack types.

2.3 Description of Features and Attack Types The KDD99 dataset comprises 41 features representing 22 different attack types. The features include duration of attack, the number of attempted connections in the past two seconds, and whether or not the attacker was able to successfully log in (see [4] and Table 1). Our model has generated Neptune, Smurf, and IP Sweep attacks. A Neptune attack sends large amounts of (usually) spoofed Synchronize sequence numbers (SYN) packets to the target to try to exhaust a server’s resources [5]. A Smurf attack is a type of distributed denial of service attack in which an attacker spoofs Internet Control Message Protocol (ICMP) echo requests to a network broadcast address, and, unless the network forbids IP-directed broadcasts, the attack will overwhelm the target and result in disruption of service (see [5, 6]). An IP or Ping sweep attempts to identify all the live hosts within a certain IP range, which can later be used for an attack [5]. This can be done in a LAN by using the router’s subnet mask to get the range of all IPs that can be addressed by the router, and then sending packets to every IP in that range and wait for a response. IP sweep is normally followed up by a Port sweep, which lists all the open ports on a host, giving insight to the possible services running on the machine. A port sweep is then followed by service-specific attacks. The results of an IP sweep may then be used in a Smurf attack, giving the Smurf attack the list of IPs to send the spoofed packets to.

136

T. Merino et al.

Table 1 Feature names and Datatypes of attack data in KDD99 Dataset [2]. No. is the data point feature number. Continuous indicates that the data is numeric. For symbolic data, the symbols found in the dataset are provided in the righthand column. Because there are 70 different symbols for the service feature, and 11 different symbols for the flag feature, only three example symbols are given for brevity. This symbolic data is encoded to integer values by an autoencoder in our model No. Feature name Data type Symbols 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate

continuous symbolic symbolic symbolic continuous continuous symbolic continuous continuous continuous continuous symbolic continuous continuous continuous continuous continuous continuous continuous continuous symbolic symbolic continuous continuous continuous continuous continuous continuous continuous continuous continuous continuous continuous continuous continuous

– tcp, udp, icmp http, smtp, domain_u, etc. SF, S0, REJ, etc. – – 0, 1 – – – – 0, 1 – – – – – – – – 0, 1 0, 1 – – – – – – – – – – – – – (continued)

Expansion of Cyber Attack Data from Unbalanced Datasets … Table 1 (continued) No. Feature name 36 37 38 39 40 41

dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate

137

Data type

Symbols

continuous continuous continuous continuous continuous continuous

– – – – – –

3 Approach In this section, we introduce the general approach of our research, including the GAN constructed in our research, the dataset used in our evaluation, and the hardware/software used in our development and experiment.

3.1 GAN Design Our goal was to determine whether GANs are a viable approach to balancing datasets. GANs have proved effective in generating realistic looking samples given a data distribution. Whereas many GAN applications involve image generation, we sought to generate cyber attack network flow data like that found in the KDD99 dataset. In image generation, images are converted into numeric arrays, where each number determines the pixel color, contrast, depth, etc. For our purposes, data comprises symbolic and numeric fields. Symbolic fields present in the original attack data are converted into numeric data, similar to the way images are processed. The GAN is thus presented only numbers, which each represent some characteristic of a cyber attack. This process is discuss more later in this section. We evaluated the potential effectiveness in generating cyber attack data using a GAN by combining a GAN model with a neural network classifier, called the Evaluation model, that would give us a measure of the success of our approach. We engineered a GAN, or the Combined model, as seen in the top rectangle of Fig. 2, that learns to generate data based on certain attack types found within our dataset. Our GAN comprises a Discriminator neural network and a Generator neural network. The Discriminator is a binary classifier, which classifies the input data as a real attack (coming from the dataset) or fake attack (created by the Generator). The input data is 41-dimensional vectors of numbers (ex. Table 2) that were produced by encoding the original attack data. The names of the 41 parameters can be found in Table 1. The Discriminator model consisted of three hidden layers of sizes 32, 16, and 8 nodes, respectively. Each hidden layer was extended by LeakyRelu activation layer as well as a Dropout layer to prevent overfitting of the Discriminator.

138

T. Merino et al.

Fig. 2 The GAN system graph designed in this research Table 2 Shown are generated (first group, upper) and real (second group, upper) Portsweep attacks and generated (first group, lower) and real (second group, lower) Neptune attacks (classified 98% accurate by the evaluation model) 190 187 181 191 387 544 1 447

0 0 0 0 1 1 1 1

118 114 110 116 36 40 40 36

1 0 0 3 4 4 3 4

0 0 0 0 1 1 0 1

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

110 101 97 104 2 2 109 2

25 26 25 23 2 2 2 2

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

71 76 73 77 1 1 1 1

0 0 0 0 1 1 1 1

0 0 0 0 1 1 0 1

0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0

306 315 306 319 206 206 206 206

0 0 0 0 1 1 1 1

0 0 0 0 0 0 0 0

126 110 107 114 69 73 24 73

191 195 190 207 100 100 43 100

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

23 23 22 33 1 1 1 1

0 0 0 0 1 1 1 1

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

38 37 32 31 40 38 5 37

0 0 0 0 0 2 0 2

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

186 194 192 187 263 139 230 133

12 12 9 8 19 7 15 11

0 0 0 0 0 1 0 1

0 0 0 0 0 1 0 1

0 0 0 0 1 0 1 0

0 0 0 0 1 0 1 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

118 124 125 121 143 143 143 143

0 0 0 0 19 16 15 12

0 0 0 0 0 0 0 0

4 2 2 2 7 7 7 6

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 1 0 1

0 0 0 0 0 1 0 1

0 0 0 0 1 0 1 0

0 0 0 0 1 0 1 0

The Generator model is a neural network that takes a 41-dimensional noise vector as input, passes it through the set of hidden layers, and produces an output vector of the same dimension, which represents generated attack data. The node counts in each hidden layer varied throughout our experimentation. We initially took the approach of randomizing the node counts in three hidden layers and logging the results. However, this approach did not yield to consistent results, and the generated attacks often ended up being random values. Matching the Discriminator model shape, we later chose

Expansion of Cyber Attack Data from Unbalanced Datasets …

139

three hidden layers with 8, 16, and 32 nodes, respectively. Furthermore, each hidden layer of the Generator is followed by the BatchNormalization layer, which reduces the amount of shift of nodes in hidden layers, and the LeakyRelu layer, which serves as an activation layer for the next hidden layer in the sequence. The output is a 41dimensional vector that mimics the original attack data. This configuration of the Generator leads to meaningful generated attack data that consistently imitates the structure and sequence of the original attack data. The GAN model was initially trained on the Neptune attack data because this attack type is abundant in the KDD99 dataset. The Neptune attack is an attack that makes a server reject any new connection from an authorized TCP client by: (a) Flooding a server with connection request packets; (b) Receiving acknowledgement packet; and (c) never responding to acknowledgement packets and keeping connection half-open, thus ignoring any new connections. Hyperparameters in the GAN model, including the dropout rate, learning rate, and alpha for this neural network, were determined through experimentation on different attack types within the original dataset. These hyperparameters remained constant throughout our testing. The purpose of the Combined model is accomplished when the Discriminator is unable to accurately differentiate between real and generated attacks. This condition was measured by the Generator and the Discriminator loss. Once this goal is achieved, the simulated attack data produced by our Generator closely mimics their attack type, and can potentially be used to balance the dataset. However, we further evaluated these generated attacks to ensure their quality, using another neural network called the Evaluation model. The Evaluation model, as seen in the lower left rectangle of Fig. 2, is a neural network binary classifier. It uses a Dropout layer as an input layer, where the input is yet again a number of 41-dimensional vectors. Because this model is a binary classifier, we only use one hidden layer with a node count of 32 and the Relu activation function. The hidden layer is followed by the Dropout layer to prevent overfitting. The output layer uses a sigmoid function, which provides an output in the range of 0 to 1. For our purposes, 0 represented a Neptune attack, and 1 represented non-Neptune attack data. The Evaluation model was trained on one million instances of equally balanced data that was comprised of Neptune attacks as well as normal network data. The Evaluation model was then validated on more than 50,000 instances of the attack and normal data from the training set to ensure that the model was not overfitting. At the end of training, the Evaluation model was tested on a test dataset of 50,000 instances of the attack and the normal data, which the model had never seen before. This yielded a consistent result of 99.99% accuracy of predictions. This approach gave us an Evaluation model that was an expert in differentiating between real attack data and normal network data, which we would later use in the classification of generated attack data. By training the Evaluation model to accurately distinguish between normal network flow data and cyber attack data, we obtained a metric of our GAN’s success

140

T. Merino et al.

for a particular training iteration. This success was measured as the percentage of generated Neptune attacks that were classified as Neptune by the Evaluation model. Figure 2 shows the proposed final framework for our GAN system. The work covered in this paper focuses on the first two segments of the framework: the GAN model and the Evaluation of Generated Data.

3.2 Dataset Used in Our Research We chose the KDD99 dataset for its size and demonstrated use in machine learning applications. The KDD99 dataset was used for The Third International Knowledge Discovery and Data Mining Tools Competition. The competition task was to build a network intrusion detector, and a predictive model capable of distinguishing between “bad” connections, called intrusions or attacks, and “good” normal connections. This database contains a wide variety of intrusions simulated in a military network environment [4]. The attack types represented in the KDD99 dataset fall into 4 main categories [4] (Table 3): 1. 2. 3. 4.

DoS: Denial of Service attacks R2L: unauthorized access from a remote machine U2R: unauthorized access to local superuser (root) privileges probing: surveillance and other probing.

It contains more than 5 million data points, including normal network traffic data and attacks such as Neptune and Smurf. The KDD99 dataset contains 24 different attack types, each of which is defined by a combination of 41 symbolic and numeric variables, plus an attack label. Other cyber attack datasets, such as the Spanish URG dataset, were considered as candidates for our research. However, these other datasets were inconsistent in how the data was being presented, which would require manual data preprocessing before it could be used for GAN training. The KDD99 dataset is already well established as a common cyber attack dataset for machine learning and it allows for more specific testing of each of the attack categories against each other than say URG which only can compare abnormal traffic against

Table 3 Attack category and child attack types in KDD99 [2]. Category is the attack category of the 4 present in KDD99. Attack names are the attack types that belong to that category. Count is the total number of attack samples within that category Category Attack names Count DoS R2L U2R Probing

back, land, neptune, pod, smurf ftp_write, guess_passwd, imap, multihop, phf, spy, warezclient, warezmaster buffer_overflow, loadmodule, perl, rootkit ipsweep, nmap, portsweep, satan

388,2391 1126 52 41,102

Expansion of Cyber Attack Data from Unbalanced Datasets …

141

Fig. 3 KDD99 dataset

normal traffic. The KDD99 dataset can be broken down into four primary attack type categories: DOS, R2L, U2R, and Probing [7]. DOS attacks involve denying a service from being accessed or used, R2L involves unauthorized access from a remote machine, U2R involves unauthorized access to local super-user (root) privileges, and Probing involves any way of monitoring or learning more information about a system. In addition, the KDD99 dataset is highly imbalanced, and would greatly benefit from being supplemented with realistic-looking generated data. Figure 3 shows the distribution of different attack types in the KDD99 dataset. The main purpose of our research is to determine whether a GAN can be used to balance a dataset. Whereas the KDD99 dataset is older and may not reflect the current state of network attacks, generating realistic looking attack data and balancing the dataset may be applied to more modern cyber attack datasets. This work benefits all areas of machine learning that struggle with unbalanced datasets, including cyber security machine learning research.

142

T. Merino et al.

3.3 Hardware and Software We used an NVIDIA TITAN Xp server running Ubuntu 18.04.1 LTS. A majority of the code was written in Python 3.6 using the Keras [8], TensorFlow (breaks when using python 3.7) [9], and scikit-learn [10] libraries. We used MySQL [11] to partition the dataset for training, testing, and querying; store GAN model hyperparameters; and store generated attack data.

4 Results As seen in Table 2, we were able to train the Combined GAN model to the point that the Generator produced consistent instances of the Neptune attack. According to Table 2 indices of zeros in generated attack instances match indices of zeros in the original data only in 38 of 41 columns. The Evaluation model evaluated 100% of generated attack instances in a batch of size 5000 as an actual attack, so we can say with confidence that our GAN model, the Generator in particular, produced reliable and accurate attack data. Some features present in the dataset have symbolic representation, such as “service,” which can be “TCP” or “UDP.” Symbolic data is encoded into integer values before being presented to the GAN. Because of the activation functions used in the GAN model, generated attacks have floating point values for all features. These Generator-produced floating point values are presented as-is to the binary classifier, but later rounded to integers so the data is easier to read and analyze (Table 2). Generated values for these symbolic features do not always lie within the range defined by the integer encoding, so symbolic representation of these generated attacks is not well defined.

5 Related Work In 2015, Hu et al. [12] used back propagation, radial basis function, and randomweight artificial neural networks to classify KDD99 data. Their classifiers trained on a subset of only 10,074 of the 494,021 KDD99 data points, using a selection of 10 of the 41 features of KDD99 data points, determined by Fisher feature selection. The researchers were able to classify with >93% accuracy whether the data was an attack; however, the classification is binary and only indicates whether the data is an attack in general, rather than determining whether the data was representative of any particular attack. Yin et al. [13] used a framework based on generative adversarial networks to augment botnet detection models (Bot-GAN). The Bot-GAN augmented the original detection model, which mainly studied the anomaly characteristics of botnets based

Expansion of Cyber Attack Data from Unbalanced Datasets …

143

on network flows. The application of a GAN served to train the Discriminator (the augmented detection model) by using a Generator that is constantly improving in order to avoid detection. As a result, the researchers found that the Generator helped improve the precision, accuracy, and other performance indicators of the detection model, accompanied by a 3.6% decrease in the false positive rate. In 2018, Xie et al. [14] used a Wasserstein GAN (WGAN) to generate attack data based on the KDD99 dataset. Similar to the work by Hu et al. [12] described above, the authors’ preprocessing eliminated many of the 41 features of KDD99 data points, in particular the discrete (symbolic) features, except in the case where a symbolic feature was represented by a 0 or 1 (as shown in Table 1). The authors trained the model on the 5 million attack data points only from the KDD99 dataset. The authors noted the comparative ease of training a WGAN, and the speed with which their model can generate large quantities of attack data. However, we note that the authors generate only attacks in general, and no indication is given as to whether the generated attack data represents the range of attacks found in the KDD99 dataset as a whole. Hence it is unknown whether such a model would be able to generate the data that could balance the KDD99 dataset, or whether it simply generates the most populous attacks. Aiming to improve intrusion detection systems, Lin et al. [15] designed a framework based on GAN, called IDSGAN. It can generate adversarial attacks and exploit the intrusion detection system (black-box IDS model). Using the dataset NSLKDD [12], the experimental results indicate that IDSGAN is robust and it attacks many intrusion detection systems with different attacks. However, once again, the researchers’ model generates attack data in general, rather than specific attacks. With the goal of using GAN to improve supervised learning, Lee et al. [16] create a technique that can alternately train both classifier and Generator models. Using the CIFAR datasets [17], their experimental results show that the method they propose can lower the error of the network. Arjovsky and Bottou [18] discuss how to train a GAN system in a theoretical sense instead of delving into implementation or specific ways of going about training a GAN. Santhanam and Grnarova [19] proposed an approach called cowboy, which can detect and defend against adversarial attacks with a GAN.

6 Threats to Validity The binary classifier that was used to determine the validity of the generated data is the only measure of the accuracy of that data. Unfortunately, the Generator cannot always be trusted to produce consistent results, especially with small datasets. We anticipate that it would be beneficial to convert the binary classifier into a multiclassifier to distinguish different generated attacks while under test, as if the classifier were deployed in the real world. We are looking forward to making a better and more consistent version of the Generator, which would increase validity of the results.

144

T. Merino et al.

The dataset we chose may also threaten the validity of this project. The KDD99 has been criticized as being too outdated for modern cyber security research. Cyber attacks constantly evolve to overcome new advances in cyber security. The many technological advances in the 20 years since KDD99 was published would necessitate changes in the structure and approach of cyber attacks. Despite these concerns, we believe our approach remains valid. Network flow data, the type of data captured in KDD99, is still used to capture network connection data and to construct new datasets. Although the structure of attacks may change, this information is stored in a similar format and can be tested using this framework. Our goal is that a new dataset created using the a format similar to the data in KDD99 can utilize our framework to generate new attack samples for any purpose.

7 Conclusion and Future Work In our continued work on this project, we aim to optimize the GAN and generate more attack types in KDD99 and other cyber attack datasets. Our GAN struggled to train on attacks with very little data, such as “spy” or “loadmodule” attacks. One technique to train on these attacks we hope to implement in our model is fewshot learning. Few-shot learning is a technique where a classifier learns to categorize data points based on few, or even single, examples. The technique leverages transfer learning [20], where additional information about the samples may be available. One hope is that we may discover features of attacks in general that may be used for fewshot learning, for instance. Being able to generate attacks with very little available data would be immensely beneficial to current cyber attack detection systems. We also hope to improve performance of the overall system by using more advanced evaluation methods. Currently, we only evaluate our generated data using a binary classifier. In the future, more evaluation methods could be implemented and work in parallel with our current model. Such methods include clustering models, novelty detection models, and statistical analysis of the generated attacks. Once our optimization is complete, we hope to test the real-world effectiveness of this framework by training new cyber attack detection systems on a GAN-balanced dataset. This is pictured in the “Comparative Analysis” section of Fig. 1. By comparing the performance on an unbalanced dataset versus a balanced dataset, we can precisely measure the improvement our system can provide to current systems. Acknowledgements This undergraduate student research project is supported via the Information Security Research and Education (INSuRE) project [21]. Lin Deng is supported by the Faculty Development and Research Committee (FDRC) award at Towson University. We thank our technical director, Dr. Benjamin Blakely at Argonne National Laboratory for his research mentorship and contribution to this project. We also thank Ksenia Tepliakova and Long Chen for their contribution to this research during the fall 2019 semester. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN Xp GPU used for this research.

Expansion of Cyber Attack Data from Unbalanced Datasets …

145

References 1. Cieslak, D., Chawla, N., & Striegel, A. (2006). Combating imbalance in network intrusion datasets (pp. 732–737). www3.nd.edu/~dial/publications/cieslak2006combating.pdf 2. UCI Machine Learning Repository. (1999). KDD Cup 1999 Data. https://kdd.ics.uci.edu/ databases/kddcup99/kddcup99.html 3. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014, June). Generative adversarial networks. arXiv (June 2014), retrieved from the arXiv https:// arxiv.org/abs/1406.2661 4. University of California. (1999). KDD Cup 1999 Data. The UCI KDD Archive 5. Labib, K., & Vemuri, V. R. (2004). Detecting denial-of-service and network probe attacks using principal component analysis. In 3rd Conference on Security and Network Architectures. http://web.cs.ucdavis.edu/~vemuri/papers/Detecting%20DoS%20and%20Probe %20Attacks%20using%20PCA.pdf 6. Anonymous. (2002). Maximum security: A Hacker’s guide to protecting your computer systems and network (4th ed.). Que Publishing 7. Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data set. In:IEEE Symposium on Computational Intelligence in Security and Defence Applications (CISDA) 8. Home - Keras Documentation. https://keras.io/ 9. TensorFlow. www.tensorflow.org 10. scikit-learn: machine learning in Python scikit-learn 0.20.3 documentation. https://scikit-learn. org/stable/ 11. MySQL. www.mysql.com 12. Hu, L., Zhang, Z., Tang, H., & Xie, N. (2015, August). An improved intrusion detection framework based on artificial neural networks. In: 2015 11th International Conference on Natural Computation (ICNC) (pp. 1115–1120). IEEE. https://ieeexplore.ieee.org/abstract/document/ 7378148 13. Yin, C., Zhu, Y., Liu, S., Fei, J., & Zhang, H. (2018). An enhancing framework for botnet detection using generative adversarial networks. In: 2018 International Conference on Artificial Intelligence and Big Data (ICAI BD) (pp. 228–234) 14. Xie, H., Lv, K., & Hu, C. (2018, August). An effective method to generate simulated attack data based on generative adversarial nets. https://ieeexplore.ieee.org/abstract/document/8456136 15. Lin, Z., Shi, Y., & Xue, Z. (2018, September). Idsgan: Generative adversarial networks for attack generation against intrusion detection. arXiv Comput. Sci. 16. Lee, H., Han, S., & Lee, J. (2017, May).textitGenerative adversarial trainer: Defense to adversarial perturbations with GAN. arxiv.org/abs/1705.03387 17. Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Tech. rep. www. cs.toronto.edu/~kriz/learning-features-2009-TR.pdf 18. Arjovsky, M., & Bottou, L. (2017, January). Towards principled methods for training generative adversarial networks. arxiv.org/abs/1701.04862 19. Santhanam, G.K., & Grnarova, P. (2018, May). Defending against adversarial attacks by leveraging an entire GAN. arxiv.org/abs/1805.10652 20. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press, http://www. deeplearningbook.org 21. Sherman, A., Dark, M., Chan, A., Chong, R., Morris, T., Oliva, L., et al. (2017). INSuRE: collaborating centers of academic excellence engage students in cybersecurity research. IEEE Security and Privacy, 15(4), 72–78.

Beyond the Hawthorne Research: Relationship Between IT Company Employees’ Perceived Physical Work Environment and Creative Behavior Jin-Hua Zhang and Jun-Ho Lee

Abstract This study has demonstrated the relationship between IT company employees’ perception of the organizational physical work environment, psychological well-being as a psychological factor, and creative behavior. The research has not differentiated physical work environment from psychological factors as was the case with the existing Hawthorne research, and verified their combined effects. Through this work, the study has demonstrated that perceived physical work environment has a significant effect on creative behavior and that psychological well-being as a psychological factor, which has been discussed separately from the physical work environment, mediates between physical work environment and creative behavior. The study has also discussed implications of these findings and others. Keywords IT industry · Knowledge worker · Physical work environment · Creative behavior · Psychological well-being

1 Introduction The Hawthorne research has shown that psycho-social factors, rather than physical work environment or conditions, are important to performance. This has stimulated related research from the point of view of human relations. Still, does the physical work environment, including lighting, do little to boost the performance? This study suggests that the Hawthorn research has four problems. First, although it is true that psychological and social factors have a great impact on performance, that does not necessarily mean that the physical work environment has no effect whatsoever on performance. Besides, it is questionable whether the physical work environment and psycho-social factors can be separated completely. Second, the physical work environment is manipulated with overly simple elements like lighting. It would be reasonable to look at the actual physical work environment from a J.-H. Zhang · J.-H. Lee (B) Department of Business Administration, Hoseo University, Cheonan, Chungnam, Korea e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_9

147

148

J.-H. Zhang and J.-H. Lee

complicated configuration perspective. For example, even if lighting is appropriate, if the other physical work environment factors are not, the overall effect would be less impressive. Third, as the study subjects became aware of the Hawthorne research being carried out, a sort of social desirability might have come into play that would lead them to become conscious of observers, rather than the physical work environment, thus distorting and even reinforcing the effect of psycho-social factors. Fourth, the Hawthorne research was carried out in a setting such as a factory which has specific and clear work procedures and product standards. Thus, performance could have been created in a relatively clear way, regardless of the physical work environment. This study suggests that these four problems may have led to underestimation of the impact of the physical work environment. Since the Hawthorne study, the influence of human factors on performance has become important. Although physical work environment is a valuable resource for the organization, there are relatively few studies on the effect of the working environment [1]. That, however, seems to have led to little research on the physical work environment. This study, therefore, intends to re-examine the effect of physical work environment on performance. By focusing on the four problems mentioned above, this study looks at the relationship between physical work environment and psycho-social factors, especially the impact the physical work environment has on psychological wellbeing, from the viewpoint of positive psychology capital, a concept which has gained traction recently. Also, the study views the physical work environment as the one which consist of various elements rather than a single one, and measures members’ perception thereof. The study also considers creativity a performance variable that can be affected by the physical work environment and verifies such. Creative behavior, with specific and clarified work procedures and unclear product standards, can have significant attributes potentially affecting physical work environment. This study focuses on IT company employees because the IT industry most clearly represents an industrial paradigm shift as compared with the production-oriented era where the Hawthorne research was carried out. This research intends to substantiate the impact of IT company employees’ perception of the physical work environment on creative behavior, a recent core performance element, through psychological wellbeing as a positive internal factor that has been discussed separately.

2 Literature Review and Hypotheses 2.1 Physical Work Environment and Creative Behavior A majority of workers spend most of their time at the office. In many ways the office is their second home. Also, in the face of a new working environment, individual can adapt to the working environment in other innovation ways. So, the effect of the physical work environment on the individual is self-evident. In other words, that

Beyond the Hawthorne Research: Relationship …

149

means that the physical work environment can have a direct impact on job satisfaction, job efficiency, and others. For example, in a survey of over 1000 office workers, about 40% of them responded that physical environmental factors were an important factor in evaluating their job satisfaction [2]. Given that the physical work environment affects each individual’s psychological perception, and the relationship between psychological perception and performance, it can be inferred that the physical work environment will also have an impact on work performance [3]. In particular, members’ creativity, which has been at the focus of attention recently as a performance variable, may also be influenced by the physical work environment, and in fact, the effect may be greater than the effect of productivity which emphasizes the physical factors rather than the psychological ones. Though creativity is generally referred to as ‘the ability of individuals to propose novel and appropriate ideas or products within a specific period’ [4], and it is a concept adopted in various studies, its definition is ambiguous [5, 6], and in many cases, it is often used in various concepts such as creative individual, creative performance, creative environment, and creative process [7]. Creativity, as a concept most commonly used in management-related research, can be defined as creating new and useful ideas and products in the process of performing tasks individually or at an organization [8]. Look at the researcher’s definition of creative behavior, Shalley suggested that creative behavior is a suitable and new solution to the organizational problem [9]. Woodman et al. defined the creative behavior as new products, services, ideas, programs, and systems that are developed by individuals in a complex social system [10]. George and Zhou defined as an innovative and useful idea for individual production, and it is of great value in research on innovation [11]. In the previous research many scholars did. In relation to the organization’s physical work environment that can promote creative behavior, Amabile et al. explored the environmental factors that can serve to either facilitate or undermine creativity, while emphasizing the importance of socio-environmental contexts [12]. Also, previous research on innovation preceded by creativity has shown that the physical work environment has a positive impact on creativity [13, 14]. Therefore, this study believes that in IT companies where the physical work environment may have more significance, the relationship between physical work environment and creative behavior will be positive. Hypothesis 1: There is a positive correlation between physical work environment and creative behavior.

2.2 Physical Work Environment and Psychological Well-Being The physical work environment affects performance through various psychological states such as awakening, stress, confusion, and motivation [3]. An individual’s per-

150

J.-H. Zhang and J.-H. Lee

ception of the physical work environment is reflected not only in perceptive-cognitive aspects but also in emotional aspects [15]. This study explores the effects of psychological well-being, from among the psychological states which can mediate between the physical work environment and creative behavior. Psychological well-being is related to the happiness felt by the members of an organization [16]. Their psychological well-being is one of the subjects of thousands of years of research on happiness conducted by many philosophers, social scientists, and psychologists, from a long-term and macro perspective [17]; and it is also part of research on positive organizational behavior (POB) inspired by positive psychology pioneered by Seligman [18], from a short-term and micro perspective [19]. According to a study, the level of job satisfaction can explain 20–25% of an adult’s overall satisfaction with his or her life [20]. One of the major factors which determine such job satisfaction is the physical work environment. Although it may not be strong enough to push a person to find another job, it plays a critical role in making the person proud of his or her job. This represents a close relationship between the physical work environment and psychological well-being. Psychological well-being is not limited to an individual’s characteristics or tastes, but includes his or her perceptive and emotional relationships with the outside world arising from interactions in daily and professional lives [21–24]. Ryff and Keyes found that one’s psychological well-being is determined by (i) positive evaluation of his or her past and present based on the fulfillment of positive functions and potential (self-acceptance), (ii) pursuit of continued growth and development (personal development), (iii) belief in the purpose and meaning of life (purpose in life), (iv) positive relationship with other people (positive relationship), (v) ability to efficiently manage his or her life and external environment (environmental mastery), and (vi) ability to make his or her own decision regarding values and behaviors [25]. Baron also suggested that diverse environmental conditions have a positive or negative impact and that the physical aspect of work environment, in particular, can strongly influence people’s emotional aspect [26]. In view of the discussions above, the physical work environment is believed to affect an individual’s psychological well-being. Thus, we have developed the following hypothesis. Hypothesis 2: There is a positive correlation between physical work environment and psychological well-being.

2.3 The Mediating Effect of Psychological Well-Being This study focuses on environmental mastery, from among the determining factors of psychological well-being. If the members of an organization are satisfied with their work environment and can effectively manage their surroundings, they will have a higher level of psychological well-being. Individuals with high psychological well-being are more active in presenting their opinions and take the initiative in developing better approaches or creative

Beyond the Hawthorne Research: Relationship …

151

Fig. 1 Theoretical model to be tested

ideas. The psychological well-being of the individual showed a complete mediating effect between perceived physical work environment and creative behavior. Since an earlier study by Amabile [27], a motivation attribution model (attributional model of motivation, [28]) has been utilized to understand the mechanism of creativity. The members of an organization with high psychological well-being attribute a failure to a short-term event or external factors. Thus, they are relatively less concerned about a failure and show a higher level of creativity. Schuldberg and Wright and Walton reported a positive relationship between physical work environment and creativity [29, 30]. In light of the above discussions, psychological well-being is believed to mediate between physical work environment and creative behavior, with the former influencing the latter. This leads to our next hypothesis below. Hypothesis 3: Psychological well-being serves as a medium in a positive relationship between physical work environment and creative behavior. Through the above description, the model and hypothesis of this study can be illustrated as follows (Fig. 1).

3 Methodology 3.1 Sample and Data Collection A survey was conducted on the employees of Korean IT companies. A website was created on Google to ask questions and collect data, and received a total of 98 responses. Excluding poor data from seven respondents, we analyzed answers from 91 respondents. When broken down by demographic characteristics, 64 respondents were male (70.3%) and 27 were female (29.7%). Of the 91 respondents, 7.7% (7) had earned a college certificate degree, 73.6% (67) earned a bachelor’s degree, and 18.7% (17) earned a master’s degree graduates. As for the length of service for an IT company, those who have five to less than ten years of experience were the largest group (29 persons, 31.9%), more than one year to less than three years of experience were 25 persons (27.5%), more than ten years to less than 20 years were 19 persons (20.9%), more than three years to less than five years were 14 persons (15.4%), and

152

J.-H. Zhang and J.-H. Lee

Table 1 Demographic characteristics of respondents Construct Gender Education

Frequency

Percent (%)

Male

64

70.3

Female

27

29.7

College certificate degree Bachelor’s degree

Work experience

7

7.7

67

73.6

Master’s degree

17

18.7

More than 1 year–Less than 3 years

25

27.5

More than 1 year–Less than 3 years

14

15.4

More than 1 year–Less than 3 years

29

31.9

More than 1 year–Less than 3 years

19

20.9

4

4.4

More than 20 years–

more than twenty years were 4 persons (4.4%). The demographic characteristics of respondents are shown in Table 1.

3.2 Measurement Scales Independent variables, parameters, dependent variables and control variables were used in this study. All variables other than control variables adopted Likert 7 Scale (1 = totally disagree, 4 = neither agree nor disagree, 7 = totally agree). The respondents answered the questions in a self-report manner, which is considered befitting when ‘self’ is the most appropriate measure for the purposes of this study. Each variable was measured as follows. Physical Work Environment To measure physical work environment, 46 survey items developed by Min-Jeong Oh were employed [3]. We measured an individual’s subjective perception of the physical work environment, rather than the objective physical work environment. This is because it is difficult to standardize and integrate the objective approach in understanding the overall configuration of the physical work environment and the internal effect of an individual’s perception is significant in terms of the variables in this study. The survey items include ‘It is easy to change the structure inside the office,’ ‘Privacy is not protected in the office,’ ‘The ornaments and drawings are placed in the right position in the office,’ and ‘We have facilities and space for sports and exercise.’ Psychological Well-Being In this study, the psychological well-being focuses on the degree of confidence in the effective management of person’s own work environment. To measure psychological

Beyond the Hawthorne Research: Relationship …

153

well-being, 14 survey items from the Warwick-Edinburgh Mental Well-Being Scale (WEMWBS) were used [31]. They include ‘I have been feeling optimistic about the future,’ ‘I have been feeling interested in other people,’ ‘I have been able to make up my own mind about things,’ and ‘I have been feeling confident.’ Creative Behavior A creative behavior means an attempt to provide novel, innovative, and useful ideas, products and services [11, 32]. Reflecting this definition, we adopted 13 survey items used by Zhou and George to measure creative behavior [11]. They include ‘I try to find out new technologies, processes, techniques, and/or product ideas,’ ‘I come up with new and practical ideas to improve performance,’ ‘I develop adequate plans and schedules for the implementation of new idea’, and ‘I promote and champion ideas to others’. Control Variables Demographical variables were controlled in this study, so as to clarify the relationship between the variables of major interest. To be specific, respondents’ gender, educational background, and the length of service for an IT company were used as control variables after review of the previous studies and characteristics of the samples.

3.3 Analysis Method SPSS 23.0 was employed to analyze the data. For reliability test, Cronbach’s ‘a’ value was checked for each variable. The results were confirmed by factor analysis. We adopted a simple regression analysis to confirm main effects and hierarchical regression analysis by Baron and Kenny to confirm mediating effects [33].

4 Results 4.1 Validity and Reliability of Measurement Tools and Correlation Between Variables To verify the validity of the variables, an exploratory factor analysis was conducted through varimax rotation. The factor analysis was conducted taking eigenvalues of 1 or more and factor loadings of .50 or more as the criterion of significance [34]. After excluding the 10 questions related to physical work environment—the independent variable—the exploratory factor analysis was reconducted, which showed that even the factor loadings of the items measuring each factor were over .50, exceeding the criterion of significance. All other variables showed an eigenvalue of 1 or more. This

154

J.-H. Zhang and J.-H. Lee

Table 2 Means, standard deviations, correlations, and reliability coefficients for variables Variables

Mean

S.D.

1

1. Gender

1.30

.45



2

3

4

5

2. Education

2.11

.50

−.094

3. W.E

2.59

1.22

4. P.W.E

3.83

.95

.048

.280**

.021

5. P.W.

4.60

1.15

−.072

.327**

.037

.681** (.79)

6. C.B.

4.55

1.10

−.086

.174

.122

.536**

6



−.298** .001

– (.86) .761** (.81)

W.E. work experience (The length of service for an IT company) P.W.E. physical work environment P.W. psychological well-being C.B. creative behavior *n = 91; *p < .05(*); **p < .01; ***p < .001 Internal consistency reliabilities appear in parentheses along the diagonal Gender: 1 = male, 2 = female Education: 1 = college certificate degree, 2 = bachelor’s degree, 3 = master’s degree, 4 = other 1 = 1–3 years, 2 = 3–5 years, 3 = 5–10 years, 4 = 10–20 years, 5 = 20 and more years

indicates that that there was no problem with the validity of the questionnaire used to measure the variables included in the research model. The variables used in this study showed a high level of reliability: physical work environment (.86), psychological well-being (.79), and creative behavior (.81) [35]. If we look at the correlation between variables, physical work environment had a significant positive correlation with psychological well-being and creative behavior (r = .681, p < .01; r = .536, p < .01). This is consistent with the study’s hypothesis suggesting that there is a positive relation between physical work environment and psychological well-being and creative behavior. There was also a significant positive correlation between psychological well-being and creative behavior (r = .761, p < .01). The means, standard deviations, reliability coefficient and correlation between the variables included in the research model and the descriptive statistics are shown in Table 2.

4.2 Verification of Hypotheses A regression analysis was conducted to examine the positive relation between the physical work environment and creative behavior and the mediating effect of psychological well-being based on the survey results of workers in the IT industry in Korea. The analysis results are shown in Table 3. First, hypothesis 1 suggests that there is a positive correlation between physical work environment and creative behavior. In model 4 of Table 3, the simple regression analysis shows a positive relation between physical work environment and creative behavior (β = .534, p < .001), supporting this hypothesis. Hypothesis 2 suggests that there is a positive correlation between physical work environment and psychological

Beyond the Hawthorne Research: Relationship …

155

Table 3 Regression analysis Construct Control variables

P.W.

C.B.

Model 1

Model 2

Model 3

Model 4

Model 5

Gender

−.034

−.091

−.037

−.084

−.015

Education

.324

.137

.170

.016

.089

W.E.

.027

−.003

.111

.086

−.087

.534***

.045

Independent variable

P.W.E.

Mediator variable

P.W.

.647***

.755***

R-squared

.109

.492

.046

.307

 R-squared

.079

.469

.013

.274

.596 .572

F

3.559

20.848*** 1.407

9.508***

25.046***

*n = 91; p < .05(*); **p < .01; ***p < .001

well-being. In model 2, the simple regression analysis shows that the higher the physical work environment, the higher the psychological well-being of employees (β = .647, p < .001), once again supporting the study’s hypothesis. Hypothesis 3 supposes the mediating effect of psychological well-being in the relationship between the physical work environment and creative behavior. According to Baron and Kenny [33], the mediating effect can be confirmed through three steps. First, the independent variable should have a significant effect on the dependent variable without taking into account the mediator variable. This has been confirmed as hypothesis 1 was verified showing that physical work environment has a significant effect on creative behavior. Second, the independent variable should have a significant effect on the mediator variable. This has also been confirmed since hypothesis 2 was verified showing that the higher the physical work environment score, the higher the psychological well-being of employees. Third, there should be a significant relationship between the mediator variable and the dependent variable when the independent variable is controlled. In this case, when the significance of the effect of the independent variable on the dependent variable disappears, it becomes a full mediation, whereas when the effect of the diminished independent variable is still significant, it becomes a partial mediation. As can be seen from model 5 of Table 3, there was a significant positive correlation between psychological well-being and creative behavior (β = .755, p < .001) when the independent variable—the physical work environment—was controlled, and the significance of the physical work environment disappeared, showing a complete mediating effect. In the Sobel test, the indirect effects of physical work environment through psychological well-being were also found to be significant (z = 4.640, p < .001). These findings support hypothesis 3, which supposes the mediating effect of psychological well-being in the relationship between the physical work environment and creative behavior of employees.

156

J.-H. Zhang and J.-H. Lee

5 Conclusion The purpose of this study was to investigate the effect of physical work environment on psychological well-being and creative behavior and the mediating effect of psychological well-being. The results and meaning of the study are as follows. First, the study has demonstrated that perceived physical work environment has a significant effect on creative behavior and that psychological well-being as a psychological factor. Based on this conclusion, the more favorable the perception of the physical work environment, the higher the psychological well-being and creative behavior of employees. At the beginning, many studies have believed that creativity is natural of individuals. But the results of our research showed that some small changes in everyday life can also improve creativity, and the physical work environment is also the same. Therefore, in order to improve the individuals and organizational performance or creativity, what is the best working environment for employees? It will become an inevitable and necessary subject for the organization. Second, the psychological well-being of the individual showed a complete mediating effect between perceived physical work environment and creative behavior. According to this conclusion proved that physical work environment effects the individual’s psychology, and increase creative behavior. Thus, in order to improve the creativity of the individual, need to think about how to change the psychological factors. To improve the psychological well-being, organization can consider changes in the organization’s policy or leadership, the working or job environment and conditions, and some individual factors. These findings expand the context of previous studies on physical work environment by linking it with psychological factors and creative behavior, a performance variable, as well as provide meaningful implications. On the meaning of this study, first, the study revealed a close relationship between physical work environment and psychological factors—something that has not been addressed much in studies since the Hawthorne research—especially psychological well-being as positive psychological capital, a concept that has recently been gaining attention. In addition, physical work environment was shown to have a clear effect on creativity (creative behavior), which has recently emerged as an important performance variable, whereas during the days of the Hawthorne research when manufacturing formed the basis of the economy, productivity was the main performance variable. Physical work environment not only had a direct effect on creative behavior, but also an indirect effect, improving creative behavior through the mediating effect of psychological well-being. Second, IT is a leading industry in the 21st century, and for IT company employees, creativity is an important asset as their work involves solving problems of existing systems and programs or developing new systems and programs. Therefore, the fact that physical work environment and psychological well-being have a significant impact means that the management of these factors is important in maximizing the creativity of employees. Especially since, due to the nature of the job, many people working in this field tend to be introverted, the physical work environment may have

Beyond the Hawthorne Research: Relationship …

157

the significance and effect of social support. On the other side, in the context of the fourth industrial revolution, artificial intelligence has become the main driving force for the development of technology, and creativity plays an important role in technology. In the future, it’s not necessarily human to work with us, and robots or artificial intelligence machines are very possible. Robots and artificial intelligence machines as a tool for work or job, are include in the physical work environment, thus, what kind of working environment will also become an inescapable theme of the organization. Lastly, the limitations of this study and suggestions for future research are as follows. First, the subjects of the study were IT company employees and so it may be difficult to generalize the study results to other industries. Thus, it is necessary to study the other industries. Second, the study only considered the subjective physical work environment. Existing studies on physical work environment have mainly examined the objective work environment or the subjective work environment separately. However, strictly speaking, the two can have a different meaning, the objective work environment being the actual work environment as reported by the organization or the management while the subjective work environment is what is perceived by the employees. Therefore, it may be necessary to conduct future studies connecting the two. Third, in this study, researches on physical work environment use cross sectional research. In the future, it will be necessary to use longitudinal research, to examine the improving the working environment of this study, what’s the change of the individual. Therefore, it is possible to further confirm the importance of the physical work environment. Finally, the study only considered psychological well-being as a psychological factor that mediates the relationship between physical work environment and creative behavior. In the future, it will be necessary to examine the effects of other psychological factors in evaluating the physical work environment.

References 1. Becker, F. D., & Steele, F. (1995). Workplace by design: Mapping the high-performance workspace. San Francisco: Jossey-Bass. 2. Harris, E. M. (1978). Work, places: The psychology of the physical environment in offices and factories, 1986. London: Cambridge University Press. 3. Oh, M. J. (2003). The relationship of physical environment on psychological states and organizational and job attitudes. A thesis for the degree of Master of Arts/ Psychology, graduate school, the catholic university of Korea Seoul, Korea. 4. Kim, H. J., & Seol, H. D. (2016). The effect of trust, conflict, and knowledge sharing on individual creativity. Korean Journal of Business Administration, 29(5). 5. Zhou, J., & Shalley, C. E. (2008). Expanding the scope and impact of organizational creativity research. In Handbook of organizational creativity (vol. 28, pp. 125–147) 6. Hennessey, B. A., & Amabile, T. M. (2010). Creativity. Annual Review of Psychology, 61, 569–598.

158

J.-H. Zhang and J.-H. Lee

7. Woodman, R. W., & Schoenfeldt, L. F. (1990). An interactionist model of creative behavior. Journal of Creative Behavior, 24(4), 279–290. 8. Amabile, T. M., Conti, R., Coon, T., Lazenby, F., & Herron, M. (1996). Assessing the work environment for creativity. Academy of Management Journal, 39, 1154–1184. 9. Shalley, C. E. (1991). Effect of productivity goals, creativity goals, and personal discretion on individual creativity. Journal of Applied Psychology, 76(2), 179–185. 10. Woodman, R. W., Sawyer, J. E., & Griffin, R. W. (1993). Toward a theory of organizational creativity. Academy of Management Review, 18, 293–332. 11. Zhou, J., & George, J. M. (2001). When job dissatisfaction leads to creativity: Encouraging the expression of voice. Academy of Management Journal, 44(4), 682–696. 12. Amabile, T. M., Burnside, R. M., & Gryskiewicz, S. S. (1999). User’s manual for KEYS: Assessing the climate for creativity. A survey from the Center for creative leadership. Center for Creative Leadership, Greensboro, NC. 13. McCoy, J. M., & Evans, G. W. (2002). The potential role of the physical environment in fostering creativity. Creativity Research Journal, 14(3), 409–426. 14. Oksanen, K., & Ståhle, P. (2013). Physical environment as a source for innovation: Investigating the attributes of innovative space[J]. Journal of Knowledge Management, 17(6), 815–827. 15. Ward, L. M. (1981). Introduction to organizational behavior. Scott, IL: Gleview, Foresman and Company. 16. Diener, E. (2000). Subjective well-being: The science of happiness and a proposal for a national index. American Psychologist, 55, 34–43. 17. Van Dierendonck, D., Diaz, D., Rodriguez-Carvajal, R., Blanco, A., & Moreno-Jimenez, B. (2007). Ryff’s six-factor model of psychological well-being: A Spanish exploration. Social Indicators Research, 87(3), 473–479. 18. Seligman, M. E. P. (2002). Authentic happiness. New York, NY: Free Press. 19. Kim, C. Y., & Park, W. W. (2017). Psychological well-being in workplace: A review and meta-analysis. Korean Academy of Management, 21(2), 15–76. 20. Harter, J. K., Schmidt, F. L., & Keyes, C. L. M. (2002). Wellbeing the workplace and its relationship to business outcomes: A review of the Gallup studies. 21. Kahn, W. A. (1990). Psychological conditions of personal engagement and disengagement at work. Academy of Management Journal, 33(4), 692–724. 22. Danna, K., & Griffin, R. W. (1999). Health and well-being in the workplace: A review and synthesis of the literature. Journal of Management, 25(3), 357–384. 23. Wright, T. A., & Hobfoll, S. E. (2004). Commitment, psychological well-being and job performance: An examination of conservation of resources (COR) theory ad job burnout. Journal of Business and Management, 9(4), 389–406. 24. Fisher, C. D. (2010). Happiness at work. International Journal of Management Reviews, 12(4), 384–412. 25. Ryff, C. D., & Keyes, C. L. M. (1995). The structure of psychological well-being revisited. Journal of Personality and Social Psychology, 69(4), 719–727. 26. Baron, R. A. (1994). The physical environment of work settings: Effects on task performance, interpersonal relations, and job satisfaction. Research in Organizational Behavior, 16, 1–46. 27. Amabile, T. M. (1983). The social psychology of creativity. New York: Springer-Verlag. 28. Weiner, B. (1985). An attributional theory of achievement motivation and emotion. Psychological Review, 92(4), 548–573. 29. Schuldberg, D. (1999). Chaos theory and creativity. In M. Runco & S. Pritzker (Eds.), Encyclopedia of creativity (Vol. 1, pp. 259–272). New York, NY: Wiley. 30. Wright, T. A., & Walton, A. P. (2003). Affect, psychological well-being and creativity: Results of a field study. Journal of Business and Management, 9(1), 21–32. 31. Tennant, R., Hiller, L., Fishwick, R., Platt, S., Joseph, S., Weich, S., et al. (2007). The WarwickEdinburgh mental well-being scale (WEMWBS): Development and UK validation. Health Qual Life Outcomes, 5(1), 63–75. 32. Csikszentmihalyi, M. (1996). Creativity: Flow and psychology of discovery and invention. New York: Harper Perennial.

Beyond the Hawthorne Research: Relationship …

159

33. Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. 34. Hair, J. F., Jr., Black, W. C., Babin, B. J., Anderson, R. E., & Tatham, R. L. (2006). Multivariate data analysis (6th ed.). Upper Saddle River, NJ: Pearson Education, Inc. 35. Nunnally, J. (1978). Psychometric methods.

Structural Relationship Data Analysis Between Relational Variables and Benefit Sharing: Moderating Effect of Transaction-Specific Investment Hae-Soo Pyun

Abstract As information exchange becomes more important in business-tobusiness relationships, it is necessary to analyze and manage various data and information about trading partners in depth. Recently, the characteristics of the competition have changed. There is a tendency to shift from competition between companies to competition between networks. Particularly, mutual cooperation and long-term relationship between suppliers-buyers are essential in the automobile industry, so the partnership relationship between companies is more important. Importance of strategic customers is increasing in the relationship between seller and buyer. However, theoretical and empirical studies are relatively inadequate in comparison with practical importance. Relational variables are becoming increasingly important in supplier-buyer transaction relationships. Especially, relational commitment and relational norms have a profound effect on cooperation activities such as benefit sharing and performance among companies. Although the importance of relationships with strategic customers is increasing, research on the effects of relationship characteristics with strategic customers on benefit sharing is relatively insufficient. Therefore, this research aims to analyze the structural relationship data analysis between relational variables and benefit sharing in the automobile industry, and to analyze the moderating effect of transaction-specific investment, and to suggest the theoretical and practical implications. For this purpose, data were collected for suppliers of Korea’s automobile parts. Based on the collected data, reliability analysis, validity analysis, correlation analysis and regression analysis were conducted. As a result of analysis, three hypotheses were supported. Finally, implications of the research are presented, and limitations and directions for future research are described. Keywords Structural relationship data analysis · Relational commitment · Relational norms · Transaction-specific investment · Benefit sharing · Strategic account

H.-S. Pyun (B) Department of Business Administration, Namseoul University, 91 Daehak-ro, Cheonan-Si, South Korea e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_10

161

162

H.-S. Pyun

1 Introduction As information exchange becomes more important in business-to-business relationships, it is necessary to analyze and manage various data and information about trading partners in depth. In recent years, the characteristics of the competition have changed as the business environment of the company changes. In other words, there is a tendency to shift from competition between companies to competition between networks. Therefore, the relationship between companies does not stay in a simple supplier-purchaser relationship but develops into a mutually cooperative relationship. Particularly, mutual cooperation and long-term relationship between parts suppliers and purchasing companies are essential in the automobile industry, so the partnership relationship between companies is more important. Moreover, maintaining and developing relationships with strategic customers with large sales or profits among a large number of customers has a significant impact on corporate performance. Given this importance, maintaining and strengthening relationships with strategic customers from suppliers is a very important research topic. There has been an increasing interest in relational exchange, which emphasizes the various benefits of long-term relationships between suppliers and buyers. And the researchers are examining the relational variables, which are detailed dimensions of business relations. The effects of these factors on various corporate performances are needed in various aspects [1, 2]. Strategic customer relationship management can be divided into three stages: effectiveness, market performance, and profitability [3]. Generally, effectiveness means the extent to which an organization’s objectives are achieved [4]. The effectiveness of strategic customer relationship management means that suppliers achieve good performance in relation to strategic customers versus general customers. Market performance means the extent to which market-related goals such as favorable market growth rates, market share, and customer satisfaction are achieved. Strategic customer relationship management effectiveness and market performance eventually result in profitability [3]. In the private and public sector, various joint growth programs are operated to achieve corporate profitability and reduce unfair trade practices. One of the most prominent growth programs is benefit sharing. Benefit sharing activities include cost reduction, quality improvement, productivity improvement, management innovation, human resource development, localization of parts, co-marketing, joint overseas contracting, new technology development, process development, new product development etc. Companies share the results of benefit sharing activities such as cash compensation, unit price compensation, long-term contract, volume expansion, prototype purchase compensation, joint patents, and sales revenue sharing. Relationship variables such as relational commitment and relational norms are becoming increasingly important in supplier-buyer transaction relationships. Relational commitment refers to the expectation of maintaining a business relationship with other companies [2], and the relational norms is a complex concept that includes cohesion, mutuality, flexibility, and information exchange. Relational commitment

Structural Relationship Data Analysis Between Relational …

163

and relational norms have a profound effect on cooperation activities such as benefit sharing and various performances among companies. Although the importance of relationships with strategic customers is increasing, research on the effects of relationship characteristics with strategic customers on benefit sharing is relatively insufficient. Therefore, this research aims to analyze the structural relationship data analysis between relational variables and benefit sharing in the automobile, and to analyze the moderating effect of transaction-specific investment, and to suggest the theoretical and practical implications. The composition of this study is as follows. First, we review existing theories about strategic customers, relational variables and benefit sharing. Based on this theoretical background, four hypotheses were drawn. In order to verify the hypothesis, we surveyed the suppliers of Korea’s automobile parts. Reliability analysis, validity analysis, correlation analysis, and regression analysis were performed on the collected questionnaires. Finally, the theoretical and empirical implications of the research results are presented, and the limitations of the research and future research directions are presented.

2 Theory and Method 2.1 Theory The relationship between sellers and buyers has been studied for a long time. Sellers are increasingly dealing with strategically important companies as they conduct transactions with a variety of purchasing companies. The sale of products to strategically important customers can be explained on the basis of the “Pareto principle”. This principle explains the fact that a significant portion of business results, such as sales and profits, come from a small number of important customers or products. Managing such strategically important clients has received considerable attention from researchers and managers [5]. Strategic customer management has been partly attempted to conceptualize, and strategic customer management is conceptualized by integrating various existing studies. In particular, it provides a conceptual framework of strategic customer management by taking into account various aspects such as purpose, process, structure, resource, and evaluation in terms of forming a dedicated team for strategic customers [5]. There have been various names that refer to major customers of sales companies. These various names do not all have the same meaning. In the existing research, strategic customers, major customers, key customers, large customers, and global customers are categorized according to the direction, background, and perspective of research. The term is slightly different, but it is often seen as referring to a customer who is important to a company in many ways, such as sales or profit [3, 5–10].

164

H.-S. Pyun

Several attempts have been made to analyze existing research on strategic customer management. Most researches present and demonstrate theories and variables that can explain specific research subjects and situations. Among these studies, researches have been conducted to present and prove a new analysis framework to analyze the existing studies comprehensively. The most representative studies include research that summarizes approaches to strategic customer management [7], research that provides a comprehensive framework for strategic customer management [11], and research that analyze the differences of perception among different research subjects [12]. Many researchers summarized the approach to strategic customer management in three broad categories: sales approach, relationship marketing approach, and supply chain management approach. In addition, they analyze the study of existing strategic customer management by studying strategic customer manager, studying strategic customer relationship. Based on the existing research, they present and demonstrate the analytical framework for strategic customer management in four dimensions: activity, subject, resource, and formalization. And they present several areas of difference in perceptions between researchers and practitioners on strategic customer management and attempt to evaluate relationships with strategic customers from the perspective of the customer, not the supplier [7, 11–13]. For the first time in the world, Toyota Motors has introduced a benefit-sharing system to motivate innovation and enhance competitiveness based on mutual trust between supplies and buyers. They devised a way to mutually share innovation performance in order to systematically discover and promote various process improvement ideas from suppliers. In addition, they have made a task force for the systematic management of proposals, and have made a proposal evaluation committee to ensure the fairness of performance measurement and the prompt processing of proposals. Toyota’s benefit-sharing activities enabled them to become a world-class automobile company and contributed to the growth of Japan as a global economic powerhouse. Since then, global automobile companies have been pushing for a benefit sharing system. It is the relationship characteristic that has a decisive impact on the performance creation in the relationship between supplier and buyer [8, 10]. In particular, when supplier manage strategic customer relationships, a variety of performance are created [13]. Benefit sharing activities play an important role in creating performance in relationship with the strategic customers. Benefit sharing activities include cost reduction, quality improvement, productivity improvement, management innovation, human resource development, localization of parts, co-marketing, joint overseas contracting, new technology development, new process development, new product development. Suppliers-buyers share the results of benefit sharing activities such as cash compensation, unit price compensation, long-term contract, volume expansion, prototype purchase compensation, joint patents, and sales revenue sharing. Relational commitment is the willingness and expectation to maintain the relationship with the trading partner [2] and can be defined as the sacrifice to the other company [14]. In addition, relational commitment means that, in addition to assessing current customers’ benefits and costs, they are more likely to go beyond assessing

Structural Relationship Data Analysis Between Relational …

165

their counterparties and to be confident of sustained relationships with short-term losses to develop into a stable relationship [15]. Relational commitment is defined by the organization, strategy, job, person, etc. [1]. Prior research classified the relational commitment into the persistent, normative and emotional commitment, and also classified into attitude, continuous, economic, and social commitment. Relational commitment has been studied as an essential factor in maintaining long-term relationships, mainly in social psychological organizational behavior and marketing. Relational commitment encourages corporations to engage in long-term, mutually beneficial relationship building and development rather than pursuing shortterm, one-sided interests. In particular, relational commitment with strategic customers has a positive impact not only on relational performance but also on economic performance Thus, relational commitment positively affects benefit sharing with strategic customers. A transaction-specific investment is one that is difficult to relocate if it moves away from an existing transaction relationship to a new relationship. So, the transaction-specific investment will moderate the relational commitment and benefit sharing. H1-1: The relational commitment will have a positive impact on benefit sharing. H1-2: The transaction-specific investment will moderate the relational commitment and benefit sharing. Relational norms have become central concepts in various fields of social science such as social psychology [16], political science [17], law [18] and economics [19]. The concept of norms is introduced in various research fields and literature, but shares basic meanings [20]. Norms can be viewed from various perspectives. First, norms can be applied to different levels of society, specific industries, individual firms, groups, and so on. Second, norms can be divided into discrete exchange rules and relational exchange rules according to the degree of pursuing collective goals. Third, norms include multidimensional meanings [20]. Based on a variety of existing studies, the relational norm is defined as the rules of behavior accepted by trading partners or expectations of partially shared behavior among decision makers [16, 21]. Relational norms are multidimensional concepts and consist of several subconcepts. It is composed of cohesion, role preservation, reciprocity, flexibility, information exchange. Flexibility implies mutual expectations of willingness to adapt to changes in the environment [20]. It also recognizes changes in the environment and adjusts the trading conditions accordingly [22]. And information exchange means mutual expectation that transaction parties actively provide useful information to the other party. In addition, solidarity means mutual expectations that place great value on the relationship between trading partners. Thus, flexibility, information exchange, and cohesion have their own distinct meanings and elements, but they constitute a subdimension of the relational norm [23]. Thus, relational norms have a positive impact on benefit sharing with strategic customers. And, transaction-specific investment is one that is difficult to relocate if it moves away from an existing transaction relation-

166

H.-S. Pyun

ship to a new relationship. So, the transaction-specific investment will moderate the relational norms and benefit sharing. H2-1: The relational norms will have a positive impact on benefit sharing. H2-2: The transaction-specific investment will moderate the relational norms and benefit sharing.

2.2 Method In this research, the measurement items used in the previous studies were modified to this research situation and measured by the 5-point Likert scales (1 = “strongly disagree”, 5 = “strongly agree”). The relational commitment was modified according to this research situation based on the existing research and measured by 3 items (The relationship that our company has with the purchasing company is a worthwhile; We believe it is important to maintain long-term relationships with buyers; We want to keep our relationship with the buyer constantly). The relational norms was modified according to this research situation based on the existing research and measured by 3 items (Buyers offer convenience and cooperation as much as we strive; Buyers are flexible when dealing with us; When an unexpected situation occurs, the purchaser will respond to the situation rather than sticking to the regulations). The transaction-specific investment was modified according to this research situation based on the existing research and measured by 3 items (The purchaser invests in facility investment and employee training for our relationship with our company; Buyers do a lot of work for our company to develop; Buyers are doing good to improve our business). The benefit sharing was modified according to this research situation based on the existing research and measured by 4 items (If the costs are reduced by the joint activities with the purchasing company, the purchasing company distributes the profit to our company; The buyer compensates for the contribution of our company when the benefit comes from our suggestion; In principle, the patents created by our company and the purchasing company jointly are owned jointly by the two companies; Our company and the purchaser bear the risk arising from the cooperation process fairly). This research is limited to the automobile parts suppliers who produce and deliver automobile related parts in Korea’s automobile industry. Especially, data were collected only for 1st and 2nd vendors. We used survey methods such as face-to-face contact, telephone, and e-mail for data collection. A total of 196 samples were used for the final analysis. The characteristics of the survey subjects are as follows. First of all, 54.1% of the respondents were over the senior-level, and almost 50% of the employees were working within 10 years. Based on the collected data, reliability analysis, validity analysis, correlation analysis and regression analysis were conducted.

Structural Relationship Data Analysis Between Relational …

167

Table 1 Reliability analysis results

Alpha coefficient

Relational commitment

Relational norms

Transactionspecific investment

Benefit sharing

.867

.817

.913

.744

3 Results and Discussion 3.1 Research Results The data collected in this study were verified the reliability and validity of the measurement items based on the measurement validation process. This study first verified the reliability of the measurement tools and verified the validity of the measurement model through confirmatory factor analysis. This study used Cronbach’s alpha coefficient to verify reliability. The Cronbach’α coefficient is more than .70 in the preliminary study, suggesting .80 in the basic study and .90 in the applied study. And an exploratory study of .60 or more is acceptable. The results of the reliability test of each variable are shown in Table 1. As a result of the analysis, it can be said that the reliability of the measured variables was measured at a reliable level of more than .70. So, the reliability coefficient can be evaluated as having an high internal consistency. This study conducted EFA (Exploratory Factor Analysis) to confirm the presence of distinct factors. The results showed that all measurement items with eigenvalue > 1 criterion (see Tables 2 and 3). The results of the correlation analysis between the variables in the empirical analysis are shown in Table 4. In this study, it is found that relational commitment, relational norms, and transaction-specific investment show a significant correlation with benefit sharing. In this study, regression analysis was performed by selecting relational commitment, relational norms as independent variables, and benefit sharing as dependent variables. In the model summary, the R Square value is .244, which is 24.4% explanatory power (Table 5). The significance probability of the model in ANOVA table (is p < .000 level, which is a statistically significant model (see Table 6). In this study, regression analysis was conducted to verify hypotheses. Table 7 shows the results of regression analysis. As a result of the analysis, it can be seen that relational commitment and relational norms have a positive impact on benefit sharing. So Hypothesis 1-1 and 2-1 were all supported (see Table 7). This study analyzed the moderating effect of transaction-specific investment on the relationship between relational commitment and benefit sharing. As a result of the analysis, hypothesis 1-2 was not supported (see Table 8).

1.114

1.665

1.874

1.914

2.164

3.558

4.336

5.050

8.634

10.078

16.842

42.770

100.000

98.886

97.221

95.347

93.433

91.269

87.711

83.375

78.325

69.691

59.613

42.770

1.036

1.209

2.021

5.132

8.634

10.078

16.842

42.770

78.325

69.691

59.613

42.770

Cumulative %

1.831

2.292

2.452

2.824

15.258

19.101

20.432

23.534

% of variance

Rotate sums of Total

Notes Extraction Method: Principle Component Analysis, Rotation Method: Varimax with Kaiser Normalization

.134

.260

8

12

.427

7

.200

.520

6

11

.606

5

.230

1.036

4

.225

1.209

3

10

2.021

2

9

5.132

1

% of variance

Exact sums of squared loadings Total

Cumulative %

Total

% of variance

Initial Eigenvalues

Table 2 Total variance explained of exploratory factor analysis results

78.325

63.067

43.966

23.534

Cumulative %

168 H.-S. Pyun

Structural Relationship Data Analysis Between Relational …

169

Table 3 Rotated component matrix of exploratory factor analysis results Component 1

2

3

4

Relational Commitment 1

.134

.815

.183

.119

Relational Commitment 2

.077

.898

.102

.030

Relational Commitment 3

.026

.886

.124

.129

Relational Norms 1

.267

.203

.809

.191

Relational Norms 2

.244

.167

.807

.156

Relational Norms 3

.176

.126

.771

.032

Transaction-specific investment 1

.859

.093

.251

.134

Transaction-specific investment 2

.872

.148

.247

.168

Transaction-specific investment 3

.873

.125

.148

.170

Benefit sharing 1

.284

.080

.163

.877

Benefit sharing 2

.166

.197

.133

.898

Benefit sharing 3

.293

−.138

.377

.518

Notes Extraction Method: Principle Component Analysis, Rotation Method: Varimax with Kaiser Normalization Table 4 Correlations analysis results RC

Pearson correlation

RC

RN

TSI

BS

1

.357**

.252**

.290**

.000

.000

.000

Sig. (2-tailed) RN

TSI

BS

N

196

196

196

196

Pearson correlation

.357**

1

.529**

.477**

Sig. (2-tailed)

.000

.000

.000

N

196

196

196

196

Pearson correlation

.252**

.529**

1

.527**

Sig. (2-tailed)

.000

.000

.000

N

196

196

196

196

Pearson correlation

.290**

.477**

.527**

1

Sig. (2-tailed)

.000

.000

.000

N

196

196

196

196

**Correlation is significant at the 0.01 level (2-tailed) Table 5 Model summary Model

R

R2

Adjusted R2

Standard error of the estimate

1

.494a

.244

.236

.47645

a Predictors:

(Constant), RN, RC

170

H.-S. Pyun

Table 6 ANOVA results Model 1

df

Mean square

F

Sig.

Regression 14.122

Sum of squares

2

7.061

31.104

.000

Residual

43.813

193

.227

Total

57.935

195

Predictors: (Constant), RN, RC Dependent variable: BS Table 7 Regression analysis results Model

Unstandardized coefficients

Standardized coefficients

B

Std. error

Beta

1.146

.306

RC

.149

.073

RN

.411

.064

1

t

Sig.

3.739

.000

.137

2.047

.042

.428

6.384

.000

Unstandardized coefficients

Standardized coefficients

t

Sig.

B

Std. error

Beta

1.857

1.062

1.749

.082

.060

.259

.233

.816

Dependent variable: BS Table 8 Moderating effect analysis results (1) Model

1 RC

.056

TS

.188

.337

.259

.558

.578

RC*TS

.040

.081

.277

.488

.626

Dependent variable: BS

Also, this study analyzed the moderating effect of transaction-specific investment on the relationship between relational norms and benefit sharing. As a result of the analysis, hypothesis 2-2 was supported (see Table 9).

3.2 Discussion This research aimed to analyze the structural relationship data analysis between relational variables such as relational commitment and relational norms on benefit sharing in the automobile industry, and to analyze the moderating effect of transactionspecific investment. In this research, the measurement items used in the previous studies were modified to this research situation and measured. This research is lim-

Structural Relationship Data Analysis Between Relational …

171

Table 9 Moderating effect analysis results (2) Model

1

Unstandardized coefficients

Standardized coefficients

B

Beta

Std. error

t

Sig.

4.046

.000

(Constant) 3.264

.807

RN

−.267

.229

−.278

−1.169

.244

TS

−.335

.257

−.462

−1.304

.194

RN*TS

.168

.069

1.237

2.428

.016

Dependent variable: BS

ited to the automobile parts suppliers who produce and deliver automobile related parts in Korea’s automobile industry. Especially, data were collected only for 1st and 2nd vendors. We used survey methods such as face-to-face contact, telephone, and e-mail for data collection. A total of 196 samples were used for the final analysis. Based on the collected data, reliability analysis, validity analysis, correlation analysis and regression analysis were conducted. As a result of the analysis, it can be seen that relational commitment and relational norms have a positive impact on benefit sharing. These findings confirm and extend previous studies that relational commitment and relational norms have a positive impact on benefit sharing in the automotive industry. Also, this study analyzed the moderating effect of transaction-specific investment on the relationship between relational commitment and benefit sharing, and between relational norms and benefit sharing. As a result of the analysis, transaction-specific investment moderated the relationship between relational norms and benefit sharing. These findings confirm and extend previous studies that transaction-specific investment moderated the relationship between relational norms and benefit sharing in the automotive industry. The results of this study provide theoretical implications. Research on the antecedents affecting benefit sharing in previous researches was relatively insufficient. However, this study extends the study of the antecedents of benefit sharing by demonstrating the influence of relational commitment and relational norms on benefit sharing. In addition, the theoretical basis of the moderating effect of the transaction-specific investment is presented in that the transaction-specific investment acts the role of moderating the relationship between relational norms and benefit sharing. Also, this study provides practical implications. Suppliers and buyers need to manage transactions with a focus on relationship characteristics such as relational commitment and relational norms. It also suggests that benefit-sharing activities can become more active by increasing transaction-specific investments among trading companies. Limitations of this study and directions for future research are as follows. First, focusing on relational commitment and relational norms in the automobile industry has not taken comprehensive consideration of other factors of relationship characteristics. In future studies, it is necessary to carry out research including other factors of

172

H.-S. Pyun

relationship characteristics. Second, the relationship between suppliers and strategic customers is limited to the automobile industry. In order to generalize research, it is necessary to expand to various industries. Third, this study focused on vendor perspective, but it is necessary to complement the research results by simultaneously reflecting the buyer perspective. In the future, it is necessary to analyze both the supplier perspective and the strategic customer perspective at the same time. Finally, this study is conducting a cross-sectional study on the relationship between suppliers and strategic customers. It is necessary to fill in the gap of research by conducting longitudinal research in future studies. Acknowledgements Funding for this paper was provided by Namseoul university.

References 1. Gundlach, G. T., & Murphy, P. E. (1993). Ethical and legal foundations of relational marketing exchange. Journal of Marketing, 57, 35–46. 2. Morgan, R. M., & Hunt, S. D. (1994). The commitment trust theory of relationship marketing. Journal of Marketing, 58(3), 20–38. 3. Workman, J. P., Homburg, C., & Jensen, O. (2003). Intraorganizatonal determinants of ket account management effectiveness. Journal of the Academy of Marketing Science, 31(1), 3–21. 4. Ruekert, R. W., Walker, O. C., Jr., Orville, C., & Roering, K. J. (1995). The organization of marketing activities: A contingency theory of structure and performance. Journal of Marketing, 49, 13–25. 5. Bradford, K. D., Challagalla, G. N., Hunter, G. K., & Moncrief, III W. C. (2012) Strategic account management: conceptualizing, integrating, and extending the domain from fluid to dedicated accounts. Journal of Personal Selling & Sales Management, 32(1), 41–56. 6. Napolitano, Lisa. (1997). Customer-supplier partnering: A strategy whose time has come. Journal of Personal Selling and Sales Management, 17, 1–8. 7. McDonald, M. (2000). Key account management—A domain review. The Marketing Review, 1(1), 15–34. 8. Guenzi, P., Pardo, C., & Georges, L. (2007). Relational selling strategy and key account managers’ relational behaviors: An exploratory study. Industrial Marketing Management, 36, 121–133. 9. Jones, E., Richards, K. A., Halstead, D., & Fu, F. Q. (2009). Developing a strategic framework of key account performance. Journal of Strategic Marketing, 17(3–4), 221–235. 10. Guesalaga, Rodrigo. (2014). Top management involvement with key accounts: The concepts, its dimensions, ans strategic outcomes. Industrial Marketing Management, 43, 1146–1156. 11. Homburg, C., Workman, J. P., Jr., & Jensen, O. (2002). A configurational perspective on key account management. Journal of Marketing, 66(2), 38–60. 12. Guesalaga, R., & Johnston, W. (2010). What’s next in key account management research? Building the bridge between the academic literature and the practitioner’s priorities. Industrial Marketing Management, 39, 1063–1068. 13. Friend, S. B., & Johnson, J. S. (2014). Key account relationships: An exploratory inquiry of customer-based evaluations. Industrial Marketing Management, 43, 642–658. 14. Allen, N. J., & Meyer, J. P. (1990). The measurement and antecedents of affective, continuance, and normative commitment to the organization. Journal of Occupational Psychology, 63, 1–18. 15. Anderson, E., & Weitz, B. (1992). The use of pledges to build and sustain commitment in distribution channels. Journal of Marketing Research, 29, 18–34.

Structural Relationship Data Analysis Between Relational …

173

16. Thibaut, J. W., & Kelley, H. H. (1959). The social psychology of groups. New York: Wiley & Sons, Inc. and. 17. Axelrod, R. (1986). An evolutionary approach to norms. American Political Science Review, 80, 1095–1111. 18. Macneil, I. R. (1980). The new social contract, an inquiry into modern contractual relations. New Haven, CT: Yale University Press (1980). 19. Bendor, J., & Mookherjee, D. (1990). Norm, third party sanctions, and cooperations. Journal of Law, Economics & Organization, 6(Spring), 33–63. 20. Heide, J. B., & John, G. (1992). Do norms matter? Journal of Marketing, 56, 32–44. 21. Gibbs, J. P. (1981). Norms, deviance, and social control: Conceptual matters. New York: Elsevier. 22. Achrol, R. S. (1997). Changes in the theory of interorganizational relations in marketing: toward a network paradigm. Journal of the Academy of Marketing Science, 25(1), 56–71. 23. Noordewier, T., John, G. G., & Nevin, J. R. (1990). Performance outcome of purchasing arrangement in industrial buyer-vendor relationships. Journal of Marketing, 154(4), 80–93.

Fusion of Log-Mel Spectrogram and GLCM Feature in Acoustic Scene Classification Mie Mie Oo and Lwin Lwin Oo

Abstract Acoustic scene classification (ASC) is an important problem of computational auditory scene analysis. The proposed feature is extracted from the fusion of the Log-Mel Spectrogram (LMS) and the Gray Level Co-occurrence Matrix (GLCM) for the acoustic scene classification. LMS of the input audio file is calculated and then GLCM feature is extracted from LMS to detect the changes of audio signal in time and frequency domain. Multi-class Support Vector Machine (SVM) trains this feature in order to categorize the type of environment for audio input files. The main contribution of this paper is to extract the effective feature from the combination of signal processing approach and image processing approach. The purpose of this feature is to reduce computational time for classification. This system uses Detection and Classification of Acoustic Scenes and Events (DCASE 2016) challenges to show the robustness of the proposed feature. Keywords Log-Mel Spectrogram · Gray Level Co-occurrence Matrix · Acoustic Scene Classification · Support Vector Machine

1 Introduction Acoustic scene classification (ASC) is an essential task for recognition the audio come from unsuitable CCTV areas. In acoustic scene classification system, the label of the audio file originates from different environment such as library, car horn and station etc. ASC is a useful technique for continuously monitoring audio devices in the nearby situation. ASC system implements specific responsibilities without involving people. Log-Mel Spectrogram features are extracted from the input audio file. The audio clip input is pre-processed with a full sampling frequency of 44,100 Hz. After M. M. Oo (B) · L. L. Oo University of Computer Studies, Mandalay (UCSM), Mandalay, Myanmar e-mail: [email protected] L. L. Oo e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_11

175

176

M. M. Oo and L. L. Oo

getting LMS, the Gray Level Co-occurrence Matrix (GLCM) is extracted from LMS and then statistics are calculated from the GLCM. Detection and Classification of Acoustic Scenes and Events (DCASE 2016) dataset is used to present the properties of the proposed feature. This system calculates the classification accuracy in the structure of k-fold cross validation with and without Principal Component Analysis (PCA) feature selection methods. For k-fold cross validation, the system performs classifier evaluation in the structure of twofold, fivefold, and tenfold cross validation. The classification accuracy of the proposed feature has reached an acceptable condition. All acoustic scenes are correctly classified depending on conditions. Audio in some scenes is incorrectly labeled with some conditions. This paper consists of five sections. The introduction and related works are presented in Sects. 1 and 2. Proposed methodology is presented in Sect. 3. Experimental results, discussion and conclusion are presented in Sects. 4, 5 and 6 respectively.

2 Related Works Acoustic scene classification (ASC) is one of the applicable systems for many applications such as elder people take caring system and security system for CCTV unsuitable areas such as bed room and wash room. Most security systems need to acquire voice information and automatically recognize what will happen in these inappropriate areas of CCTV. The DCASE dataset still has many challenges and problems for acoustic scene classification. MFCC is one of the best features and Convolutional Neural Network is one of the best classification methods to solve challenges and problems of DCASE dataset but it has long classification time. Most of the methods focused on signal processing methods and its feature. Our proposed feature focused on the combination of digital signal processing and digital image processing method. Acoustic Scene Classification (ASC) is an automatic understanding of the recognition of audio contexts through online streaming or from records. A context or scene is a concept commonly used by humans to identify the possibility of background noise and sound events associated with a specific audio scenario, such as residential area or park. Han, Y. and Lee, K. proposed total eleven-algorithms to solve challenge. The static and multi-width frequency-delta (MWFD) data augmentation features use an input network, which significantly improves the accuracy of the classification and reduces the error rate [4]. Waldekar, S. and Saha, G. proposed spectral and temporal features for acoustic scenes classification. Non-overlap block transform coefficients and sub-band centroid frequency coefficients are used in ASC tasks to exploit specific spectral information of the audio signal in a scene. Short-Term (ST) time-frequency and Constant-Q Cepstral Coefficients features and discriminative classifiers are used to boost the ability to criminate among different types of sounds [11]. In audio sound classification for medical surveillance, scattering features and support vector machine are used to classify the sounds from indoor. Hence ASC is

Fusion of Log-Mel Spectrogram and GLCM …

177

important for monitoring, security applications and sounds produced in home, business and outside environments. Souli, S. and Lachiri, Z. proposed the feature extraction approach for environmental sounds classification based on scattering transform and the principal component analysis. The Gaussian kernel is used in SVM classifiers to classify dataset to solve the problem of splitting high dimensional data [9]. Salamon, J. and Bello, J. P. proposed discriminative spectro-temporal patterns and deep convolutional neural networks in ordered to classify the environmental sound. In experiment, four different audio data augmentations are used that are resulting in five augmentation sets. Before changing it into the input representation used to train the network, each augmentation is applied directly to the audio signal. The augmentations and resulting augmentation sets are Time Stretching, Pitch Shifting, Pitch Shifting, Background Noise and Dynamic Range Compression. Combination of a deep, high-capacity model and an augmented training set proofed that the improved the classification accuracy [8]. Jleed, H. and Bouchard, M. proposed discrete hartley transformation used as spectral features of the ASC task. The spectral features including centroid, spectral flux, entropy and temporal sparsely are used and Hidden Markov Model classifier is also used for classification. The databases DCASE 2013 and DCASE 2016 are used for acoustic environment classification. The experimental results showed that the approach presented could boot classification accuracy. In ASC, FFT calculations are replaced by discrete hartley transform calculations to improve performance and reduce storage space [5]. Matrix factorization techniques utilizing Principal Component Analysis (PCA) and Nonnegative Matrix Factorization (NMF) have been sent out with different variants and tuning methodologies to improve the classification accuracy. Bisot, V., Serizel, R., Essid, S., and Richard, G., demonstrated this framework that a general portrayal of the information can be learned through lattice factorization, which has the benefit of consequently adjusting to the current information. A nonnegative assignment driven word reference learning was positioned among the best performing framework in the DCASE 2016 dataset [2]. Classification of acoustic scenes combined spectral and temporal features. Abidin, S., Togneri, R., and Sohel, F., presented Variable-Q transform (VQT) provides a higher resolution control over the constant-Q transform (CQT) to capture the best audio information. The system adopts a variant of local binary pattern (LBP), and the adjacent evaluation completes LBP (AECLBP), which is more appropriate for extracting features from time-frequency images [1].

3 Proposed Methodology Acoustic Scene classification is one of the challenging tasks in digital signal processing. The main steps of Acoustic Scene Classification are: Audio pre-processing, feature extraction and classification. In pre-processing steps, the input audio file is sampling and windowing to get the modified audio signal. In feature extraction, the

178

M. M. Oo and L. L. Oo

Audio Signal

Pre-processing

Feature Extraction

Classification

Label Fig. 1 Overview of acoustic scene classification

Log-Mel Spectrogram is extracted from pre-processed audio. And then Gray Level Co-occurrence Matrix (GLCM) are calculated according to the distance (d) and eight types of orientation angle values (θ = 0°, 45°, 90°, 135°, 180°, 225°, 270°, 315° and d = 1) from the Log-Mel Spectrogram. After that, 14 statistical values are calculated from the extracted eight kinds of GLCM. Therefore the length of the proposed feature is 112 (unit). The overview of the Acoustic Scene Classification is shown in Fig. 1.

3.1 Audio Preprocessing The average value of the left and right channels is calculated to convert the audio signal from stereo to mono. The sampling frequency 44,100 Hz is used with no down sampling because it seemed meaningful spectral characteristics observed in a high-frequency range from the visual inspection on the spectrogram. The size of the window used for Fast Fourier Transform conversion is 2048 samples with a jump size of 1024, approximately 40 and 20 ms.

3.2 Feature Extraction After performing the pre-processing of the audio signal, the proposed feature is extracted to represent information of the signal in the small length of the feature. In feature extraction, Log-Mel Spectrogram of the pre-processed audio signal is calculated for time frequency representation of audio signal. The changes of frequency according to the time domain are detected by constructing GLCM matrix for calculated Log-Mel Spectrogram. The 14 statistical values for these changes are calculated from the GLCM. The flow of proposed feature extraction is shown in Fig. 2.

Fusion of Log-Mel Spectrogram and GLCM …

179

Fig. 2 Flow of proposed feature extraction

3.2.1

Log-Mel Spectrogram

When the input audio signal pre-processing is finished, the feature representation step is performed. Log-Mel Spectrogram represents an acoustic time-frequency representation of a sound. In the calculation of Log-Mel Spectrogram, firstly Fast Fourier Transform is calculated over pre-processed audio signal. Fast Fourier Transform equation is: Si (p) =

N 

si (n)h(n) e−j2πpn /N p = 0, . . . , N − 1

(1)

n=1

where h(n) is N sample long analysis window, si (n) is time domain samples, Si (p) is frequency domain samples, N is Fast Fourier Transform size. The filter bank is used to map its spectral amplitude to the mel scale of the perceptual excitation, and the mel filter bank converts the spectrum to the mel spectrum. Mel-scale is based on the perception of human hearing frequencies [7]. Thus, the mel-scale is used to measure the tone of a subjective frequency or pitch. The mel-frequency scale is the following equation:   f r equency (2) mel( f r equency) = scale × ln 1 + 700

180

M. M. Oo and L. L. Oo

Input Audio Signal

Frame Blocking

Frame

Hamming Window

FFT Spectrum Log Mel Spectrogram Feature

Log

Mel Filter Bank

Fig. 3 Flow of Log-Mel spectrogram extraction

where mel(frequency) is the mel frequency for the linear frequency and scale is 1125. The filter bank energy is obtained after mel filtering. Finally, the logarithmic conversion of the mel energy is calculated and then the log mel spectrum is generated from the filter bank. The flow of Log-Mel spectrogram extraction is shown in Fig. 3.

3.2.2

Gray Level Co-occurrence Matrix (GLCM)

After getting the Log-Mel Spectrogram of the input audio signal, GLCM matrix is also calculated from the extracted LMS to detect the changes of the spectrogram. Grey Level Co-occurrence Matrices (GLCM) is one of the earliest methods for texture feature extraction proposed by Haralick et al. back in 1973. GLCM gives the texture feature of an image as a global feature. Texture is one of the important features used to detect the significant changes in an image. A GLCM of an image is created by calculating co-occurrence times of pixel pairs with respect to their gray level value. GLCM represents the intensity distributions and information about the relative positions of an image’s neighboring pixels. The local Haralick features: contrast, entropy, variance and correlation are achieved from normalized gray level co-occurrence matrices [6]. Since GLCM has been successfully applied for texture feature in many image applications and still famous for texture analysis in image processing and texture analysis such as texture image classification and clustering. Fourteen statistics values were extracted from the GLCMs to represent texture [10]. In this study, proposed system also used 14 statistics from the GLCM matrix with 8 different orientation angle values as shown in Fig. 4. In this figure, the 14 statistical values are: Contrast, Angular Second Moment (Energy), Correlation, Inverse Difference Moment (Homogeneity), Variance, Sum Average, Sum Variance, Sum Entropy, Difference Variance, Entropy, Difference

Fusion of Log-Mel Spectrogram and GLCM …

181

Fig. 4 Flow of GLCM and its statistics value extracted from Log-Mel spectrogram

Entropy, Maximal Correlation Coefficient Information, Measure of Correlation I, Information Measure of Correlation II. These 14 statistical values are calculated from the GLCM matrixes and then horizontally concatenated to form a proposed feature. The typical information of an audio signal can be characterized in a small number of dimensions since the length of our proposed feature is 112.

3.3 Support Vector Machine (SVM) The better choice of classifier is one of the dominant factors in Acoustic Scene Classification. Support Vector Machine (SVM) is one of the supervised machine learning algorithms which is widely used in classification. The method of supporting vector machines based on the Gaussian kernel is used to classify the datasets because of their ability to handle high-dimensional data [3]. The SVM based multi-class classification method appears to be very suitable for real-world recognition tasks. SVM also used in regression for some kind of application and problems but rarely used for regression.

182

M. M. Oo and L. L. Oo

There are two kind of SVM: binary class SVM and multi-class SVM. The main purpose of SVM is to find the separation line to classify the label of the data. The two ways to separate the data according to their labels are: hyper plane based approach and kernel based approach. According to the type of kernel, there are many type of SVM classifier such as Liner SVM, Quadratic SVM, Cubic SVM and Gaussian SVM. In this system, multi-class Quadratic SVM classifier is used for Acoustic Scene Classification. Quadratic SVM multi-class classifier is used to solve problem of Acoustic Scene classification in one single optimal plane.

3.4 DCASE 2016 Dataset The various acoustic scenes recordings are contained in DCASE dataset. The recordings are come from different environments. Actual length of the recording is three-five minute long but the original recordings were then divided into 30-second segments for the challenge. The DCASE 2016 dataset is the challenging dataset in ASC task. The 15 labels of the DCASE dataset are: • • • • • • • • • • • • • • •

Tram Bus City-center Lakeside-beach Café-Restaurant Park Car Metro-station Forest-path Residential-area Train Grocery-store Home Library Office

3.5 K-Fold Cross Validation In experiment, the evaluation of proposed feature is perform by measuring the average classification accuracy in the structure of k-fold cross validation. In k-fold cross validation, the validation is performed for k times. For each validation, the dataset is divided into k subsets, k-1 subsets are used for training and the remaining one is used for testing. The average classification accuracy is calculated over these k validations as shown in Fig. 5.

Fusion of Log-Mel Spectrogram and GLCM … Validation

183

Partition Training and Testing subsets

Accuracy

1

D1

D2

D3

D4

D5

Acc1

2

D1

D2

D3

D4

D5

Acc2

3

D1

D2

D3

D4

D5

Acc3

4

D1

D2

D3

D4

D5

Acc4

5

D1

D2

D3

D4

D5

Acc5

Average classification Accuracy= (Acc1+Acc2+Acc3+Acc4+Acc5)/5

Fig. 5 Visual representation for k-fold cross validation (k = 5) Table 1 Classification accuracy according to different classifiers on DCASE 2016 dataset

Classifier NAME

Classification accuracy (%)

LDA

72.3

KNN

67.8

SVM

72.4

Bold indicates SVM has highest classification accuracy

In this figure, the value of k is 5. In this Figure, the blue color represent for training dataset and the rest red color represent for testing dataset. The average classification accuracy is calculated over in the fivefold cross validation. D1, D2,…, D5 are the randomly divided subset of the dataset.

4 Experimental Results There are three parts in experimental results section, classifier selection, feature selection and k-fold cross validation. The experimental performance shows properties and advantages of proposed feature in DCASE 2016 dataset.

4.1 Classifier Selection Support Vector Machine (SVM), K-nearest neighbors (KNN) and Linear Discriminative Analysis (LDA) are used for classifier selection as shown in Table 1. In classifier selection, the kernel function of the SVM is linear function. Among these classifiers, SVM has the highest classification accuracy and then kernel selection for SVM classifier is performed.

184 Table 2 Classification accuracy according to different kernel functions of SVM classifier on DCASE 2016 dataset

M. M. Oo and L. L. Oo

Kernel function

Classification accuracy (%)

Linear

72.4

Quadratic

73.0

Cubic

70.5

Fine Gaussian

57.5

Medium Gaussian

59.2

Coarse Gaussian

44.5

Bold indicates Quadratic Kernael SVM has highest classification accuracy

The Liner, Quadratic, Cubic and Gaussian kernels are used as shown in Table 2. Among these kernels, Quadratic kernel has the highest classification accuracy. Since the Quadratic kernel has the highest classification accuracy, the quadratic SVM is used with the value of box constraint level is 1 and kernel scale is auto. For classifier selection and kernel selection experiment, 75% of dataset is used as training and 25% of dataset is used as testing and the feature length of the proposed feature is 112.

4.2 Feature Selection Using Principle Component Analysis (PCA) Algorithm The feature selection is performed by using Principle Component Analysis (PCA) algorithm. Principal component analysis is a quantitatively rigorous method for achieving simplification. A modified set of features are generated from PCA by performing linear combination of the original variables. All of the components in this modified set of feature are orthogonal to each other and no repeated feature. Hence, the principal components as a whole form an orthogonal basis for the space of the data. The classification accuracy according to the specified number of components is shown in Table 3. In this table, the number specified feature 38 gives the best classification accuracy on DCASE 2016 dataset. The feature length greater than 38 can’t increase the classification accuracy. Hence, the number of specified feature around 35–40 gives better results for Acoustic Scene Classification on (DCASE 2016) dataset. According to the classification results of Tables 2 and 3, the combination of PCA and Quadratic SVM leads to the better classification accuracy.

Fusion of Log-Mel Spectrogram and GLCM …

185

Table 3 Classification accuracy of SVM classifier with PCA on DCASE 2016 dataset according to different number of feature selection Number of feature

Classification accuracy (%)

Training time (s)

10

70.2

2.855

20

75.0

2.708

28

82.8

4.082

38

83.2

3.007

40

80.8

4.236

50

79.1

3.414

70

77.1

3.754

Bold indicates Number of Feature length 38 has highest classification accuracy Table 4 Classification accuracy according to different k-fold cross validation on DCASE 2016 dataset (k = 2, 5, 10) and quadratic SVM classifier with PCA K-Fold Cross

Classification accuracy (%)

Training time (s)

Prediction speed (obs/s)

2

74.8

4.308

3900

5

79.3

8.149

1900

10

81.5

19.315

730

Bold indicates 10-fold cross has highest classification accuracy

4.3 Experimental Results for K-Fold Cross Validation In this experiment, the Quadratic multi-class SVM classifier is used with two parameters. The two parameters are box constraint level and kernel scale. The values of box × constraint level and kernel scale are 4 and 1 respectively. For experimental results, the system performs twofold cross validations, fivefold cross validations and tenfold cross validations to prove that the proposed feature does not have data bias. The average classification accuracy training time and prediction speed of these three cross validations are calculated by using Quadratic SVM classifier with PCA (38 selected feature) as shown in Table 4. The confusion matrix of tenfold cross validation is presented in Fig. 6. According to Fig. 6, all of the labels have good classification accuracies except park and residential area. The park is misclassified as library and residential area and residential area is misclassified as park. According to Table 4, the average classification accuracy of twofold cross-validation is smallest and tenfold cross validation is largest. But the classification accuracies of these validations are acceptable.

186

M. M. Oo and L. L. Oo

Fig. 6 Confusion matrix for tenfold cross validation with all labels of DCASE 2016 dataset

5 Discussion This paper proposed a feature to represent the audio information in the small number of dimension by using digital image processing techniques and digital signal processing techniques. According to the experimental results, the combination of these two techniques had reasonable classification accuracy. Moreover, PCA increased the acoustic scene classification accuracy up to 83.2% as shown in Table 3. The number of training data can affect the acoustic scene classification accuracy due to classification results in Table 4. In Table 4, the average classification accuracy reached up to 81.5% because of tenfold cross validation.

6 Conclusion The fusion method of Log-Mel spectrogram, Gray level Co-occurrence Matrix (GLCM) is used to propose the feature for acoustic scene classification. The average classification accuracy reached 83.2% because of PCA feature selection. In k-fold cross-validation, the classification accuracies of the proposed feature are acceptable. By combining Log-Mel Spectrogram and GLCM, the proposed feature successfully

Fusion of Log-Mel Spectrogram and GLCM …

187

classified as labels for acoustic scenes except some conditions. Although the proposed feature has acceptable classification accuracy in acoustic scene classification, we need to try to get the higher classification accuracy by considering the other image and signal processing methods.

References 1. Abidin, S., Togneri, R., & Sohel, F. (2018). Spectrotemporal analysis using local binary pattern variants for acoustic scene classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(11), 2112–2121. 2. Bisot, V., Serizel, R., Essid, S., & Richard, G. (2017). Feature learning with matrix factorization applied to acoustic scene classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 16–29. 3. Ghodasara, V., Waldekar, S., Paul, D., & Saha, G. (2016). Acoustic scene classification using block based MFCC Features. In Detection and classification of acoustic scenes and events. 4. Han, Y., & Lee, K. (2016). Convolutional neural network with multiple-width frequency-delta data augmentation for acoustic scene classification. In IEEE AASP challenge on detection and classification of acoustic scenes and events. 5. Jleed, H., & Bouchard, M. (2017). Acoustic environment classification using discrete hartley transform features. In IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) (pp. 1–4). 6. Latha, Y. L. M., & Prasad, M. V. N. K. (2015). GLCM based texture features for palm print identification system. In Computational intelligence in data mining (Vol. 1, pp. 155–163). New Delhi: Springer. 7. Majeed, S. A., Husain, H., & Samad, S. A. (2015). Mel frequency Cepstral coefficients (MFCC) feature extraction enhancement in the application of speech recognition: A comparison study. Journal of Theoretical & Applied Information Technology, 79(1). 8. Salamon, J., & Bello, J. P. (2017). Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters, 24(3), 279–283. 9. Souli, S., & Lachiri, Z. (2018). Audio sounds classification using scattering features and support vectors machines for medical surveillance. Applied Acoustics, 270–282. 10. Unnikrishnan, A., Balakrishnan, K., & Sebastian, V. (2012). Gray level co-occurrence matrices: Generalisation and some new features. arXiv preprint arXiv, 1205.4831. 11. Waldekar, S., & Saha, G. (2018). Classification of audio scenes with novel features in a fused system framework. Digital Signal Processing, 71–82.

Improvement on Security of SMS Verification Codes Shushan Zhao

Abstract Nowadays many online service providers use SMS verification codes as a major or supplementary authentication method to verify identity of the user. However, we notice that there are several ways to intercept the SMS verification codes so that the attacker can impersonate the actual user. To counter this type of attacks, we bring up the idea that requires the SMS verification code be sent to not only the exact phone number but also the exact phone of the registered user and be used to generate a one time passcode (OTP). We propose a possession-based SMS verification framework and implementation algorithms in it, and analyze the security and performance features of them. The solution is generic to all platforms and operating systems, and our analysis demonstrates that even in the case that the attacker manages to intercept the original SMS verification code successfully by any technical or social engineering means, derivation of a valid OTP is computationally infeasible. Keywords SMS verification code · One-time passcode · Authentication

1 Introduction Today we are living in a world full of online services—online shopping, online banking, online chatting etc., and recently we have upgraded versions of above mentioned online services—mobile services. All these services rely on a basic security service—authentication, and it’s facing more and more challenges nowadays. The traditional and most popularly used method—a password known only to its creator and owner—has the inherent weakness that users must balance between security and convenience: a secure enough password is not easy to remember, and an easy-to-remember password is not secure enough. Password-attacking is the main target of hackers for years, and there are many tools free online for this purpose. Most of the attacks use a password dictionary or a hashcode database (so-called rainbow S. Zhao (B) University of Pittsburgh at Bradford, 300 Campus Dr., Bradford, PA 16701, USA e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_12

189

190

S. Zhao

table) and search for a match of password or hashed password. There are many tools to launch the attack, such as “John-the-Ripper”, “medusa”, “hydra” etc., and there are many tools to generate and customize the password or hashcode dictionaries, such as “crunch”, “cewl”, “rtgen” etc. With development of computational power and storage resources, attacks not possible or feasible before are now becoming possible and feasible. For example, on a DECstation 3100 in 1989, even if only words of 3–5 lower case characters are considered, testing 3000 passwords would require over 25 CPU hours [9]; while on a modern computer, testing this number of words takes less than 1 s. For another example, any one can buy a password cracking dictionary of 15 GB containing 1,493,677,782 popularly used passwords for $5 [2]. The financial and technical cost is now affordable by most individuals. This leads to the fact that passwords are getting easier and easier to break, and less and less secure. Many online service providers have realized the trend of decreasing security of passwords and resort to other supplementary methods. SMS verification code is a popularly used supplementary method for user identity verification. Unfortunately, in recently years (especially after year 2014 more precisely), SMS interception techniques are known to more and more people, and become less and less expensive to carry out. As far as we are aware, there are at least four methods in the literature to intercept SMS verification codes: 1. Through SIM Swapping Attacks: SIM swapping attacks mean that the attackers manage to replace the SIM card of a known cellphone number with one they have obtained. The easiest way is using social engineering skills—after gathering enough information on a target, they just call the cellular service provider, claim that the existing SIM card is lost or damaged, and ask for a new one. According to the U.S. Fair Trade Commission, there were 1038 reported incidents of SIM swap identity theft in January 2013, representing 3.2% of identity theft cases that month. By January 2016, that number had ballooned to 2658 [5, 17]. With the swapped card, the attacker can receive messages destined to the target cellphone number. 2. Through SIM Cloning Attacks: There are several toolkits available in the market with which you can clone a SIM card to a new empty SIM card [11, 12]. Such a toolkit does not require any authentication or matching of the PIN to clone the SIM card. What the attacker needs to do is buying the toolkit, a SIM card reader/writer and an empty SIM or a rewritable SIM card, then stealing the target’s SIM card, cloning it and replacing it back intactly. With the cloned SIM card, the attacker can choose a time and suppress the target user from connecting to the network by some means and impersonate this user instead. 3. Through Malicious Apps: Malicious apps can be installed in users’ cell phones by one means or another (which is not focus of this paper), with access to SMS messages. Then a malicious app can send an interesting SMS message to the remote interceptor through Internet. Furthermore, the app can delay or block the reception of the native SMS app and its notification to the owner of the phone, if the malicious app grabs the root

Improvement on Security of SMS Verification Codes

191

privilege. No matter whether or not the actual owner of the phone receives the verification code, if the interceptor uses the code first, the code becomes invalid. That is how the one-time verification code works. In [8], the authors have studied this type of SMS interception attacks. By means of the modification of the Android SMS reception framework and the native SMS app, they demonstrate how to prevent a malicious app from intercepting an SMS message. In addition, to protect rooted environment, they also suggest an idea to verify the app with its public key of the certificate on the framework layer and get users notified of the compromised app. However, their solution only targets malicious apps and only works in Android framework. 4. Using Fake Base Station and Fake Operator Workstation: In 2014, two research groups found independently that SMS traffic in GSM networks can be intercepted and they published how to achieve this using Signaling System No. 7 (SS7) protocol in [15, 16, 19]. In 3G, 4G networks and upcoming 5G networks, SS7 is replaced by Diameter protocol that improves overall interconnection security, hoping that SMS interception would be prevented. In [1], the authors suggest that using a jammer to jam the 3G/4G band the cell phone would fall back to GSM operation where it can be intercepted using the fake base station method. In the paper, they also demonstrate hardware and software tools used in the process. In [7], the authors show that 4G/5G SMS can still be intercepted using Diameter based networks independently of device or OS type. The first step in this attack is to get the International Mobile Subscriber Identity (IMSI). A false base station can request IMSI from the phone, or a false WLAN station can be used to trigger the phone to make an “EAP-AKA” run and reveals IMSI. Then, the attacker acts as a Visited Mobility Management Entity (V-MME) and sends a Location Update to the user’s home network saying that the user is roaming to another network so that messages would be forwarded to another network in the future. The attacker then acts as a Virtual Short Message Service Center (V-SMSC) and tells the SMSC of the user to send all messages to it. With this attack, the authors successfully get SMS verification codes and reset passwords of the target’s Twitter account and Facebook account etc. The testing attack was conducted on 4G LTE network in Finland in 2017. In light of the fact of these viable attacks, we find the imminent demand of a countermeasure to address these security issues of SMS verification codes. In this paper, we propose a method to improve the security of SMS verification codes in which we ensure that the SMS verification code is sent to not only the exact phone number but also the exact phone of the registered user, and by which the derivation of valid SMS verification code is computationally impossible. Compared to existing solutions in the literature, our solution is generic to all platforms and operating systems, and is secure even in the case that the attacker manages to intercept the original SMS verification code by any technical or social engineering means.

192

S. Zhao

The rest of this paper is organized as follows: Sect. 2 overviews the solution— a possession-based SMS verification framework. Section 3 presents implementation of OTP generation algorithms used in the framework. Section 4 discusses the security features and compares the OTP generation algorithms in several aspects. Section 5 demonstrates experimental results and performance evaluation of the scheme. Section 6 concludes the paper.

2 A Possession-Based SMS Verification Framework The basic idea of our improved SMS verification solution is a possession-based SMS verification framework in which the SMS verification code (we call it raw verification code in following sections) is processed by some algorithm using some shared secret between the authentic parties, and the output of the algorithm is used for one time password/passcode (OTP). In this way, even somebody else intercepts the raw verification code, without the shared secret, she/he cannot generate the correct output expected by the other party. And the shared secret is related to the exact phone the user registered. The current message flow between the server and client is shown in Fig. 1a; the proposed message flow is shown in Fig. 1b. Let’s name a program App Poss_auth that implements this framework on client side. The following steps are initialization process to register a client to a server and the server to the client’s App Poss_auth: 1. On the smart phone that is to be registered for the user and the server, the user authenticates him/her-self to the server using other authentication methods, e.g. password, or face recognition. 2. The user runs App Poss_auth on the same smart phone. 3. App Poss_auth calculates s = H (har dwar e_ signatur e||timestamp||ser ver _ name). 4. The user sends sclient = s to the server for registration. 5. (Optional) The server and the user agree on and save other parameters needed in the future (e.g. a large prime number to be used as modulus, the algorithm to be used). 6. The server saves sser ver = s (and other parameters) for this user in its database. 7. App Poss_auth adds an entry for the this server and key (and other parameters) in its database. har dwar e_signatur e is something that is unique to the device running App Poss_auth. For smart phone, hardware signature can be International Mobile Equipment Identity (IMEI) or unique installation ID (UIID). The following steps are authentication process used to authenticate the user to a registered server: 1. The user sends login request to the server with her/his account information (on any device).

Improvement on Security of SMS Verification Codes

(a) Thecurrentmessageflow

(b) Messageflowwithimprovedsecurity Fig. 1 SMS verification message flow between the server and client

193

194

S. Zhao

2. The server retrieves secret sser ver for this user, determines a random SMS verification code r and calculates pser ver = Ai (sser ver , r ) where Ai () is an OTP generation algorithm that takes this r and the stored secret sser ver for this user (there might be extra requirements for r , to be explained in next section), sends r to the user, and asks for OTP. 3. The user selects the sever from App Poss_auth interface (on the registered smart phone). 4. The user inputs the raw verification code r in text to App Poss_auth (The App can input the code r automatically if assigned the access right to read SMS messages). 5. App Poss_auth calculates the OTP pclient = Ai (sclient , r ) for this verification code, where Ai () is the same OTP generation algorithm that takes this r and the stored secret sclient for this server. 6. The user inputs OTP pclient in the login page. ? 7. The server verifies pser ver = pclient . It approves the login request if the two passcodes are matched; and rejects the login request if not matched.

3 OTP Generation Algorithms Used in the Framework We here show some examples on OTP generation algorithms based on known oneway (trapdoor) functions such as HMAC, Discrete Logarithm (DL) problem, Integer Factoring (IF) problem, and block ciphers (BC). From the shared secret between a client and a server, we can derive a key k shared between them. The trapdoor functions make use of the shared key to generate an OTP. The purpose of this section is to evaluate and compare security level and time-wise performance aspects of different options (to be presented in Sects. 4 and 5), so that users can choose one when needed based on their specific requirements and pros and cons of these options.

3.1 Randomized HMAC OTP Generation Algorithm Assume k is the key shared between the client and server, r is the raw verification code sent from the server to the client, H is a hash function that converts an integer of any digits (with padding if needed) to at least 512 bits, e.g. SHA-512. A one-time password is generated with this algorithm: int HMAC_OTP(int k, int r) { int k := H(k ⊕ r); int OTP := HMAC(k, r); return n least significant bits of power as OTP. }

Improvement on Security of SMS Verification Codes

195

We name this algorithm Randomized HMAC-OTP (R-HMAC-OTP) in the rest of this paper.

3.2 DL-Based OTP Generation Algorithm We can use the random raw verification code and shared key as base and exponent of discrete logarithm in integer groups, and use the resulted power as source of OTP. In the following algorithm, r is the raw verification code sent from the server to the client, H is a hash function that converts an integer of any digits (with padding if needed) to at least 512 bits, e.g. SHA-512; p is at least a 1024 bits prime number shared between the client and server; k is the key shared between the client and server. We are going to use H (r ) as base, and H (k ⊕ r ) as exponent. Because a k × bk ≡ (a × b)k , (which means that after getting outputs for a and b, there is a shortcut to calculate the output for a × b,) we can only choose a prime number as base. In this case, each base is the generator of a cyclic finite group in which the DL problem holds. For this algorithm, we have an extra requirement for the raw verification code to be sent from the server to the client: If H (r ) is not prime, we need to increase r by 1 and retry until we get H (r ) is prime. int DL_OTP(int k, int & r, int p) { int power := 1; int base := H(r); while (!isPrime(base)) { r := r+1;//update the value of input r; base := H(r); } int exponent := H(k ⊕ r); while (exponent > 0) { if (exponent&1 == 1) { power := (power*base) mod p; } exponent := exponent/2; power := (base*base) mod p; } return n least significant bits of power as OTP. }

196

S. Zhao

3.3 IF-Based OTP Generation Algorithm We can use the random raw verification code and shared key to generate two large prime numbers as factors and calculate their product as source of OTP. In the following algorithm, r is the raw verification code sent from the server to the client, H is a hash function that converts an integer of any digits (with padding if needed) to at least 512 bits, e.g. SHA-512; p is at least a 1024-bit prime number shared between the client and server; k is the key shared between the client and server. We are going to generate two 512-bit prime numbers f 1 and f 2 , and calculate the 1024-bit product of them as the OTP. As we want to keep 1-to-1 mapping from r to the OTP (more explanation in Sect. 4), we need to update the raw verification code r to be sent from the server, instead of using next Prime( f 1 ) because next Prime( f 1 ) might lead to same f 1 ’s from different r ’s. The following algorithm makes sure f 1 and f 2 are 1-to-1 mapped from r , so that the OTP is 1-to-1 mapped from r . int IF_OTP(int k, int & r, int p) { int f1 := HMAC(k, r); while (!isPrime(f1)) { r=r+1; //update the value of input r; f1 := HMAC(k, r); } int f2 := H(f1); if (!isPrime(f2)) f2 := nextPrime(f2); int product := f1*f2 mod p; return n least significant bits of product as OTP. }

3.4 BC-Based OTP Generation Algorithm We can use a block cipher to encrypt the random raw verification code with the shared key, and use the ciphertext as source of OTP. For better security, we xor the random raw verification code with the key to get a random key for each encryption operation, and we use a hash function that converts an integer of any digits (with padding if needed) to at least 512 bits, e.g. SHA-512, so that we can use Cipher Block Chaining (CBC) mode. For example, if using AES cipher, the 8-digit decimal number (up to 27 bits) with padding is used as message, and the shared key truncated or hashed to 128 bits is used as key, the output 128 bits ciphertext is used as source of OTP.

Improvement on Security of SMS Verification Codes

197

int BC_OTP(int k, int r) { int k := k ⊕ r; r := H(r||padding); int OTP := E C BC (k, r); return n least significant bits of OTP as OTP. }

4 Features and Analysis of the Framework 4.1 Complexity and Success Rate of Algorithms The HMAC-based algorithms and BC-based algorithm are simply deterministic algorithms, and definitely converge to the solution. The time complexity and space complexity are constant, and the success rate is 1. The DL-based algorithm and IF-based algorithm are nondeterministic algorithms. The time complexity and success rate is dependent on the probability the H or HMAC output being a prime number. According to [4], the count of prime numbers less an integer i is denoted by a function π(i) ≈ i/(log i − 1) [4]. The larger a number is, the lower the probability a number less than it being prime is. Therefore, the time complexity for DL-based algorithm and IF-based algorithm is O(n/log n) where n is hash function output block size. When SHA-512 is employed, the probability its output being prime number is about 1%. This means the algorithm will restart with an incremented input about 100 times before finding a valid output and generating a valid OTP.

4.2 Security Analysis We notice that SHA-1 is already considered not secure any more [6], so in this paper SHA-256 or SHA-512 is used. We analyze the security on basis of assumption that a hash function is a secure Pseudo Random Function (PRF) and a block cipher is a Pseudo Random Permutation (PRP), meaning that they are computationally indistinguishable from truly random counterparts. To be computationally indistinguishable, we need to eliminate all known factors that lead to biases from truly random counterparts. It is not difficult to see that if codes of small length are used in raw verification codes and OTP’s for convenience (e.g. less than 8-digit decimal number), the length of the codes is the bottleneck of security and all algorithms are on the same level. In

198

S. Zhao

order to compare them, we need to neglect the lengths of input raw verification codes and output OTP’s of the algorithms, and assume the input and output lengths are only limited to block size of the hash function or block cipher. Because in practice the allowed time window for an OTP is short and security requirement is thus not high in most cases, all OTP algorithms should be already secure enough; the comparison provides only theoretical guidance in special cases where OTP’s time window is long enough to concern about attacks on them. In the context of an OTP, the input message (the raw verification code) is considered public; the output ciphertext or message digest (OTP) is a short-term one-time secret; and the key is a long-term secret. The security concerns are key security and output collision resistance which will be discussed in the following.

4.2.1

Strong Collision Resistance

Strong Collision Resistance means the difficulty to find any pair of input random verification codes x and y to an algorithm A such that A(x) = A(y), i.e. leading to same output OTP (with the assumption key is secure). We consider the number of Chosen Plaintext Attack (CPA) queries of reaching 50% probability of collision. When input space is greater than or equal to output space, the number of queries is determined by output space. According to Birthday √ Paradox, the maximum number of queries is qmax = 2so , where so is the output block size. This is true to HMAC-based OTP and BC-based OTP algorithms. For DL-based OTP and IF-based OTP algorithms, the input values must be prime numbers, and input-to-output is 1-to-1 mapping. The number of prime numbers less an integer i is denoted by a function π(i) ≈ i/(log i − 1) [4], then the maximum number of queries is qmax =

  π(2si ) ≈ 2si /(log 2si − 1)

(1)

where si is input block size.

4.2.2

Weak Collision Resistance

Weak Collision Resistance means the difficulty to find another input raw verification code r  = r0 such that A(r ) = p0 , given a pair of input raw verification code r0 and output OTP p0 such that A(r0 ) = p0 , with the assumption key is secure. For R-HMAC-OTP algorithm and BC-based OTP algorithm, “The analysis demonstrates that the best possible attack against the HOTP function is the brute force attack.” [13, 14]. Under brute force attack, assume the input block size of the hash function is si , the maximum number of CPA queries to find a collision is qmax = 2si .

Improvement on Security of SMS Verification Codes

199

For DL-based OTP and IF-based OTP algorithms, the input values must be prime numbers. The maximum number of CPA queries to find a collision is qmax = π(2si ) ≈ 2si /(log 2si − 1)

4.2.3

(2)

Key Security

Key security means the difficulty for the adversary to break the key given the power of sending an arbitrary raw verification code and receiving corresponding output OTP, and testing different keys; for example, the adversary is able to eavesdrop and learn random raw verification codes and corresponding OTP’s in the traffic and re-run the algorithm trying potential keys in his/her lab. For R-HMAC-OTP and BC-based OTP algorithms, with a given (r, O T P) pair, the adversary cannot test the key directly, but only the hashcode of a one-time key— k O T = H (k ⊕ r ). After qmax = 2sk queries, the adversary can only get k O T . For each k O T , the adversary needs to get the pre-image of hash function H (k ⊕ r ) to get the many-times key k. Due to one-way property of the hash function, there is no better way than 2sk tries determined by key space size. The overall effort to get k is 2sk × 2sk = 22sk . For DL-based OTP and IF-based OTP algorithms, the key security is based on the security of DL problem and IF problem. DL problem is used in ElGamal encryption, Diffie-Hellman key exchange, and the Digital Signature Algorithm. IF problem is used in RSA and Public Key Infrastructure (PKI) algorithms. The key security of DL problem and IF problem is out of the scope of this paper. From the literature, we see that the most efficient algorithm to solve DL problem and IF problem in some certain √ type of forms is General Number Field Sieve (GNFS) algorithm that takes O(ex p 3 n) for n-bit number [10]. n is determined by the modulus p preset, and is twice the length of input block size si . Even if the DL or IF problem is solved, we see from the algorithms, the adversary only gets the hashcode of a one-time key— H (k ⊕ r ). To get the many-times key, the adversary needs to break the hash function to get the pre-image of the hashcode, the security of which is determined by key space size 2sk . In Table 1 we compare the security levels of different OTP algorithms. For higher level of security, a user can choose either to increase the HMAC key size and hash function block size, or combine different algorithms into one OTP implementation application. For example, a user can create common random number tables on both server and client sides, and use a created OTP as indexes into the tables to choose a random key value, as suggested in [18]. As a future work, we would consider Discrete Logarithm in Elliptic Curves instead of natural numbers. Elliptic Curves Cryptography offers smaller key sizes and more efficient implementations at the same security level as other widely deployed schemes [3]. As a caveat, we want to explain difference between PRP and PRF for IF-based OTP algorithm. In description in Sect. 3, we intend to maintain the algorithm as PRPs, meaning that input and output are 1-to-1 mapped, and we sifted out the random codes

200 Table 1 Comparison on security levels of OTP algorithms OTP algorithm Security basis Strong collision Weak collision resistance resistance √ R-HMAC OTP HMAC with 2so 2si random keys and random messages  2si 2si DL-based OTP Discrete (log 2si −1) (log 2si −1) logarithm  2si 2si IF-based OTP Integer (log 2si −1) (log 2si −1) factorization √ BC-based OTP Block cipher 2s BC 2s BC

S. Zhao

Key security 22sk √ 2sk O(ex p 3 2si ) √ 2sk O(ex p 3 2si ) 22sk

that possibly result in a same OTP as another input code. This lowers down the performance on the server side. If we skip this step, the algorithm will be PRF, meaning that multiple input codes could possibly result in same OTP. What is the impact to security in practice? Well, this will lower down the security level of strong collision resistance and weak collision resistance. As we mentioned earlier in Sect. 4.1, if SHA-512 is used, about p/100 primes divide p numbers to p/100 segments, the probability of two random numbers less p falling into one segment is 100/ p. The   than 2si 2si security level will be reduced from (log 2si −1) to (1 − 100/ p) (log 2si −1) , which is negligible when p is large enough. We sacrifice performance for security in Sect. 3. This is not always needed by all future users of the system. They can choose the other way round if they do not mind slightly reduced security. Also, we need to remind the readers and future users that security parameters between a server and a user, e.g. the modulus p, need to be updated after some time. The time span is determined by how many raw verification codes have been consumed between them. The maximum number of raw verification codes under same parameters is determined by strong √ collision resistance level, e.g. for R-HMACbased OTP algorithm, the number is 2so .

5 Experimental Results We implemented the above algorithms with Java JDK 9.0 on Windows Server 2016 64-bit platform and Intel i7-6700 3.40GHZ CPU/16G RAM desktop computer, and measured average run time for each of them. All algorithms were implemented with SHA-512 as the core hash function. In addition, we also tested SHA-256 in IF-based algorithm. The raw verification codes were 8-digit random decimal numbers. The OTP’s generated were also 8-digit decimal numbers. The experimental results are presented in table and chart in Fig. 2.

Improvement on Security of SMS Verification Codes

201

Fig. 2 Comparison on time performance of different OTP algorithms

From the experimental results, we see that there is no much performance difference between R-HMAC-based OTP and DL-based OTP algorithms. AES-based OTP is by far the fastest. This is because AES instruction set is embedded in most x86 architecture microprocessors. We notice that IF-based OTP algorithm is much slower than others, and the reason is that nextPrime() function takes account for 80–90% of the time consumption. If we use SHA-256 instead of SHA-512, the time performance is comparable to other algorithms. We also notice a larger fluctuation in IF-based OTP time consumption. This is because nextPrime() function is a probability algorithm, the running time of this function relies much on its input that is a random number and different every time. Combining the security level comparison in previous section and time performance difference in this section, a user can determine his/her best choice of the OTP algorithm. For example, R-HMAC-based algorithm is on the same security level as BC-based algorithm, but if the CPU supports hardware AES operations, then BC-based algorithm is much faster and is the better choice.

202

S. Zhao

6 Conclusion Authentication is the paramount component of an online service. Password-based authentication itself is not secure enough any more. SMS verification codes are used in many online services as major or supplementary authentication method. However, in recent years we see SMS interception is possible, which is breaking the existing authentication system of many existing online services. We propose a possession-based authentication framework to fortify the SMS verification system, in which only the registered and authentic smartphone can receive the SMS verification code and generate the valid OTP, while the intercepted verification code has no use even if received successfully somehow on an unregistered device. The security and performance concerns are also analyzed and discussed. We believe this is a novel, feasible, and promising solution for online service authentication in current situation and the near future.

References 1. Androulidakis, I. I. (2016). Mobile phone security and forensics—A practical approach, 2nd ed. Springer International Publishing. 2. Crackstation: CrackStation’s Password Cracking Dictionary (2016). Available online at https:// crackstation.net/buy-crackstation-wordlist-password-cracking-dictionary.htm. Accessed on October 12, 2018. 3. Bernstein, D. J., & Lange, T. (eds). (2017). eBACS: ECRYPT Benchmarking of Cryptographic Systems. https://bench.cr.yp.to. 4. Dickson, L. E. (1919). History of the Theory of Numbers, vol. I: Divisibility and Primality. Carnegie Institute of Washington, Publication No. 256. Reprinted by Chelsea, New York, 1971 5. Digitaltrends: Here is how to stop SIM fraudsters from draining your bank account (2018). Available online at https://www.digitaltrends.com/mobile/sim-swap-fraud-explained/. Accessed on October 12, 2018. 6. Google: Google Security Blog: Announcing the First SHA-1 Collision (2017). Available online at https://security.googleblog.com/2017/02/announcing-first-sha1-collision. Accessed on October 12, 2018. 7. Holtmanns, S., & Oliver, I. (2017). SMS and one-time-password interception in LTE networks. In 2017 IEEE International Conference on Communications (ICC) (pp. 1–6) https://doi.org/ 10.1109/ICC.2017.7997246. 8. Kim, D., & Ryou, J. (2014). SecureSMS: Prevention of SMS interception on Android platform, pp. 1–6 https://doi.org/10.1145/2557977.2557979. 9. Klein, D. (1992). Foiling the cracker: A survey of, and improvements to, password security. Programming and Computer Software, 17, 1–10. 10. Kleinjung, T. (2006). On polynomial selection for the general number field sieve. Mathematics of Computation, 75(256), 2037–2047. 11. MagicSIM: “MagicSIM” (2018). Available online at http://download.cnet.com/MagicSIM/ 3000-2094_4-10601728.html. Accessed on October 12, 2018. 12. Mobiledit: “Mobiledit” (2018). Available online at http://www.mobiledit.com/sim-cloning/. Accessed on October 12, 2018. 13. M’Raihi, D., Bellare, M., Hoornaert, F., Naccache, D., & Ranen, O. (2005). RFC 4226 - HOTP: An HMAC-Based One-Time Password Algorithm. In IETF RFC (Vol. 4226).

Improvement on Security of SMS Verification Codes

203

14. M’Raihi, D., Machani, S., Pei, M., & Rydell, J. (2011). TOTP: Time-based one-time password algorithm. In IETF RFC, Vol. 6238. 15. Mulliner, C., & Borgaonkar, R., et al. (2013). SMS-based one-time passwords: Attacks and defense. In 10th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 16. Nohl, K. (2014). Mobile self-defense. In 2014 Chaos Computer Security Conference (pp. 1–6). 17. Sedicii: Preventing Mobile Phone SIM Swap Fraud (2017). Available online at https://www. sedicii.com/. Accessed on October 12, 2018. 18. Shaker, S. H. (2014). HMAC modification using new random key generator. Iraqi Journal of Computers, Communication and Control and Systems Engineering, 14(1), 72–82. 19. Technologies, P. (2014). SS7 Security Report. Available online at https://www.ptsecurity.com/ ww-en/analytics/ss7-vulnerabilities/. Accessed on October 12, 2018.

An Alternative Development for RCANE Platform Toan Van Nguyen and Geunwoong Ryu

Abstract RCANE platform was proposed in our previous work which is a semicentralized network of parallel blockchain and authorized proof of stakes (APoS). In that paper, the proposed RCANE platform was presented to solve the challenges of previous platforms and comply with our preconceived perspectives of the current political, social, and economic systems to build up an ecosystem for economy, society and politics. In addition, RCANE platform aims to pioneer the ‘content market’, where the cryptographic currency has an intrinsic value and can cover the blind spots of existing currencies, create new profitability by organizing system to give value to contents without monetary value, and establish cryptocurrency ecosystem to create a sustainable business model based on social and economic value to lead the future huge content market. However, the network security and the performance of RCANE platform are still open issues which must be improved. In this paper, an alternative development of the security network and performance of RCANE platform is presented to show its’ advantages in comparison with previous platforms more clearly. The performance of RCANE platform in terms of transaction confirmation time, the number of transaction per second and latency is emphasized. Keywords Semi-centralized network · Parallel blockchain · APoS · Adaptable platform

1 Introduction Blockchain technology has played an important role to overcome challenges in numerous areas. Especially, it has brought efficient solutions for transactional systems since third-parties and central instances are expelled. Unfortunately, the decenT. V. Nguyen (B) · G. Ryu RCANE Lab, Seoul, Korea e-mail: [email protected] G. Ryu e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9_13

205

206

T. V. Nguyen and G. Ryu

tralization has several challenges which cannot be adopted by our current political, economic and social systems. These problems were analyzed clearly in [1]. Herein, the fundamental technical problem of current blockchain algorithm is that ‘performance inefficiency’ could not be solved, in comparison with the central server system. In addition, notwithstanding each case of the client software maintains a copy of the blockchain and updates it based on these consensuses, the client software itself does not construct consensuses for making decisions about the future direction of the cryptocurrency, such as whether individuals or firms should be rewarded for taking actions supporting the cryptocurrency [2–6]. Furthermore, users’ completed anonymity and the absence of a responsible person are also open problems of decentralization [3, 4]. To date, there is a great deal of blockchain projects have launched. However, almost current blockchain projects have concentrated on finance by using existing platforms to generate tokens without the development of the technological features [6–21] while the intrinsic values should be the technological cores. To this end, a whole new platform dubbed RCANE was proposed in [1] to solve above challenges of current blockchain. In that paper, the structure of RCANE platform was introduced as a semi-centralized network which combines the advantages of both centralized and decentralized networks. By dint of semi-centralized network, RCANE platform adapt to our preconceived perspectives of the current political, social, and economic systems to build up an ecosystem for economy, society and politics. Besides, the components of RCANE platform are parallel blockchain and APoS which are also presented to described its work mechanism. Moreover, an important feature of RCANE is to reward coins to individuals who have promoted the cryptocurrency by adopting the currency on the Internet, providing liquidity, coding. In addition, RCANE Project aims to pioneer the ‘content market’, where the cryptographic currency has an intrinsic value and can cover the blind spots of existing currencies, create new profitability by organizing system to give value to contents without monetary value, and establish cryptocurrency ecosystem to create a sustainable business model based on social and economic value to lead the future huge content market. However, the network security and the performance of RCANE network are still issues which must be developed to show its advantages in comparison with previous platforms, and also its executability in reality. As an inheritance of our previous work in [1], this paper presents an alternative development of the network security and performance of RCANE platform. Herein, the transaction confirmation time, the number of transaction per second (TpS) and latency are emphasized to verify the RCANE performance. Besides, the transaction process is also adjusted to eliminate some meaningless steps. The algorithms and the transaction sequence are presented more clearly. In this paper, the brief review of previous platforms is also conducted to give the comparison with RCANE network to prove the necessary of the semi-centralized RCANE for our current economy, society and politics. The rest of this paper is organized as follows. In Sect. 2, existing platforms such as Ethereum, EOS and Ripple are introduced. In the next section, RCANE network is presented, including the structure of the proposed parallel blockchain, the semicentralized network of parallel blockchain and APoS. Then, the network security

An Alternative Development for RCANE Platform

207

and the transaction sequence of RCANE network are also presented. In Sect. 4, an experimental set-up is conducted and the obtained results are shown. The last section presents the conclusions of the research work.

2 Existing Platforms 2.1 Ethereum Ethereum was launched in 2015, which is referred to as Blockchain 2.0, and it has many similar ways in comparison with Bitcoin blockchain. However, unlike Bitcoin, Ethereum blocks contain a copy of both the transaction list and the most recent state. This project is a great milestone in blockchain technology, improved the mechanism of ‘Smart Contract’ that has opened the unforeseen possibilities of Decentralized Applications (DAPPs) [22]. The Smart Contract is mostly applied for financial derivatives in which the main challenge is that the reference to an external price ticker is required. An Ethereum transaction takes about 17 s [23]. And 15 transactions per second is the maximum number that Ethereum may conduct. Besides, ETH takes several or more seconds to propagate the blocks themselves. The GAS value is also one feature of Ethereum, which is like a bridge used to jump to the front of the line. You can pay Miners more to do your work first. If you set your price to 0, you will be stuck forever. To date, many issues still require the discussion among the Ethereum developers, especially Proof-of-Stake consensus [24, 25].

2.2 EOS EOS utilizes the decentralized consensus algorithm called Delegated Proof of Stake (DPoS) which is proven capable of meeting the performance requirements of applications on the blockchain [16]. By dint of DPoS consensus algorithm, an EOS transaction time is much faster in comparison with Ethereum, take on average 1.5 s. Theoretically, EOS may implement approximate 13,190 transactions per second. As mentioned in [16], who hold tokens on a blockchain adopting the EOS software may select block producers through a continuous approval voting system. Anyone may choose to participate in block production and will be given an opportunity to produce blocks, provided they can persuade token holders to vote for them. Typically, a transaction can be considered confirmed on average 0.25 s from time of broadcast if DPoS blockchains have one hundred percent block producer participation. In addition, EOS also adds asynchronous Byzantine Fault Tolerance for faster achievement of irreversibility, which can make transaction settlement and propagation times under a second.

208

T. V. Nguyen and G. Ryu

2.3 Ripple Ripple is a decentralized ledger that uses a completely different protocol to manage consensus in which there are not ‘blocks’, so Ripple is not blockchain. The peer-topeer Ripple Ledger network consists of many distributed servers, called nodes, that accept and process transactions. Client applications sign and send transactions to nodes, which relay these candidate transactions throughout the network for processing. The nodes that receive, relay and process transactions may be either tracking nodes or validating nodes [26]. The Ripple transaction confirmation take about 3.5 s, and it can processed about 1500 transactions per second. Ripple consensus process takes roughly 3–5 s to complete. This might seem like an eternity for traders accustomed to measuring transaction latency in milliseconds or microseconds, however, once you remove the speed advantage that others might use to engage in predatory trading strategies, a lot of the latency aversion that people harbor disappears.

3 RCANE Network 3.1 Background of RCANE Network In [1], a whole new platform dubbed RCANE was introduced which is a semicentralized network as shown in Fig. 1, including stations and nodes. This network selects the advantages of decentralization and centralization to build up an efficient network for blockchains, in which the fast processing speed of centralization and the distributed system of decentralization are combined.

Fig. 1 Semi-centralized network, including stations and normal nodes

An Alternative Development for RCANE Platform

209

Fig. 2 The structure of the parallel Blockchain, including node blocks and history blocks

Besides, the work mechanism of RCANE network and the structure of parallel blockchain were presented clearly in [1]. Here, parallel blockchain consists of node blocks and history blocks as illustrated in Fig. 2. Node block contains the information of permitted user and the public key. And, history block contains the information of confirmed transactions. Only the information of the desired node can be read in the parallel blockchain. By doing this, the efficiency of blockchain is improved considerably since the load of information is reduced during the transaction process. In parallel blockchain, it should be recorded as ‘real name’ in the node block so that only permitted users can access. The vote is unique way to establish conventions for making decisions about the future direction of the RCANE, or to reward coins to individuals who have promoted the cryptocurrency. However, decisions, including transactions, are only possible with a Station node having power in the semi-centralized system. Stations have stakes, and their influence based on their amount of stake. This verification is named Authorized Proof of Stakes (APoS). Only stations can broadcast and generate blocks via broadcast. The super broadcasts are exchanged continuously among stations, so the blockchain is synchronized. Unlike previous platforms, in RCANE network, Blockchain is managed by Stations, and therefore, a personal node cannot issue block or broadcast to others, so hash connection is not essential.

210

T. V. Nguyen and G. Ryu

Fig. 3 The procedure of vote for transactions

As presented in [1], for example, a transaction needs to be confirmed, the vote will be happened as in Fig. 3 in which node “Toan” wants to send an amount of RCANE coin to node “Nung”. To start, “Toan” sends a request message to Station, then Station issues a new history block which contains transaction information (what = transaction), and broadcasts them. The transaction history is then newly issued, and other stations should vote for this history block to make it available. If the total of vote (TV) is equal or bigger than required vote (RV), that history block is available. The weight of vote of each Station (‘How’ in Fig. 3) depends on their stakes. If there are n Stations vote for the history block, the TV can be calculated by: TV = How1 + · · · + Hown

(1)

The RV of history block can be calculated by: RV = k.S herein: RV   required vote k ∈ 0 1 muiredultiplier, is decided by voting S the sum of stake.

(2)

An Alternative Development for RCANE Platform

211

If stations voted for transactions, they are responsible for those transactions in case of hacking transactions. In the case of hacking, the stations can cancel the hacking transaction by voting, and if someone spoils the network, the node may be deactivated. In other words, the wallet will be locked to prevent hacking. The cancelled vote weight can be calculated by this formula:   RV .OSi (3) CVi = 1 − S where: CVi cancelled vote weight of Station ith OSi own stake of Station ith. If there are k Stations voting for cancelation, the total of cancelled vote weight can be calculated: TC = CV1 + · · · + CVk

(4)

If the TV − TC < RV, then the history block is cancelled. In other words, that history block becomes unavailable. By virtue of (4), only stations that have amount of voted stake can cancel requests to derive sufficient weights for block invalidation. By doing this, it prevents the dictatorship of stations who occupy a great deal of stakes, or the malicious cancellation of transactions.

3.2 Transaction Sequence The transaction sequence in RCANE network includes five main steps: Step 1: Query The query is that the node sends a request to a station. The query packet is encrypted with a hash, session key, and private key as shown in Fig. 4. This packet cannot be read or created by anyone. After a station has received the query packet, which is decrypted along the reserved process of encryption as shown in Fig. 5. The station ignores a packet if it has the similar value to previous ones, or if the time recorded in the packet is too long ago, or in the future. Step 2: Transaction request A node sends this query to the station when it sends a coin or token to another node. The station checks whether the amount of coins or tokens currently held by the node is valid or not, and checks whether it is a negative value or not. If the content of the request is correct, history block records the transaction details is created on the corresponding two nodes and broadcasted. It sends the unexpected train packet for verification to other stations shown as Fig. 6.

212

Fig. 4 Encryption of a query packet

Fig. 5 Decryption of a query packet

T. V. Nguyen and G. Ryu

An Alternative Development for RCANE Platform

213

Fig. 6 The method of request for a transaction

Step 3: Acknowledgement request The node informs other stations when a transaction request is made, as shown in Fig. 7. Each station stores an encrypted acknowledgment packet in its cache. As shown in Fig. 6, when the station broadcasts the transaction block that requests verification via an unexpected train packet, the other stations compare it with the acknowledgement request packet stored in the cache. It checks whether the amount of expected coin or token will be changed due to the blocks currently held by the transaction node and currently waiting for acknowledgment is valid or not. If the contents of the request is correct, creating a voting block in the previously created transaction block and notify it by broadcast. Step 4: Broadcast and receive A broadcast is the propagating messages to all nodes from stations. Before sending a broadcast, stations encrypt packets as shown in Fig. 8. Then, the nodes receive the packets and decrypt them using the station’s address repeatedly and randomly, as shown in Fig. 9. After the transaction blocks and transaction voting blocks have been broadcasted, stations insert them into pending history block manager. Step 5: Pending history block managing Pending history block manager synchronizes history blocks received from broadcasts. The history block is inserted into the blockchain after 30 s. The number of pending history block queues is similar to the number of node blocks. Therefore, in the RCANE network, even if the block creation request is excessive, the bottleneck rarely occurs. By dint of the pending history queue, the scalability is solved.

214

Fig. 7 The method of a approval for a transaction

Fig. 8 Encryption of a broadcast packet

Fig. 9 Decryption of a broadcast packet

T. V. Nguyen and G. Ryu

An Alternative Development for RCANE Platform

215

3.3 Network Security Packets in the RCANE network are encrypted by three algorithms. SHA-256 hash: The SHA-256 hash algorithm produces irreversible data fragments. These fragments can be used to verify the packet tampering. AES symmetric key encryption: The AES symmetric key algorithm produces reversible data fragments. Because symmetric keys are vulnerable to malicious hacking, it is not used for an important packet solely. Asymmetric key encryption: Encrypts use the public and private key pairs recorded in the wallet file. Since there is no process of transmitting the public key, it cannot join the network if the public key is not recorded in the node block. Packets created with this encryption method cannot be tampered with or falsified unless passwords for wallet files and wallet files are leaked. The scheme of transaction sequence contributes to ensure the security in which query packet and broadcast packet are presented. To serve security purpose, train packet, signature and wallet file also need to be explained. Train Packet: Train packets are encrypted and decrypted in the same way as broadcasts. A station will ignore a packet if it has the similar value to previous ones, or if the time recorded in the packet is too long ago, or in the future. Signature: Signature means encrypting certain text with a private key. Some history blocks require a signature to be active, and therefore, blocks require a signature that cannot be freely issued by the station or forged. Wallet file: Wallet files are stored with the rwl extension and are encrypted with a symmetric key. Username and password must be entered to open a wallet which can be joined to the network if the public key is recorded as a node block in the blockchain of the RCANE network. Wallet files should never be uploaded online or shared with others.

4 Evaluation 4.1 Setup This section performs some experiments to verify the developments of RCANE Project. The experiments are set up by using a console version of RCANE Core on both development running on AWS (Amazon Web Services, US) and server computer was provided by DevStack, South Korea. The configuration of DevStack: non-GPU, 16 vCPUs, 32 GB RAM, and operation system is Ubuntu 14. The purpose of the use of normal server computer is to prove that the RCANE network can work well even by using not good configuration server computers. Each ten transactions are conducted in three different scenarios of one station, three stations and five stations, in which the performance of RCANE network are verified based on the transaction confirmation time, the number of transaction per

216

T. V. Nguyen and G. Ryu

second, and the latency to set up the transaction. A transaction request has been sent whenever a previous one completely finished. All of stations have same stakes so that the voting weight is equal. A transaction is confirmed when above a half of acknowledgement votes is broadcasted. Blocks are added on the blockchain database over 30 s passed after block creation.

4.2 Result The results have shown that the performance of RCANE network is increased dramatically in comparison with our previous work. It is clear that the performance of RCANE network will much better if high configuration computers are used. The occupied capacity and cancellation block were presented clearly in our previous work and they are almost similar in this paper. So, in this section, transaction confirmation time, TpS and latency will be emphasized. The TpS of RCANE network has shown in Fig. 10, where the number of TpS is increased by increasing the number of station joined in RCANE network. In theory, there is no limited number of TpS because this number will be increased if new stations joined in network. This is a great advantage of RCANE network. The transaction confirmation time in three scenarios are shown in Tables 1, 2, and 3, where No. is the number of transactions, Time(s) is the confirmation time. These results have shown that the transaction confirmation time is reduced considerably in comparison with our previous work. They have also shown that the transaction speed is improved meaningfully in comparison with conventional blockchains since it wastes the few time for the verification. RCANE has independent pending queues for each node to synchronize the broadcasted blocks itself. Therefore, it does not need to save many transactions in one block like conventional blockchains, whereas it is able to make each history block to corresponding node block immediately. This is only capable in the semi-centralized network. The verification is fully not required

Fig. 10 The relationship between TpS and the number of stations in RCANE network

An Alternative Development for RCANE Platform Table 1 Transaction confirmation time in case of one station

Table 2 Transaction confirmation time in case of three station

Table 3 Transaction confirmation time in case of five station

217

No.

1

2

3

4

5

Time (s)

0.915

3.506

3.496

3.759

3.981

No.

6

7

8

9

10

Time (s)

1.021

0.823

2.516

2.818

3.214

No.

1

2

3

4

5

Time (s)

4.638

6.546

6.009

7.054

6.35

No.

6

7

8

9

10

Time (s)

6.587

7.483

6.391

7.2

8.051

No.

1

2

3

4

5

Time (s)

7.28

4.552

7.516

9.501

7.037

No.

6

7

8

9

10

Time (s)

9.243

9.232

12.542

8.335

11.666

Fig. 11 The relationship between transaction confirmation time and the number of stations in RCANE network

because the stations are already verified by credible process named APoS voting. If the latency of internet network is ignored, the latency of RCANE network may approach to mini or micro seconds. In other words, the transaction requests are almost conducted immediately. The average of transaction confirmation time is shown in Fig. 11, where the transaction confirmation time increases in case of the joined nodes increase. The values are 2.8749 s for one station, 6.6309 s for three stations, and 8.6 s for five stations while these values are 4.257 s for one station and 34.339 s for five stations in previous work. The gradient of transaction confirmation time increase when the number of joined station increases. This is one currently existing problem that must be solved in the next research.

218

T. V. Nguyen and G. Ryu

The proposed method for next research to solve the mentioned existing problem of transaction confirmation time, and to increase TpS which are firstly presented as below: – Only two random stations are requested for authentication to vote for a transaction block (the higher the computing power, the higher the probability of selection). – The node has requested transaction which has to vote for oneself. – The three entities can cast the same 10 votes. – The required votes are 21/30, so that all three of entities should vote for the block to activate the transaction block. Two of entities should dis-vote to deactivate the transaction block.

5 Conclusion This paper presented alternative developments for our previous work in terms of security network problem and the performance of RCANE platform. Herein, the background of RCANE platform is briefly introduced. In addition, algorithms for security, and the transaction sequence of RCANE blockchain are presented clearly. Another important advantage of RCANE network is that the scalability problem can be solved by using the pending history block manager. It is clearly shown that the mechanism of RCANE enables fast transaction rates that were not possible in decentralization networks. It proposed a new way to invalidate illegally generated blocks by the reasonable block cancellation weight formula. This alternative development of RCANE platform helps to prove its executability in reality to create the ‘content market’ by adapting to our preconceived perspectives of the current political, social, and economic systems. It is easy to prove that the performance of RCANE network will be much better in case of using high-power computer. Acknowledgements This research was supported by the RCANE Project, supervised by the RCANE LAB.

References 1. Van Toan, N., Park, U., & Ryu, G. (2018). RCANE: Semi-centralized network of parallel Blockchain and APoS. In The 24th International Conference on Parallel and Distributed Systems (IEEE ICPADS 2018), Singapore. December 11–13, 2018. 2. Abramowicz, M. (2016). Autonocoin: A proof-of-belief cryptocurrency. LEDGER, 1, 119–133. 3. Sward, A., Vecna, I., & Stonedahl, F. (2018). Data insertion in Bitcoin’s Blockchain. LEDGER, 3, 1–23. 4. Muftic, S. (2016). BIX certificates: Cryptographic tokens for anonymous transactions based on certificates public Ledger. LEDGER, 1, 19–37. 5. Muftic, S., bin Abbdullah, N., & Kounelis, I. (2016). Business information exchange system with security, privacy, and anonymity. Journal of Electrical and Computer Engineering, 1–10. https://doi.org/10.1155/2016/7093642.

An Alternative Development for RCANE Platform

219

6. Origami Network Team. (2018). A protocol for Bulding decentralized marketplaces using the Ethereum Blockchain. In White Paper. http://ori.network, February 26, 2018. 7. WaBi Team. (2018). WaBi—Crypto token for safe consumer products. In White Paper. www. wacoin.io. 8. Buterin, V. (2014). Secret sharing and erasure coding: A guide for the aspiring dropbox decentralizer. In Ethereum Blog. https://blog.ethereum.org/2014/08/16/secretsharing-erasurecoding-guide-aspiring-dropbox-decentralizer/, August 16, 2014. 9. Lehner, E., Hunzeker, D., & Ziegler, J. R. (2017). Funding science with science: Cryptocurrency and independent academic research funding. LEDGER, 2, 65–76. 10. Kraft, D. (2016). Game channels for trustless off-chain interactions in decentralized virtual worlds. LEDGER, 1, 84–98. 11. Piasecki, P. J. (2016). Gaming self-contained provably fair smart contract Casios. LEDGER, 1, 99–110. 12. Biryukov, A., & Khovratovich, D. (2017). Equihash: Asymmetric proof-of-work based on the generalized birthday problem. LEDGER, 2, 1–30. 13. Rizun, P. R. (2016). Subchains: A technique to Scale Bitcoin and improve the user experience. LEDGER, 1, 38–52. 14. ICON Foundation. (2017). ICON hyperconnect the world. In White Paper. www.icon. foundation, August 15, 2017. 15. Bitdegree Project. (2017). Revolutionizing global education with Blockchain. In White Paper. www.bitdgree.org, October 9, 2017. 16. Ian Grigg. (2017). EOS—An introduction. www.eos.io, July 05 2017. 17. SwissBorg Project. (2017). The Blockchain era of Swiss wealth management. In Technical White Paper. www.swissborg.com, November 16, 2017. 18. Open Source University Project. (2018). The world’s academic & career development Ledger. In White Paper. www.os.university, April 16, 2018. 19. The Gamy Tech Team. (2018). Game protocol: A decentralized economy for great games. In White Paper. www.gameprotocol.io. 20. Yantis, J., Quigley, W., & CasSelle, M. (2018). Global decentralized marketplace for video game virtual assets. In Worldwide Asset eXchange (WAX) Project. March 29, 2018. 21. The ZilliQa Team. (2017). The Zilliqa technical whitepaper. www.zilliqa.com, August 10, 2017. 22. Buterin, V. (2016). A next-generation smart contract and decentralized application platform. In White paper. 23. Buterin, V. (2015). On slow and fast block times. In Ethereum Block. https://blog.ethereum. org/2015/09/14/on-slow-and-fast-block-times, September 13, 2015. 24. Scalability. (2015). Bitcoin Wiki. https://en.bitcoin.it/wiki/Scalability, December 13, 2015. 25. Ethereum Frontier. (2015). https://www.ethereum.org/, Accessed November 30, 2015. 26. Schwartz, D., Youngs, N., & Britto, A. (2014). The ripple protocol consensus algorithm. In White Paper. Ripple Labs Inc.

Author Index

C Coplan, Max, 131

Oo, Mie Mie, 175 Orpilla, Mont, 23

D Deng, Lin, 43, 131 Doctolero, Sam, 91

P Patton, Jon, 131 Pyun, Hae-Soo, 161

G Gao, Weichao, 23

R Ryu, Geunwoong, 205

L Lee, Jun-Ho, 147 Liang, Fan, 23 Liang, Hengshuo, 23 Li, Yihao, 43 Lu, Chao, 1, 23 Lynn, Khin Thidar, 55

S Soe, Thin Thin, 117 Steele, Mark, 131 Stillwell, Matt, 131 Stoyanov, Alexander, 131

M Macnab, C.J.B., 91 Merino, Tim, 131 Min, Myat Myat, 117 N Nguyen, James, 23 Nguyen, Toan Van, 205 Nwe, Mar Mar, 55 O Oo, Cherry, 75 Oo, Hnin Min, 75 Oo, Lwin Lwin, 175

V Visalli, Nicholas, 43 W Wei, Bingyang, 43 Wu, Yalong, 1 Y Yu, Wei, 1, 23 Z Zhang, Jin, 1 Zhang, Jin-Hua, 147 Zhao, Shushan, 189

© Springer Nature Switzerland AG 2020 R. Lee (ed.), Software Engineering Research, Management and Applications, Studies in Computational Intelligence 845, https://doi.org/10.1007/978-3-030-24344-9

221