Intelligent Computing: Proceedings of the 2020 Computing Conference, Volume 2 [1st ed.] 9783030522452, 9783030522469

This book focuses on the core areas of computing and their applications in the real world. Presenting papers from the Co

1,657 93 79MB

English Pages XI, 717 [728] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Intelligent Computing: Proceedings of the 2020 Computing Conference, Volume 2 [1st ed.]
 9783030522452, 9783030522469

Table of contents :
Front Matter ....Pages i-xi
Urban Mobility Swarms: A Scalable Implementation (Alex Berke, Jason Nawyn, Thomas Sanchez Lengeling, Kent Larson)....Pages 1-18
Using AI Simulations to Dynamically Model Multi-agent Multi-team Energy Systems (D. Michael Franklin, Philip Irminger, Heather Buckberry, Mahabir Bhandari)....Pages 19-32
Prediction of Cumulative Grade Point Average: A Case Study (Anan Sarah, Mohammed Iqbal Hossain Rabbi, Mahpara Sayema Siddiqua, Shipra Banik, Mahady Hasan)....Pages 33-42
Warehouse Setup Problem in Logistics: A Truck Transportation Cost Model (Rohit Kumar Sachan, Dharmender Singh Kushwaha)....Pages 43-62
WARDS: Modelling the Worth of Vision in MOBA’s (Alan Pedrassoli Chitayat, Athanasios Kokkinakis, Sagarika Patra, Simon Demediuk, Justus Robertson, Oluseji Olarewaju et al.)....Pages 63-81
Decomposition Based Multi-objectives Evolutionary Algorithms Challenges and Circumvention (Sherin M. Omran, Wessam H. El-Behaidy, Aliaa A. A. Youssif)....Pages 82-93
Learning the Satisfiability of Ł-clausal Forms (Mohamed El Halaby, Areeg Abdalla)....Pages 94-102
A Teaching-Learning-Based Optimization with Modified Learning Phases for Continuous Optimization (Onn Ting Chong, Wei Hong Lim, Nor Ashidi Mat Isa, Koon Meng Ang, Sew Sun Tiang, Chun Kit Ang)....Pages 103-124
Use of Artificial Intelligence and Machine Learning for Personalization Improvement in Developed e-Material Formatting Application (Kristine Mackare, Anita Jansone, Raivo Mackars)....Pages 125-132
Probabilistic Inference Using Generators: The Statues Algorithm (Pierre Denis)....Pages 133-154
A Q-Learning Based Maximum Power Point Tracking for PV Array Under Partial Shading Condition (Roy Chaoming Hsu, Wen-Yen Chen, Yu-Pi Lin)....Pages 155-168
A Logic-Based Agent Modelling Paradigm for Investment in Derivatives Markets (Jonathan Waller, Tarun Goel)....Pages 169-180
An Adaptive Genetic Algorithm Approach for Optimizing Feature Weights in Multimodal Clustering (Manar Hosny, Sawsan Al-Malak)....Pages 181-197
Extending CNN Classification Capabilities Using a Novel Feature to Image Transformation (FIT) Algorithm (Ammar S. Salman, Odai S. Salman, Garrett E. Katz)....Pages 198-213
MESRS: Models Ensemble Speech Recognition System (Ben Zagagy, Maya Herman)....Pages 214-231
DeepConAD: Deep and Confidence Prediction for Unsupervised Anomaly Detection in Time Series (Ahmad Idris Tambuwal, Aliyu Muhammad Bello)....Pages 232-244
Reduced Order Modeling Assisted by Convolutional Neural Network for Thermal Problems with Nonparametrized Geometrical Variability (Fabien Casenave, Nissrine Akkari, David Ryckelynck)....Pages 245-263
Deep Convolutional Generative Adversarial Networks Applied to 2D Incompressible and Unsteady Fluid Flows (Nissrine Akkari, Fabien Casenave, Marc-Eric Perrin, David Ryckelynck)....Pages 264-276
Improving Gate Decision Making Rationality with Machine Learning (Mark van der Pas, Niels van der Pas)....Pages 277-290
End-to-End Memory Networks: A Survey (Raheleh Jafari, Sina Razvarz, Alexander Gegov)....Pages 291-300
Enhancing Credit Card Fraud Detection Using Deep Neural Network (Souad Larabi Marie-Sainte, Mashael Bin Alamir, Deem Alsaleh, Ghazal Albakri, Jalila Zouhair)....Pages 301-313
Non-linear Aggregation of Filters to Improve Image Denoising (Benjamin Guedj, Juliette Rengot)....Pages 314-327
Comparative Study of Classifiers for Blurred Images (Ratiba Gueraichi, Amina Serir)....Pages 328-336
A Raspberry Pi-Based Identity Verification Through Face Recognition Using Constrained Images (Alvin Jason A. Virata, Enrique D. Festijo)....Pages 337-349
An Improved Omega-K SAR Imaging Algorithm Based on Sparse Signal Recovery (Shuang Wang, Huaping Xu, Jiawei Zhang, Boyu Wang)....Pages 350-357
A-Type Phased Array Ultrasonic Imaging Testing Method Based on FRI Sampling (Dai GuangZhi, Wen XiaoJun)....Pages 358-366
A Neural Markovian Multiresolution Image Labeling Algorithm (John Mashford, Brad Lane, Vic Ciesielski, Felix Lipkin)....Pages 367-379
Development of a Hardware-Software System for the Assembled Helicopter-Type UAV Prototype by Applying Optimal Classification and Pattern Recognition Methods (Askar Boranbayev, Seilkhan Boranbayev, Askar Nurbekov)....Pages 380-394
Skin Capacitive Imaging Analysis Using Deep Learning GoogLeNet (Xu Zhang, Wei Pan, Christos Bontozoglou, Elena Chirikhina, Daqing Chen, Perry Xiao)....Pages 395-404
IoT Based Cloud-Integrated Smart Parking with e-Payment Service (Ja Lin Yu, Kwan Hoong Ng, Yu Ling Liong, Effariza Hanafi)....Pages 405-414
Addressing Copycat Attacks in IPv6-Based Low Power and Lossy Networks (Abhishek Verma, Virender Ranga)....Pages 415-426
On the Analysis of Semantic Denial-of-Service Attacks Affecting Smart Living Devices (Joseph Bugeja, Andreas Jacobsson, Romina Spalazzese)....Pages 427-444
Energy Efficient Channel Coding Technique for Narrowband Internet of Things (Emmanuel Migabo, Karim Djouani, Anish Kurien)....Pages 445-466
An Internet of Things and Blockchain Based Smart Campus Architecture (Manal Alkhammash, Natalia Beloff, Martin White)....Pages 467-486
Towards a Scalable IOTA Tangle-Based Distributed Intelligence Approach for the Internet of Things (Tariq Alsboui, Yongrui Qin, Richard Hill, Hussain Al-Aqrabi)....Pages 487-501
An Architecture for Dynamic Contextual Personalization of Multimedia Narratives in IoT Environments (Ricardo R. M. do Carmo, Marco A. Casanova)....Pages 502-521
Emotional Effect of Multimodal Sense Interaction in a Virtual Reality Space Using Wearable Technology (Jiyoung Kang)....Pages 522-530
Genetic Algorithms as a Feature Selection Tool in Heart Failure Disease (Asmaa Alabed, Chandrasekhar Kambhampati, Neil Gordon)....Pages 531-543
Application of Additional Argument Method to Burgers Type Equation with Integral Term (Talaibek Imanaliev, Elena Burova)....Pages 544-553
Comparison of Dimensionality Reduction Methods for Road Surface Identification System (Gonzalo Safont, Addisson Salazar, Alberto Rodríguez, Luis Vergara)....Pages 554-563
A Machine Learning Platform in Healthcare with Actor Model Approach (Mauro Mazzei)....Pages 564-571
Boundary Detection of Point Clouds on the Images of Low-Resolution Cameras for the Autonomous Car Problem (Istvan Elek)....Pages 572-581
Identification and Classification of Botrytis Disease in Pomegranate with Machine Learning (M. G. Sánchez, Veronica Miramontes-Varo, J. Abel Chocoteco, V. Vidal)....Pages 582-598
Rethinking Our Assumptions About Language Model Evaluation (Nancy Fulda)....Pages 599-609
Women in ISIS Propaganda: A Natural Language Processing Analysis of Topics and Emotions in a Comparison with a Mainstream Religious Group (Mojtaba Heidarysafa, Kamran Kowsari, Tolu Odukoya, Philip Potter, Laura E. Barnes, Donald E. Brown)....Pages 610-624
Improvement of Automatic Extraction of Inventive Information with Patent Claims Structure Recognition (Daria Berduygina, Denis Cavallucci)....Pages 625-637
Translate Japanese into Formal Languages with an Enhanced Generalization Algorithm (Kazuaki Kashihara)....Pages 638-655
Authorship Identification for Arabic Texts Using Logistic Model Tree Classification (Safaa Hriez, Arafat Awajan)....Pages 656-666
The Method of Analysis of Data from Social Networks Using Rapidminer (Askar Boranbayev, Gabit Shuitenov, Seilkhan Boranbayev)....Pages 667-673
The Emergence, Advancement and Future of Textual Answer Triggering (Kingsley Nketia Acheampong, Wenhong Tian, Emmanuel Boateng Sifah, Kwame Obour-Agyekum Opuni-Boachie)....Pages 674-693
OCR Post Processing Using Support Vector Machines (Jorge Ramón Fonseca Cacho, Kazem Taghva)....Pages 694-713
Back Matter ....Pages 715-717

Citation preview

Advances in Intelligent Systems and Computing 1229

Kohei Arai Supriya Kapoor Rahul Bhatia   Editors

Intelligent Computing Proceedings of the 2020 Computing Conference, Volume 2

Advances in Intelligent Systems and Computing Volume 1229

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong

The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within ”Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **

More information about this series at http://www.springer.com/series/11156

Kohei Arai Supriya Kapoor Rahul Bhatia •



Editors

Intelligent Computing Proceedings of the 2020 Computing Conference, Volume 2

123

Editors Kohei Arai Faculty of Science and Engineering Saga University Saga, Japan

Supriya Kapoor The Science and Information (SAI) Organization Bradford, West Yorkshire, UK

Rahul Bhatia The Science and Information (SAI) Organization Bradford, West Yorkshire, UK

ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-52245-2 ISBN 978-3-030-52246-9 (eBook) https://doi.org/10.1007/978-3-030-52246-9 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Editor’s Preface

On behalf of the Committee, we welcome you to the Computing Conference 2020. The aim of this conference is to give a platform to researchers with fundamental contributions and to be a premier venue for industry practitioners to share and report on up-to-the-minute innovations and developments, to summarize the state of the art and to exchange ideas and advances in all aspects of computer sciences and its applications. For this edition of the conference, we received 514 submissions from 50+ countries around the world. These submissions underwent a double-blind peer review process. Of those 514 submissions, 160 submissions (including 15 posters) have been selected to be included in this proceedings. The published proceedings has been divided into three volumes covering a wide range of conference tracks, such as technology trends, computing, intelligent systems, machine vision, security, communication, electronics and e-learning to name a few. In addition to the contributed papers, the conference program included inspiring keynote talks. Their talks were anticipated to pique the interest of the entire computing audience by their thought-provoking claims which were streamed live during the conferences. Also, the authors had very professionally presented their research papers which were viewed by a large international audience online. All this digital content engaged significant contemplation and discussions amongst all participants. Deep appreciation goes to the keynote speakers for sharing their knowledge and expertise with us and to all the authors who have spent the time and effort to contribute significantly to this conference. We are also indebted to the Organizing Committee for their great efforts in ensuring the successful implementation of the conference. In particular, we would like to thank the Technical Committee for their constructive and enlightening reviews on the manuscripts in the limited timescale. We hope that all the participants and the interested readers benefit scientifically from this book and find it stimulating in the process. We are pleased to present the proceedings of this conference as its published record.

v

vi

Editor’s Preface

Hope to see you in 2021, in our next Computing Conference, with the same amplitude, focus and determination. Kohei Arai

Contents

Urban Mobility Swarms: A Scalable Implementation . . . . . . . . . . . . . . Alex Berke, Jason Nawyn, Thomas Sanchez Lengeling, and Kent Larson Using AI Simulations to Dynamically Model Multi-agent Multi-team Energy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Michael Franklin, Philip Irminger, Heather Buckberry, and Mahabir Bhandari Prediction of Cumulative Grade Point Average: A Case Study . . . . . . . Anan Sarah, Mohammed Iqbal Hossain Rabbi, Mahpara Sayema Siddiqua, Shipra Banik, and Mahady Hasan Warehouse Setup Problem in Logistics: A Truck Transportation Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rohit Kumar Sachan and Dharmender Singh Kushwaha WARDS: Modelling the Worth of Vision in MOBA’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alan Pedrassoli Chitayat, Athanasios Kokkinakis, Sagarika Patra, Simon Demediuk, Justus Robertson, Oluseji Olarewaju, Marian Ursu, Ben Kirmann, Jonathan Hook, Florian Block, and Anders Drachen

1

19

33

43

63

Decomposition Based Multi-objectives Evolutionary Algorithms Challenges and Circumvention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sherin M. Omran, Wessam H. El-Behaidy, and Aliaa A. A. Youssif

82

Learning the Satisfiability of Ł-clausal Forms . . . . . . . . . . . . . . . . . . . . Mohamed El Halaby and Areeg Abdalla

94

A Teaching-Learning-Based Optimization with Modified Learning Phases for Continuous Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Onn Ting Chong, Wei Hong Lim, Nor Ashidi Mat Isa, Koon Meng Ang, Sew Sun Tiang, and Chun Kit Ang

vii

viii

Contents

Use of Artificial Intelligence and Machine Learning for Personalization Improvement in Developed e-Material Formatting Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Kristine Mackare, Anita Jansone, and Raivo Mackars Probabilistic Inference Using Generators: The Statues Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Pierre Denis A Q-Learning Based Maximum Power Point Tracking for PV Array Under Partial Shading Condition . . . . . . . . . . . . . . . . . . . . . . . . 155 Roy Chaoming Hsu, Wen-Yen Chen, and Yu-Pi Lin A Logic-Based Agent Modelling Paradigm for Investment in Derivatives Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Jonathan Waller and Tarun Goel An Adaptive Genetic Algorithm Approach for Optimizing Feature Weights in Multimodal Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Manar Hosny and Sawsan Al-Malak Extending CNN Classification Capabilities Using a Novel Feature to Image Transformation (FIT) Algorithm . . . . . . . . . . . . . . . . . . . . . . . 198 Ammar S. Salman, Odai S. Salman, and Garrett E. Katz MESRS: Models Ensemble Speech Recognition System . . . . . . . . . . . . . 214 Ben Zagagy and Maya Herman DeepConAD: Deep and Confidence Prediction for Unsupervised Anomaly Detection in Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Ahmad Idris Tambuwal and Aliyu Muhammad Bello Reduced Order Modeling Assisted by Convolutional Neural Network for Thermal Problems with Nonparametrized Geometrical Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Fabien Casenave, Nissrine Akkari, and David Ryckelynck Deep Convolutional Generative Adversarial Networks Applied to 2D Incompressible and Unsteady Fluid Flows . . . . . . . . . . . . . . . . . . 264 Nissrine Akkari, Fabien Casenave, Marc-Eric Perrin, and David Ryckelynck Improving Gate Decision Making Rationality with Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Mark van der Pas and Niels van der Pas End-to-End Memory Networks: A Survey . . . . . . . . . . . . . . . . . . . . . . . 291 Raheleh Jafari, Sina Razvarz, and Alexander Gegov

Contents

ix

Enhancing Credit Card Fraud Detection Using Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Souad Larabi Marie-Sainte, Mashael Bin Alamir, Deem Alsaleh, Ghazal Albakri, and Jalila Zouhair Non-linear Aggregation of Filters to Improve Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Benjamin Guedj and Juliette Rengot Comparative Study of Classifiers for Blurred Images . . . . . . . . . . . . . . 328 Ratiba Gueraichi and Amina Serir A Raspberry Pi-Based Identity Verification Through Face Recognition Using Constrained Images . . . . . . . . . . . . . . . . . . . . . . . . . 337 Alvin Jason A. Virata and Enrique D. Festijo An Improved Omega-K SAR Imaging Algorithm Based on Sparse Signal Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 Shuang Wang, Huaping Xu, Jiawei Zhang, and Boyu Wang A-Type Phased Array Ultrasonic Imaging Testing Method Based on FRI Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 Dai GuangZhi and Wen XiaoJun A Neural Markovian Multiresolution Image Labeling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 John Mashford, Brad Lane, Vic Ciesielski, and Felix Lipkin Development of a Hardware-Software System for the Assembled Helicopter-Type UAV Prototype by Applying Optimal Classification and Pattern Recognition Methods . . . . . . . . . . . . . . . . . . . 380 Askar Boranbayev, Seilkhan Boranbayev, and Askar Nurbekov Skin Capacitive Imaging Analysis Using Deep Learning GoogLeNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Xu Zhang, Wei Pan, Christos Bontozoglou, Elena Chirikhina, Daqing Chen, and Perry Xiao IoT Based Cloud-Integrated Smart Parking with e-Payment Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Ja Lin Yu, Kwan Hoong Ng, Yu Ling Liong, and Effariza Hanafi Addressing Copycat Attacks in IPv6-Based Low Power and Lossy Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Abhishek Verma and Virender Ranga On the Analysis of Semantic Denial-of-Service Attacks Affecting Smart Living Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Joseph Bugeja, Andreas Jacobsson, and Romina Spalazzese

x

Contents

Energy Efficient Channel Coding Technique for Narrowband Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Emmanuel Migabo, Karim Djouani, and Anish Kurien An Internet of Things and Blockchain Based Smart Campus Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Manal Alkhammash, Natalia Beloff, and Martin White Towards a Scalable IOTA Tangle-Based Distributed Intelligence Approach for the Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Tariq Alsboui, Yongrui Qin, Richard Hill, and Hussain Al-Aqrabi An Architecture for Dynamic Contextual Personalization of Multimedia Narratives in IoT Environments . . . . . . . . . . . . . . . . . . . 502 Ricardo R. M. do Carmo and Marco A. Casanova Emotional Effect of Multimodal Sense Interaction in a Virtual Reality Space Using Wearable Technology . . . . . . . . . . . . . . . . . . . . . . . 522 Jiyoung Kang Genetic Algorithms as a Feature Selection Tool in Heart Failure Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 Asmaa Alabed, Chandrasekhar Kambhampati, and Neil Gordon Application of Additional Argument Method to Burgers Type Equation with Integral Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 Talaibek Imanaliev and Elena Burova Comparison of Dimensionality Reduction Methods for Road Surface Identification System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 Gonzalo Safont, Addisson Salazar, Alberto Rodríguez, and Luis Vergara A Machine Learning Platform in Healthcare with Actor Model Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 Mauro Mazzei Boundary Detection of Point Clouds on the Images of Low-Resolution Cameras for the Autonomous Car Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 Istvan Elek Identification and Classification of Botrytis Disease in Pomegranate with Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 582 M. G. Sánchez, Veronica Miramontes-Varo, J. Abel Chocoteco, and V. Vidal Rethinking Our Assumptions About Language Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 Nancy Fulda

Contents

xi

Women in ISIS Propaganda: A Natural Language Processing Analysis of Topics and Emotions in a Comparison with a Mainstream Religious Group . . . . . . . . . . . . . . . . . . . . . . . . . . . 610 Mojtaba Heidarysafa, Kamran Kowsari, Tolu Odukoya, Philip Potter, Laura E. Barnes, and Donald E. Brown Improvement of Automatic Extraction of Inventive Information with Patent Claims Structure Recognition . . . . . . . . . . . . . . . . . . . . . . . 625 Daria Berduygina and Denis Cavallucci Translate Japanese into Formal Languages with an Enhanced Generalization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638 Kazuaki Kashihara Authorship Identification for Arabic Texts Using Logistic Model Tree Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656 Safaa Hriez and Arafat Awajan The Method of Analysis of Data from Social Networks Using Rapidminer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667 Askar Boranbayev, Gabit Shuitenov, and Seilkhan Boranbayev The Emergence, Advancement and Future of Textual Answer Triggering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674 Kingsley Nketia Acheampong, Wenhong Tian, Emmanuel Boateng Sifah, and Kwame Obour-Agyekum Opuni-Boachie OCR Post Processing Using Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694 Jorge Ramón Fonseca Cacho and Kazem Taghva Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715

Urban Mobility Swarms: A Scalable Implementation Alex Berke(B) , Jason Nawyn, Thomas Sanchez Lengeling, and Kent Larson MIT Media Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA {aberke,nawyn,thomassl,kll}@media.mit.edu https://www.media.mit.edu/groups/city-science/overview/

Abstract. We present a system to coordinate “urban mobility swarms” in order to promote the use and safety of lightweight, sustainable transit, while enhancing the vibrancy and community fabric of cities. This work draws from behavior exhibited by swarms of nocturnal insects, such as crickets and fireflies, whereby synchrony unifies individuals in a decentralized network. Coordination naturally emerges in these cases and provides a compelling demonstration of “strength in numbers”. Our work is applied to coordinating lightweight vehicles, such as bicycles, which are automatically inducted into ad-hoc “swarms”, united by the synchronous pulsation of light. We model individual riders as nodes in a decentralized network and synchronize their behavior via a peer-topeer message protocol and algorithm, which preserves individual privacy. Nodes broadcast over radio with a transmission range tuned to localize swarm membership. Nodes then join or disconnect from others based on proximity, accommodating the dynamically changing topology of urban mobility networks. This paper provides a technical description of our system, including the protocol and algorithm to coordinate the swarming behavior that emerges from it. We also demonstrate its implementation in code, circuity, and hardware, with a system prototype tested on a city bike-share. In doing so, we evince the scalability of our system. Our prototype uses low-cost components, and bike-share programs, which manage bicycle fleets distributed across cities, could deploy the system at city-scale. Our flexible, decentralized design allows additional bikes to then connect with the network, enhancing its scale and impact. Keywords: Cities · Mobility · Swarm behavior · Decentralization Distributed network · Peer-to-peer protocol · Synchronization · Algorithms · Privacy

1

·

Introduction

Cities comprise a variety of mobility networks, from streets and bicycle lanes, to rail and highways. Increasing the use of the lightweight transit options that navigate these networks, such as bicycles and scooters, can increase the sustainability of cities and public health [1–3]. However, infrastructure to promote and protect c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 1–18, 2020. https://doi.org/10.1007/978-3-030-52246-9_1

2

A. Berke et al.

lightweight transit, such as bicycle lanes, are limited, and riders are vulnerable on streets designed to prioritize the efficient movement of heavier vehicles, such as cars and trucks. In this paper we present our design and implementation of a system that synchronizes lights of nearby bicycles, automatically inducting riders into unified groups (swarms), to increase their presence and collective safety. Ad-hoc swarms emerge from our system, in a distributed network that is superimposed on the physical infrastructure of existing mobility networks. We designed and tested our system with bicycles, but our work can be extended to unify swarms of the other lightweight and sustainable transit alternatives present in cities. As bicycles navigate dark city streets, they are often equipped with lights. The lights are to make their presence known to cars or other bikers, and make the hazards of traffic less dangerous. As solitary bikes equipped with our system come together, their lights begin to softly pulsate, at the same cadence. The cyclists may not know each other, or may only pass each other briefly, but for the moments they are together, their lights synchronize. The effect is a visually united presence, as swarms of bikes illuminate themselves with a gently breathing, collective light source. As swarms grow, their visual effect and ability to attract more cyclists is enhanced. The swarming behavior that results is coordinated by our system technology without effort from cyclists, as they collaboratively improve their aggregate presence and safety. We provide a technical description of our light system that includes the design of a peer-to-peer message protocol, algorithm, and low-cost hardware. We also present our prototypes that were tested on a city bike-share network. The system’s low-cost and the opportunity for bike-share programs to deploy it city-wide allows the network of swarms to quickly scale. In addition, the decentralized and flexible nature of our design allows new bikes to join a network, immediately coordinate with other bikes, and further grow a network of swarms. Our system is designed for deployment in a city, yet draws inspiration from nature. Swarms of insects provide rich examples of synchrony unifying groups of individuals in a decentralized network. We focus on examples particular to the night. The sound of crickets in the night is the sound of many individual insects, chirping in synchrony. A single cricket’s sound is amplified when it joins the collective whole. The spectacle of thousands of male fireflies gathering in trees in southeast Asia to flash in unison has long been recorded and studied by biologists [4,5]. These examples of synchrony emerging via peer-to-peer coordination within a decentralized network are of interest in our design for urban swarms. They have also interested biologists, who have studied the coordination mechanisms of these organisms [6]. Applied mathematicians and physicists have also analyzed these systems and attempted to model the dynamics of their synchronized behavior [7,8]. We draw from these prior technical descriptions in order to describe the coordination of our decentralized bike light system.

Urban Mobility Swarms

3

In doing so, we describe the individual bikes that create and join swarms as nodes in a distributed network. These nodes are programmed to behave as oscillators, and their synchronization is coordinated by aligning their phases of oscillation via exchange of peer-to-peer messages. Our message protocol and underlying algorithm accommodate the dynamically changing nature of urban mobility networks. New nodes can join the network, and nodes can drop out, and yet our system maintains its mechanisms of synchrony. Moreover, our system is scalable due to its simplicity, flexibility, and features that allow nodes to enter the swarm network with minimal information and hardware. Namely, – – – –

There is no global clock Nodes communicate peer-to-peer via simple radio messages Nodes need not be predetermined nor share metadata about their identities Nodes can immediately synchronize

Before we provide our technical description and implementation, we first describe the bicycles, their lights, and their swarming behavior. We then describe them as nodes in a dynamic, decentralized network of swarms, before presenting our protocol and algorithm that coordinates their behavior. Lastly, we show how we prototyped and tested our system with bicycles from a city bike-share program.

2

Swarm Behavior and Bicycles Lights

Similar to our swarms of bicycles, swarms of nocturnal insects, such as crickets and fireflies, display synchronous behavior within decentralized networks. In these cases, the recruitment and coordination of individuals in close proximity emerges from natural processes and provides a compelling demonstration of “strength in numbers”. This concept of “strength in numbers” demonstrated in natural environments can be extended to the concept of “safety in numbers” for urban environments. “Safety in numbers” is the hypothesis that individuals within groups are less likely to fall victim to traffic mishaps, and its effect has been well studied and documented in bicycle safety literature [9,10]. The cyclists within swarms coordinated by our system are safer due to their surrounding numbers, but also because their presence is pronounced by the visual effect swarms produce with their synchronized lights. Unlike insects, the coordination of bikes swarms is due to peer-to-peer radio messages and software, yet swarms can still form organically when cyclists are in proximity. The visual display of synchronization is due to the oscillating amplitude of LED lights. Lights line both sides of the bicycle frame, and a front light illuminates the path forward (Fig. 1). The lights stay steadily on when a bike is alone. When a bike is joined by another bike that is equipped with the system, a swarm of two is formed, and the lights on both bikes begin to gently pulsate. The amplitude of the lights oscillates from high to low and back to high, in synchrony. As other

4

A. Berke et al.

bikes come in proximity, their lights begin to pulsate synchronously as well, further growing the swarm and amplifying its visual effect.

Fig. 1. Bicycle with lights.

The system synchronizes swarms as they merge, as well as the momentary passing of bicycles. When any bike leaves the proximity of others, its lights return to their steady state. The effect is a unified pulsation of light, illuminating swarms of bikes as they move through the darkness. This visual effect enhances their safety as well as their ability to attract more members to further grow the swarm. Additionally, as a swarm grows and its perimeter expands, the reach of its radio messages expands as well, further enhancing its potential for growth. While this paper focuses on the technical system that enables these swarms, we note that the swarming behavior that emerges can also be social. Members of swarms may not know each other, but by riding in proximity, they collaboratively enhance the swarm’s effects.

3 3.1

Technical Description A Decentralized Network of Swarms

In order to model swarms, we describe individual bikes as nodes in a decentralized network. We consider swarms to be locally connected portions of the network, comprised of synchronized nodes. The nodes synchronize by passing messages peer-to-peer and by running the same synchronization algorithm. When nodes come within message-passing range of one another, they are able to connect and synchronize. Two or more connected and synchronized nodes form a swarm. When a node moves away from a swarm, and is no longer in range of message passing, it disconnects from that portion of the network, leaving the swarm. The network’s topology changes as nodes (bikes) move in or out of message passing range from one another, and connect or disconnect, and swarms thereby form, change shape, or dissolve (Fig. 2). There may be multiple swarms of synchronized bikes in the city, with each swarm not necessarily in synchrony with another distant swarm. As such, the network of nodes may have a number of connected portions (swarms) at any given time, and these swarms may not be connected to one another (Fig. 3). Our system exploits the transitive nature of synchrony: If node 1 is synchronized with node 2, and node 2 is synchronized with node 3, then node 1 and

Urban Mobility Swarms

5

Fig. 2. Nodes synchronize when near each other, and fall out of synchrony when they move apart.

Fig. 3. Examples of network states.

node 3 are synchronized as well. Since all nodes in a connected swarm are in synchrony with each other, a given node needs only to connect and synchronize with a single node in a swarm in order to synchronize with the entire swarm (Fig. 4).

Fig. 4. Synchrony of nodes in the network is transitive.

When two synchronized swarms that are not in synchrony with each other come into proximity and connect for the first time, our message broadcasting protocol and synchronization algorithm facilitates their merge and transition towards a mutually synchronized state (Fig. 5).

6

A. Berke et al.

Fig. 5. Two swarms come in proximity with each other and merge as one swarm.

A feature of the message passing and synchronization protocol is that the nodes in the network need not be predetermined. New nodes can enter this decentralized network at any given time and immediately begin exchanging messages and synchronizing with pre-existing nodes. 3.2

Nodes as Oscillators

The behavior of the nodes (bikes) that needs be synchronized is the timed pulsation of light. We can characterize this behavior by describing a node as an oscillator, similar to simple oscillators modeled in elementary physics. Nodes have two states: 1. Synchronized: the node’s behavior is periodic and synchronized with another node. 2. Out of sync: the node’s behavior remains steady; the node is not in communication with other nodes. All nodes share a fixed period, T . When a node is in a state of synchrony, its behavior transitions over time, t, until t = T , at which point it returns to its behavior at time t = 0. We denote the phase of node i at time t as φi (t) such that φi (t) ∈ [0, T ] and the phases 0 and T are identical. When nodes are in synchrony, their phases are aligned. Thus for two nodes, node i and node j, to be synchronized, φi (t) = φj (t) (Fig. 6). When a node is out of sync (i.e. the bike is not in proximity of another bike and therefore not exchanging messages with other bikes), then it ceases to act as an oscillator. When out of the synchronous state, the node’s phase remains stable at φ = 0.

Urban Mobility Swarms

7

Fig. 6. Out of sync nodes, and synchronized nodes.

3.3

Phase and Light

The pulsating effect of a bike node’s light is the decay and growth of the light’s amplitude over the node’s period, T . The amplitude of the light is a function of the node’s phase: A = fA (φ) (see Fig. 7). We denote the highest amplitude for the light as HI, and the lowest as LO1 , such that: fA (0) = fA (T ) = HI fA ( T2 ) = LO

(1)

Fig. 7. Graph of fA (φ)

When a node is in the synchronized state, and its phase oscillates, φ(t) ∈ [0, T ], the amplitude of its light can be plotted as a function of time, t (Fig. 8). Note that nodes do not share a globally synchronized clock, so time t is relative to the node. Without loss of generalization, we plot t = 0 as when the given node enters a state of synchrony. When a node is in the out of sync state, the value of its phase φ, is steady at φ = 0, so its light stays at the HI amplitude. A = HI = fA (0) (Fig. 9). 1

In our implementation, the amplitude of light does not reach as low as 0 (LO > 0). This decision was made due to our desired aesthetics and user experience.

8

A. Berke et al.

Fig. 8. Amplitude, A, plotted as a function of relative time, t, for a node in the synchronized state.

Fig. 9. Amplitude, A, plotted for a node in the out of sync state.

As soon as an out of sync node encounters another node and enters a state of synchrony, its phase begins to oscillate and the amplitude of its light transitions from HI to LO along the fA (φ) path (Fig. 10).

Fig. 10. Amplitude, A, plotted as a function of time, for a node transitioning in to synchronous state.

Implementation Notes. For our bicycle lighting system we chose period T = 2200 ms and chose fA (θ) as a sinusoidal curve.     (HI − LO) cos(φ ∗ 2π) +1 ∗ fA (φ) = + LO (2) T 2 We visually tested a variety of period lengths and functions. We chose the combination that best produced a gentle rhythmic effect that would be aesthetically pleasing and noticeable, yet not distracting to drivers.

Urban Mobility Swarms

9

We also considered fA (φ) as a piecewise linear function (Fig. 11). For a slightly different effect, one might choose any other continuous function such that Eq. 1 hold. As long as the period, T , is the same as other implementations, the nodes can synchronize.  HI − k ∗ φ, when φ < T2 (3) fA (φ) = LO + k ∗ (φ − T2 ), when φ ≥ T2

Fig. 11. Graph of fA (φ) as a piecewise linear function.

4 4.1

Message Broadcasting and Synchronization Protocol

Nodes maintain anonymity by communicating information pertaining only to timing over a broadcast and receive protocol. Synchronization is coordinated by a simple set of rules that govern how nodes handle messages received. Broadcasting Messages. The messages broadcast by a node are simply integers representing the node’s phase, φ, at the time of broadcasting, t, i.e. nodes broadcast φ(t). Nodes in the out of sync state broadcast the message of 0 (zero), as φ = 0 for out of sync nodes. Receiving Messages. Nodes update their phase values to match the highest phase value of nearby nodes. When a message is received by a node out of the synchronous state, the phase represented in the message, φm , is necessarily greater than or equal, φm ≥ φ, to the out of sync node’s phase value of φ = 0. The out of sync node then sets its phase to match the phase in the received message, φ = φm , and enters a state of synchrony. Its phase then begins to oscillate from the value of φm , and bike lights pulsate in synchrony. When a message is received by a node that is already in a state of synchrony, the node compares its own phase, φ, to the phase represented in the received

10

A. Berke et al.

message, φm . If the node’s phase value is less than the phase value in the received message, φ < φm , then the node updates its phase to match the received phase, φ = φm . The node then continues in a state of synchrony, with its phase still oscillating, but now from the phase value of φm . The node is now in synchrony with the node that sent the message of φm (see Fig. 12).

Fig. 12. Node updates its phase value to match the phase value received in message.

There is an allowed phase shift, ϕallowed , to accommodate latency in message transit and receipt, and to keep nodes from changing phase more often than necessary (Fig. 13). Nodes do not update their phase to match a greater phase value if the difference between the phases is less than ϕallowed . For example, suppose node 1 has phase value φ1 and node 2 has phase value φ2 , and φ1 < φ2 . If (φ1 + ϕallowed > φ2 ) or (φ2 + ϕallowed mod T ) > φ1 then node 1 does not update its phase to match φ2 upon receiving a message of φ2 . In our implementation, ϕallowed is so small that the possible phase shift between the light pulsations of bike nodes is imperceptible.

Fig. 13. There is an allowed phase shift ϕallowed .

Once a node updates its phase to match a greater phase received in a message, φm , it then broadcasts its new phase. Nodes in range of this new message may

Urban Mobility Swarms

11

have been out of range of the original message, but these nearby nodes can now all synchronize around the new common phase φm . This simple protocol works as a mechanism for multiple swarms to merge and synchronize. Moreover, whenever nodes come in proximity of each other’s messages, they will synchronize. Even when node i with phase value φ receives message φm < φ from node m, and node i does not updates its phase to match φm , node i and node m will still synchronize. Since they are in message passing range, node m will receive the message broadcast by node i of φ > φm , and node m will then update is own phase to match φ. Fig. 14 illustrates various scenarios for receipt of the broadcast message. Once nodes synchronize, minimal messages are required to keep them synchronized, as all nodes share the same period of oscillation, T . When enough time passes without a node receiving any messages, the node then leaves its synchronous state and returns to the out of sync state where its phase stays steady at φ = 0 (and its lights cease to pulsate). Consider the cases of Fig. 14 where nodes come in range of each other’s messages and synchronize. We let the reader extend these small examples to the larger network topology of nodes previously provided. The broadcast messages are minimal, and the synchronization rule set simple, and we consider this simplicity a feature. We demonstrate its implementation as an algorithm. 4.2

Algorithm

The implementation of our algorithm used for our working prototype is provided open source2 . Nodes execute their logical operations through a continuous loop. Throughout the loop, they listen for messages, as well as update their phase as time passes. Algorithm 1 and Algorithm 2 outline the loop operations. This simple protocol and algorithm offer the following benefits across the network, with the only requirements being that all nodes in the network run loops with this same logic, and share the same fixed period. – Nodes need not share a globally synchronized clock in order to synchronize their phases. Time can be kept relative to a node. – Nodes need not share any metadata about their identity, nor need to know any information about other nodes, in order to synchronize. Unknown nodes can arbitrarily join or leave the network at any time while the network maintains its mechanisms for synchrony.

2

https://github.com/aberke/city-science-bike-swarm/tree/master/Arduino/PulseInSync.

12

A. Berke et al.

Fig. 14. Scenarios of nodes receiving broadcast messages and updating their state of synchrony.

Urban Mobility Swarms

13

Algorithm 1. Routine to update phase 1: 2: 3: 4: 5: 6: 7: 8: 9:

currentTime ← getCurrentTime() if node is inSync then timeDelta ← currentTime - lastTimeCheck phase ← (phase + timeDelta) % period else phase ← 0 end if lastTimeCheck ← currentTime return phase

Algorithm 2. Main loop 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17:

4.3

inSync ← FALSE if currentTime − lastReceiveTime < timeToOutOfSync then inSync ← TRUE end if phase ← updatePhase() phaseM ← receive() if phaseM not null then lastReceiveTime ← getCurrentTime() phase ← updatePhase() if (phase < phaseM) & (computePhaseShift(phase, phaseM) < allowedPhaseShift) then phase ← phaseM + expectedLatency lastTimeCheck ← lastReceiveTime end if end if broadcast(phase) phase ← updatePhase() updateLights(phase)

Addressing Scheme

A requirement of the system is that any two nodes must be able to communicate upon coming in proximity of one another, without knowing information about the other beforehand. Moreover, any new node that enters an existing network must be able to immediately begin broadcasting and receiving messages to synchronize with pre-existing nodes in the network. Thus the challenge is to accomplish this communication without nodes sharing identities or addresses. Because these nodes are broadcasting and receiving messages over radio, nodes cannot simply all broadcast and receive messages on the same channel, or else their messages will conflict and communication will be lost. Methods have been developed to facilitate resource sharing among nodes in a wireless network such as our network of bike nodes (e.g. TDMA implementations [11]). These methods are designed to avoid the problems of nodes sending messages on the same channel at conflicting times by coordinating the timing

14

A. Berke et al.

at which messages are sent. The DESYNC algorithm [12] even supports channel sharing across decentralized networks of nodes that do not share a globally synchronized clock (such as our network), by nodes monitoring when other messages are sent, and then self-adjusting the time at which they send messages, until gradually the nodes send their messages at equally spaced intervals. These strategies are not as well suited for our network of bike nodes, because its topology continuously changes (as new bikes join or leave the network, and as bikes pass each other, or collect at stoplights, or go separate ways), and nodes need to exchange messages as soon as they enter proximity of each other. In addition, immediately after a node updates its own phase to match a phase received in a message, it must broadcast its phase so that other nearby nodes can resynchronize with it. This immediate resynchronization would be hindered by a resource sharing algorithm that required a node to wait its turn in order to broadcast a message. Bike nodes should be able to continuously listen for messages sent by other nodes, and be able to broadcast messages at any time. We designed and use an addressing scheme to handle these requirements. The scheme exploits the fact that when multiple nodes are in proximity of each other, the messages they broadcast are often redundant: When nodes are in message passing range, they synchronize and the messages they then broadcast contain the same information about their shared phase. In our addressing scheme, we allocate N predetermined addresses, which we number as address 1, address 2, address 3, . . . , address N . All nodes in the network know these common addresses in the same way they all know the common period, T . We also consider our nodes as numbered: node 1, node 2, node 3, . . . Each node uses one of the N addresses to broadcast messages, and listens for messages on the remaining N − 1 addresses: – node i broadcasts on address i, – node i listens on address i + 1 mod N , address i + 2 mod N ,. . . , address i + (N − 1) mod N For example, node 1 broadcasts on address 1, while node 2 broadcasts on address 2. Since node 1 also listens on address 2, and node 2 listens on address 1, the two nodes can exchange messages without conflict. Nodes determine their own node numbers by randomly drawing from a discrete uniform distribution over {1, 2, 3,. . . , N }, such that node i had a N1 chance of choosing any i ∈ {1, 2, 3,. . . , N }. When a node in the out of sync state comes in proximity of another node, there is a small ( N1 ) probability that the nodes share a node number and therefore will not be able to exchange messages. To overcome this issue, nodes in the out of sync state regularly change their node numbers by redrawing from the discrete uniform distribution and then re-configuring which addresses they broadcast and

Urban Mobility Swarms

15

listen on based on their node number. This change allows two nearby nodes with conflicting node numbers and addresses to get out of conflict. If a node encounters multiple synchronized nodes, it needs only to have a non-conflicting node number with one of them in order to synchronize with all of them, since the synchronized nodes share and communicate the same phase messages. Discussion of Alternatives. We also considered an alternative synchronization scheme that would allow all nodes to share one common address to broadcast and receive messages. In this simpler alternative, nodes only broadcast messages when their phase is at 0. (Nodes in the out of sync state broadcast at random intervals). Upon receiving such a message, a node sets its own phase to 0 and enters a state of synchrony with the message sender. Simplifying the message protocol in this way circumvents the issue of nodes sending messages over a shared channel and their messages conflicting. If two nodes do happen to send a message at the same time, then they must already be synchronized (they share a phase of 0 at the time of sending). Any other node in proximity that receives this broadcast will synchronize to phase 0 and then also broadcast its messages at the same time as the other nodes. This message passing protocol has been studied and modeled in relation to pulse-coupled biological oscillators where the oscillation is episodic rather than smooth [7]. Examples include the flashing of a firefly, or the chirp of a cricket, where instead of the system interacting throughout the period of oscillation, there is a single “fire” (e.g. flash or chirp) event that occurs at the end of the period. This simplified synchronization scheme works well for discrete episodic events among oscillators, and while it could work for our bike nodes, we chose not to use it because our bike nodes have a continuous behavior (Fig. 15). Because they update the amplitude of their light continually throughout their phase, synchronizing at phase 0 is as important as synchronizing at any other phase value. Moreover, this simplified messaging protocol would make the time to synchronization longer, dependent on the length of the period, T . Two nodes that come in to proximity for the first time but that are already oscillating with phases that are out of synchrony with each other would not have the opportunity to synchronize until one of their phases reaches 0 again. 4.4

Faulty and Malicious Nodes

We note the unlikely case of faulty nodes, which broadcast messages to the bike swarm network without following the same protocol as other nodes. These faults may occur because one bike’s system breaks or was badly implemented, or may be due to malicious actors. These faulty nodes can destabilize the synchronization of nearby swarms. However, the issue will be spatially isolated to the swarms within broadcasting range of the faulty node, while the rest of the network continues to function successfully.

16

A. Berke et al.

Fig. 15. The timing of message broadcasts in our synchronization scheme versus episodic message broadcasts in the simplified synchronization scheme.

5

Circuitry, Prototype Fabrication, and Tests

Our system design includes an integrated circuit. The circuit connects a lowcost radio to broadcast and receive the protocol messages, a microcontroller programmed to run our algorithm, and lights controlled by the microcontroller. The radio transmission range is limited by design in order to control swarm membership to only include nearby nodes (bikes). Our prototypes use nRF24 [13] radio transceivers to broadcast and receive the protocol messages without necessitating individual nodes to pair. The nRF24 specification allows for software control of transmission range, which is used to constrain the spatial distance between connected nodes. The Arduino Nano [14] microcontroller was selected to run the synchronization protocol and algorithm. The other components in our circuit were used for the management of power and the pulsation of lights. The circuit schematic and Arduino code are open source.3 We implemented and tested our system for urban mobility swarms by fabricating a set of 6 prototypes. The prototypes strap on and off bicycles from a city bike-share program, and we rode throughout our city with them over a series of three nights. We tested various scenarios of bikes forming, joining, and leaving swarms, as well as swarms passing, and swarms merging, as shown in video footage that is available online: https://youtu.be/wUl-CHJ6DK0. Also available online is detailed photographic documentation of the prototype development and deployment: https://www.media.mit.edu/projects/bike-swarm.

6

Conclusion

We designed a system for the urban environment that draws from swarming behavior exhibited in the natural environment. In this paper we presented urban

3

https://github.com/aberke/city-science-bike-swarm.

Urban Mobility Swarms

17

mobility swarms as a means to promote the use and safety of lightweight, sustainable transit. We described and demonstrated a system for their implementation, with a radio protocol, synchronization algorithm, and tested prototypes. The prototypes we designed are specific to synchronizing the lights of nearby bicycles in the dark. Riders within swarms collaboratively amplify the swarm’s effect and collective safety, yet coordination and formation of swarms requires no effort from the riders. The riders are automatically inducted into ad-hoc swarms when in proximity due to our simple, yet powerful system design. The system we implemented can be easily extended and applied to transit options beyond bicycles. More generally, our system treats individual riders as nodes in a decentralized network, and coordinates swarms as connected portions of the network, with a peer-to-peer message protocol and algorithm. Our design accommodates a dynamically changing network topology, as necessitated by the nature of an urban mobility network in which individuals are constantly moving, joining the network, or leaving altogether. Furthermore, the features of our decentralized design afford its flexible and secure implementation. There is no global clock and nodes communicate with minimal radio messages without sharing metadata, allowing new nodes to immediately coordinate with the system while maintaining an individual’s privacy. Moreover, our system can be deployed at scale, which we demonstrated by implementing it with simple, low-cost circuit and hardware components, and by testing with bikes from a city bike-share. Bike-share programs manage fleets of bikes distributed across cities and could deploy the system at city-scale. The system can be integrated into bicycles, or strapped on and off, as riders typically do with bike lights. Once deployed, our modular hardware and decentralized system design allows arbitrary bikes to form or further grow a network with ease.

References 1. De Hartog, J.J., Boogaard, H., Nijland, H., Hoek, G.: Do the health benefits of cycling outweigh the risks? Environ. Health Perspect. 118(8), 1109–1116 (2010) 2. Johansson, C., L¨ ovenheim, B., Schantz, P., Wahlgren, L., Almstr¨ om, P., Markstedt, A., Str¨ omgren, M., Forsberg, B., Sommar, J.N.: Impacts on air pollution and health by changing commuting from car to bicycle. Sci. Total Environ. 584, 55–63 (2017) 3. BBC: Air pollution: Benefits of cycling and walking outweigh harms - study, May 2016. https://www.bbc.com/news/health-36208003. Accessed 25 Sept 2019 4. Buck, J.B.: Synchronous rhythmic flashing of fireflies. Q. Rev. Biol. 13(3), 301–314 (1938) 5. Buck, J., Buck, E.: Synchronous fireflies. Sci. Am. 234(5), 74–85 (1976) 6. Strogatz, S.H., Stewart, I.: Coupled oscillators and biological synchronization. Sci. Am. 269(6), 102–109 (1993) 7. Mirollo, R.E., Strogatz, S.H.: Synchronization of pulse-coupled biological oscillators. SIAM J. Appl. Math. 50(6), 1645–1662 (1990) 8. Werner-Allen, G., Tewari, G., Patel, A., Welsh, M., Nagpal, R.: Firefly-inspired sensor network synchronicity with realistic radio effects. In: Proceedings of the 3rd International Conference on Embedded Networked Sensor Systems, pp. 142–153. ACM (2005)

18

A. Berke et al.

9. International Transport Forum: Cycling, Health and Safety (2013) 10. Jacobsen, P.L.: Safety in numbers: more walkers and bicyclists, safer walking and bicycling. Inj. Prevent. 21(4), 271–275 (2015) 11. Miao, G., Zander, J., Sung, K.W., Slimane, S.B.: Fundamentals of Mobile Data Networks. Cambridge University Press, Cambridge (2016) 12. Degesys, J., Rose, I., Patel, A., Nagpal, R.: DESYNC: self-organizing desynchronization and TDMA on wireless sensor networks. In: Proceedings of the 6th International Conference on Information Processing in Sensor Networks, pp. 11–20. ACM (2007) 13. Nordic-Semiconductor: nRF24 series (2019). https://www.nordicsemi.com/ Products/Low-power-short-range-wireless/nRF24-series. Accessed 25 Sept 2019 14. Wikipedia: Arduino—Wikipedia, the free encyclopedia (2019). http://en. wikipedia.org/w/index.php?title=Arduino&oldid=917538577. Accessed 25 Sept 2019

Using AI Simulations to Dynamically Model Multi-agent Multi-team Energy Systems D. Michael Franklin1(B) , Philip Irminger2 , Heather Buckberry3 , and Mahabir Bhandari3 1

College of Computing, Kennesaw State University, Marietta, GA, USA [email protected] 2 Power and Energy, Systems Group, Oak Ridge National Laboratory, Oak Ridge, TN, USA 3 Building Technology Research and Integration Center, Oak Ridge National Laboratory, Oak Ridge, TN, USA

Abstract. The complexity of energy systems is well known as they are complex and intricate systems. As a result, many extant studies have used many simplifications or generalizations that do not accurately reflect the nature of this complex system. In particular, most HVAC systems are modeled as a single unit, or several large units, rather than as a hierarchical composite (e.g., as a floor rather than as a collection of disparate rooms). The net result of this is that the simulations are too generic to perform meaningful analysis, machine learning, or integrated simulation. We propose using a multi-agent multi-team strategic simulations framework called SiMAMT to better define, model, simulate, and learn the HVAC environment. SiMAMT allows us to create distinct models for each type of room, hierarchically aggregate them into units (like floors, or sections), and then into larger sets (like buildings or a campus), and then perform a simulation that interacts with each sub-element individually, the teams of sub-elements collectively, and the entire set in aggregation. Further, and most importantly, we additionally model another ‘team’ within the simulation framework - the users of the systems. Again, each individual is modeled distinctly, aggregated into sub-sets, then collected into large sets. Each user, or agent, is performing on their own but with respect to the larger team goals. This provides a simulation that has a much higher model fidelity and more applicable results that match the real-world. Notice: This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-accessplan). c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 19–32, 2020. https://doi.org/10.1007/978-3-030-52246-9_2

20

D. M. Franklin et al. Keywords: Artificial intelligence · Multi-agent systems · Strategy Machine learning · HVAC · Building management systems

1

·

Introduction

Energy systems and thermodynamic interactions are intricate and complex [1,2]. Peak-shaving algorithms attempt to mitigate the high cost of operating HVAC systems by applying machine learning to discover moments where the HVAC control system can be reduced, shut-down, or adjusted [6,10]. In truth, the system can never be truly shut-down because air must always flow, but it can be approached. The overall effort is to learn the movements of the agents within the system to better understand their patterns of behavior. Once these patterns are understood, adjustments can be made to the control algorithm to adjust for peak-shaving - reducing usage during peak-moments of power utilization from HVAC operation. In this paper, we wish to produce a new paradigm for the modeling and simulation of high-demand HVAC systems. This new paradigm utilizes the SiMAMT framework [4], a system designed to simulate multi-agent, multi-team strategic interactions. This framework is utilized to create a highfidelity simulation to be the basis of the machine learning occurring in the HVAC system [3]. We chose this simulation framework because much of the existing work in this area uses lower fidelity generalized simulations. These common methodologies use average customers progressing across average days within average temperature ranges, marginalizing the variation within the system. This creates a system poorly tuned to the nuances of day-to-day operations within a complex system. Further, they miss the impact of the interactive nature of these environments. In the general example, these other simulation tools, even when they are highly detailed, model the building as a whole or in simple zones, discarding the people coming and going, differing configurations of rooms and floors, and the subtleties of which windows get more sun throughout the day; rather, they just simulate the outside temperature, the inside thermostat setting, and then render the effective temperature within the building. We wish to model the individual elements of the buildings and the nuances of the people as thoroughly as possible to capture the most data that we can. Additionally, the simulation uses predictive data analytics and machine learning to determine maximal responses to events within the simulation [11]. Our goal is to improve on the existing modeling and simulations systems by including these factors. We wish to set up two teams - one team is the campus, comprised, hierarchically, of rooms, units of rooms, floors, and buildings; the other team is the student body, comprised, hierarchically, of the individuals, the groups of individuals, and the sets of groups. The two teams will interact with each other in the simulated system to produce a high-fidelity model of the HVAC system. Additionally, multi-agent systems are difficult to model and make predictions in [12], but even more so when we have hierarchical layers of agents and groups of agents in teams.

Using AI Simulations to Dynamically Model

2

21

Background

There are many factors that must be considered when understanding the energy profile of a building. The EnergyPlusTM [8] modeling system is a highly-detailed building simulation tool developed by the Department of Energy (DOE). It factors in a wide variety of thermal loads from thermal envelopes, vents, air handlers, ceilings, floors, etc. It measures the thermal loads generated by each of these various inputs and calculates a total energy requirement to cool or heat the environment. This program has been used with great effect to model energy usage and HVAC system performance for several years. SiMAMT uses a set of models to declare the behavior of each disparate element of the simulated environment. The model-based approach allows for each agent being managed to hold varied, flexible, expressive, and locally powerful behaviors. These models can be designed as Finite-State Automatons (FSAs) or Probabilistic Graphical Models (PGMs). The system architecture allows for these models to be represented as either a diverse set of graphs, where each one represents a policy as a walk through a graph of possible sets of actions or choices, or as a set of multiple isomorphic graphs where the weight of the edges encode the decision matrices. Thus the system allows for multiple layers of hierarchical reasoning and representation where, at each level, there is an overarching directional policy guiding the collected actions of the group of agents to accomplish a shared goal. This allows for a shared, distributed intelligence while preserving the individual agent policies. Further, since the large-scale policies flow downward (the coalition strategy flows down to the teams, the strategy directs each team uniquely, and the team policies direct each agent by assigning behaviors to each) while the information used for decisions flows upward (each agent collects observations and information and sends them to their team, the team collects, collates, and processes that information and sends higher-level observations upstream to the coalition). The output of the SiMAMT based simulation is the total energy consumption of the campus. We will use EnergyPlusTM to calibrate our simulation to make sure the our version of the modeling and simulation works accurately. The total energy cost from the SiMAMT simulation is then compared to the EnergyPlusTM total cost to test the veracity of the system. Once this is confirmed, the system will then be pushed to learn the best algorithms to run the system most efficiently. These final results will be the final output of our tests.

3

Methodology

HVAC systems have large diversity in operating systems, methods, and realizations, but, generally, they operate on a few standard principles. For this discussion, the HVAC system has a set point, the temperature that it is working to achieve. If the ambient temperature is 68 degrees and the set point is 70◦ , the HVAC system will work to increase the temperature to 70◦ . In higher-end HVAC systems there are systems that can regulate temperatures within certain

22

D. M. Franklin et al.

sub-zones by using baffles to increase or decrease the amount of regulated air flowing into that sub-zone. The effect is that the sub-zone can have a different temperature from this set point, and this delta of temperature is called the offset. There are limitations, of course, to the amount of offset achievable, but it is reasonable to have an offset of ±3◦ . From a function perspective, the HVAC system can be told to hold a certain temperature so that it will add cool air or warm air to maintain a certain temperature. This is the primary function of the thermostat, but not all systems have the ability to switch between cooling and heating. Many systems, mostly home HVAC systems, are designed to only do one or the other. For example, if a home thermostat function was set to Heat and the temperature set to 68◦ , the HVAC system would turn on the heat anytime the temperature went below 68◦ in an effort to maintain the set point. However, if the temperature were to rise to 80◦ the HVAC system would not react since it is in Heat mode. The converse is also true, if set to Cool and 68◦ , it will cool when the air is above 68◦ , but will not react when the temperature drops as far down as even 32◦ . Higher-end home HVAC systems and most commercial systems use an additional set of relays to shift the function of the HVAC system from Heat to Cool as appropriate. Many of these systems would have two set points, one for the Heat function set point and one for the Cool function set point. In this scenario the HVAC system will work to maintain a comfortable temperature range. This terminology is used throughout the rest of the paper. First, we had to model the occupancy of the building by creating a population of agents within the system. Recall that the overall system goal is to increase fidelity, so it was imperative that the population of agents within the system match accurately the real-world population that they are modeling. To that end, there was a study made on the population as far classifying them into groups, understanding their schedules and preferences, and matching their behaviors to the tasks within the system. While this is the general process, specifics are provided here to ground the work. In the example scenario, a large campus HVAC management system, the population are students within the campus. These students have a wide variety of schedules that vary between taking classes at various times, attending and participating in extracurricular activities, and socializing with other students. The study revealed that although there is a lot of variation within this population, there are several major groupings that emerge. The final result was a series of five different types of schedules that represent the major combinatorial variants. Each student within the population was then assigned to each of these five groups in order to gather a distribution that matched the student population. In the general sense, these population distributions can be modeled in any standard, normal, or algorithmic distribution, or they can be modeled by hand (if known) or learned through machine learning. In this case, the distribution was learned. The number of groups settled at five, but the algorithm tested many options for groupings to find the lowest statistical error to the actual population to do so. In addition to modeling their schedules, there was the need to model their preferences. As outlined before in brief, this meant understanding their preferences in relation to the set temperature.

Using AI Simulations to Dynamically Model

23

Again, the study determined that the preferences fell within ±3◦ of the set point. The distribution of these preferences follows the normal distribution with most people preferring to be within one degree of the set point and few wishing to be three degrees away. This distribution is vital to the veracity of the model. Each of these components formed the high-fidelity model that constitutes the agent population, hierarchically modeled as individuals comprising teams comprising the population. Second, we had to model the buildings within the campus. As previously explained, this is modeled hierarchically from the room to the unit to the floor to the building to the campus. The main component of cost from the building perspective is heat load. This heat load, measured in BTUs, is the amount of heat transferred into the building from the external factors, such as sunlight, and internal factors, such as lighting and motors. The EnergyPlusTM system was utilized to model the loads induced into the system from the windows, roofs, floors, walls, ambient loads, etc. This modeling was exhaustive and every effort was made to include all heat loads into the system so that the fidelity of the model was maintained. This data was then fed into the simulation as the contribution to the overall cost function for each room in the system (each room, as indicated earlier, has individual elements that determine its contribution to the heat load). Each room was then aggregated, along with its heat load, into a unit of rooms that all share some of these characteristics (like the fact that they are South facing). These units add their shared heat load characteristics to the sum total of the rooms, and this continues on up the hierarchy to the floors. Each floor has its own characteristics, like being on the ground floor, or being the top floor, that also adds to or subtracts from the heat load. Again each element was modeled algebraically with the variables of outside temperature and incident sunlight left as unknowns. This provides a model for the whole building made up of many smaller models that all vary with the outside temperature and the incident sunlight on each individual unit. To model any given day, data is collected from NOAA [7] for the region of the world where the campus is located. This data is then played out across the simulation day to vary the heat loads across the day. As this happens, the heat load is generated by each room, then up to each unit, then up to the floor, then the building, and ultimately the campus. To be clear, this is not a calculated heat load derived en masse, but rather it is being aggregated by each individual unit being modeled throughout the day as temperatures and sunlight vary. This means that each slice of each day of an entire year can be utilized to produce a high-fidelity, high-accuracy simulation of the HVAC system for the campus. Third, we modeled the interactions. There were a few options for how the HVAC system could operate, so we modeled each type of interactive behavior. The first was a basic hold function where the HVAC system holds one temperature throughout the day, or through large portions of the day. The second was with a machine learning algorithm controlling the thermostat to adjust it throughout the day as heat load and temperature vary to use micro-adjustments to save money. In this second scenario there are no occupancy detectors and no

24

D. M. Franklin et al.

baffles to create sub-zones. The third was to introduce sub-zones with offsets so that there were unique preferences for each room in the building and looking at how well the system could manage the HVAC with those dynamic load issues. Further, the movements of the students factored in en masse, but not per room because there were no occupancy sensors. The fourth scenario included the occupancy sensors so that the simulation could track the students as they moved to and from classes. With this additional data, the simulation could fully perform machine learning on the patterns of movements and behaviors of the agents within the simulation and thus much more closely model the real-world scenario. The initial setup of the simulation is shown in Fig. 1. Please note, due to privacy concerns, the image is small. The goal is to show a typical single floor layout of a multi-floor building, similar to a dormitory. It shows one of the building being simulated on the campus. This particular building has six floors. This color of each room indicates it current occupancy setting. The small dots that are visible in this figure, and in Fig. 2, show the agents as they move about the building on their daily routine. Fig. 3 shows a single floor view. The simulation is an interactive time application, so the user can view the entire campus, a single building, or a single floor, and see tabular data for each view showing energy consumption, cost estimates, current set points, distribution of offsets, and much more. The SiMAMT framework provides the interactive simulation and gathers the data for the final results.

Fig. 1. Initial single building view of the simulation

There are some additional real-world considerations that must be factored in. While a simpler machine learning algorithm might just turn off the HVAC

Using AI Simulations to Dynamically Model

25

Fig. 2. Initial single building progress view of the simulation

Fig. 3. Single floor view of the simulation

system during off-peak or reduced load times, the real-world system must keep some air circulating to avoid bad air quality and mold and other stagnant air problems. There are also issues with hard-locking compressors and other mechanical components when the system switches rapidly back and forth from Heat to Cool. While this may be fine in a computer simulation, it is not sustainable in the real world because of the damage to the system. As a result, we factored in reset times and switchover delays (like using a small  value between function changes) to avoid such issues. It is also important to recognize the reality of the exponential nature of heat exchange. Allowing the environment to reach an untenable temperature will require a disproportionate amount of energy to

26

D. M. Franklin et al.

resolve. This non-linear factorization must be a part of any energy calculations because it is how the real-world works, no matter how inconvenient it may be for the simulation. In all, these factors reduce the efficiency of the resultant simulation by 4–9% in aggregation, but they increase veracity by over 10%, and that is critical to us. Additionally, it is critical to note that the simulation system introduces statistical noise and variance into the process to produce realistic results. There are many small but significant factors that affect the performance of real-world systems, and these factors need to have a place in any simulation that attempts to model such real-world scenarios. Once the models were completed, the simulation was run with a variety of settings and factors. The factors are covered in the Experiments section, but the reinforcement learning is described here. While the SiMAMT framework allows for many types of learning, and even supports mixing learning styles, we will illustrate with a variant of SARSA-λ originally created in [5]. Each layer of the agent model is performing SARSA-λ, though with different ranges and setups. Each layer is learning according to the update function shown in Eq. 1. This updates the Q(s, a) table by utilizing the reward r for moving to the next state, the next values provided from taking the chosen next action (a ) from the next state (s ) (stored as Q(s , a )). It is mitigated by the learning rate, α. The algorithm for the updates and the movement tracking history is shown in Fig. 4. This shows the step by step updates shown in Eq. 2. The update amount, the δ, is calculated in Eq. 3. The e-table is incremented for every space that is visited, according to Eq. 4. The decaying updates in the e-table are updated according to Eq. 4 using the discount rate γ and the decay rate λ. This results in an eligibility trace (a history of decaying rewards based on the previously visited, and, thus, eligible spaces that can receive an update/reward). These traces are similar to those shown in Fig. 5. Q(s, a) = Q(s, a) + α(r(s , a ) + γQ(s , a ) − Q(s, a))

(1)

Q(s, a) = Q(s, a) + αδe(s, a)

(2)

δ = r(s , a ) + γQ(s , a ) − Q(s, a)

(3)

e(s, a) = e(s, a) + 1

(4)

e(s, a) = γλe(s, a)

(5)

While this is only one example, it shows the flexibility of the framework that even a normally lower performing algorithm like SARSA-λ can, with modification, provide in-depth insight into the modeling of complex systems. This framework also works with any temporal difference learning or Q-Learning technique, as well as modified genetic algorithms and deep learning [4].

Using AI Simulations to Dynamically Model

27

Fig. 4. SARSA-λ algorithm [9]

Fig. 5. SARSA-λ eligibility traces [9]

4

Experiments

First, it was critical to establish veracity, ensuring that the simulation produced the same final total heat loads and total costs that the subject system did. Using a model day for consistency, we compared the final total heat cost for both the simulation and the system to make sure that we had calibrated our system correctly. The experiment was run successfully and confirmed that the simulation was accurate to the real world heat loads. Fig. 6 shows that the simulation results are right in line with the EnergyPlusTM showing only about an 8% variance across the entire simulation. Importantly, the shape of the days energy usage matches in both iterations. An important note, this graph is below the 0 line,

Fig. 6. SiMAMT vs EnergyPlus

28

D. M. Franklin et al.

meaning that lines higher on the graph represent a lower energy cost, and those on the bottom of the graph represent a larger energy cost (the rate is expressed in negative numbers). Having verified the veracity of the system, the next step was to examine the effects of the hold temperatures and its effect on the total energy usage. The model day was fixed to examine the effects of only one variable at a time. As mentioned previously, there is still some variation because of the statistical noise and variance of the simulated elements, but the consistency is still there across the variety of experiments. In this experiment the hold temperature was incremented slowly across the range from 70◦ to 80◦ in two degree increments. Each full day was run multiple times and at each set point and the data was recorded for each time slice. The results are shown in Fig. 7 and show that the energy usage scales linearly across the set points. This finding is important as it indicated one of the first surprising results. While, predictably, total energy cost was higher at each set point, it did not increase beyond linearity, so any set point is a viable option without additional penalties beyond the linear increase in cost. The consistent shape reaffirms the energy usage pattern of the day and further validates the model. Knowing the relationship across the set points, we wanted to allow the algorithm to learn a control schema to decrease the costs, if possible, by making small micro-adjustments throughout the day. The machine learning algorithm, a variant of Q-learning that uses Reinforcement Learning to make adjustments over time, was rewarded for lower energy costs. As a result, even with no other data, it learned to be approximately 5% more efficient than the standard hold control function. This was an exciting result because it showed that even placing the thermostats under computer or algorithmic control was already paying dividends. Fig. 8 shows the energy costs across the day under the standard hold function vs. being under the learned model control. This model, again, was trained using reinforcement learning, but based on the models of the system already discovered during the initial study phase. These models were tuned, so they took the domain knowledge of the hold function and learned from that point (rather than learning from scratch). The result was that it took much less

Fig. 7. Energy cost by variation of setpoint

Using AI Simulations to Dynamically Model

29

time (fewer iterations) to see positive results. This was both a surprising and welcome finding, and a next step in the progression of algorithmic control for the HVAC system.

Fig. 8. Energy cost of hold vs learned

Building on this success, the next step was to include occupancy sensor data in the simulation. We ran the same day of temperature and sunlight data and added in the ability to learn the habits of the occupants as the simulation progressed. Initially, for this experiment, it was a reactive system. There was a three stage system for determining occupancy to make it more realistic for real-world usage. A naive algorithm would perhaps just shift to ‘Away’ mode immediately upon exit, but if the agent returned immediately it would shift right back. This is not healthy for the system, though it would make more impressive algorithmic results; however, our goal was fidelity. When there was at least one occupant in the room the room was ‘Occupied’. When the last person left the room it would shift to ‘Recently Occupied’. After remaining unoccupied for a set period of time (e.g., 15 min), it would shift to ‘Unoccupied’. The algorithm makes no adjustments to rooms until this final state, preserving the integrity of the system. The results, shown in Fig. 9, show the dramatic effect in energy savings achievable once using occupancy. Recall, graphs show negative numbers, so the higher the line the more efficient the algorithm and the lower the total cost.

Fig. 9. Energy cost with and without occupancy sensors

30

D. M. Franklin et al.

Once we had proven these methodologies via these initial experiments, we moved on to modeling a larger series of days. In the next experiment, the algorithm ran for 45 days and learned as it went. This experiment was run for three trials in one instance and four trials in the second. The first experiment was using an algorithm for Maximum Offset. In this method, unoccupied rooms were shifted to the maximum allowable offset temperature (e.g., +3◦ offset above set point). Once the room became occupied, the system would then reactively shift to the desired offset for the occupant, moving back towards that temperature for them. The prediction was that this schema would produce the highest savings because it was the least concerned with comfort and was rewarded for saving money, not creating comfort. The surprising comparative results will follow the second part of this experiment. For now, Fig. 10 shows the results of the Max Offset algorithm for the total energy costs across the 45 days. The three trials show the progression of the learning (Energy 1 is first trial, Energy 2 builds on that, etc.). Next, the AI shifted from reactive to proactive. The reward for this machine learning was based on Maximum Comfort (keeping the agents as comfortable as possible, keeping the room temperature closest to their desired offset). In this modality, the AI learns the occupancy behaviors of the agents within the system. Now, once the room is unoccupied, the learning offers new insight into the next behavior. As this unoccupied time progresses, the room would warm up until the algorithm predicted the return of the occupant (part of the larger artificial intelligence within the SiMAMT system to learn behaviors of the agents within the simulation, predict those behaviors, and adapt the system to proactively adjust based on predicted actions). The algorithm calculates the time needed to return the room to the desired temperature from its current temperature and starts the cooling process that long before the predicted arrival of the occupant. If it would take 15 min to return the room to the occupant’s offset temperature, the HVAC system would start cooling 15 min before the occupant’s predicted arrival. The results of this algorithm, shown in Fig. 11 were predicted to be better than the normal offset, but not as good as the Max Offset formulation. However, the

Fig. 10. Energy cost of max offset algorithm

Using AI Simulations to Dynamically Model

31

results were a surprise - the Max Comfort algorithm actually uses less energy than the Max Comfort algorithm. The four trials show the progression of the algorithm as it learns. The learning rate is also shown in the next figure, Fig. 12, to indicate the progression of the learning over the four trials.

Fig. 11. Energy cost of max comfort algorithm

Each of the progressive experiments proved in greater detail that the system works and that the computer controlled HVAC system with predictive AI will consistently reduce energy and save money.

Fig. 12. Learning rate of the max comfort algorithm Table 1. Total energy consumption comparison Cooling schema

Total enery Percent savings

Baseline

−54,813.30

Predictive w Max Offset

−38,891.46 29.0%

Predictive w Max Comfort −36,631.41 33.2%

32

5

D. M. Franklin et al.

Conclusions

These experiments show that the SiMAMT multi-team multi-agent framework can produce reliable, repeatable results. The framework is shown to model both the population of agents and the buildings accurately and with high-fidelity. Further, the learning algorithms are effective at learning, the AI is effective at predicting, and the simulation framework can model an environment for learning, reacting, predicting, and controlling the HVAC system of the future. The final results of the simulation show that the potential energy savings for this fully-modeled system, with both occupancy sensors and predictive analytics, can reduce energy consumption (and costs) by over 30%, as shown in Table 1.

References 1. Cook, D.: Discovering activities to recognize and track in a smart environment. IEEE Trans. Knowl. Data Eng. 23(4), 527–539 (2011) 2. Cook, D.: Learning setting-generalized activity models for smart spaces. IEEE Intell. Syst. (2011, to appear) 3. Franklin, D.M.: Strategy inference in multi-agent multi-team scenarios. In: Proceedings of the International Conference on Tools for Artificial Intelligence (2016) 4. Franklin, D.M., Hu, X.: SiMAMT: a framework for strategy-based multi-agent multi-team systems. Int. J. Monit. Surv. Technol. Res. 5, 1–29 (2017) 5. Franklin, D.M., Martin, D.: eSense: BioMimetic modeling of echolocation and electrolocation using homeostatic dual-layered reinforcement learning. In: Proceedings of the ACM SE 2016 (2016) 6. Leemput, N., Geth, F., Claessens, B., Van Roy, J., Ponnette, R., Driesen, J.: A case study of coordinated electric vehicle charging for peak shaving on a low voltage grid. In: 2012 3rd IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe), pp. 1–7, October 2012 7. US Department of Commerce: National Oceanic and Atmospheric Administration (2019) 8. Department of Energy: EnergyPlus - A Whole Building Modeling Software (2018) 9. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. pages Chap. 4, 5, 8 (1998) 10. Wang, Z., Wang, S.: Grid power peak shaving and valley filling using vehicle-to-grid systems. IEEE Trans. Power Delivery 28(3), 1822–1829 (2013) 11. Weber, B.G., Mateas, M.: A data mining approach to strategy prediction. In: 2009 IEEE Symposium on Computational Intelligence and Games, pp. 140–147, September 2009 12. Yang, E., Gu, D.: Multiagent reinforcement learning for multi-robot systems: a survey. Department of Computer Science, Univeristy of Essex, Technical report (2004)

Prediction of Cumulative Grade Point Average: A Case Study Anan Sarah(B) , Mohammed Iqbal Hossain Rabbi, Mahpara Sayema Siddiqua, Shipra Banik, and Mahady Hasan Independent University, Bangladesh, Dhaka, Bangladesh [email protected], [email protected], [email protected], {banik,mahady}@iub.edu.bd

Abstract. Cumulative Grade Point Average (CGPA) prediction is an important area for understanding tertiary education performance trend of students and identifying the demographic attributes to devise effective educational strategies and infrastructure. This paper aims to analyze the accuracy of CGPA prediction of students resulted from predictive models, namely the ordinary least square model (OLS), the artificial neural network model (ANN) and the adaptive network based fuzzy inference model (ANFIS). We have used standardized examination (Secondary School Certificate and High School Certificate) results from secondary and high school boards and current CGPA in respective disciplines of 1187 students from Independent University, Bangladesh from the period of April 2013 to April 2015. Evaluation measures such as- Mean absolute error, root mean square error and coefficient of determination are used as to evaluate performances of abovementioned models. Our findings suggest that the mentioned predictive models are unable to predict CGPA values of the students accurately with currently used parameters. Keywords: Prediction · CGPA · Classical prediction model · Soft computing models · Mean square error · Root mean square error · Coefficient of determination

1 Introduction Economic, technological, innovative growth of a society is vastly dependent on the ratio of the population receiving tertiary education. To achieve sustainable growth in the industry-both in governmental or private sectors, employers search for highly skilled workforce. Tertiary education ensures increased individual specialized skills necessary for the sustainable economic growth. Cumulative Grade Point Average (CGPA) is a reliable performance metric of a student’s overall tertiary academic achievement. CGPA measures students’ performance by calculating their average grade points obtained from all courses that the students have completed. Academic administrators need to understand the student demographic to provision effective strategies for improvement of academic curriculum and infrastructure. © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 33–42, 2020. https://doi.org/10.1007/978-3-030-52246-9_3

34

A. Sarah et al.

It’s imperative for the administration to identify the crucial underlying factors and their impact on an individual student’s CGPA. Determining students who are at the risk of academic failure and formulating strategies to reduce it, will facilitate to increase the academic success rate. In this context, it has proven particularly important to construct a predictive model for student academic performance. In this paper, our plan is to find out an efficient predictive model for predicting CGPA of Independent University, Bangladesh (IUB) students. Predictive models will be constructed based on the following parameters: (i) prior performances of students in standardized board examination such as SSC (secondary school certificate) and HSC (high school certificate) (ii) University admission test results (Mathematics and English). We have used the Ordinary Least Square (OLS) method, the ANN model and the ANFIS model to estimate IUB students CGPA [2, 12]. According to our knowledge, this type of study has not yet been conducted in the context of our country, Bangladesh. Thus, we believe that this study will make an important contribution in CGPA forecasting literature. The paper intends to portray an abrupt description of related work of prediction models to predict CGPA in Literature review section which is followed by Methodology of the selected predictive models in Sect. 3. In Sect. 4, we described the experimental design of the predictive models. The next section that is result and discussion talks about the findings made by the implemented models. The paper concludes with scope of subsequent work.

2 Background Details and Related Work Numerous works [1–7] has been done in literature associated with CGPA prediction and improving student learning based on the techniques such as Neural networks, Bayesian model, maximum-weight dependence trees and many others. Hien and Haddawy [1] used the Bayesian model to predict CGPA based on Master’s and Doctoral applicants background at the time of admission by the mean absolute error and relative mean square as evaluation measures. Their results show that according to their findings Doctoral model performed better as compared to the other model. Oladokun et al. [2] built an artificial neural network (ANN) model to determine student’s performance based on several factors that affect a student’s academic results. The accuracy of artificial network was 74%. The ANN topology was constructed on multilayer perceptron. The network’s performance was measured with the mean square error. Wang and Mitrovic [3] also established a model to foresee student’s results depending on ANN. A feed forward ANN with four inputs, a single hidden layer, and a single output where delta-bar-delta back-propagation (BP) and a linear tanh transfer function was used to test the accuracy of the prediction which was 98.06%. Gedeon and Turner [4] use a full causal index method based on the ANN to calculate the probable concluding grade to be achieved by a student by their current performance and partial marks. A back-propagation trained feed-forward ANN was trained to perform this experiment on an array of 153 samples which were the class results of undergraduate Computer Science subject. The applied model shows correct output for 94% cases. Fausett and Elwasif [5] trained the ANN model to foresee student’s performance in the assignment trial. Two ANNs that is back-propagation and counter-propagation were trained to anticipate student’s reasonable performance in a

Prediction of Cumulative Grade Point Average

35

subject based on their placement test responses. Their findings show that the BP networks achieved very high level of accuracy in predicting student performances. Zollanvari et al. [6] created a foretelling model of GPA with a maximum-weight first-order dependence tree structure. The assembled model distinguishes the set of training data with 82% precision. Rusli et al. [7] created three predictive models (namely logistic regression, ANN and adopted neural fuzzy inference system (ANFIS)) based on students’ pervious performance and the first semester’s CGPA of the undergraduate degree. The models efficiency was estimated by the root mean squared error and their results show that the ANFIS is superior to the other models. According to our knowledge, these are the studies available in recent literature.

3 Proposed Approach In this case study three methods have been proposed due to the nature of the data. These forecasting models are also broadly used in diverse scenarios [14, 15]. The methods are The Ordinary Least Square (OLS) method, The Artificial Neural Network (ANN) Method, The Adaptive Network based Fuzzy Inference (ANFIS) Method, this linear regression model and two neural networking were proposed in order to perceive the algorithm that works best for the predictive model. The data set used in this study CGPA (measures students’ performance by calculating their average grade points obtained from all courses that the students have been completed) of IUB students dated from April 2013 to April 2015, whose number of credits completed is greater than or equal to 90 [Data source: IUB database (www.iras.iub.edu.bd)]. Data set consists of the student’s GPA of SSC (SSC_GPA) and HSC (HSC_GPA) examinations, university admission test marks of English (IUBAENG_S) and Mathematics (IUBAMAT_S) along their current CGPA. The size of the dataset is 1187. Before applying our selected predicted models to predict CGPA, we have created a numerical summary (to give an idea about basic properties of the considered raw data sets) and results are tabulated in the Table 1. Table 1. Numerical summary of the considered data set CGPA

IUBAENG_S

IUBAMAT_S

HSC_GPA

SSC_GPA

Minimum

1.90

8.00

0.00

3.20

3.44

Maximum

3.99

48.00

50.00

5.00

5.00

Mean

2.81

24.78

20.05

4.51

4.65

Standard deviation

0.43

7.65

8.59

0.46

0.39

Skewness

0.34

0.63

0.75

−0.57

−1.01

Correlation with CGPA



0.30

0.11

0.32

0.34

From Table 1, it is observed that the minimum CGPA is 1.90 and the maximum CGPA is 3.99. We found that for example, most of students CGPA around 2.81. To understand very clearly whether all students CGPA is equal to 2.81 or not, we have

36

A. Sarah et al.

calculated standard deviation of CGPA. We noticed that CGPA varies from the mean value of 2.81. So, we may conclude that CGPA is ranging from 2.38 to 3.34. In addition, to understand whether our considered variables are symmetric or not, we have calculated coefficient of skewness. For example, skewness of CGPA 0.34 it means that CGPA is positively skewed. More clearly, we may say that most of the students CGPA below 2.81. Skewness of HSC_GPA -0.57 means that HSC_GPA is negatively skewed which means most of the students HSC_GPA over 4.51. To understand the relationship of CGPA with other considered independent variables, we have used the graphical method (scatter diagram [see Fig. 1(a–d)]) and numerically calculated coefficient of correlation. For example, Fig. 1(c) shows that there is a poor positive relationship exists between CGPA and SSC_GPA. More clearly, if we select one student randomly, we can conclude that if that student’s SSC score increases, there is 34% chance CGPA might increase.

Fig. 1. (a). Scatter diagram between CGPA andIUBAMAT_S Considered independent variable, (b) Scatter diagram between CGPA and IUBAENG_S considered, (c) Scatter diagram between CGPA and SSC_GPA considered independent variable, (d) Scatter diagram between CGPA and HSC_GPA considered independent variable

Figure 2 represents the general approach that was used for all the methods. The beginning step was to collect the data and filter them as needed. Then the filtered data were trained and tested using the three proposed methods. The results which is the accuracy level of the proposed methods were then compared and analyzed to make a decision.

Prediction of Cumulative Grade Point Average

Data Collection

Training and Testing with OLS, ANN and ANFIS

Data Filtering

Decision

37

Comparing the results

Fig. 2. General approach used for the methods

The widely used forecasting models that are used to predict CGPA, briefly discussed as follows: 3.1 The Ordinary Least Square (OLS) Model It is a technique that is used to estimate unknown parameters for a population linear regression model. To understand this model clearly, consider the following model: CGPAi = β0 + Xi β1 + ei , i = 1, 2, . . . , n where CGPA is the dependent variable, β0 is a constant, X is a vector of selected independent variables included SSCGPA, HSCGPA IUBAENG_S, IUBAMAT_S, β1 is the vector of parameters of independent variables and ei ~ iid (μ, σ2 ), which means ei is independent and identically distributed (iid) with mean μ and variance σ2 . In matrix form, we can rewrite the above equation as, CGPA = Xβ + e. Our target is to estimate β of the above model based on collected data sets, which is T minimizing error sum of squares w.r.t parameters, defined as follows: ∂e∂βe = 0. After ˆ (X T X)−1 X T CGPA. Thus, the estimated model solving above equations, we get β= ˆ = X β. This predicted CGPA based on our selected independent variables become CGPA model will be used to predict CGPA based on our selected independent variables. 3.2 The ANN Model Culloch and Pitts [8] proposed this model, contains the following processing functions: (i) (ii) (iii) (iv) (v)

Having inputs Allotting proper weight coefficient of inputs Computing weighted sum of inputs Comparing this sum with some threshold and Achieve suitable output value (see Fig. 3).

In this Fig. 3, a configuration of ANN with 1 input layer, two hidden layers (with sufficient no. of neurons) and 1 output layer is also presented for better understanding to reader.

38

A. Sarah et al.

Fig. 3. An ANN design

The net input as defined as: n = WX + b Here, R is the number of units in input vector and N are the no. of neurons in the hidden layers. The training algorithm is the standard back propagation, uses the gradient descent tool to minimize error over all training data. During training, each desired output CGPA is compared with the actual output CGPA and computes error at the output layer. The backward pass is the error back propagation and adjustments of weights. Thus, the network is adjusted based on a comparison of the output CGPA and the target until the network output CGPA matches the target. After the training process is ended, the network with specified weights can be used for testing a set of data different than those for training, which will be used to predict CGPA based on our selected independent variables. For details see Culloch and Pitts [8]. 3.3 The Adaptive Network Based Fuzzy Inference (ANFIS) Model Based on the fuzzy set theory and fuzzy logic, Jang [9] developed this very important soft computing forecasting model in forecasting literature. A brief description of this model is as follows (details see Jang [9]): ANFIS is a mixture of ANN and fuzzy inference system (FIS) in such a way that the ANN learning algorithm is used to find the parameters of FIS. An ANFIS architecture is presented in Fig. 3, which shows that this design has 5 layers (1 input layer, 3 hidden

Fig. 4. An ANFIS architecture

Prediction of Cumulative Grade Point Average

39

layers that represents membership functions and fuzzy rules and 1 output layer). Usually, ANFIS uses the Sugeno FIS model as a learning algorithm. In the above Fig. 4, the circular nodes represent fixed nodes whereas the square nodes are nodes that have parameters to be learnt. Each layer in this figure is associated with a particular step in the FIS (details see Jang [9]). The output structure can be rearranged as: CGPA = XW where X = [W1 x, W1 y, W1 , W2 x, W2 y, W1 ] and W = [p1, q1, r1, p2, q2, r2]T (details see Jang [9]). When the input-output training pattern exist, the vector W can be solved using the ANFIS learning algorithm.

4 Results To develop a predictive model that works efficiently to estimate CGPA based on our selected input an error and trial approach is used. Two different set of data is used for training and testing (for details, see [13]). Results To compare the performance of ANFIS, CGPA is also predicted using the ANN and the OLS models. An ANN topology of 4:10:6:1, learning rate of 0.15 and a momentum parameter 0.98 is chosen using the error and trial method. The learning rate parameter controls the step size in each iteration (for details, refer [10, 11]). The momentum parameter avoids getting stuck in local minima or slow convergence (details see [12]). We also performed the same prediction using the OLS model, minimizing error sum of squares w.r.t each parameters for considered inputs. Selected prediction models performances have been measured by mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), root mean square percentage error (RMSPE) and also by the coefficient of determination R2 . MAE(MAPE) and RMSE(RMSPE) are used to measure the accuracy of prediction through representing the degree of scatter. R2 is a measure of the accuracy of prediction of the trained network models. Lower values of MAE (MAPE), RMSE (RMSPE) and higher R2 values indicate better prediction (details see [12]). In Table 2, we reported performances of different considered prediction models achieved for the CGPA values using selected error measures. To understand very clearly, selected error measures are visualized in Fig. 5, 6 and 7 respectively. It is observed that for both of error measures, the ANFIS prediction model has lowest MAPE/RMSPE as compared to other selected prediction models ANN and OLS. Figure 7 presents values of coefficient of determination for all chosen prediction models. We found that R2 value (indicates the accuracy of prediction is 29%) for ANFIS again outperforms ANN and OLS [12]. According to our study, it may be concluded that the ANFIS prediction model cannot be used to predict CGPA based on our given inputs.

40

A. Sarah et al. Table 2. Performance measures of selected systems. OLS

ANN

Train MAE MAPE (%) RMSE RMSPE (%) R2

Test

ANFIS

Train

Test

Train

Test

0.3038

0.3236

0.2998

0.3181

0.2634

0.2823

11.3601

11.7203

11.2245

12.0256

10.1334

10.9684

0.3723

0.3959

0.3689

0.3904

0.3470

0.3725

14.2300

14.6900

13.6732

13.4268

11.0453

11.9674

0.2077

0.1414

0.2222

0.2576

0.2467

0.2890

MAPE 13 12

11.72 11.36

12.02 11.22

11

10.96 10.13

10 9

OLS

ANN Training

ANFIS

Testing

Fig. 5. MAPE of training and testing data

RMSPE 20 15

14.2314.69

13.6713.43

11.97 11.05

10 5 0

OLS

ANN

ANFIS

Training Testing

Fig. 6. RMSPE of training and testing data

Prediction of Cumulative Grade Point Average

R-Square

0.4 0.3

0.26 0.22

0.21 0.2

41

0.289 0.25

0.14

0.1 0

OLS

ANN Training

ANFIS

Testing

Fig. 7. R-square value of training and testing data

5 Conclusion This paper aims to find out if CGPA values can be predicted based on selected inputs using several prediction models such as the OLS model, the ANN model and the ANFIS model. We measured performances of considered models using various evaluation measures, namely MAE, MAPE, RMSE, RMPSE and coefficient of determination. We observed that the ANFIS prediction model has the lowest MAE/MAPE for testing and training data respectively as compared to the ANN and OLS predictions models. We also found that w.r.t RMSE/RMPSE, the ANFIS prediction model performs better than other two selected models. Coefficient of determination value for the ANFIS model is higher indicates better prediction rate in achievable than other selected prediction models. Our study findings based on selected inputs conclude that the ANFIS, ANN and OLS prediction models are not able to predict with CGPA with better accuracy. We believe that results of this study will be considered as a helpful contribution in forecasting areas. There are other prediction models besides the aforementioned three models in prediction literature, such as weighted least square model, Bayesian prediction model, decision tree model, genetic algorithm prediction model and others. Our next plan is to implement these models on qualitative data along with the quantitative data which might improve the accuracy rate of prediction of CGPA. The qualitative data might include information on student’s psychological state, physical condition, economic and social state which can be helpful in our future research.

References 1. Hien, N.T.N., Haddawy, P.: A decision support system for evaluating international student applications. In: 37th ASEE/IEEE Frontiers in Education Conference, Milwaukee (2007) 2. Oladokun, V., Adebanjo, A., Charles-Owaba, O.: Predicting students’ academic performance using artificial neural network. Pac. J. Sci. Technol. 9 (2008) 3. Wang, T., Mitrovic, A.: Using neural networks to predict student’s performance. In: International Conference on Computers in Education, Auckland, New Zealand (2002)

42

A. Sarah et al.

4. Gedeon, T., Turner, S.: Explaining student grades predicted by a neural network. In: Neural Networks 1993, IJCNN 1993, Nagoya, Japan (1993) 5. Fausett, L., Elwasif, W.: Predicting performance from test scores using backpropagation and counterpropagation. In: 1994 IEEE International Conference on Neural Networks. IEEE World Congress on Computational Intelligence, Orlando, FL, USA (1994) 6. Zollanvari, A., Kizilirmak, R.C., Kho, Y.H.: Predicting students’ CGPA and developing intervention strategies based on self-regulatory learning behaviors. IEEE Access 5, 23792–23802 (2017) 7. Rusli, N.M., Ibrahim, Z., Janor, R.M.: Predicting students’ academic achievement: comparison between logistic. In: International Symposium on Information Technology, Kuala Lumpur, Malaysia (2008) 8. Culloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Bioohys 5, 115–133 (1943) 9. Jang, J.S.R.: ANFIS: adaptive-network-based fuzzy. IEEE Trans. Syst. Man Cybern. 23, 665–685 (1993) 10. Fuzzy Logic Toolbox for Use with MATLAB, MathWorks, New York (2015) 11. Neural Network Toolbox for Use with MATLAB, MathWorks, New York (2015) 12. Banik, S., Chanchary, F.H., Rouf, A.R., Khan, K.: Modeling chaotic behavior of Dhaka stock market index values using the neuro-fuzzy Model. In: 10th International Conference on Computer and Information Technology (2007) 13. Banik, S., Chanchary, F.H., Khan, K., Rouf, A.R., Anwer, M.: Neural network and genetic algorithm approaches for forecasting Bangladeshi monsoon rainfall. In: 11th International Conference on Computer and Information Technology (2008) 14. Banik, S., Khan, A.F.M.K.: Forcasting US NASDAQ stock index values using hybrid forcasting systems. In: 18th International Conference on Computer and Information Technology (ICCIT) (2015) 15. Banik, S., Anwer, M., Khan, A.F.M.K.: Soft computing models to predict daily temperature of Dhaka. In: 13th International Conference on Computing and Information Technology (ICCT) (2010)

Warehouse Setup Problem in Logistics: A Truck Transportation Cost Model Rohit Kumar Sachan(B) and Dharmender Singh Kushwaha Motilal Nehru National Institute of Technology Allahabad, Allahabad, India [email protected]

Abstract. Fast, efficient, timely delivery of goods and optimal transportation cost are the major challenges in a logistics industry. A wellplanned transportation system overcomes these challenges and reduces the operational and investment costs of a logistics company. This transportation system is based on the Warehouse-and-Distribution Center (W&DC) network, which is similar to a Hub-and-Spoke (H&S) network. This paper presents a new hub location model based on the truck transportation cost instead of unit cost of goods transportation, since this is more suitable for real world goods transportation scenario from the perspective of a logistics company. Anti-Predatory Nature Inspired Algorithm (APNIA) is used to find the optimal solution of the proposed model. It finds an optimal solution in terms of W&DC (or H&S) network and respective total logistics cost. The proposed approach first finds the location of warehouses and DCs; and then allocates the DCs to warehouses in order to reduce the total logistics cost. Experimental evaluations are conducted on a real-life Warehouse Setup Problem (WSP) of 10 locations of Kanpur city, India. It reveals that the proposed Truck Transportation Cost based Model (TTCM) gives approximate 2%–10% more accurate overall logistics cost from the perspective of a logistics company. Keywords: Anti-predatory NIA · Hub Location Problem · Transportation · Logistics · Nature-inspired algorithms · Optimization

1

Introduction

Logistics is the management of flow of goods from origin to their destination. It includes various activities like, packaging, order processing, material handling, transportation, inventory control and warehousing [3]. Out of these, transportation and warehousing are the two key activities [1]. Transportation of goods is responsible for the end-to-end movement of goods and warehouse is responsible to provide intermediate storage of goods during the transportation [1]. According to Knight Frank report-2018 [2], logistics cost in India is 13–17% of the Gross Domestic Product (GDP) which is nearly double (6–9%) logistics cost to GDP ratio of developed countries such as US, Hong Kong and France. c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 43–62, 2020. https://doi.org/10.1007/978-3-030-52246-9_4

44

R. K. Sachan and D. S. Kushwaha

This statistic signifies the absence of the efficient transportation system in India. The other major challenges in Indian logistics are fast, reliable, on time delivery and seamless transportation of goods at optimal logistics [17]. These challenges are overcome by the efficient transportation system which is directly related to savings on logistics cost as their sole objective [2]. The most efficient way to develop an efficient transportation system is to develop a Warehouse-and-Distribution Center (W&DC) network [32] between the cities/locations which is commonly termed as a logistics network. In W&DC network, warehouses act as a distribution hub/dispatch hub/logistics hub/return center and are responsible for the various activities like, consolidation, sortation, connectivity, switching and distribution of goods between stipulated origins and destinations (O-D) points; and Distribution Centers (DCs) act as an origin or destination point of goods and are responsible for the booking, distribution and packaging of goods [32]. An end-to-end transportation system based on the W&DC is illustrated in Fig. 1. It shows the movement of goods from origin DC to destination DC via warehouses.

Fig. 1. An end-to-end transportation system based on the W&DC

A W&DC network is similar to a Hub-and-Spoke (H&S) network [16] where warehouse acts as a hub and DC acts as spoke. In H&S network, few cities/locations act as hubs and remaining cities/locations act as spokes. The spokes are connected to the hubs in a way that ensures that all flow of goods passes through the hub(s) before these reach the destination spoke. The H&S network fulfills the demands of flow of goods via a smaller set of transportation links between the pair of O-D than in fully connected network [30]. To find a W&DC (or H&S) network from the given set of locations is a two step process [5]. First step is to identify the locations of warehouse and distribution center; and second is to allocate the DCs to identified warehouses in such a way that total logistics cost is optimized to route the flow of goods for every pair of O-D locations [7]. The problem to find a W&DC network is named as Warehouse Setup Problem (WSP).

Warehouse Setup Problem in Logistics: A Truck Transportation Cost Model

45

The WSP is an application of Hub Location Problem (HLP). HLP finds a solution in terms of H&S network which has optimal logistics cost. HLP is useful in those logistics problems where some goods must be transported from every pair of O-D and when it is impossible to establish a direct transportation link between each pair of locations or it is too expensive (or not reasonable) [14]. Several HLP’s models have been proposed in the past [11,15]. These models are broadly classified into five major categories: p-Hub Median Problem (pHMP), p-Hub Location Problem (pHLP), p-Hub Center Problem (pHCP), Hub Covering Problem and Hub Arc Location Problem (HALP) [6]. Further, these HLPs are classified based on the area of solution (discrete or continuous), type of objective function (minimax or minisum), assessment policy of number of hubs (exogenous or endogenous), capacity of hubs (unlimited or limited), cost of hub establishment (no cost or fixed cost or variable cost), node allocation scheme (single or multiple) and cost of connection establishment between spokes to hubs (no cost or fixed cost or variable cost) [15]. Some of these models are: Uncapacitated Single Allocation p-HMP (USApHMP) [21], Uncapacitated Multiple Allocation HLP with Fixed Costs (UMAHLP-FC) [20], Capacitated Single Allocation HLP (CSAHLP) [14], Uncapacitated Multiple Allocation HLP (UMAHLP) [12], Capacitated Multiple Allocation HLP (CMAHLP) [8], Single Allocation p-HCP (SApHCP) [18], Capacitated Asymmetric Allocation HLP (CAAHLP) [29], Multiple Allocation p-HMP under Intentional Disruptions (MApHMP-I) [23], Capacitated Multiple Allocation Cluster HLP (CMACHLP) [19], Un-capacitated Multiple Allocation HLP (UMAHLP) [22], Uncapacitated Single Allocation p-HLP with Fixed Cost (USApHLP-FC) [24]. These models have different mathematical formulations and associated constraints that calculate the logistics cost. To the best of our knowledge, all aforementioned models/formulations of HLP are based on the unit cost of goods transportation; none of model is based on the real world goods transportation scenario that incorporates the truck transportation cost. Also, none of these model includes labour cost in the total logistics cost. Most of these use sum of transportation cost and fixed establishment cost of hubs as the total cost. The unit cost of goods transportation based models are not well suited for real world goods transportation scenario from the perspective of a logistics company. A logistics company transports goods between different locations via trucks and many times, few under loaded trucks also travel between different locations. Cost of transportation of these under loaded trucks are not included in the unit cost of goods transportation based model but still logistics company has to bear the cost of underutilized space of the trucks. For this reason, there is a need for a new HLP model which addresses the above issue in real world goods transportation scenario. This different charging issue is illustrated in Fig. 2. Due to the possibility of several solutions, finding a solution with optimal logistics cost and H&S network is challenging. As number of location points increases, the complexity of the problem also increases exponentially. Operation research and heuristic methods are best suited to solve small problems, but

46

R. K. Sachan and D. S. Kushwaha

Fig. 2. Illustration of different charging issue

when the number of locations is high, meta-heuristic algorithm based approaches are used. A meta-heuristic algorithm finds the optimal solution from randomly generated initial solutions, based on the fitness value (or logistics cost) of the problem. Mathematical formulation of the HLP’s model is used as the fitness/objective function during optimization process. Commonly used metaheuristic algorithms for HLPs, are Genetic Algorithm (GA) [24,31], Particle Swarm Optimization (PSO) [4,22], Ant Colony Optimization (ACO) [18,19], Simulated Annealing (SA) [9,10,13] and Anti-predatory NIA (APNIA) [28]. This paper proposes a new model for real world goods transportation scenario that is based on the transportation cost of trucks and named as “Truck Transportation Cost based Model (TTCM)”. The proposed TTCM of HLP is solved using APNIA [25]. For experimental evaluation, a WSP of 10 location points is created based on the real world scenario of Kanpur city, India. The obtained results are compared with the Unit Cost of Goods Transportation based Model (UCGTM). Rest of the paper is organized as follows: Sect. 2 presents the mathematical formulation of the proposed models with general assumptions for HLP. Section 3 briefly describes the APNIA and APNIA based approach for solving the HLP is presented in Sect. 4. Section 5 discusses the experimental evaluations of APNIA on UCGTM and TTCM on a real-life WSP. The analysis of obtained results is presented in Sect. 6. Section 7 provides the conclusion with future remarks.

2

Proposed Models for Hub Location Problem (HLP)

Out of the dozen variants of HLP’s models, Uncapacitated Single Allocation pHub Location Problem with Fixed Cost (USApHLP-FC) [24] is a HLP variant with no capacity constraints on hubs (i.e. unlimited capacity), single allocation constraints on spokes (i.e. a spoke allocated to only one hub) and fixed establishment/development cost of each hub location. The aim of USApHLP-FC is to find an optimal H&S network to route the flow of goods between O-D pairs of location points so that total logistics cost is optimal (minimum). This logistics cost includes the total transportation cost and total hubs establishment cost. This section proposes two models and their mathematical formulations based on the USApHLP-FC. The first model UCGTM is an extension of unit cost of goods transportation model which incorporates the labour cost in total logistics

Warehouse Setup Problem in Logistics: A Truck Transportation Cost Model

47

cost. The second model TTCM is a novel one for the real world goods transportation scenario which is based on the truck transportation cost instead of the unit cost of goods transportation. Both models consider the sum of transportation cost of goods (T C); labour cost of loading and unloading of goods (LC); and fixed hub and spoke establishment cost (HSC) as a total logistics cost. Both the models have some common assumptions, decision variables and constraints, as listed below: General Assumptions – – – – – – – – – – –

Number of hub locations is fixed and known. Hubs do not have capacity constraints. A spoke location is allocated to only one hub. All hubs are connected to each other via direct link. Direct connection between the spokes is not allowed. Distance between every pair of O-D location is known. Flow of goods between every pair of O-D location is known. Every location has a hub establishment cost which is fixed and known. Every location has a spoke rent cost which is fixed and known. Labour wage and labour productivity is known. Transportation cost between hubs is always lower than the transportation cost between hub and spoke. – Two different capacity trucks are used for transportation of goods. The lower capacity trucks are used for transportation between the hub to spoke and higher capacity trucks are used for transportation between the hubs. Output (or Decision) Variables – Yk : a hub allocation variable – Xij : a spoke to hub allocation variable  1 if a hub is located at node k Yk = 0 otherwise  1 Xij = 0

if node i is connected to a hub located at node j otherwise

Constraints

N 

Xij = 1∀i

(1)

(2)

(3)

j=1 N 

Yk = P

(4)

k=1

Yk

and

Xij

 {0, 1}

Xijkm ≥ 0

(5) (6)

48

R. K. Sachan and D. S. Kushwaha

Constraint (3) ensures that every spoke location will be allocated to exactly one hub and constraint (4) ensures that exactly p hubs are selected. Constraint (5) and (6) are the standard integrity constraints. The details of both models are discussed in following subsections. 2.1

Model 1. Unit Cost of Goods Transportation Based Model (UCGTM)

Mathematical formulation of the objective function of UCGTM is discussed in this subsection which includes the input variables and the objective function. This model is mainly based on the unit cost of goods transportation and discount factor. The unit cost of goods transportation represents the monetary expenses required to transport the one kilogram goods to one kilometer between the warehouse and DC; and discount factor represents a reduction in unit cost of goods transportation between the warehouses with respect to the unit cost of goods transportation between the warehouse and DC. Input Variables – – – – – – – – – –

N : number of location points P : number of hubs Dij : distance between the O-D locations (in kilometer (km)) Wij : flow of goods between the O-D locations (in kilogram (kg)) Fk : hub establishment cost at location k (in rupees (Rs.)) Sk : spoke rent at location k (in rupees) LW : labour wage (in rupees) LP : labour productivity (kilogram per day) UC : unit cost of goods transportation (in rupees) Cij : unit cost of goods transportation between O-D locations (Cij = Dij × U C) – α: discount factor for hub-to-hub transportation (0< α 4.8 difference in WARDS. This is equivalent to three times the mean “optimality” score and allows for an effective margin of at most 3 miss-predicted variables, such as minutes, detection, item or level reveals. For example, a WARDS prediction of 3.9 compared against an actual measure of 6.7 will result in a correct prediction because it is within the error tolerance. Likewise, when analysing the results of the Artificial Neural Networks which had the individual variables as their relevant target an error tolerance of 1 was introduced. Table 4 offers a description of the performance as well as train iterations for each of the described networks. As the table demonstrates, detection prediction has provided the biggest accuracy, while duration had the lowest accuracy. This is also reflected in Table 3 where in order to achieve its accuracy a different training function was employed with a significantly different architecture as a result. Because of this reason, the train iterations for the duration was also notably larger. This suggests that the main factor that is reducing the accuracy at present is the complex space of the game. This is a reflection of how small variations in decision making can alter the outcome of a situation drastically, thus predicting the game state accurately several minutes in advance becomes difficult. Table 4. Neural networks result summary Target

Epoch Accuracy

WARDS

87

63.3%

Detection

93

69.3%

Duration

1339

55.7%

Item reveal

103

64.9%

Level reveal

92

65.3%

Consequence kills

97

59.6%

Due to the novelty of the model, particularly its ability to report performance during the running game, there is no consist baseline to be compared. We have looked at similar predicting algorithm, that are aimed at different aspects of the game [3,12,20]. Although none of the prediction models have looked at warding, nor a similar time frame of a period of approximately six minutes, we have found the overall performance of the network to be in line with the predictive capabilities we have encountered. Furthermore, we have produced two simple baselines where we have (1) run a random guess algorithm and (2) made a small improvement to the guessing capability of the random guessing by weighing guesses closer to the mean more heavily. We have found that baseline (1)

WARDS

79

produced a very low guessing accuracy of 0.3% when the same error tolerance was applied. This performance matches what is expected of the continues value space. In order to produce a better baseline, we have modified the algorithm to produce random guesses with a heavier weight towards the mean (refer to Sect. 6.2). Model (2) produced a higher accuracy of 9.2%. Despite the improvement observed in algorithm (2), it is clear that our suggested Neural Network model is undoubtedly more accurate than simply guessing.

8

Discussion/Conclusions

Relatively little work has been done towards measuring and improving the effects of vision and information gathering mechanics in esports games with imperfect information. In particular, the study of warding in MOBAs like League of Legends and Dota 2 has been limited to simple metrics despite the mechanic’s significance. This paper analysed the current industry standard for measuring warding success, called the Vision Metric. We then used detailed expert interviews to model each individual ward with a technique named Wards Aggregate Record Derived Score (WARDS). We used the WARDS model to objectively measure the effect and impact of warding in Dota 2. Although this paper has focused primarily within MOBAs and Dota 2, the WARDS model described can be generalised to any title with similar mechanics as long as all of the necessary data can be retrieved. Furthermore, this paper we analysed the current industry standard for measuring warding success, called the Vision Metric. We then used detailed expert interviews to model each ward using the WARDS model. We used the WARDS model to objectively measure the effect and impact of warding in Dota 2 and used this model to generate a large amount of labelled data. This data was then utilised in the design, training and evaluation of an Artificial Neural Network, aimed at predicting the final WARDS value for any given ward prior to its expiration. Although the results obtained with those Neural Networks had a relatively low accuracy value, we have found that due to the complexity of the problem and the large time frame the performance is matched with other predictive models that focused in other aspects in the game when considering the different time frames. We have also compared the network with a simple guessing solution and we have found our Network considerably outperformed it. The WARDS model as described by this paper has multiple applications. The first is game analytics, where the WARDS can help a coach assess their teams’ warding abilities or evaluate and explore different warding positions and their relative value based on what they want to achieve. The second is training and education, where the WARDS can be used to improve a casual player’s gameplay by helping them pick optimal warding positions during a game or evaluate their past games with alternate simulated warding placements. This feedback should accelerate a player’s ability to learn effective information management in MOBAs. In addition to those applications, the WARD Score is a novel measurement that can be used in conjunction with existing metrics for Machine Learning purposes. For example, it would be possible to utilise the WARDS model as

80

A. Pedrassoli Chitayat et al.

an additional parameter for win prediction models. This could assist with the accuracy of those models as it would be a step towards a better understanding of this complex game feature. The model’s ability to predict short turn increases in team gold networth on approximately 83% of cases in our dataset could be useful to account for unique variances and predict team success. Furthermore, WARDS provides a mathematical model for a complex area of Dota 2 which can assist with understanding the game’s noisy and complicated environment. The WARDS model can serve as a baseline for other imperfect information games. An example of possible applications would be titles such as Counter Strike Global Offensive (CSGO) [21] and Overwatch [7]. Both of these games do not have wards as in-game items, instead players themselves act as scouts and have to move around the map with the sole intent of acquiring game state information and relaying back to their team-mates. The same principle explored in the model can be utilised to measure how effective their performance has been when gaining intel for their team. Lastly, it addresses a mechanic that is well established as advantageous for gameplay situations. The vision and warding mechanic enables, for example, a characters to move to a different areas in order to kill an enemy character which may not have been possible without the knowledge that a ward provides [10]. After reviewing the performance of the Artificial Neural Network and the predictive problem itself, we suggest that the consistency of the scores have proven the possibility for future work on the area. Our current Neural Network architecture makes predictions based on a single state snapshot taken at the start of each ward. One improvement that could increase prediction accuracy is to modify the architecture to incorporate updated state information throughout the ward’s lifetime into its prediction. This modification could increase the overall accuracy of the network by reducing the amount of uncertainty the network has to contend with as time progresses. Acknowledgments. This work has been created as part of the Weavr project (weavr.tv) and was funded within the Audience of the Future programme by UK Research and Innovation through the Industrial Strategy Challenge Fund (grant no. 104775) and supported by the Digital Creativity Labs (digitalcreativity.ac.uk), a jointly funded project by EPSRC/AHRC/Innovate UK under grant no. EP/M023265/1. We would also like to thank the “Ogreboy’s Free Coaching Serve” Discord server for agreeing to let us use their server as a platform and the following players for their input: Fyre, Water, Arzetlam, Trepo and Ogreboy. Lastly we would like to thank Isabelle Noelle for enabling the timely delivery of this project.

References 1. API Documentation - Riot Games API 2. Matchmaking | Dota 2 3. Block, F., Hodge, V., Hobson, S., Sephton, N., Devlin, S., Ursu, M.F., Drachen, A., Cowling, P.I.: Narrative bytes: data-driven content production in esports. In: Proceedings of the 2018 ACM International Conference on Interactive Experiences for TV and Online Video, pp. 29–41. ACM (2018)

WARDS

81

4. Bonny, J.W., Castaneda, L.M., Swanson, T.: Using an international gaming tournament to study individual differences in MOBA expertise and cognitive skills. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI 2016, New York, NY, USA, pp. 3473–3484. ACM (2016) 5. Developer Community. Steam - valve developer community, March 2011 6. Drachen, A., Yancey, M., Maguire, J., Chu, D., Wang, I.Y., Mahlmann, T., Schubert, M., Klabajan, D.: Skill-based differences in spatio-temporal team behaviour in defence of the Ancients 2 (DotA 2). In: 2014 IEEE Games Media Entertainment, pp. 1–8, October 2014 7. Blizzard Entertainment. Overwatch (2019) 8. Ferrari, S.: From generative to conventional play: MOBA and league of legends. In: DiGRA Conference (2013) 9. Raiol, G., Sato, G.: League of legends - challenger’s ranked games, June 2019 10. Hodge, V., Devlin, S., Sephton, N., Block, F., Drachen, A., Cowling, P.: Win prediction in esports: Mixed-rank match prediction in multi-player online battle arena games. arXiv preprint arXiv:1711.06498 (2017) 11. Hoffman, R.R.: The problem of extracting the knowledge of experts from the perspective of experimental psychology. AI Mag. 8(2), 53–53 (1987) 12. Katona, A., Spick, R., Hodge, V., Demediuk, S., Block, F., Drachen, A., Walker, J.A.: Time to die: death prediction in DOTA 2 using deep learning. arXiv preprint arXiv:1906.03939 (2019) 13. Kinkade, N., Jolla, L., Lim, K.: Dota 2 win prediction. Univ Calif. 1, 1–13 (2015) 14. Muhammad, L.J., Garba, E.J., Oye, N.D., Wajiga, G.M.: On the problems of knowledge acquisition and representation of expert system for diagnosis of coronary artery disease (CAD). Int. J. u e Serv. Sci. Technol. 11(3), 49–58 (2018) 15. do Nascimento Junior, F.F., da Costa Melo, A.S., da Costa, I.B., Marinho, L.B.: Profiling successful team behaviors in league of legends. In: Proceedings of the 23rd Brazillian Symposium on Multimedia and the Web, WebMedia 2017, Gramado, RS, Brazil, pp. 261–268. ACM, New York, NY, USA (2017) 16. Olson, J.R., Rueter, H.H.: Extracting expertise from experts: Methods for knowledge acquisition (1987) 17. Riot Games. Riot games: Who we are (2017) 18. Riot Games. Welcome to league of legends (2019) 19. Riot GMang. Vision score details (2017) 20. Schubert, M., Drachen, A., Mahlmann, T.: Esports analytics through encounter detection. In: Proceedings of the MIT Sloan Sports Analytics Conference, vol. 1 (2016) 21. The CSGO Team. Counter strike: Global offensive (2019) 22. Valve Corporation. DOTA 2, July 2013. Accessed 24 July 2019 23. Various Authors. Lane creeps, May 2019 24. Various Authors. Matchmaking rating, June 2019 25. Various Authors. Movement speed, July 2019 26. Various Authors. Observer wards, June 2019 27. Various Authors. Runes, May 2019 28. Wagner, W.P.: Trends in expert system development: a longitudinal content analysis of over thirty years of expert system case studies. Expert Syst. Appl. 76, 85–96 (2017) 29. Xia, B., Wang, H., Zhou, R.: What contributes to success in moba games? An empirical study of defense of the ancients 2. Games Cult. 14(5), 498–522 (2019)

Decomposition Based Multi-objectives Evolutionary Algorithms Challenges and Circumvention Sherin M. Omran1(B) , Wessam H. El-Behaidy1 , and Aliaa A. A. Youssif1,2 1 Faculty of Computers and Artificial Intelligence, Helwan University, Cairo, Egypt

[email protected], [email protected] 2 College of Computing and Information Technology, Arab Academy for Science,

Technology and Maritime Transport, Cairo, Egypt [email protected]

Abstract. Decomposition based Multi Objectives Evolutionary Algorithms (MOEA/D) became one of research focus in the last decade. That is due to its simplicity as well as its effectiveness in solving both constrained and unconstrained problems with different Pareto Front (PF) geometries. This paper presents the challenges on different research areas concerning MOEA/D. Firstly, the original MOEA/D algorithm is explained. Its basic idea is to decompose the Multi Objectives Optimization (MOO) problem into multiple single objective optimization sub problems and works concurrently to solve these sub problems. Each sub problem is optimized with the help of the information gained from its neighborhood. Then, two major factors that affect the search ability of decomposition based MOO algorithms: Scalarization Functions (SF) and weight vectors generation and adaptation are discussed. Furthermore, the researches in two categories of different variants of MOEA/D are illustrated. Finally, the real world application areas that applied the decomposition approach are mentioned. Keywords: Multi Objective Optimization · MOEA/D · Decomposition approach · Evolutionary Algorithms

1 Introduction Problems in the real world usually have conflicting objectives. For these kinds of problems, there does not exist an individual solution that simultaneously fits all the objectives. Such problems are called Multi Objective Optimization (MOO) problems. A Multi Objective Optimization (MOO) problem can be stated as in [1]: Maximize F(x) = (f1 (x), . . . . . . fm (x))T subject to x ∈ 

(1)

A. A. A. Youssif—On leave from Faculty of Computers & Artificial Intelligence, Helwan university, Cairo, Egypt. © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 82–93, 2020. https://doi.org/10.1007/978-3-030-52246-9_6

Decomposition Based Multi-objectives Evolutionary Algorithms

83

Such that: m is the number of objectives,  is the variable space (also called decision space), and F:  → Rm is the objective space. The achievable objective set can be defined as the set {F(x)| x ∈ }. Since these objectives oppose each other, the solution for these problems is a set of all the nondominated points [2, 3]. Let u, v ∈ Rm , u is said to dominate v if and only if ui ≥ vi for every i ∈ {1, . . . . . . , m} and uj > vj for at least one index j ∈ {1, . . . . . . ., m}. A point x∗ ∈  is Pareto-optimal to Eq. (1) if it cannot be dominated by any other point x ∈ . In this case, F(x∗ ) is called Pareto-optimal objective vector. The refinement in an objective for any Pareto-optimal point always leads to regression into at least another objective. The set containing all such points is called a Pareto-optimal Set (PS) while, the set containing the whole Pareto-optimal objective vectors is called Pareto Front (PF) [2, 4]. The algorithms used for MOO problems lie into two categories; Pareto dominance based algorithms and decomposition based algorithms. Pareto dominance based algorithms such as Non-dominated Sorting Genetic Algorithm (NSGA2) [5], Speed-constrained Multi objectives Particle Swarm Optimization (SMPSO) [6], and Strength Pareto Evolutionary Algorithm (SPEA) [7] usually work well to approximate the Pareto Front in 2 or 3 objectives. However, the performance is extremely reduced due to the increased number of objectives. In this case, almost all the solutions cannot be dominated by each other [3, 4]. For most MOO problems, it is very time consuming to cover the whole Pareto Front (PF) as they always have an infinite set of Pareto-optimal vectors. So, Zhang et al. [3] proposed a newly implemented Multi-Objectives Evolutionary Algorithm using Decomposition (MOEA/D). Decomposition based MOO algorithms try to solve the problems of the previous dominance based techniques. Amongst these problems are the dominance resistance phenomenon and the increasing complexity [4]. The main idea of decomposition based MOO techniques is to break down the MOO problem into a group of single objective sub problems. The optimization process for each sub problem is accomplished in a concurrent and a collaborative way using the information acquired from its neighboring sub problems. That makes MOEA/D have a reduced complexity. This technique has proved that it is one of the most promising techniques to handle both Multi and Many objective optimization problems (i.e. problems with more than three objectives) [3]. Researches on decomposition based algorithms lie into four categories: • • • •

Scalarizing Functions (SF) adaptation. Weight generation or adaptation. Generation of new versions that can be applied for more complex problems. Applying the decomposition based algorithms to different applications (i.e. real world applications).

Some new scalarizing functions (SF) were provided in the literature such as the work of GRABISCH [8], Santiago [9], Miettinen [10], and Jiang [11]. The concurrent use of

84

S. M. Omran et al.

different SFs in order to get the benefits of all of them was found in Ishibuchi [12], Rojas [13]. Xiaoliang Ma [14] and Wang [15] proposed some new adaptive strategies to study the effect of Lp scalarizing functions on problems with different PF geometries. An extensive number of researches studied the effect of weight vector generation and adaptation strategies. Amongst these researches, some new adaptive weight vector generation strategies [16, 17], and [25], Uniformly-random mechanisms [18], some mechanisms to handle complex PFs [19], and some Artificial Intelligence-based mechanisms [19, 20], and [21]. The decomposition approach has been extended to different number of evolutionary mechanisms [22, 23] and [24] or by using different new operators to handle the tradeoff between diversity and convergence [25, 26]. The MOEA/D has been also used to solve real world application problems [27, 28]. The next sections of this paper are organized as follows: The original MOEA/D algorithm is described in Sect. 2, followed by a review of Scalarizing Functions (SF) adaptation techniques in Sect. 3. Section 4 presents the weight generation or adaptation mechanisms. Different MOEA/D versions are explained in Sect. 5. Real world applications are reviewed in Sect. 6. Finally, conclusion is given in Sect. 7.

2 Decomposition Based Multi Objective Evolutionary Algorithm (MOEA/D) Framework The first step to solve the MOO problem using MOEA/D is to decompose or split the MOO problem defined by Eq. (1) into several scalar sub problems and to work on these sub problems concurrently. According to Zhang et al. [3], there are many decomposition techniques; the Weighted Sum (WS), the Penalty Boundary Intersection (PBI), and the Weighted Tchebycheff (W-Tch) technique. The W-Tch will only be considered, in this section, as it is considered the most effective one [14]. The MOO problem of Eq. (1) can be handled as a group of N scalar sub problems using W-Tch technique where, the objective function of the jth sub problem is given as stated in [3]: Minimize gtch (x|λj , z∗ ) = max1≤i≤m {λi |fi (x) − zi∗ |} j

Subject to x ∈  (2)  ∗ T is the reference point, S.T z ∗ = max{f (x)|x ∈ } for Where z = z1∗ , z2∗ , . . . . . . zm i i i = 1 → m. For each Pareto-optimal point (i.e. non-dominated solution) x∗ , there is a weight m j vector λ = (λ1 , . . . ., λm )T such that, λi ≥ 0, and i=1 λi = 1 for all i from 1 to m objectives and for all j from 1 to N sub-problems. In this case, all the non-dominated solutions found for Eq. (2) are considered as Pareto-optimal to Eq. (1). By changing the weight vector, various Pareto-optimal solutions could be achieved. So that, selecting the appropriate weight vectors is one of the factors that affect the solution quality. For each weight vector λi there is a neighborhood which is a set of the T closest weight vectors in {λ1 , . . . ., λN }. Hence, the neighborhood of the ith sub problem contains every sub problem that has a weight vector at distance ≤ T from λi . “Figure 1” shows the complete algorithm steps. 

Decomposition Based Multi-objectives Evolutionary Algorithms

85

MOEA/D algorithm Inputs: • Multi-Objective Optimization (MOO) problem. • : The number of sub problems (the population size). • : A set of evenly sampled weight vectors { • : The neighborhood size • A stopping criterion Steps: 1.

2.

Initialization: Generate the initial population at random such is the current that, is the population size where, solution to the sub problem. Initialize the reference point as mentioned in Eq.(2). Calculate the Euclidean distances for each couple of weight vectors to determine the neighbors for each vector (the set of its closest weight vectors). For each set are the closest weight vector to . For each :Evaluate the fitness value . Update: For Reproduction: Select at random two indexes from then, by using the genetic operators (i.e crossover and . and mutation) generate a new solution from Repair: Apply a problem specific improvement heuristic to generate . on then Update : For each , If set Neighboring solutions update: For each index and

3.

If stopping criteria is met, then stop. Else go back to step 2. Fig. 1. MOEA/D algorithm

3 Scalarizing Functions (SF) Adaptation The Scalarizing Function (SF) or aggregation function is the function used to transform the MOO problem into a group of scalar sub problems [29, 30]. It is one of the main factors that affect the MOEA/D search ability. Selecting the most appropriate SF affects both the diversity and convergence of the algorithm. Many researches have investigated and proposed new SFs or studied the effect of changing the SF control parameters on the final results. Previously, Zhang et al. [3]

86

S. M. Omran et al.

suggested three SFs; Weighted Sum (WS), Penalty Boundary Intersection (PBI), and Weighted Tchebycheff (W-Tch). While GRABISCH et al. [8] and Santiago et al. [9] proposed SFs as Weighted exponential sum, Weighted product, Quadratic mean, and Exponential mean. More other SFs were explained by Miettinen et al. [10]. Recently, Jiang et al. [11] presented two new SFs and studied their impact on MOEA/D algorithms; MSF and PSF. The Multiplicative Scalarizing Function (MSF) is a general form that is the weighted Tchebycheff (W-Tch) is a special case of. Penalty based SF (PSF) updated the weighted Tchebycheff function in such a way inspired by the Penalty Boundary Intersection. It uses different linearly decreasing α values through the search stages, where α is the penalty value used for diversity adjustment. The improved region of the PSF extremely varies with α where, α ≥ 0 is preferable to maintain the diversity. Results proved the effectiveness of the framework based on the two proposed SFs called eMOEA/D as compared to other recent approaches on some classical problems with 2 or 3 objectives. However, further investigations should be provided for larger number of objectives. The problem with this framework is that the linearly decreasing approach for the control parameter α is not always suitable for all kinds of problems. Other researches combined different SFs to make advantage of each. Ishibuchi et al. [12] examined the concurrent use of both the WS and the W-Tch in a single algorithm. Two implementation schemes have been proposed. The first one is called the multi-grid scheme. In this scheme, each SF has a complete grid of uniformly distributed weight vectors and the two similar grids for each SF are used simultaneously. As a result of this design, the two grids can be overlapped and both the population size and the actual number of neighbors will be doubled. The second scheme is a single grid scheme with different SFs. Each SF has a non-complete grid of weight vectors where, each function is assigned alternately to each weight vector. So that, the result is a single complete grid with two SFs instead of one in the original algorithm. The two implementation schemes were examined on multi objectives 0/1 knapsack problems with different number of objectives. Results showed that the proposed schemes could outperform the MOEA/D using a single SF. The main advantages of the two proposed schemes are their simplicity and their ability to be applied for different types of SFs. Furthermore, Rojas and Coello [13] proposed a technique that supports collaborative populations mechanism (similar to the single-grid idea proposed by Ishibuchi et al. [12]) using several SFs and model parameter values through an adaptive selection strategy. The selection strategy selects from a pool the SF that fits each sub population according to the improvement fitness rate that is calculated for each sub problem at each time span. They suggested combining SFs with similar target directions so as to generate a uniformly distributed solutions all over the PF. They proposed a pool of strategies S1 and S2 to combine different SFs. The S1 pool includes The Augmented Tchebycheff (ACHE), Modified Tchebycheff (MCHE), and Weighted Norm (WN), while S2 pool includes Augmented Achievement Scalarizing Function (AASF), Modified ASF (MASF) and Penalty Boundary Intersection (PBI). Results showed that the proposed technique could outperform the other counterparts for different kinds of problems over a large set of objectives. Some researches studied the effect of using Lp-norm/Lp scalarizing methods in some adaptive strategies on problems with different PF geometries. Xiaoliang Ma et al. [14]

Decomposition Based Multi-objectives Evolutionary Algorithms

87

proposed a W-Tch decomposition with constrained Lp-norm on direction vectors (p-Tch) with clear geometric property objective functions. In p-Tch, sub-problems have been constructed on basis of a direction vector instead of weight vector. The direction vector λ can be thought of as a positive vector with ||λ||p = 1. They used the 2-Tch as a representative example of their theory. The proposed algorithm proved its effectiveness as compared to literature MOEA algorithms on both benchmark and real world problems. Wang et al. [15] analyzed the Lp scalarizing method showing the impact of the p value on the MOEA/D algorithm selection pressure. They found that as p increases the search ability reduces while it becomes more robust on PF geometries. They also proposed a new Pareto adaptive scalarizing approximation called Pas that approximates the p value. The proposed MOEA/D-Pas proved its effectiveness as compared to other benchmarks on various problems with different PF geometries over a large number of conflicting objectives.

4 Weight Generation or Adaptation Mechanisms Weight vectors generation is the second factor that affects the decomposition based algorithm search ability. Similar weight vectors lead to poor solutions because in this case, the resultant solutions will not be evenly distributed over the Pareto front [3, 9]. The state of the art methodologies suggest two types of weight vector generation; systematic and random weight vector generation [9, 31]. In the systematic weight vector generation, the weight vectors are distributed evenly on a unit simplex [32, 33]. For irregular Pareto fronts, the random weight vector generation is sometimes recommended [34]. These methods work very well in case of hyperplane Pareto Fronts, but they cannot guarantee solutions diversity in case of more complex PFs geometry [35]. According to the high influence of the weight vector distribution on the final solutions, many researches concerning weight vector adaptation have been provided. Jiang Siwei et al. [16] presented a Pareto adaptive weight vector methodology called (paλ) that depends on Mixture Uniform Design (MUD). The proposed methodology modifies the weight vectors depending on the PF shape. Empirical results proved that the proposed paλ-MOEA/D methodology provided higher hybervolume, and better solutions as compared to both the classical MOEA/D and NSGAII on 12 benchmark problems. Guo et al. [17] provided a new MOEA/D algorithm called (AWD-MOEA/D) that is based on an adaptive weight vector adjustment method. To get an adaptive weight vector set, a two phase methodology was provided. In the first phase, the initial weight vectors are created using the uniform design method so as to evenly investigate the objective space. In the second phase, an adaptive weight vector generation method that combines generalized decomposition as well as uniform design is used. This method is used in this stage in order to dynamically adapt the weight vector settings which in turn helps generating a uniform non-dominated solutions. The proposed algorithm provided the best diversity and convergence against both UMOEA/D (Uniform MOEA/D) and MOEA/D over 2 standard test problems. Another adaptive weight generation method called MOEA/D-AWG was presented in [36]. The proposed method generates the weight vectors related to the geometrical properties of the PFs which are estimated first by using Gaussian process regression.

88

S. M. Omran et al.

Results verified the efficiency of the proposed adaptively weight generation method as compared to MOEA/D alternatives with uniformly generated weight vectors. Farias et al. [18] proposed a Uniformly-Randomly-Adaptive Weights generation algorithm called (MOEA/D-URAW) that combined the uniform random generation mechanism with the adaptation mechanism presented in [37] for sub problems generation. Sub problems are generated according to the sparseness of the population. The MOEA/D-URAW performance was evaluated on Waving Fish Group (WFG) problems with various PF geometries. Results showed that the algorithm provided the best results as compared to other state of the art techniques. Ch. Zhang et al. [19] presented a new weight vector adjustment method for biobjective problems with non-continuous PFs called MOEA/D-ABD. The proposed method starts with detecting weight vectors that requires some adjustments and allocates vectors along the Pareto front depending on the length. The solutions for these vectors are produced by means of linear interpolation mechanism. MOEA/D-ABD algorithm proposed the best solutions as compared to MOEA/D-AWA [37]. The main problem with that algorithm is that it is not suitable for problems with larger number of objectives as well as it doesn’t guarantee good solutions in case of continuous PFs. Different Artificial Intelligence based mechanisms were also used for weight vector generation such as Artificial Neural Networks (ANN) and Evolutionary techniques. Gu et al. [20] developed an innovative weight generation mechanism called MOEA/D-SOM that is based on Self Organizing Maps (SOM). By using the recent individuals’ objective vectors, the SOM network was periodically trained with N neurons such that, both the weight and objective vectors are of the same dimensions. The neurons’ weights were considered as the weight vectors. Results showed that the proposed SOM-based mechanism could highly outperform the other counterparts on a set of both redundant and nonredundant test problems. Meneghini et al. [21] proposed an evolutionary-based weight vector generation technique. The main idea is to evolve a set of weight vectors depending on the desired characteristics in a MOEA/D framework. The proposed EA could prohibit weight vectors creation through the border of the orthant where, this area has poor solutions. Experiments proved that the proposed mechanism can provide a group of generally well spread vectors close to the uniform distribution, without forming clusters or empty spaces.

5 Different MOEA/D Versions The Recent researches proposed different versions or variants of MOEA/D. These researches lie into two categories. The first one is to apply the decomposition approach to other efficient evolutionary algorithms in order to benefit from both of them and to get balance between both diversity and convergence. The other one is to update the original MOEA/D to fit the more complex problems. Among the first category, algorithms like MOEA/DD [38], MOEA/DD-CMA [22], MO-GPSO/D [23], and MOEA/D-ACO [24] will be clarified. Li et al. [38] presented a novel algorithm called MOEA/DD that collects both decomposition and dominance approaches into a single algorithm. The proposed algorithm

Decomposition Based Multi-objectives Evolutionary Algorithms

89

proved its superiority to both state of the art and recent algorithms on some constrained and unconstrained problems. Castro Jr et al. [22] combined MOEA/D-CMA [39] which is one of the variants of Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [40] with MOEA/DD [38] algorithm. This combined algorithm called MOEA/DD-CMA. This algorithm was compared against MOEA/D-CMA over a large number of problems with objectives ranging from 5 to 15 objectives. Martínez et al. [23] studied the incorporation of Geometric Particle Swarm Optimization (GPSO) [41] one of the variants of PSO that is used for discrete optimization problems with the decomposition mechanism. The algorithm was tested on 1/0 knapsack problems with different number of objectives (more than 3). Experiments showed that MO-GPSO/D algorithm provided a very Promising results as compared to three other benchmark algorithms; NSGA3, MOEA/D, and MOEA/D*. Liangjun Ke et al. [24] proposed an Ant Colony Optimization (ACO) algorithm using the decomposition principle called MOEA/D-ACO such that, the number of ants is the number of sub-problems where each ant tries to solve one sub-problem. Ants are split into groups where, each group targets a particular part in the PF. The neighbors for each ant may be members of the same or a different group. The final solution is obtained using the shared information from all the group members. MOEA/D-ACO could outperform both the original MOEA/D and the Bicriterion-Ant algorithm on all the test instances. The algorithms of second category include a large number of variants on the original MOEA/D to solve more complex problems such as problems with complex or noncontinuous PF. Here, the most recent and effective algorithms will be mentioned. Li et al. [25] proposed an update to their original MOEA/D algorithm proposed in [3] in order to handle more complex PS shapes. The new algorithm MOEA/D-DE incorporated the original decomposition approach with Differential Evolution (DE) operator. The DE is used to carry out the mating procedure. A polynomial mutation is then presented after DE as it slightly improves the algorithm performance. Although the results did not prove that the proposed algorithm is always superior to NSGA2 using the same operators (NSGA2-DE) but, it’s still a promising technique for such kind of problems. To solve complex problems such as problems with non-continuous regions, sharp peaks and long tails, Jiang et al. [42] solved through a Two Phase (TP) strategy and a niching method, namely, MOEA/D-TPN. During the first phase, the algorithm searches for areas of solutions crowdedness so as to detect the shape of the PF. In the second phase, the algorithm determines the sub problem form that will be used depending on the results of the first phase. The niching method is presented to avoid duplicate solutions/off-springs by guiding the mating procedure with parents in the regions with minimal crowd. The reproduction process is carried out using the same operators that were previously proposed by Li et al. [25]. Results showed the superiority of the algorithm as compared to both SPEA2 [43] with Shift-based Density Estimation (SPEA2 + SDE) [44] and NSGA3 [45]. Xu et al. [26] proposed a hierarchical decomposition based MOEA named (MOEA/HD) that divides the sub problems into different layers/hierarchies. MOEA/HD then adjust the direction of the search of the lower hierarchy sub problems using superior

90

S. M. Omran et al.

guiding sub problems. Results showed that the algorithm provided promising results on problems with different PF features.

6 Real World Applications According to the good results provided by the decomposition approach and its different variants on large and different sets of benchmark problems, it has to be tested for real world application areas. Zhang et al. [27] extended their original decomposition approach presented in [3] into a new one gathering both Normal Boundary Intersection (NBI) and the W-Tch approaches into one algorithm called NBI-style Tchebycheff. The algorithm was tested on portfolio management optimization problem with bi-objectives; return and variance or risk. The proposed approach provided a promising and comparable results to NSGA2. Xing et al. [28] incorporated the MOEA/D approach with Population-based incremental learning (PBIL) components and proposed (MOEA/D-PBIL). The proposed algorithm was applied on the multicast routing with network coding optimization problem with three objectives; the coding cost, the end to end delay, and the link cost. Results proved the superiority of the algorithm as compared to 6 other MOEAs variants. Decomposition mechanism was also used with many other applications such as natural and medical image segmentation [46], the sizing of a folded-cascode amplifier [47], aerospace applications [48], reservoir flood control operation [49] and agile Satellite Mission Planning [50].

7 Conclusions The Multi-objective optimization (MOO) techniques can be categorized as dominance based and decomposition based algorithms. The dominance based algorithms work very well with problems with at most three objective functions. However, its performance deteriorates with more than three objective optimization problems. The decomposition based approach works mainly to solve such kinds of problems. It decomposes the problem into several sub problems and tries to solve each of them separately. Due to the promising results of MOEA/D algorithms, this paper presents the challenges performed on the decomposition functions/SFs, weight generation mechanisms, MOEA/D different versions, and real world applications. The researches of new SFs, their combinations and adaptive Lp-norm scalarizing method on problems with complex Pareto Fronts are discussed. Also, the new adaptively weight generation mechanisms and AI based weight generation mechanisms; paλ-MOEA/D, AWD-MOEA/D, MOEA/D-AWG, MOEA/DURAW, MOEA/DABD, MOEA/D-SOM, and Evolutionary-based weight vector were discussed to handle complex problems. Different variants of MOEA/D that combines the decomposition with different other strategies were also presented such as MOEA/DD, MOEA/DD-CMA, MO-GPSO/D, MOEA/D-ACO, MOEA/D-TPN, MOEA/D-DE, and MOEA/HD. MOEA/D proved its effectiveness in many different fields such as financial optimization, network routing, aerospace, image segmentation, and satellite mission planning.

Decomposition Based Multi-objectives Evolutionary Algorithms

91

References 1. Deb, K.: Multi-Objective Optimization Using Evolutionary Algorithms. Wiley, New York (2001) 2. Bui, L.T., Alam, S.: Multi Objective Optimization in Computational Intelligence: Theory and Practice. IGI Global, Hershey (2008) 3. Zhang, Q., Li, H.: MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11(6), 712–731 (2007) 4. Purshouse, R.C., Fleming, P.J.: On the evolutionary optimization of many conflicting objectives. IEEE Trans. Evol. Comput. 11(6), 770–784 (2007) 5. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) 6. Nebro, A.J., Durillo, J.J., García-Nieto, J., Coello, C., Luna, F., Alba, E.: SMPSO: a new PSObased metaheuristic for multi-objective optimization. In: IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making (MCDM), Nashville, TN, USA (2009) 7. Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999) 8. Grabisch, M., Marichal, J.-L., Mesiar, R., Pap, E.: Aggregation functions. part I: means. Inf. Sci. 181(1), 1–22 (2011) 9. Santiago, A., Huacuja, H.J.F., Dorronsoro, B., Pecero, J.E., Santillan, C.G., Barbosa, J.J.G., Monterrubio, J.C.S.: A survey of decomposition methods for multi-objective optimization. In: Recent Advances on Hybrid Approaches for Designing Intelligent Systems. Springer, vol. 547, pp. 453–465, 2014 10. Miettinen, K., Makela, M.M.: On scalarizing functions in multiobjective optimization. Oper. Res. Spektrum 24, 193–213 (2002) 11. Jiang, S., Yang, S., Wang, Y., Liu, X.: Scalarizing functions in decomposition-based multiobjective evolutionary algorithms. IEEE Trans. Evol. Comput. 22(2), 296–313 (2018) 12. Ishibuchi, H., Sakane, Y., Tsukamoto, N., Nojima, Y.: Simultaneous use of different scalarizing functions in MOEA/D. In: GECCO 2010, Portland, Oregon, USA (2010) 13. Pescador-Rojas, M., Coello, C.A.C.: Collaborative and adaptive strategies of different scalarizing functions in MOEA/D. In: IEEE Congress on Evolutionary Computation (CEC) (2018) 14. Ma, X., Zhang, Q., Tian, G., Yang, J., Zhu, Z.: On Tchebycheff decomposition approaches for multi-objective evolutionary optimization. IEEE Trans. Evol. Comput. 22(2), 226–244 (2018) 15. Wang, R., Zhang, Q., Zhang, T.: Decomposition based algorithms using Pareto adaptive scalarizing methods. IEEE Trans. Evol. Comput. 20(6), 821–837 (2016) 16. Siwei, J., Zhihua, C., Jie, Z., Yew-Soon, O.: Multiobjective optimization by decomposition with Pareto-adaptive weight vectors. In: Seventh International Conference on Natural Computation, Shanghai, China (2011) 17. Guo, X., Wang, X., Wei, Z.: MOEA/D with adaptive weight vector design. In: 11th International Conference on Computational Intelligence and Security, Shenzhen, China (2015) 18. Farias, L.R.C.d., Braga, P.H.M., Bassani, H.F., Araújo, A.F.R.: MOEA/D with uniformly randomly adaptive weights. In: GECCO 2018, Kyoto, Japan (2018) 19. Zhang, C., Tan, K.C., Lee, L.H., Gao, L.: Adjust weight vectors in MOEA/D for bi-objective optimization problems with discontinuous Pareto fronts. Soft Comput. 22(12), 3997–4012 (2018) 20. Gu, F., Cheung, Y.-M.: Self-organizing map-based weight design for decomposition-based many-objective evolutionary algorithm. IEEE Trans. Evol. Comput. 22(2), 211–225 (2018)

92

S. M. Omran et al.

21. Meneghini, I.R., Guimaraes, F.G.: Evolutionary method for weight vector generation in multiobjective evolutionary algorithms based on decomposition and aggregation. In: CEC, San Sebastián (2017) 22. Castro, O.R., Santana, R., Lozano, J.A., Pozo, A.: Combining CMA-ES and MOEA/DD for many-objective optimization. In: IEEE Congress on Evolutionary Computation (CEC), San Sebastian, Spain (2017) 23. Zapotecas-Martínez, S., Moraglio, A., Aguirre, H., Tanaka, K.: Geometric particle swarm optimization for multi-objective optimization using decomposition. In: GECCO 2016, Denver, CO, USA (2016) 24. Ke, L., Zhang, Q., Battiti, R.: MOEA/D-ACO: a multiobjective evolutionary algorithm using decomposition and ant colony. IEEE Trans. Cybern. 43(6), 1845–1859 (2013) 25. Li, H., Zhang, Q.: Multiobjective optimization problems with complicated Pareto sets, MOEA/D and NSGA-II. IEEE Trans. Evol. Comput. 13(2), 284–302 (2009) 26. Xu, H., Zeng, W., Zhang, D., Zeng, X.: MOEA/HD: a multiobjective evolutionary algorithm based on hierarchical decomposition. IEEE Trans. Cybern. 49(2), 517–526 (2019) 27. Zhang, Q., Li, H., Maringer, D., Tsang, E.: MOEA/D with NBI-style Tchebycheff approach for Portfolio Management. In: IEEE Congress on Evolutionary Computation, Barcelona, Spain (2010) 28. Xing, H., Wang, Z., Li, T., Li, H.: An improved MOEA/D algorithm for multi-objective multicast routing with network coding. Appl. Soft Comput. 59, 88–103 (2017) 29. Gunantara, N.: A review of multi-objective optimization: methods and its applications. Cogent Eng. 5(1) (2018). https://doi.org/10.1080/23311916.2018.1502242 30. Emmerich, M.T.M.: A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Num. Comput. 17(3), 585–609 (2018) 31. Trivedi, A., Srinivasan, D., Sanyal, K., Ghosh, A.: A survey of multi-objective evolutionary algorithms based on decomposition. IEEE Trans. Evol. Comput. 21(3), 440–462 (2017) 32. Messac, A., Mattson, C.A.: Normal constraint method with guarantee of even representation of complete Pareto frontier. AIAA J. 42(10), 2101–2111 (2004) 33. Das, I., Dennis, J.E.: Normal-boundary intersection: a new method for generating the pareto surface in nonlinear multicriteria optimization problems. SIAM J. Optim. 8(3), 631–657 (1998) 34. Li, H., Ding, M., Deng, J., Zhang, Q.: On the use of random weights in MOEA/D. In: 2015 IEEE Congress on Evolutionary Computation (CEC), Sendai, Japan (2015) 35. Qi, Y., Ma, X., Liu, F., Jiao, L., Sun, J., Wu, J.: MOEA/D with adaptive weight adjustment. Evol. Comput. 22(2), 231–264 (2013) 36. Wu, M., Kwong, S., Jia, Y., Li, K., Zhang, Q.: Adaptive weights generation for decompositionbased multi-objective optimization using Gaussian process regression. In: GECCO 2017 Proceedings of the Genetic and Evolutionary Computation Conference, Berlin, Germany (2017) 37. Qi, Y., Ma, X., Liu, F., Jiao, L., Sun, J., Wu, J.: MOEA/D with adaptive weight adjustment. Evol. Comput. 22(2), 231–264 (2014) 38. Li, K., Deb, K., Zhang, Q., Kwong, S.: An evolutionary many-objective optimization algorithm based on dominance and decomposition. IEEE Trans. Evol. Comput. 19(5), 694–716 (2015) 39. Zapotecas-Martínez, S., Derbel, B., Brockhoff, D., Aguirre, H.E., Tanaka, K.: Injecting CMAES into MOEA/D. In: GECCO 2015, Madrid, Spain (2015) 40. Hansen, N., Auger, A.: CMA-ES: evolution strategies and covariance matrix adaptation. In: GECCO 2011, Dublin, Ireland (2011) 41. Moraglio, A., Chio, C.D., Poli, R.: Geometric particle swarm optimisation. In: Genetic Programming, EuroGP 2007. Lecture Notes in Computer Science, vol. 4445. Springer, Heidelberg (2007)

Decomposition Based Multi-objectives Evolutionary Algorithms

93

42. Jiang, S., Yang, S.: An improved multiobjective optimization evolutionary algorithm based on decomposition for complex Pareto fronts. IEEE Trans. Cybern. 46(2), 421–437 (2016) 43. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength Pareto evolutionary algorithm. TIK-Report. 103, Zurich, Switzerland (2001) 44. Li, M., Yang, S., Liu, X.: Shift-based density estimation for pareto-based algorithms in manyobjective optimization. IEEE Trans. Evol. Comput. 18(3), 348–365 (2014) 45. Deb, K., Jain, H.: An evolutionary many-objective optimization algorithm using referencepoint based non-dominated sorting approach, part I: solving problems with box constraints. IEEE Trans. Evol. Comput. 18(4), 577–601 (2014) 46. Sarkar, S., Das, S., Chaudhuri, S.S.: Multi-level thresholding with a decomposition-based multi-objective evolutionary algorithm for segmenting natural and medical images. Appl. Soft Comput. 50, 142–157 (2017) 47. Liu, B., Fernández, F.V., Zhang, Q., Pak, M., Sipahi, S., Gielen, G.: An enhanced MOEA/DDE and its application to multiobjective analog cell sizing. In: IEEE Congress on Evolutionary Computation, Barcelona, Spain (2010) 48. Ho-Huu, V., Hartjes, S., Visser, H.G., Curran, R.: An efficient application of the MOEA/D algorithm for designing noise abatement departure trajectories. Aerospace 4(4), 54 (2017) 49. Qi, Y., Bao, L., Ma, X., Maio, Q.: Self-adaptive multi-objective evolutionary algorithm based on decomposition for large-scale problems: a case study on reservoir flood control operation. Inf. Sci. 367(10), 529–549 (2016) 50. Li, L., Chen, H., Li, J., Jing, N., Emmerich, A.M.: Preference-based evolutionary manyobjective optimization for agile satellite mission planning. IEEE Access 6, 40963–40978 (2018)

Learning the Satisfiability of L  -clausal Forms Mohamed El Halaby(B) and Areeg Abdalla Department of Mathematics, Faculty of Science, Cairo University, Giza 12613, Egypt {halaby,areeg}@sci.cu.edu.eg

Abstract. The k-SAT problem for L  -clausal forms has been found to be NP-complete if k ≥ 3. Similar to Boolean formulas in Conjunctive Normal Form (CNF), L  -clausal forms are important from a theoretical and practical point of views for their expressive power, easy-hard-easy pattern as well as having a phase transition phenomena. In this paper, we investigate predicting the satisfiability of L  -clausal forms by training different classifiers (Neural Network, Linear SVC, Logistic Regression, Random Forest and Decision Tree) on features extracted from randomly generated formulas. Firstly, a random instance generator is presented and used to generate instances in the phase transition area over 3-valued and 7-valued Lukasiewicz logic. Next, numeric and graph features were extracted from both datasets. Then, different classifiers were trained and the best classifier (Neural Network) was selected for hyperparameter tuning, after which the mean of the cross-validation scores (CVS) increased from 92.5% to 95.2%.

Keywords: Satisfiability logic · Machine learning

1

· L -clausal forms · L ukasiewicz logic · Fuzzy

Introduction and Preliminaries

In propositional logic, a Boolean variable x can take one of two possible values: 0 or 1. A literal l is a variable x or its negation ¬x.  A disjunction C is a group r of r literals joined by ∨. This is expressed as C = i=1 li A Boolean formula φ in Conjunctive Normal Form (CNF) is a group of m disjunctions joined by ∧ (i.e., a conjunction of disjunctions). From now on, we will refer to a disjunction in a CNF formula as a clause. If φ consists of m clauses where each clause Ci is composed of ri literals, then φ can be written as φ=

m 

Ci

i=1

where Ci =

ri 

li,j

j=1 c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 94–102, 2020. https://doi.org/10.1007/978-3-030-52246-9_7

Learning the Satisfiability of L  -clausal Forms

95

Given a propositional formula in CNF, the satisfiability problem (SAT) [1] is deciding whether φ has an assignment to its variables that satisfies every clause. SAT is a core problem in theoretical computer science because of its central position in complexity theory [2]. Moreover, numerous NP-hard practical problems have been successfully solved using SAT [3]. Fuzzy logic is an extension of Boolean logic by Lotfi Zadeh in 1965 [4] based on the theory of fuzzy sets, which is a generalization of the classical set theory. By introducing the notion of degree in the verification of a condition, thus enabling a condition to be in a state other than true or false (thus, infinite truth degrees), fuzzy logic provides a very valuable flexibility for reasoning, which makes it possible to take into account inaccuracies and uncertainties. The SAT problem in fuzzy logic and specifically L  ukasiewicz logic exists [5] as well as its optimization version (maximizing the number of satisfied clauses) [6], but less attention has been paid to develop efficient solvers for the problem. One of the recent attempts [7] consists of enhancing the start-of-the-art Covariance Matrix Adaptation Evolution Strategy (CMA-ES) algorithm. This was done by having multiple CMA-ES populations running in parallel and then recombining their distributions if this leads to improvements. Another finding in [8] showed that a hillclimber approach outperformed CMA-ES on some problem classes. A different idea was recently proposed which involves encoding the formula as a Satisfiability Modulo Theories (SMT) program then employing flattening methods and CNF conversion algorithms to derive an equivalent Boolean CNF SAT instance [9]. For formulas in L  ukasiewicz logic, the variables can take a value from a finite (or infinite) set of truth values, but this paper is concerned with finite truth sets. Moreover, the basic connectives of L  ukasiewicz logic are defined in Table 1. We will be dealing with five operations, namely negation (¬), the strong and weak disjunction (⊕ and ∨ respectively) and the strong and weak conjunction ( and ∧, respectively). Table 1. Logical operations in L  ukasiewicz logic. Operation name

Definition

Negation ¬

¬x = 1 − x

Strong disjunction ⊕ x ⊕ y = min{1, x + y} Strong conjunction  x  y = max{x + y − 1, 0} Weak disjunction ∨

x ∨ y = max{x, y}

Weak conjunction ∧

x ∧ y = min{x, y}

Implication →

x → y = min{1, 1 − x + y}

One obvious extension of Boolean CNF formulas in L  ukasiewicz logic is called simple L  -clausal forms [10], where the Boolean negation is replaced by the L  ukasiewicz negation and the Boolean disjunction with the strong disjunction.

96

M. El Halaby and A. Abdalla

The resulting form is m  i=1

⎛ ⎞ ri  ⎝ lij ⎠ j=1

where each lij is a variable (that can take a truth value belonging to either a finite or an infinite set) or its negation. In this paper, we are concerned with a slightly different and more interesting class of formulas called L  -clausal forms [10]. The following definition describes how these formulas are constructed. Definition 1. Let X = {x1 , . . . , xn } be a set of variables. A literal is either a variable xi ∈ X or ¬xi . A negated term is a literal or an expression of the form  -clause is weak disjunction of ¬(l1 ⊕ · · · ⊕ lk ), where l1 , . . . , lk are literals. An L terms. An L  -clausal form is a weak conjunction of L  -clauses. For example, (x1 ⊕¬(x2 ⊕x3 )) ∧ (x3 ⊕¬x2 ⊕¬x1 ) is an L  -clausal form, but not a simple L  -clausal form, due to the presence of the negated term ¬(x2 ⊕x3 ). It has been shown [10] that the satisfiability problem for any simple L  -clausal form is solvable in linear time, contrary to its counterpart in Boolean logic which is NPcomplete in the general case. In addition, the expressiveness of simple L  -clausal forms is limited, meaning that not every L  ukasiewicz formula has an equivalent simple L  -clausal form. The reason L  -clausal forms are interesting is that if at most three literals appear in each L  -clause (i.e., 3-SAT), the satisfiability problem becomes NP-complete1 . The authors also showed that 2-SAT is solvable in linear time for L  -clausal forms. For satisfiability problems over randomly generated instances, the transition from under-constrained instances with very high probability of satisfiability to over-constrained problems with very low probability of satisfiability is called the phase transition phenomena. This phenomena has been observed in Boolean satisfiability [11] as well as in the satisfiability of L  -clausal forms [10]. Therefore, predicting the satisfiability of instances generated at or near the phase transition is more challenging than predicting the satisfiability of formulas generated elsewhere. Due to the good performance of currently available Boolean SAT algorithms, SAT solving procedures are becoming interesting alternatives for an increasing number of problem domains. Applications of SAT include knowledge-compilation [12], hash functions, cryptanalysis [13] and many others. Fuzzy logic has numerous applications [14–16]. Several problems in computer-aided design (CAD) and electronic design automation (EDA), for example, can be naturally stated satisfiability problems over a multi-dimensional, multivalued (e.g. fuzzy) solution space. If these practical problems are formulated using Boolean SAT, then an encoding (such as one-hot encoding) of the multi-valued dimensions using a set of Boolean variables must also be used. Such encodings require defining new constraints, which exclude encoded values that do not occur in the original problem. For example, if a five-valued domain variable is encoded using three binary variables (having eight possible settings), additional constraints must be specified to 1

The proof involves reducing Boolean 3-SAT to the SAT problem for L  -clausal forms.

Learning the Satisfiability of L  -clausal Forms

97

exclude the remaining three assignments. Therefore, understanding and solving the SAT problem efficiently over a powerful class of formulas such as L  -clausal forms is a step towards taking advantage of this generic problem solving strategy in solving combinatorial optimization problems over variables in L  ukasiewicz logic. There has been works on predicting the satisfiability of Boolean formulas, for example in [17] and [11], but to the best of our knowledge, there is no research yet that focuses on predicting the satisfiability of L  -clausal forms. In this paper, we demonstrate that it is possible to achieve classification accuracies higher than 95% based on features computed in polynomial-time. The rest of the paper is structured as follows. First, the instance generator used to produce the formulas is described. Second, the details of dataset used are illustrated along with the parameters chosen and the features extracted. Third, the machine learning models trained over the dataset are listed as well as their initial results. Finally, hyper-parameter tuning is performed on the best-scoring model and the corresponding improved results are reported.

2

Instance Generator

We have carried out a similar experiment to the one done by Bofill et al. in [10] on 3-valued L  -clausal forms. The instances used were generated in the following manner: given the number of variables n and the number of clauses m, each clause is generated from three variables xi1 , xi2 and xi3 picked uniformly at random. Then, one of the following eleven L  -clauses is drawn uniformly at random (xi1 ⊕ xi2 ⊕xi3 ), (¬xi1 ⊕xi2 ⊕xi3 ), (xi1 ⊕¬xi2 ⊕xi3 ), (xi1 ⊕xi2 ⊕¬xi3 ), (¬xi1 ⊕¬xi2 ⊕xi3 ), (¬xi1 ⊕ xi2 ⊕ ¬xi3 ), (xi1 ⊕ ¬xi2 ⊕ ¬xi3 ), (¬xi1 ⊕ ¬xi2 ⊕ ¬xi3 ), (¬(xi1 ⊕ xi2 ) ⊕ xi3 ), (¬(xi1 ⊕ xi3 ) ⊕ xi2 ) and (xi1 ⊕ ¬(xi2 ⊕ xi3 )). We aim to generalize the construction and generation of L  -clauses to study them in more detail experimentally and theoretically. Our model generates L clausal forms with parameters (m, n, k, p), where m is the number of L  -clauses, n is the number of variables, k is the number of variables appearing in each L  -clause and p is the degree of absence of negated terms. It is important to note that no generated L  -clause has a variable appearing more than once. The decision of whether or not to put a negated term in a clause is made as follows: Given p, we generate a random integer r ∈ {0, 1, . . . , p − 1}, and if r = 0, then we add a negated term with length less than or equal to k. So, as p increases, the probability of adding a negated term decreases. For example, when p approaches 1, the sum of the lengths of negated terms in each L  -clause approaches k, and when p approaches ∞, that sum approaches 0. In the next section, we will discuss the relationship between p and the cost. Different from Boolean CNF formulas, the phase transition phenomena in L  -clausal forms does not depend only on the ratio between m and n, but also on p. In other words, using our model, by changing p one can generate different L  -clausal forms with the same value for m and n but having totally different costs. The instances produced are then translated into Satisfiability Modulo Theories (SMT) programs and solved using Z3 [18].

98

3

M. El Halaby and A. Abdalla

Dataset and Features

A dataset of 1030 instances in the phase transition area was produced using the described instance generator. The number of satisfiable and unsatisfiable instances in the dataset are 565 and 465 respectively. The formulas are 3-valued L  -clausal forms with k = 3, 1000 L  -clauses, p = 28 and 510 variables. The reason for choosing these parameters is due to the reported phase transition in [10]. This phase transition is reported to occur at a clauses-to-variables ratio of 1.9 which corresponds to 510 variables if the number of L  -clauses chosen is 1000. Moreover, the value of p = 28 was chosen as it leads to a probability of satisfiability of the generated instances is 1/2. The following two subsections describe the two sets of features, numeric and graph features, that are extracted from every instance. 3.1

Numeric Features

Let φ be an L  -clausal form with m clauses, n variables and mneg negated terms. 1. m/n 2. mneg /m 3. mhorn /m, where mhorn is the number of Horn clauses in φ. A horn clause is an L  -clause with at most one positive literal. 4. Statistics on the numbers of positive and negative occurrences of each variable. 5. Statistics on the number of negated literals each clause contains. The statistics calculated for the numbers of positive and negative occurrences and the number of negated literals are: minimum, maximum, mean, variance, standard deviation, geometric mean, quadratic mean, Shannon’s entropy, kurtosis, 25th, 50th and 75th percentile, skewness and the total sum of squares. Some of these statistics were used in [11]. 3.2

Features from Graph Representations

1. The variable-clause graph is a bipartite graph with a node for each variable, a node for each clause, and an edge between them whenever a variable occurs in a clause (positively or negatively). The negations over the negated terms are removed before computing the variable-clause graph of the formula. For example, let φ = (x¯1 ⊕ x3 ⊕ x4 ) ∧ (x1 ⊕ x2 ⊕ x¯4 ⊕ x5 ) ∧ (x2 ⊕ x¯3 ⊕ x5 ). Figure 1 describes the variable-clause graph of φ. 2. The variable graph has a node for each and an edge between variables that occur together in at least one clause. The following features are extracted from the degrees of the nodes in each of the graph representations mentioned: minimum, maximum, mean, variance, standard deviation, geometric mean, quadratic mean, Shannon’s entropy, kurtosis, 25th, 50th and 75th percentile, skewness and the total sum of squares.

Learning the Satisfiability of L  -clausal Forms

99

Fig. 1. Variable-clause graph of φ.

4

Results

In this section, we first describe the features extracted and their final selection. Second, the classifiers used in the exploratory stage are mentioned as well as their evaluation. Finally, hyper-parameter tuning is performed on the best scoring classifier and final accuracy is reported. 4.1

Feature Engineering and Selection

The following steps are carried out to engineer new features based on the original ones and to select the best among them. 1. A Standard scalar transformation (the dataset is transformed such that its distribution has a mean value 0 and standard deviation of 1) is applied to the data. 2. A Logistic Regression model with L1 penalty is used to produce an importance score for every feature. The model takes the feature matrix X and the target Y then outputs a score for each feature such that the higher the score, the more relevant the feature is towards the target. The number of iterations used for the model is 1000. 3. The selected features in the second step are then transformed using a polynomial transformation of the third degree. The way this works is as follows: Given a feature vector (a, b), a polynomial transformation of the third degree produces the feature vector (1, a, b, ab, a2 , b2 , a3 , b3 , ab2 , ba2 ). 4. A Standard scalar transformation is applied to the new features. 5. Due to the large number of features produced from the polynomial transformation, the second step is repeated using the features obtained from the fourth step. The final number of features we end up with is 80 and the number of iterations used for the Linear SVC model is 1000. 4.2

Classifiers and Evaluation

In the exploratory state, we experimented with different classifiers. The following models were tested using stratified K-fold cross-validation with K = 10. Using a value of K = 10 has been shown through experimentation to generally result in a model skill estimate with low bias a modest variance [19]:

100

M. El Halaby and A. Abdalla

1. Logistic Regression with an L1 penalty and an inverse of regularization strength of 1.0. 2. Neural Network with a tanh activation and a single hidden layer with 30 nodes. 3. Random Forest with 1000 trees and the minimum number of samples needed to split is 2. 4. Linear Support Vector Machine with L1 penalty and an inverse of regularization strength of 1.0.

Table 2. Exploratory stage results, indicating the mean and the standard deviation of the cross-validation scores (CVS). Classifier

Mean CVS STD CVS

Logistic Regression 90.3%

3.3%

Neural Network

92.5%

2.3%

Random Forest

61.4%

2.2%

Linear SVC

90.8%

4.2%

Decision Tree

54.1%

5.8%

The Neural Networks model had the highest mean CVS and the second lowest standard deviation, as Table 2 shows. Hyper-parameter tuning was performed using grid search augmented by cross-validation to optimize the model. The following list describes the parameter grid chosen for the search and the best parameter for each item is in bold. – Learning rate: constant, invscaling and adaptive. – Solver: Adam, Stochastic Gradient Descent (SGD) and Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS). – Sizes of hidden layer(s): (30), (35), (40), (45) and (30,30). – Activation functions: logistic, relu and tanh. – Nesterov’s accelerated gradient: off, on. – L2 penalty parameter: 0.01, 0.001 and 0.0001 After creating a Neural Network with the best parameters, the resulting mean and standard deviation of the CVS are 95.2% and 1.98% respectively. This result is an improvement on the initial results described in Table 2. A different dataset of 1012 instances (531 satisfiable and 481 unsatisfiable instances) with the same parameters of the first dataset was generated, but over 7-valued instead of 3-valued logic. The produced formulas are also in the phase transition area. A Neural Network whose parameters were hyper-tuned was tested on this dataset and the achieved mean CVS was 92%.

Learning the Satisfiability of L  -clausal Forms

5

101

Conclusion and Future Work

This paper is concerned with the problem of predicting the satisfiability of an interesting class of logical formulas in L  ukasiewicz logic, called L  -clausal forms. In particular, we first presented an instance generator and used it to produce two datasets with instances at the phase transition over 3-valued and 7-valued Lukasiewicz logic. Next, numeric and graph features were extracted from both datasets. Then, different classifiers were trained and the best classifier (Neural Network) was selected for hyper-parameter tuning, after which the mean of the CVS increased from 92.5% to 95.2%. Many practical problems are naturally described in multi-valued logic and this work will be beneficial in encoding many of these applications into L  -clausal forms and solving them in short time. The same technique gained tremendous success and popularity over the years in the case of Boolean satisfiability [20–22]. A limitation of this study is the size of the dataset. Future studies will involve generating larger datasets with variable occurrences having different statistical distributions (power, exponential and normal distributions). This is important since variable occurrences in formulas coming from industrial applications often follow various distributions and not just a uniform distribution. In addition, we will investigate the ability of our model to predict the satisfiability of formulas from real-life applications. Finally, we will explore more efficient methods of solving the satisfiability of L  -clausal forms. This will enable generating instances with more variables and L  -clauses.

References 1. Biere, A., Heule, M., van Maaren, H. (eds.): Handbook of satisfiability. IOS Press, Amsterdam (2009) 2. Cook, S.: The complexity of theorem-proving procedures. In: Proceedings of the Third Annual ACM Symposium on Theory of Computing, pp. 151–158. ACM (1971) 3. Marques-Silva, J.: Practical applications of boolean satisfiability. In: 9th International Workshop on Discrete Event Systems, pp. 74–80, IEEE (2008) 4. Zadeh, L.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965) 5. Rushdi, M., Rushdi, M., Zarouan, M., Ahmad, W.: Satisfiability in intuitionistic fuzzy logic with realistic tautology. Kuwait J. Sci. 45(2), 15–21 (2018) 6. El Halaby, M., Abdalla, A.: Fuzzy maximum satisfiability. In: Proceedings of the 10th International Conference on Informatics and Systems, pp. 50–55. ACM (2016) 7. Brys, T., Drugan, M., Bosman, P., De Cock, M., Now´e, A.: Solving satisfiability in fuzzy logics by mixing CMA-ES. In: Proceedings of the 15th Conference on Genetic and Evolutionary Computation, pp. 1125–1132. ACM (2013) 8. Brys, T., Drugan, M., Bosman, P., De Cock, M., Now´e, A.: Local search and restart strategies for satisfiability solving in fuzzy logics. In: 2013 IEEE International Workshop on Genetic and Evolutionary Fuzzy Systems (GEFS), pp. 52–59. IEEE (2013) 9. Soler, J., Many` a, F.: A bit-vector approach to satisfiability testing in finitelyvalued logics. In: 2016 IEEE 46th International Symposium on Multiple-Valued Logic (ISMVL), pp. 270–275. IEEE (2016)

102

M. El Halaby and A. Abdalla

10. Bofill, M., Many` a, F., Vidal, A., Villaret, M.: New complexity results for lukasiewicz logic. Soft. Comput. 23(7), 2187–2197 (2019) 11. Devlin, D., O’Sullivan, B.: Satisfiability as a classification problem. In: Proceedings of the 19th Irish Conference on Artificial Intelligence and Cognitive Science (2008) 12. Darwiche, A.: New advances in compiling CNF to decomposable negation normal form. In: Proceedings of the 16th European Conference on Artificial Intelligence, pp. 318–322. IOS Press (2004) 13. Mironov, I., Zhang, L.: Applications of sat solvers to cryptanalysis of hash functions. In: International Conference on Theory and Applications of Satisfiability Testing, pp. 102–115. Springer (2006) 14. Yager, R., Zadeh, L. (eds.): An Introduction to Fuzzy Logic Applications in Intelligent Systems, vol. 165. Springer, New York (2012) 15. De Silva, C.: Intelligent control: fuzzy logic applications. CRC Press, Boca Raton (2018) 16. Srivastava, P., Bisht, D.: Recent trends and applications of fuzzy logic. In: Advanced Fuzzy Logic Approaches in Engineering Science, pp. 327–340. IGI Global (2019) 17. Xu, L., Hoos, H., Leyton-Brown, K.: Predicting satisfiability at the phase transition. In: 26th AAAI Conference on Artificial Intelligence (2012) 18. De Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 337– 340. Springer, Heidelberg (2008) 19. Koller, D., Friedman, N., Dˇzeroski, S., Sutton, C., McCallum, A., Pfeffer, A., Abbeel, P., Wong, M., Heckerman, D., Meek, C., Neville, J.: Introduction to statistical relational learning. MIT press (2007) 20. Favalli, M., Dalpasso, M.: Applications of boolean satisfiability to verification and testing of switch-level circuits. J. Electron. Test. 30(1), 41–55 (2014) 21. Vizel, Y., Weissenbacher, G., Malik, S.: Boolean satisfiability solvers and their applications in model checking. Proc. IEEE 103(11), 2021–2035 (2015) 22. Aloul, F., El-Tarhuni, M.: Multipath detection using boolean satisfiability techniques. J. Comput. Netw. Commun. 2011 (2011)

A Teaching-Learning-Based Optimization with Modified Learning Phases for Continuous Optimization Onn Ting Chong1 , Wei Hong Lim1(B) , Nor Ashidi Mat Isa2 , Koon Meng Ang1 , Sew Sun Tiang1 , and Chun Kit Ang1 1 Faculty of Engineering, Technology and Built Environment,

UCSI University, 56000 Kuala Lumpur, Malaysia [email protected] 2 School of Electrical and Electronic Engineering, Universiti Sains Malaysia, 14300 Nibong Tebal, Malaysia

Abstract. The deviation between the modelling of teaching-learning based optimization (TLBO) framework and actual scenario of classroom teaching and learning process is considered as one factor which contributes to the imbalance of algorithm’s exploration and exploitation searches, hence restricting its search performance. In this paper, the TLBO with modified learning phases (TLBO-MLPs) is proposed to achieve better search performance of algorithm through the further refinement of learning framework so that it can reflect the actual teaching and learning processes in classroom more accurately. A modified teacher phase is first introduced in TLBO-MLPs, where each learner is modelled to have different perspectives of mainstream knowledge in classroom to maintain the diversity of population’s knowledge. A modified learner phase consisting of an adaptive peer learning mechanism and a self-learning mechanism are also proposed in TLBO-MLPs. The former mechanism enables each learner to interact with multiple learners in gaining new knowledge for different subjects, while the latter facilitates the update of new knowledge through personal efforts. The overall performances of TLBO-MLPs in solving the CEC 2014 test functions are compared with seven competitors. Extensive simulation results show that TLBO-MLPs has demonstrated the best search performance among all compared methods in solving majority of test functions. Keywords: Global optimization · Modified learning phases · Teaching-learning based optimization (TLBO)

1 Introduction Recently, there were overwhelming attention received in the research areas related to optimization given its promising performance in decision making. An optimization problem consists of an objective function to represent the intended goal. Referring to the problem characteristics, the optimal combination of decision variables can lead to the © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 103–124, 2020. https://doi.org/10.1007/978-3-030-52246-9_8

104

O. T. Chong et al.

largest or smallest objective function values. Given the rapid advancement of technology, substantial amounts of modern engineering problems are represented via the highly complex, nonlinear and large-scale optimization problems that are difficult to address with deterministic mathematical programing methods. It is crucial to develop the alternative optimization schemes that can handle these increasingly complex problems more robustly. Metaheuristic search algorithms (MSAs) consist of unique search strategy to emulate certain natural phenomena and their robust search performances enable them to tackle various complex real-world optimization problems. Swarm intelligence (SI) algorithms and evolutionary algorithms (EAs) are two main branches of MSAs with different sources of inspiration. For EAs, their search mechanisms are inspired by Darwin’s theory of evolution and some of the representatives include genetic algorithm (GA) [1], evolutionary strategies (ES) [2], genetic programming (EP) [3], differential evolution (DE) [4] and etc. Meanwhile, SI algorithms emulate the collective behavior of a group of simple agents that interact locally with each other and their environments in self-organized and decentralized manners. Some notable examples of SI are particle swarm optimization (PSO) [5], ant colony optimization [6], artificial bee colony (ABC) [7], grey wolf optimizer (GWO) [8] and etc. These MSAs are widely used by practitioners to tackle the wide range of modern optimization problems such as those reported in [9–19] because of their competitive advantages such as high efficiency and simple implementation. Motivated by the pedagogy of conventional classroom, a new SI algorithm known as the teaching-learning based optimization (TLBO) has emerged [20, 21]. Similar with most MSAs, TLBO has a large population of learners that represent different candidate solutions of a given optimization problem, while teacher represents the best solution found. The search trajectory of each learner is adjusted based on their interaction with teacher and other peer learners during the optimization process. In contrary to most existing MSAs, the TLBO has an additional advantage of not requiring any parameter tuning for specific type of algorithm such as inertia weight and acceleration coefficients of PSO, mutation and crossover rates of GA, and etc. Given these appealing features, TLBO was extended to solve more challenging types of optimization problems with greater complexities of fitness function landscapes such as those described in the multi-objective problems [22–24], constrained problems [25–27], etc. Although different TBLO variants were proposed to solve optimization problems with enhanced performances, their robustness in addressing the problems with different complexities of fitness landscapes (e.g. multimodal, expanded, hybrid and composite functions) remains arduous. Most TLBO variants only demonstrate good performance in limited classes of problems and deliver inferior optimization results for other types of problems due to the imbalance of algorithm’s exploration and exploitation strengths [21]. The design of robust mechanisms to attain proper regulation of exploration and exploitation strengths of TLBO variants remain as an open research question that is actively pursued by the researchers of this arena. It is also observed that some mechanisms adopted in the algorithmic framework of TLBO is not aligned with the real-world teaching and learning processes in classroom. For instance, the knowledge of each learner is updated based on the same teacher and mainstream knowledge of classroom that are represented by the best and mean solutions of population, respectively. In real world

A Teaching-Learning-Based Optimization with Modified Learning Phases

105

scenario, different learners tend to have different perspectives of mainstream knowledge in order to preserve the diversity level of knowledge acquired. Additionally, it is also observed that each learner tends to interact with only one peer learner for updating his or her knowledge in all subjects taken as shown in the TLBO’s learner phase. Nevertheless, it is more realistic for a learner to interact with several peers for enhancing the knowledge of different subjects. The discrepancies found between the TLBO framework modelling and real-world teaching and learning processes in classroom can be another factor to restrict the robustness of TLBO in handling the wide range types of optimization problems [28]. In this paper, a new TLBO variant known as teaching-learning based optimization with modified learning phases (TLBO-MLPs) is proposed to overcome the challenges mentioned above. The modelling of teaching and learning mechanisms proposed in the TLBO-MPLs framework is refined further for it to reflect the actual teaching and learning processes in classroom more accurately, hence leading to better search performance of algorithm. Some notable contributions of TLBO-MLPs are highlighted as shown below: – A modified teacher phase is designed in TLBO-MPLs by introducing a new concept of weighted mean position that aims to simulate the different perspectives of mainstream knowledge that are perceived by different learners in order to update their knowledge in teacher phase. – A modified learner phase is designed in TLBO-MLPs by allowing each learner to interact with multiple peers in updating his or her knowledge in different subjects (i.e., dimensional components). A peer learning probability is also introduced to quantify the tendency of each learner to interact with its peers in modified learner phase based on the knowledge level of learner. – The modified learner phase of TLBO-MLPs is incorporated with a self-learning process that aims to simulate the tendency of some learners to update his or her knowledge using personal efforts instead of interacting with his or her peers. – Rigorous performance evaluations are performed on TLBO-MLPs with CEC 2014 test functions and verified using statistical analyses. The remaining section of this research paper are summarized herein. The literature review of this work are provided in Sect. 2. Detailed descriptions of TLBO-MLPs are explained in Sect. 3. Extensive performance evaluation of TLBO-MLPs in solving the complete set of CEC 2014 test functions are described in Sect. 4, followed by the performance validation with statistical analyses. The conclusion drawn from this work and its future works are finally summarized in Sect. 5.

2 Literature Review 2.1 Conventional TLBO Teaching-learning-based optimization (TLBO) algorithm was proposed in [20] and its search mechanisms are motivated by classroom’s conventional teaching and learning processes. At the beginning stage of optimization, the random initialization is used to produce a group of learners with the population size of I. Each i-th learner of

106

O. T. Chong et al.

  Xi = Xi,1 , . . . , Xi,d , . . . , Xi,D , represents the potential solution for a given optimization problem, where d ∈ [1, D] and D refer to the index of learner and total dimensional size of problems to be optimized, respectively. Suppose that f (Xi ) is the objective function value of each i-th solution and it implies the knowledge level of every i-th learner in classroom, which can be enhanced via the teacher or learner phases. All learners are updated in the teacher phase by interacting with the best learner in population, i.e., the teacher solution represented with X teacher , by taking the average knowledge level of population denoted as X mean into account, where: 1 Xi I I

X mean =

(1)

i=1

Suppose that r1 denotes a uniformly distributed random number in the range of 0 to 1; Tf refers to a teaching factor with integer value of either 1 or 2 to emphasize the influences of mainstream knowledge in population X mean . Let Xinew be the new solution of the i-th learner obtained in teacher phase, then:   (2) Xinew = Xi + r1 X teacher − Tf X mean On the other hand, each i-th learner interacts with its peers in population during the learner phase in order to enhance its knowledge level (i.e., fitness). Denote s as the index of a randomly selected peer learner assigned to the i-th learner in learner phase, where s ∈ [1, I ] and s = i; r2 as a uniformly distributed random number in the range of 0 to 1. If the randomly selected learner Xs is fitter than Xi , the inferior Xi is attracted by the fitter Xs as depicted in Eq. (3). In contrary, a repel scheme is incorporated into Eq. (4) in order to prohibit the rapid convergence of the fitter Xi towards the inferior Xs that can lead to the premature convergence. Xinew = Xi + r2 (Xs − Xi )

(3)

Xinew = Xi + r2 (Xi − Xs )

(4)

The new solution Xinew obtained by the i-th learner during the teacher and learner phases can be used to update the original Xi if the former solution is found to be more superior to the latter one. Otherwise, Xinew is discarded due to its inferior fitness. The knowledge of all TLBO learners are update iteratively via both of the teacher phase and learner phase prior to the satisfaction of termination conditions. At the end of search process, the teacher solution of X teacher is obtained as the best solution to address a given optimization problem. 2.2 TLBO Variants and Improvement Significant amounts of extended works of TLBO were reported since its inception for enhancing its performance [21]. A promising research focus to improve the TLBO’s performance is through the employment of parameter adaptation strategy. In [29], an

A Teaching-Learning-Based Optimization with Modified Learning Phases

107

adaptive weight factor that decreases linearly with iteration number was introduced to encourage the TLBO learners focus more on exploration during the earlier phase of optimization and then exploit the smaller search space in later phase. Another similar adaptive weight was also proposed in [30], aiming to improve the exploration ability of TLBO. The learning efficiency of TLBO was enhanced in [31] using the concepts of acceleration coefficients and inertia weight to determine the influence of previous learner and learning step size, respectively, based on the fitness of each learner. A nonlinear inertia weighted TLBO (NIWTLBO) was proposed in [32]. The nonlinear inertia weight factor was employed to regulate the memory rate and scale the learner’s existing knowledge, while a time-varying weight value was used to enhance the differential increment and impose the random fluctuations on existing TLBO learners. A TLBO variant with population size varies in triangle form was proposed in [33]. Gaussian distribution was used to produce the new learners when population size of TLBO increases from minimum to maximum, while the similarity concept was employed to discard the redundant learners when the population size decreases from maximum to minimum. Neighborhood structure is another common strategy used to improve performance of TLBO given its ability to govern the propagation rate of best solution in the population. A ring topology was introduced in [34] to enhance the exploration strength of TLBO by facilitating the information exchange with its two nearest neighbors during the teacher phase. In [35], a quasi von Neumann topology was integrated into the learner phase of TLBO to enable each learner improving its knowledge level via the conventional search operator or neighborhood search operator with certain probability. Multi-population scheme is also commonly used to construct different neighborhood structures of TLBO so that the diverse areas of search space can be explored by different subpopulations simultaneously. For instance, a dynamic group strategy (DGS) was proposed into DGSTLBO in [36] to divide the population into multiple groups with equal numbers of learners based on their Euclidean distance. Apart from learning via the teacher and subpopulation’s mean, there is a probability for learners to update their knowledge with a quantum-behaved learning strategy. Different clustering techniques such as fuzzy K-means [37], adaptive clustering [38], random clustering [39] and etc. were applied to divide the main population of TLBO into certain cluster numbers by referring to a metric known as spatial distribution. A two-level hierarchical multi-swam cooperative TLBO (HMCTLBO) with enhanced exploration strength and population diversity was proposed in [40]. Each subpopulation was first constructed at the bottom layer via random partition and evolved independently. The best learner in each subpopulation at bottom layer was then selected to form the top layer and evolved through Gaussian sampling learning. Another popular approach widely used for enhancing the TLBO’s performance is through the modification of learning strategy. A modified TLBO variant was designed in [41] by replacing the original learning strategy of teacher phase with Gaussian probabilistic model, while the learner phase was incorporated with a neighborhood search operator and a permutation-based crossover to guide learners better in searching for more promising areas. In [42], the concept of tutorial class was introduced into the learner phase and performance enhancement of modified TLBO (mTLBO) was observed because of close collaborative interactions between the teacher and learners as well as among the leaners.

108

O. T. Chong et al.

An improved TLBO with learning experience of other learners (LETLBO) was proposed in [43] to facilitate the knowledge improvement of each learner by accessing to the experience information of other learners during the teacher and learner phases. In [44], an improved TLBO with differential learning (DLTLBO) was developed to achieve better population diversity and exploration capability. During the teacher phase of DLTLBO, two trial vectors were first produced using a neighborhood learning operator and a differential learning operator and this is followed by a crossover operation on these two trial vectors in order to produce a new learner. Another similar TLBO variant known as the TLBO-DE was proposed in [45] to generate new solution efficiently by using a hybrid search operator developed from the learning mechanism of original TLBO and differential learning. A TLBO with learning enthusiasm mechanism (LebTLBO) was proposed in [46] and it was inspired by the correlation between the grade of learners and their enthusiasm in acquiring knowledge. It was assumed that the learners with better grade have higher tendency to pursue new knowledge due to the higher learning enthusiasm and vice versa. In [47], a generalized oppositional TLBO (GOTLBO) with improved convergence characteristics was proposed by leveraging the benefits of opposition-based learning in producing new learners.

3 TLBO-MPLs This section elaborate the detailed search mechanism of proposed TLBO-MLPs. First, the concept of weighted mean position is introduced into modified teacher phase, aiming to avoid premature convergence of population by simulating the behaviors of learners with different mainstream knowledge in classroom. Second, the modified learner phase is equipped with a probabilistic mutation operator to simulate the tendency of some learners to choose the self-learning over peer-learning in enhancing their knowledge. Third, an adaptive peer learning mechanism is devised in the modified learner phase of TLBO-MLPs to determine the likelihood of a particular learner to interact with multiple learners in order to learn different aspects of subjects (i.e., decision variables). 3.1 Modified Teacher Phase of TLBO-MLPs During the teacher phase of conventional TLBO, it is observed from Eq. (1) that all learners contribute equally to obtain the population mean value of X mean . In addition, the search processes of all learners in teacher phase are guided by the same direction information obtained from the best learner (i.e., teacher) and mainstream knowledge of population (i.e., X mean ) as demonstrated in Eq. (2). If the teacher with best knowledge level is trapped into the suboptimal regions, the search processes of remaining TLBO learners will be misguided towards the local optimum bys identical X teacher and X mean . This behavior explains the tendency of conventional TLBO in experiencing the rapid diversity loss and premature convergence of population, particularly in tackling the optimization problems consisting of complicated search space. In order to address the aforementioned drawback, an alternative is proposed in the teacher phase of TLBO-MLPs to obtain the mean value of population. Unlike the conventional approach, it is suggested that each TLBO-MPLs learner has slightly different

A Teaching-Learning-Based Optimization with Modified Learning Phases

109

perceptions on the mainstream knowledge of population, hence different mean position should be derived as the unique source of influence to guide the search process of each learner in teacher phase. Let Xa be any a-th learner in the population that plays crucial rules to derive a unique mean position for guiding each i-th learner. In order to preserve the diversity level, define ra as a uniformly distributed random number in the range between 0 and 1 to indicate different weightage of Xa in deriving the unique mean position. Assume that X¯ imean is weighted mean position of each i-th learner, then X¯ imean

I a=1 =  I

ra Xa

a=1 ra

(5)

Based on X teacher and X¯ imean , the modified teacher phase of TLBO-MLPs updates the new solution of each i-th learner as follow:    Xinew = Xi + r3 X teacher − Tf 1 Xi + r4 X¯ imean − Tf 2 Xi (6) where r3 and r4 are the uniformly distributed random numbers in the range between 0 to 1; Tf 1 and Tf 2 refer two parameters known as teaching factor that can be set with values between 1 and 2. According to Fig. 1 and Eq. (6), every i-th TLBO-MLPs learner is able to update its knowledge by learning directly based on the knowledge gap observed between (i) the teacher and i-th learner and (ii) the weighted mean position of other learners in classroom and i-th learner. Algorithm 1: Modified Teacher Phase Input: I, Xi, i teacher

1: Identify the best learner in population as X ; 2: Calculate the weighted mean X imean of each learner using Eq. (5); 3: Randomly generate T f 1 and T f 2 where T f 1 , T f 2 ∈ [1, 2] ; 4: Update the new position X inew of i-th learner using Eq. (6); Output: X inew

Fig. 1. Pseudo-code for modified teacher phase of TLBO-MLPs.

3.2 Modified Learner Phase of TLBO-MLPs The exploration strength of TLBO is emphasized during the learner phase of algorithm via a repulsion scheme introduced in Eq. (4), particularly when a poor performing peer is randomly assigned to the i-th learner. When the search process of TLBO progresses further, majority of the learners tend to converge in certain region of search space and the population tend to be stabilized without having significant changes in term of diversity. Under this circumstance, it is less likely for a randomly selected peer learner to assist a given learner in jumping out from the local optima region, especially for those problems with complicated fitness landscape. Given these limitations, two major modifications

110

O. T. Chong et al.

are introduced into the modified learner phases of TLBO-MLPs and their mechanisms are described as follows. A stochastic-based mutation operator that aims

to perform perturbation on the TLBOMLPs learners with a probability of P MUT = 1 D is first incorporated into the modified learner phase as a new learning strategy. From the perspective of teaching and learning paradigm, the learners with different learning styles exist in a same classroom. Certain learners choose to adopt the self-learning approach over the peer interaction in enhancing their knowledge level. The incorporation of mutation scheme enable those selected TLBO-MLPs learners to perform self-learning during the modified learner phase after the modified teacher phase is completed. If any i-th learner is triggered with self-learning mechanism with the probability of P MUT , a randomly selected dimension of the i-th learner, denoted as dr ∈ [1, D] is perturbed as follow:   new (7) = Xi,dr + r5 XdUr − XdLr Xi,d r where r 5 is a uniformly distributed random number with the range in between −1 to 1; new , X U and X L represent the d -th dimension of i-th learner as well as the upper and Xi,d r i,dr i,dr r lower boundary values of decision variables, respectively. Figure 2 provides the detailed description of the aforementioned self-learning mechanism. Algorithm 2: Self Learning Input: Xi, D, XU, XL 1: Randomly generate a dimension index of dr ∈ [1, D] ; 2: Extract the dr-th component of Xi, XU, and XL; 3: Perform perturbation on X i , d to produce X inew using Eq. (7); ,d r

r

4: Return X inew as the perturbed solution; Output: X inew

Fig. 2. Pseudo-code for self-learning mechanism of TLBO-MLPs.

For other learners that prefer to achieve knowledge improvement through the peer interaction, an adaptive peer learning mechanism is introduced into TLBO-MLPs, intending to enhance the search efficiency of algorithms. During the learner phase of conventional TLBO, each learner can only interact with one peer to update all of its dimensional components as shown in Eqs. (3) and (4). These formulations might not be able to describe the actual scenario in classroom accurately because a peer learning process can be more effective if every learner can interact with more peers. Furthermore, different learners might be more knowledgeable in certain subjects (i.e., dimensions), hence those weaker subjects have higher urgency to be improved further via the peer interaction. Based on these motivations, an adaptive peer-learning strategy is developed to enhance the search efficiency of TLBO-MLPs by emulating a more accurate peer learning mechanism explained as follows. After completing the modified teacher phase, all TLBO-MPLs learners are sorted based on their current fitness values from the best to worst. Let Ri be the ranking of each i-th learner, then Ri = I − i

(8)

A Teaching-Learning-Based Optimization with Modified Learning Phases

111

From Eq. (8), the fitter learners are assigned with higher ranking value and vice versa. Let PiPL ∈ [0, 1] be the peer learning probability of every i-th learner, i.e.,: PiPL = 1 −

Ri I

(9)

Given the peer learning probability PiPL , the new solution of each i-th learner represented using Xinew can be produced using the following procedures. For every d-th new , a random number r ∈ [0, 1] is gendimension of the i-th new solution denoted as Xi,d 6 erated from uniform distribution and then compared with the peer learning probability PiPL of the i-th learner. If r6 is smaller than PiPL , three learners denoted as Xj , Xk and Xl new as indicated are randomly selected from the population to produce a new value for Xi,d in Eq. (10), where i = j = k = l. Otherwise, the i-th learner can retain its original Xi,d new . Define φ ∈ [0.5, 1] as the peer learning factor of the i-th learner and it value in Xi,d i is randomly generated from uniform distribution, the adaptive peer learning mechanism of TLBO-MPLs can be formulated as follow:  Xj,d + φi Xk,d − Xl,d , if r6 < PiPL or d = drand new (10) Xi,d = Xi,d , otherwise As shown in Eq. (10), the i-th learner with worse fitness value (i.e., higher PiPL ) has higher tendency to interact with its peers in order to update most of its dimensional components of Xinew as compared to those with better fitness (i.e., lower PiPL ). Unlike the conventional TLBO, the adaptive peer learning mechanism proposed in TLBO-MPLs not only allow a given learner to interact with multiple peers, but it can also adaptively determine the tendency of the learner to update its dimensional components via peer interaction based on its fitness value. The pseudocode used to describe the modified learner phase of TLBO-MLPs are presented in Fig. 3. Algorithm 3: Modified Learner Phase Input: Xi, D, XU, XL, Pi PL 1: Randomly generate rand ∈ [ 0,1] ; MUT

2: if rand ≤ P then /*Perform self-learning*/ 3: Produce X inew using Self Learning (Algorithm 2); 4: else /*Perform adaptive peer-learning*/ 5: for d = 1 to D then 6: Calculate X inew using Eq. (10); ,d 7: end for 8: end if Output: X inew

Fig. 3. Pseudo-code for modified learner phase of TLBO-MLPs.

3.3 Overall Framework of TLBO-MLPs The overall framework of TLBO-MLPs is summarized in Fig. 4, where γ and  max represent a counter used to record the number of fitness evaluations (FEs) consumed and

112

O. T. Chong et al.

the maximum FEs number used as termination criterion of TLBO-MLPs, respectively. At the beginning stage of optimization, all learners of TLBO-MLPs are randomly initialized and their associated fitness values are evaluated. Crucial information such as the teacher solution, weighted mean position, peer learning probability and etc. can be determined based on these fitness values. The new solution of each learners can be obtained via the modified teacher phase and the modified learner phase explained in Fig. 1 and 3, respectively. The search process is repeated until the termination condition defined as γ >  max is satisfied. The teacher solution obtained in the final stage is returned as the best solution to solve a given optimization problem.

4 Performance Evaluation on Test Functions 4.1 Test Functions and Performance Metric Performance evaluation of TLBO-MLPs is conducted using 30 real-parameter single objective optimization function introduced in CEC 2014 [48]. These test functions consist of different characteristics and they can be categorized as unimodal functions G1 (F1–F3), simple multimodal functions (F4–F16), hybrid functions (F17–F22) and composition functions (F23–F30). The mean fitness of F mean and standard deviation of SD are adopted to measure the search performance of all compared algorithms. In particular, F mean indicates the mean error value between the fitness value of best solution obtained by a compared algorithm and the theoretical value of global optimum for a function in multiple simulation runs. Meanwhile, the consistency of a compared algorithm to solve a given test function is evaluated using SD. Smaller values of F mean and SD imply the capability of an algorithm in tackling a test function consistently with promising search accuracy. The non-parametric statistical procedures are also employed to compare the proposed TLBO-MLPs and other peer algorithms rigorously. Wilcoxon signed rank test [49] was first applied to compare TLBO-MLPs with each of its peer in pairwise manner at the significance level of α = 0.05 with the results of R+ , R− , p and h values. The sum of ranks that indicated the outperformance and underperformance of TLBO-MLPs against each of its compared peers are represented by R+ and R− , respectively. The p value is a threshold level to identify the significant performance deviations between the compared algorithms. The better results of an algorithm is statistically significant if the p value obtained is smaller than α. Based on the p and α values, the h value is used to conclude if TLBO-MLPs is significantly better (i.e., h = ‘+’), insignificant (i.e., h = ‘=’) or significantly worse (i.e., h = ‘−’) than its compared peers. As for the multiple comparison of algorithms with non-parametric statistical analysis, Friedman test is first performed to determine the average rank values of all compared algorithms [50]. If significant global differences are observed between these algorithms based on the p and α values, three post-hoc procedures of Bonferroni-Dunn, Holm and Hochberg are employed to further investigate the concrete performance deviation among all algorithms by referring to their adjusted p-values [50].

A Teaching-Learning-Based Optimization with Modified Learning Phases Algorithm 4: TLBO-MLPs Input:

Algorithm 1

Algorithm 3

Output:

Fig. 4. Pseudo-code for complete framework of TLBO-MLPs.

113

114

O. T. Chong et al.

4.2 Parameter Settings for Compared Algorithms The search performance of TLBO-MLPs in tacking all CEC 2014 test functions were compared with seven well-established TLBO variants know as: conventional TLBO [20], modified TLBO (mTLBO) [42], differential learning TLBO (DLTLBO) [44], nonlinear inertia weighted TLBO (NIWTLBO) [32], TLBO with learning experiences of other learners (LETLBO) [43], generalized oppositional TLBO (GOTLBO) [47] and TLBO with learning enthusiasm mechanism (LebTLBO) [46]. The parameter settings of all compared algorithms are presented in Table 1. To ensure fair comparison, all TLBO variants are simulated independently for 30 times in solving all CEC 2014 test function at D = 30 using the maximum fitness evaluation numbers of  max = 10000×D. All simulations are performed using Matlab 2019a on a workstation equipped with Intel ® Core i7-7500 CPU @ 2.0 GHz. Table 1. Parameter settings of all eight selected TLBO variants. Algorithm

Parameter settings

TLBO

Population size I = 50

mTLBO

I = 50

DLTLBO

I = 50, scale factor F = 0.5, crossover rate CR = 0.9 and neighborhood size ns = 3

NIWTLBO

I = 50, inertia weight ω = 0 ∼ 1.0

LETLBO

I = 50

GOTLBO

I = 50, jumping rate Jr = 0.3

LebTLBO

I = 50, maximum learning enthusiasm LE max = 1.0, minimum learning enthusiasm LE min = 0.3 and F = 0.9.

TLBO-MLPs

I = 50, peer learning factor φn ∈ [0.5, 1]

4.3 Comparison of Search Performance for All Algorithms The F mean and SD values produced by the proposed TLBO-MLPs and seven other peer algorithms in tackling all of the CEC 2014 functions are provided in Table 2. The best and second best results are shown in boldface and underline text, respectively. Table 2 also summarized the performance analysis between TLBO-MLPs and its peers in terms of w/t// and #BR. In particular, w/t/l shows that TLBO-MPLs perform better in w functions, perform similar in t functions and perform worse in l functions. #BR is the number of test function with best F mean result produced by each algorithm. For the unimodal functions of F1 to F3, it is observed that the proposed TLBOMLPs produces two best F mean values that lead to the global optimum of functions F2 and F3. The performance of LeTLBO in tackling unimodal functions are also promising because it produces the best and second results in the functions F1 and F2, respectively. In contrary, the search performances of mTLBO in solving unimodal functions are inferior

F9

F8

F7

F6

F5

F4

F3

7.31E+01

1.27E+01

SD

1.13E+01

SD

F mean

6.82E+01

F mean

6.88E−02

1.92E−01

SD

2.18E+00

SD

F mean

1.53E+01

5.12E−02

SD

F mean

2.09E+01

F mean

6.64E+01

4.35E+01

SD

4.19E+01

SD

F mean

4.33E+01

1.84E+00

SD

F mean

8.15E−01

1.80E+05

F mean

SD

F2

2.75E+05

F mean

F1

TLBO

Metrics

Function.

2.89E+01

1.25E+02

2.46E+01

1.16E+02

2.36E+01

6.00E+01

2.27E+00

2.42E+01

5.60E−02

2.09E+01

1.17E+02

3.85E+02

1.73E+01

4.68E+00

1.52E+09

1.18E+09

4.03E+07

6.46E+07

mTLBO

1.35E+01

5.77E+01

9.52E+00

4.82E+01

3.32E−02

2.67E−02

2.67E+00

1.27E+01

5.98E−02

2.09E+01

3.11E+01

8.27E+01

4.39E−01

9.06E−02

3.09E−05

1.11E−05

1.41E+05

2.24E+05

DLTLBO

2.15E+01

1.72E+02

1.96E+01

1.42E+02

1.42E+00

3.15E−01

3.01E+00

2.88E+01

4.68E−02

2.09E+01

3.17E+01

9.32E+01

5.73E+01

2.52E+01

2.49E+02

1.78E+02

3.70E+05

4.95E+05

NITLBO

2.37E+01

1.00E+02

1.76E+01

8.96E+01

2.86E−02

1.81E−02

3.31E+00

2.25E+01

4.61E−02

2.09E+01

3.17E+01

3.59E+01

5.47E+00

4.15E+00

6.33E−02

2.24E−02

2.74E+05

3.29E+05

LETLBO

1.40E+01

7.31E+01

1.35E+01

6.35E+01

4.73E−02

2.84E−02

2.28E+00

1.42E+01

2.19E−01

2.08E+01

3.57E+01

4.97E+01

2.03E+02

1.28E+02

3.61E+00

2.48E+00

1.31E+05

2.10E+05

GOTLBO

1.14E+01

5.68E+01

9.07E+00

4.96E+01

1.43E−02

9.67E−03

2.38E+00

1.12E+02

7.14E−02

2.09E+01

2.77E+01

1.60E+01

3.59E−01

3.17E−01

9.52E−08

6.05E−08

7.25E+04

1.08E+05

LebTLBO

Table 2. Performance comparison between TLBO-MLPs with seven peer algorithms in 30 CEC 2014 test functions.

(continued)

7.94E+00

1.19E+02

5.24E−01

4.97E−01

0.00E+00

0.00E+00

0.00E+00

0.00E+00

4.21E−02

2.07E+01

3.05E−03

9.63E−04

0.00E+00

0.00E+00

0.00E+00

0.00E+00

1.07E+06

3.19E+06

TLBO-MLPs A Teaching-Learning-Based Optimization with Modified Learning Phases 115

F18

F17

F16

F15

F14

F13

F12

2.58E+03

2.80E+03

SD

1.09E+05

SD

F mean

1.19E+05

4.09E−01

SD

F mean

1.15E+01

SD

F mean

1.63E+01

7.95E+00

F mean

2.53E−01

4.85E−02

SD

1.03E−01

SD

F mean

4.51E−01

3.06E−01

SD

F mean

2.46E+00

SD

F mean

5.33E+03

1.18E+03

F mean

5.39E+02

SD

F11

1.62E+03

F mean

F10

TLBO

Metrics

Function.

8.10E+04

1.92E+04

1.24E+06

6.56E+05

7.48E−01

1.11E+01

1.46E+03

1.20E+03

8.51E+00

2.04E+01

9.83E−01

1.77E+00

2.66E−01

2.50E+00

5.48E+02

3.28E+03

6.57E+02

2.87E+03

mTLBO

1.41E+03

1.04E+03

5.10E+03

5.05E+03

5.66E−01

1.09E+01

7.53E+00

1.39E+01

3.70E−02

2.40E−01

6.29E−02

3.60E−01

3.75E−01

2.29E+00

1.22E+03

2.40E+03

3.20E+02

7.79E+02

DLTLBO

2.34E+03

2.06E+03

3.08E+04

2.52E+04

4.92E−01

1.16E+01

2.96E+02

1.74E+02

1.67E−01

2.91E−01

1.44E−01

5.93E−01

6.19E−01

1.93E+00

7.21E+02

3.05E+03

4.38E+02

3.15E+03

NITLBO

Table 2. (continued)

2.98E+03

2.48E+03

5.24E+04

7.70E+04

4.88E−01

1.12E+01

5.08E+00

1.43E+01

1.28E−01

2.92E−01

1.00E−01

4.07E−01

4.00E−01

2.37E+00

9.57E+02

5.29E+03

6.85E+02

2.53E+03

LETLBO

3.25E+03

1.97E+03

1.22E+05

1.27E+05

4.41E−01

1.16E+00

4.91E+00

1.39E+01

4.70E−02

2.61E−01

7.86E−01

3.89E−01

5.29E−01

1.11E+00

9.38E+02

3.83E+03

6.59E+02

1.86E+03

GOTLBO

1.27E+03

1.25E+03

3.34E+04

5.14E+04

4.79E−01

1.10E+01

3.06E+00

9.00E+00

4.38E−02

2.58E−01

5.18E−02

3.17E−01

2.72E−01

2.57E+00

1.71E+03

4.09E+03

3.66E+02

1.02E+03

LebTLBO

(continued)

5.16E+00

5.65E+01

6.92E+02

2.84E+03

2.58E−01

1.03E+01

1.10E+00

1.18E+01

1.32E−02

1.12E−01

1.69E−02

1.34E−01

1.13E−01

1.02E+00

2.93E+02

4.64E+03

5.18E+01

3.95E+01

TLBO-MLPs

116 O. T. Chong et al.

F27

F26

F25

F24

F23

F22

F21

5.18E+02

1.60E+02

SD

3.77E+01

SD

F mean

1.17E+02

F mean

2.00E+02

1.45E−13

SD

1.03E−03

SD

F mean

2.00E+02

4.41E−12

SD

F mean

3.15E+02

SD

F mean

2.61E+02

1.22E+02

F mean

3.58E+04

2.47E+04

SD

1.77E+02

SD

F mean

4.36E+02

2.19E+01

F mean

SD

F20

2.84E+01

F mean

F19

TLBO

Metrics

Function.

2.48E+02

9.40E+02

4.93E+01

1.41E+02

7.51E+00

2.04E+02

7.93E−04

2.00E+02

1.98E+01

3.53E+02

2.09E+02

5.32E+02

2.86E+04

3.18E+04

1.21E+02

2.72E+02

3.50E+01

8.02E+01

mTLBO

1.23E+02

5.66E+02

4.49E+01

1.27E+02

3.29E+00

2.10E+02

1.16E+01

2.09E+02

1.27E−11

3.15E+02

8.92E+01

1.66E+02

5.64E+02

6.08E+02

1.73E+01

2.03E+01

1.51E+01

1.05E+01

DLTLBO

2.76E+02

5.24E+02

3.03E+01

1.90E+02

0.00E+00

2.00E+02

1.99E−05

2.00E+02

0.00E+00

2.00E+02

2.93E+02

6.41E+02

8.91E+03

1.44E+04

1.56E+02

3.75E+02

1.40E+01

2.30E+01

NITLBO

Table 2. (continued)

2.11E+02

6.15E+02

4.78E+01

1.34E+02

2.41E+00

2.01E+02

9.76E−04

2.00E+02

9.15E−12

3.15E+02

1.36E+02

2.72E+02

2.14E+04

3.25E+04

8.83E+01

3.16E+02

2.41E+01

2.33E+01

LETLBO

3.00E+02

2.00E+02

4.05E+01

1.20E+02

1.73E−13

2.00E+02

8.59E−04

2.00E+02

1.89E−13

2.00E+02

1.14E+02

2.61E+02

4.04E+04

5.07E+04

2.14E+02

4.87E+02

1.50E+01

1.37E+01

GOTLBO

1.05E+02

5.35E+02

1.82E+01

1.04E+02

3.93E+00

2.01E+02

2.25E−03

2.00E+02

1.09E−12

3.15E+02

8.54E+01

1.92E+02

4.02E+04

2.97E+04

4.22E+01

2.12E+02

1.32E+00

6.17E+00

LebTLBO

(continued)

4.57E+01

3.27E+02

5.26E+01

1.50E+02

0.00E+00

2.00E+02

1.42E−06

2.00E+02

2.21E−13

3.15E+02

4.08E+01

4.26E+01

8.56E+01

3.44E+00

2.11E+00

2.23E+01

9.12E−01

4.96E+00

TLBO-MLPs A Teaching-Learning-Based Optimization with Modified Learning Phases 117

2

#BR

1.12E+03

SD

24/3/3

2.62E+03

F mean

SD

w/t/l

F30

4.33E+05

2.36E+06

F mean

2.48E+02

SD

F29

1.19E+03

F mean

F28

TLBO

Metrics

Function.

1

27/1/2

6.81E+04

8.61E+04

1.21E+07

5.45E+06

5.34E+02

2.36E+03

mTLBO

2

24/1/5

5.67E+02

1.80E+03

3.16E+02

1.11E+03

6.05E+01

9.74E+02

DLTLBO

3

24/2/4

7.84E+02

2.93E+03

3.58E+06

9.18E+05

3.98E+02

1.88E+03

NITLBO

Table 2. (continued)

1

25/2/3

1.05E+05

3.19E+03

4.21E+06

1.18E+06

3.18E+02

1.29E+03

LETLBO

5

21/2/7

8.29E+02

2.46E+03

1.71E+06

3.13E+05

1.59E+02

2.29E+02

GOTLBO

5

23/2/5

5.75E+02

2.24E+03

2.00E+02

9.21E+02

5.87E+01

9.45E+02

LebTLBO

21



6.04E+02

1.44E+03

2.10E+02

8.29E+02

5.67E+01

7.86E+02

TLBO-MLPs

118 O. T. Chong et al.

A Teaching-Learning-Based Optimization with Modified Learning Phases

119

because it produces two worst results of F mean in functions F1 and F2. For the simple multimodal functions of F4 to F16, TLBO-MLPs has demonstrated the most dominating search accuracy in tackling these 12 test functions for being able to produce ten best F mean values (i.e., functions F4 to F8, F10, F12 to F14 and F16) and one second best F mean value (i.e., function F15). The search performances of DLTLBO and LebTLBO in solving multimodal functions are promising as well. In particular, DLTLBO produces the best F mean value in function F11 and the second best F mean values in functions F6, F8, F9, F10, F14 and F16. Meanwhile, LeTLBO has demonstrated good performance in solving functions F9 and F15 with the best search accuracy and functions F4, F7 and F13 with the second best search accuracy. For the hybrid functions of F17 to F22, the proposed TLBO-MPLs is reported to solve these six functions successfully with five best F mean values (i.e., functions F17 to F19, F21 and F22) and one second best F mean value (i.e., function F20). DLTLBO is observed as the second most competitive algorithm in handling the hybrid functions for being able to solve the function F20 with best search accuracy and functions F17, F18, F21 and F22 with second best F mean value. Although the LeTLBO has solved the unimodal and simple multimodal functions with relatively good results, some performance degradations are demonstrated when it is used to tackle the more complex hybrid function because it can only achieve the second best F mean value for function F19. For the composite functions of F23 to F30, the search accuracies demonstrated by both TLBO-MLPs and GOTLBO in addressing these eight challenging problems are comparable. Particularly, the proposed TLBO-MLPs has successfully solve these composite functions with four best F mean values (i.e., functions F24, F25, F29 and F30) and three second best F mean values (i.e., functions F23, F27 and F28). For GOTLBO, it is reported to produce five best F mean values for functions F23 to F25, F27 and F28. Other compared algorithms that did not perform well in solving the unimodal and multimodal functions such as TLBO, mTLBO, NITLBO, LETLBO are also reported to produce at least one best or second best results in solving the tested composite functions. In summary, the proposed TLBO-MLPs is reported to have best optimization performance in tackling these 30 CEC 2014 test functions by producing a total of 21 best F mean values and five second best F mea values. Notably, the proposed TLBO-MLPs is the only algorithm that has excellent capability in locating the global optima of functions F2, F3, F6 and F7. Both of the modified teacher and learner phases incorporated into the proposed TLBO-MLOs enable it to have the competitive robustness in handling the optimization problems with various types of fitness landscapes (i.e., unimodal, multimodal, hybrid and composite functions) with excellent search accuracy. In contrary, other compared methods have the drawbacks of only able to solve certain types of problems competitively. For example, the search performance of LebTLBO in solving unimodal, multimodal and composite functions are promising but its search accuracy in handling hybrid functions is relatively poor. DLTLBO is able to solve both multimodal and hybrid functions with good accuracy but it has relatively inferior performance in handling the unimodal and composite functions. The Wilcoxon signed rank test is also used to compare TLBO-MLPs and each of the selected peer in pairwise manner [49]. The non-parametric statistical analysis results of R+ , R− , p and h values are reported in Table 3. Notable enhancement of TLBO-MLPs

120

O. T. Chong et al.

over TLBO, mTLBO, DLTLBO, NIWTLBO, LETBLO and LebTLBO are observed in Table 3 as shown by the value of h = + at the significance level of α = 0.05. While there are no significant differences observed between TLBO-MLPs and GOTLBO in Table 3 as indicated by the h-value of ‘=’, the Wilcoxon signed rank test would confirm that the TLBO-MLPs can significantly outperform the GOTLBO if α = 0.10. Table 3. Wilcoxon signed rank test for the pairwise comparison between TLBO-MLPs with peer algorithms. TLBO-MLPs vs. R+

R−

p-value

h-value

TLBO

376.5

58.5 5.63E−04 +

mTLBO

407.0

28.0 4.00E−05 +

DLTLBO

337.0

98.0 9.47E−03 +

NIWTLBO

394.5

70.5 8.31E−04 +

LETLBO

405.5

59.5 5.59E−04 +

GOTLBO

326.5 138.5 5.19E−02 =

LebTLBO

362.5 102.5 7.27E−03 +

Apart from pairwise comparison, the optimization performance of TLBO-MLPs is also evaluated using Friedman test for multiple comparison analysis [50]. Table 4 reported that TLBO-MLPs and all the peer algorithms are ranked according to the corresponding F mean as followed: TLBO-MLPs, LebTLBO, DLTLBO, GOTLBO, TLBO, LETLBO, NIWTLBO and mTLBO. The p-value obtained from the chi-square statistics of Friedman test is reported to be 0.00E + 00 which is smaller than α = 0.05. This analysis results shows that notable difference exists among all algorithms from the global perspective. Referring to the Friedman test results, a set of post-hoc statistical analyses known as Bonferroni-Dunn, Holm and Hochberg procedures are further performed to investigate the significant differences with the proposed TLBO-MLPs [50]. Table 5 reported that the adjusted p-values (APVs) associated with the three aforementioned post-hoc procedures. Referring to the threshold value of significant level α = 0.05, all post-hoc procedures can verify the substantial performance demonstrated by the TLBOMLPs over the mTLBO, NIWTLBO, LETLBO, TLBO and GOTLBO. No significant differences are reported between TLBO-MLPs over DLTLBO and LebTLBO through these three post- hoc analysis procedures.

A Teaching-Learning-Based Optimization with Modified Learning Phases

121

Table 4. Average ranking and the p-value obtained from Friedman test. Algorithm

Ranking

Chi-Square Statistic

p-value

TLBO-MLPs

2.17

8.51E+01

0.00E+00

TLBO

5.17

mTLBO

6.85

DLTLBO

3.33

NIWTLBO

5.73

LETLBO

5.4

GOTLBO

4.2

LebTLBO

3.15

Table 5. APVs for Bonferroni-Dunn, Holm and Hochberg procedures TLBO-MLPs vs. Bonferroni-Dunn Holm p p

Hochberg p

mTLBO

0.00E+00

0.00E+00

0.00E+00

NIWTLBO

0.00E+00

0.00E+00

0.00E+00

LETLBO

2.00E−06

2.00E−06 2.00E−06

TLBO

1.50E−05

8.00E−06 8.00E−06

GOTLBO

9.13E−03

3.91E−03 3.91E−03

DLTLBO

4.56E−01

1.30E−01 1.20E−01

LebTLBO

8.40E−01

1.30E−01 1.20E−01

5 Conclusions A new TLBO variant known as the teaching-learning based optimization with modified learning phases (TLBO-MLPs) is proposed in this paper, aiming to enhance its robustness in handling the optimization problems with different types of characteristics. Substantial efforts are made to refine the framework of TLBO-MLPs so that the teaching and learning mechanisms adopted can represent the real world scenario of classroom paradigm accurately, hence leading to the improvement of search performance. A modified teacher phase is first designed in the TLBO-MLPs to preserve the population diversity level by enabling the learners to have different perceptions of mainstream knowledge in updating their knowledge. Meanwhile, the modified learner phase incorporated into TLBO-MLPs aims to improve its convergence characteristic by allowing each learner to interact with in different subjects based on the adaptive peer learning probability.

122

O. T. Chong et al.

A self-learning mechanism is also introduced to facilitate the behaviors of certain learner in updating the knowledge through personal efforts rather than via peer interactions. Comprehensive analyses were conducted to assess the performances of TLBOMLPs in tackling CEC 2014 test functions. Rigorous comparison between TLBO-MLPs and seven other TLBO variants are verified through non-parametric statistical analysis procedures. As the future studies, an extensive theoretical framework can be formulated to investigate convergence characteristic of TLBO-MLPs. It is also worth to study the applicability of using the proposed TLBO-MLPs to handle the real-world optimization problems such as the material machining optimization problems and robust controller design optimization problems. Finally, the feasibility of TLBO-MLPs in tackling the more challenging optimization problems with multimodal, multi-objective, constrained and dynamic characteristics is another promising future research directions.

References 1. Whitley, D., Sutton, A.M.: Genetic algorithms — a survey of models and methods. In: Rozenberg, G., Bäck, T., Kok, J.N. (eds.) Handbook of Natural Computing, pp. 637–671. Springer, Heidelberg (2012) 2. Kramer, O.: Evolutionary self-adaptation: a survey of operators and strategy parameters. Evol. Intell. 3(2), 51–65 (2010) 3. Burke, E., Gustafson, S., Kendall, G.: A survey and analysis of diversity measures in genetic programming. In: Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, New York (2002) 4. Das, S., Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evol. Comput. 15(1), 4–31 (2011) 5. Valle, Y.D., Venayagamoorthy, G.K., Mohagheghi, S., Hernandez, J., Harley, R.G.: Particle swarm optimization: basic concepts, variants and applications in power systems. IEEE Trans. Evol. Comput. 12(2), 171–195 (2008) 6. Dorigo, M., Blum, C.: Ant colony optimization theory: a survey. Theor. Comput. Sci. 344(2– 3), 243–278 (2005) 7. Karaboga, D., Gorkemli, B., Ozturk, C., Karaboga, N.: A comprehensive survey: artificial bee colony (ABC) algorithm and applications. Artif. Intell. Rev. 42(1), 21–57 (2014) 8. Faris, H., Aljarah, I., Al-Betar, M.A.: Mirjalili, S,: Grey wolf optimizer: a review of recent variants and applications. Neural Comput. Appl. 30(2), 413–435 (2018) 9. Ang, C.K., Tang, S.H., Mashohor, S., Ariffin, M.K.A.M., Khaksar, W.: Solving continuous trajectory and forward kinematics simultaneously based on ANN. Int. J. Comput. Commun. Control 9(3), 253–260 (2014) 10. Alrifaey, M., Tang, S.H., Supeni, E.E., As’arry, A., Ang, C.K.: Identification and priorization of risk factors in an electrical generator based on the hybrid FMEA framwork. Energies 12(4), 649 (2019) 11. Lim, W.H., Isa, N.A.M.: Particle swarm optimization with dual-level task allocation. Eng. Appl. Artif. Intell. 38, 88–110 (2015) 12. Yao, L., Shen, J.Y., Lim, W.H.: Real-time energy management optimization for smart household. In: 2016 IEEE International Conference on Internet of Things (iThings), Chengdu, China, pp. 20–26 (2016)

A Teaching-Learning-Based Optimization with Modified Learning Phases

123

13. Yao, L., Damiran, Z., Lim, W.H.: Energy management optimization scheme for smart home considering different types of appliances. In: 2017 IEEE International Conference on Environment and Electrical Engineering and 2017 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe), Milan, Italy, pp. 1–6 (2017) 14. Solihin, M.I., Akmeliawati, R., Muhida, R., Legowo, A.: Guaranteed robust state feedback controller via constrained optimization using differential evolution. In: 6th International Colloquium on Signal Processing & its Applications, pp. 1–6 (2010) 15. Solihin, M.I., Wahyudi, Akmeliawati, R.: PSO-based optimization of state feedback tracking controller for a flexible link manipulator. In: International Conference of Soft Computing and Pattern Recognition, pp. 72–76 (2009) 16. Lim, W.H., Isa, N.A.M., Tiang, S.S., Tan, T.H., Natarajan, E., Wong, C.H., Tang, J.R.: Selfadaptive topologically connected-based particle swarm optimization. IEEE Access 6, 65347– 65366 (2018) 17. Sathiyamoorthy, V., Sekar, T., Natarajan, E.: Optimization of processing parameters in ECM of die tool steel using nanofluid by multiobjective genetic algorithm. Sci. World J. 2015, 6 (2015) 18. Yao, L., Lim, W.H., Tiang, S.S., Tan, T.H., Wong, C.H., Pang, J.Y.: Demand bidding optimization for an aggregator with a genetic algorithm. Enegies 11(10), 2498 (2018) 19. Yao, L., Damiran, Z., Lim, W.H.: A fuzzy logic based charging scheme for electric vehicle parking station. In: 2016 IEEE 16th International Conference on Environment and Electrical Engineering, Florence, Italy (2016) 20. Rao, R.V., Savsani, V.J., Vakharia, D.P.: Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput. Aided Des. 43(3), 303–315 (2011) 21. Zou, F., Chen, D., Xu, Q.: A survey of teaching–learning-based optimization. Neurocomputing 335, 366–383 (2019) 22. Natarajan, E., Kaviarasan, V., Lim, W.H., Tiang, S.S., Parasuraman, S., Elango, S.: Non-dominated sorting modified teaching-learning-based optimization for multi-objective machining of polytetrafluoroethylene (PTFE). J. Intell. Manuf. 31, 911–935 (2020) 23. Rao, R.V., Waghmare, G.G.: Multi-objective design optimization of a plate-fin heat sink using a teaching-learning-based optimization algorithm. Appl. Therm. Eng. 76, 521–529 (2015) 24. Natarajan, E., Kaviarasan, V., Lim, W.H., Tiang, S.S., Tan, T.H.: Enhanced multi-objective teaching-learning-based optimization for machining of Delrin. IEEE Access 6, 51528–51546 (2018) 25. Yua, K., Wang, X., Wang, Z.: Constrained optimization based on improved teaching–learningbased optimization algorithm. Inf. Sci. 352–353, 61–78 (2016) 26. Savsani, V.J., Tejani, G.G., Patel, V.K.: Truss topology optimization with static and dynamic constraints using modified subpopulation teaching–learning-based optimization. Eng. Optim. 48(11), 1990–2006 (2016) 27. Zheng, H., Wang, L., Zheng, X.: Teaching–learning-based optimization algorithm for multiskill resource constrained project scheduling problem. Soft. Comput. 21(6), 1537–1548 (2017) 28. Akhtar, J., Koshul, B., Awais, M.: A framework for evolutionary algorithms based on charles sanders peirce’s evolutionary semiotics. Inf. Sci. 236, 93–108 (2013) 29. Satapathy, S.C., Naik, A., Parvathi, K.: Weighted teaching-learning-based optimization for global function optimization. Appl. Math. 4(3), 429–439 (2013) 30. Cao, J., Luo, J.: A study on SVM based on the weighted elitist teaching-learning-based optimization and application in the fault diagnosis of chemical process. MATEC Web Conf. 22, 05016 (2015)

124

O. T. Chong et al.

31. Li, G., Niu, P., Zhang, W., Liu, Y.: Model NOx emissions by least squares support vector machine with tuning based on ameliorated teaching–learning-based optimization. Chemom. Intell. Lab. Syst. 126, 11–20 (2013) 32. Wu, Z.-S., Fu, W.-P., Xue, R.: Nonlinear inertia weighted teaching-learning-based optimization for solving global optimization problem. Comput. Intell. Neurosci. 2015(292576), 15 (2015) 33. Chen, D., Lu, R., Zou, F., Li, S.: Teaching-learning-based optimization with variablepopulation scheme and its application for ANN and global optimization. Neurocomputing 173, 1096–1111 (2016) 34. Wang, L., Zou, F., Hei, X., Yang, D., Chen, D., Jiang, Q.: An improved teaching–learningbased optimization with neighborhood search for applications of ANN. Neurocomputing 143, 231–247 (2014) 35. Chen, D., Zou, F., Li, Z., Wang, J., Li, S.: An improved teaching–learning-based optimization algorithm for solving global optimization problem. Inf. Sci. 297, 171–190 (2015) 36. Zou, F., Wang, L., Hei, X., Chen, D., Yang, D.: Teaching–learning-based optimization with dynamic group strategy for global optimization. Inf. Sci. 273, 112–131 (2014) 37. Zhai, Z., Li, S., Liu, Y., Li, Z.: Teaching-learning-based optimization with a fuzzy grouping learning strategy for global numerical optimization. J. Intell. Fuzzy Syst. 29(6), 2345–2356 (2015) 38. Reddy, S.S.: Clustered adaptive teaching–learning-based optimization algorithm for solving the optimal generation scheduling problem. Electr. Eng. 100(1), 333–346 (2018) 39. Li, M., Ma, H., Gu, B.: Improved teaching–learning-based optimization algorithm with group learning. J. Intell. Fuzzy Syst. 31(4), 2101–2108 (2016) 40. Zou, F., Chen, D., Lu, R., Wang, P.: Hierarchical multi-swarm cooperative teaching–learningbased optimization for global optimization. Soft. Comput. 21(23), 6983–7004 (2017) 41. Shao, W., Pi, D., Shao, Z.: An extended teaching-learning based optimization algorithm for solving no-wait flow shop scheduling problem. Appl. Soft Comput. 61, 193–210 (2017) 42. Satapathy, S.C., Naik, A.: Modified teaching–learning-based optimization algorithm for global numerical optimization—a comparative study. Swarm Evol. Comput. 16, 28–37 (2014) 43. Zou, F., Wang, L., Hei, X., Chen, D.: Teaching-learning-based optimization with learning experience of other learners and its application. Appl. Soft Comput. 37, 725–736 (2015) 44. Zou, F., Wang, L., Chen, D., Hei, X.: An improved teaching-learning-based optimization with differential learning and its application. Math. Probl. Eng. 2015(754562), 19 (2015) 45. Wang, L., et al.: A hybridization of teaching–learning-based optimization and differential evolution for chaotic time series prediction. Neural Comput. Appl. 25(6), 1407–1422 (2014) 46. Chen, X., Xu, B., Yu, K., Du, W.: Teaching-learning-based optimization with learning enthusiasm mechanism and its application in chemical engineering. J. Appl. Math. 2018(1806947), 19 (2018) 47. Chen, X., Yu, K., Du, W., Zhao, W., Liu, G.: Parameters identification of solar cell models using generalized oppositional teaching learning based optimization. Energy 99, 170–180 (2016) 48. Liang, J.J., Qu, B.Y., Suganthan, P.N.: Problem definitions and evaluation criteria for the CEC 2014 special session and competition on single objective real-parameter numerical optimization. Zhengzhou University, Zhengzhou China Computational Intelligence Laboratory (2013) 49. García, S., Molina, D., Lozano, M., Herrera, F.: A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 special session on real parameter optimization. J. Heuristics 15(6), 617 (2008) 50. Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 1(1), 3–18 (2011)

Use of Artificial Intelligence and Machine Learning for Personalization Improvement in Developed e-Material Formatting Application Kristine Mackare1(B) , Anita Jansone1 , and Raivo Mackars2 1 Faculty of Science and Engineering, Liepaja University, Liepaja, Latvia

[email protected] 2 IK, Liepaja, Latvia

Abstract. Although screen technology use in most daily tasks, including daily work tasks and educational reasons, a significant part of users and learners are having problems to read the more extended time without complains. Developed E-material formatting application is working based on the developed methodology for e-material formatting as an improvement for text perception from the screen. The methodology is limited and includes several variables for general formatting improvement and primary personalization. More in-depth and more specific personalization ask for a broader range of variables to be involved. That comes to an enormous number of potential configurations in total. It is not something that the human mind can operate and generate appropriate methodologies. Artificial Intelligence and Machine Learning use can improve personalization as can proses a huge amount of data and analyze algorithms faster to get a solution and reach the goal. Methodology: use case analysis of E-material formatting application. Results: Paper includes short description of current situation and developed application; description of existing application limitations and challenges; analysis for Artificial Intelligence and Machine Learning use in this case; analysis of necessary data for Machine Learning development; description of the information types Artificial Intelligence generates after learning process and description of the Artificial Intelligence process for e-material formatting personalization. Conclusions: Use of Artificial Intelligence and Machine Learning is reasonable for personalization improvement in developed E-material formatting application. Keywords: Artificial Intelligence · E-material formatting application · Machine learning

1 Introduction Smart technologies with screens are in daily use for achieving any of humans’ daily tasks. It includes personal and work tasks as well as for educational reasons and getting knowledge [1]. Despite the accessibility of the wide range different visual, graphical and audial solutions and materials, textual information still is on the top as popular e-material for everyday use [2]. © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 125–132, 2020. https://doi.org/10.1007/978-3-030-52246-9_9

126

K. Mackare et al.

Reading from screens and papers are different [3]. People are in need to adapt and use a new reading model [4], but it is a slow process, and human evolution of visual perception [5] is not as fast as technologies development [6]. A significant part of users’ is having complains after screen-reading [7]. It can make some different problems, including health, [8] that affect people life quality. The human visual system needs help. There is a need for a solution. As a possible solution for that is e-material formatting application. Appropriate formatting of e-materials should be applied for the natural and comfortable perception of text and its content. Typographical aspects of the text are in close cooperation with visual processes. Also, they should be helping the learning process and facilitating memorization.

2 Developed Application 2.1 Methodologies The developed methodology is several years of research for developing recommendations for e-material formatting based on vision science, user behavior and user needs and preferences. For methodology development several previous tasks have been done: • Broad range literature research about currently offered and available recommendations, guidelines and methodologies for e-materials have been done, and it showed unambiguity in suggestions. • Statistical research on digital devise and internet use in population as well as they involvement in educational activities Worldwide, Europe and Latvia have been done. • Users’ need and preference, and users’ habits research have been conducted. • Users’ complains, and vision problems related to near- and screen-work have been overviewed. • Patient data record analysis from practice shown vision conditions, most often symptoms and complains after screen work as well as refraction changes and other findings in ocular health. An extended presentation about all collected and analyzed data can be found in the previous publications [9–12]. In previous work [9, 13] it was proposed that recommendations of five main parameters as font type, font size, line spacing, the color of text and color of the background are suggested as the basic parameters for e-material formatting. Targeting groups was primarily divided into nine age groups: Children as pre-school 3–5 and grade-schooler 6–11, Teens 12–15, Youth 16–25, Young adults 26–35, adults 36–39, middle-aged adults 40–55, senior adults 55+, elderly 65+ [14]. However, the methodology is limited. It includes several variables for general formatting improvement and primary personalization. On methodology base, the application possibilities of formatting personalization are also limited.

Use of Artificial Intelligence and Machine Learning for Personalization Improvement

127

2.2 The Application Concept of the Application. The concept for application prototype had been made. It based on developed recommendations for e-material formatting and by users’ individual factors and need. The idea involves such component as an idea of the tool by itself: what it must do and how, what information must contain in the database, and vision of design. It must give a clear understanding of the application to the programmer what must be done. The app must work for both – e-material creators and e-material users. The main idea of the concept is represented by three main edges of the app: user, interface, and database. The scheme also represents collaboration processes between edges and the main idea of the app – text formatting of documents based on user groups. More detailed concept description is published previously [15–17]. The Prototype of Application The First Prototype of Application. Already developed first version prototype gives the overview of application work to follow implemented working schemes, relationships between main edges and the collaboration process with the user, material and database. For the possibility to give formatting recommendations, application collect necessary data from the user. That is followed by solution finding with a step-by-step recommendation and so-called tree-scheme of users answers and related application respond what gives a recommendation of text formatting and provides it. After the user has tried new formatting of e-material, the application provides a short questionnaire of user feedbacks. An application can be described from four sides: developers, e-material formatting users as readers, e-material creators and researchers. Application prototype is developed on Moodle type platform base but with possible transformation and adaptation for a different environment and wider range of use. As the application is developed to give access to the database to researchers, it helps in userhabit research and allows to keep the application up to date to reach application learning process. That is an important part of nowadays user-centric designs for users’ satisfaction. A more detailed description of the application prototype is published previously [15, 18]. The Second Prototype of Application. After the development of the first version of the prototype, authors have concluded that it is not only possible to make such an application but that there is room for improvements [18]. As the app will be developed to use on the e-study environment, in the second prototype have been used PHP7.3 programming language, as it is the server language which is used for interaction between the browser and the server, which is a perfect option for this app development. Using PHP7.3 functions allows overwriting the XML code, thus changing and modifying the e-materials, updating them to learner needs. Secondary, in the improved version of the prototype focus is to make it accessible and workable on all most popular formats of e-materials, including PDF. In theory, it is possible to develop the app as a web browser add-on which can modify any base code of the loaded webpage. Then it will be able to edit HTML format by modifying its CSS parameters too. It will be improved in the final version of the app. The third significant upgrade is that the app is made in such a way so it could be used with any e-material system not only Moodle. Moreover, as e-learning is getting more and more popular in web, beyond traditional schools and universities, then authors

128

K. Mackare et al.

concluded that the app must be able to format the web materials that are viewable only in browser therefor edit HTML and CSS. As it is planned to implement machine learning artificial intelligence to suggest best parameters as fonts, sizes and colors for each person, but it initially needs a decent size data of user choice preferences, then it is important to collect as much data on alpha tests as possible. The second prototype could be used for it. So, it was decided that on requesting file user will see two file previews on screen, on one side original file and on the other - prototypes modified version with an option to download any of versions. Therefore, if user choses to download modified version we get information that user is interested in personalized file he saw on preview and we can give a small survey on next time the user receives in system or downloads another file, did he liked or not personalized version and is there something he would change. Alternatively, if he chooses original file, we give an instant survey to find out was user not interested in personalized file as an option at all or user didn’t like personalized option that was offered.

3 Intelligence and Machine Learning Use in Improvement of e-Material Formatting Application 3.1 Benefits of Using Artificial Intelligence and Machine Learning in This Case Machine learning involves feeding complex algorithms, designed to carry out data processing tasks in a similar way to the human brain, with huge amounts of data. The result is computer systems which become capable of learning [19]. Why could the machine learning process make automatized formatting app so powerful as a tool for e-material personalization? If you think about it: • It is 5 basic parameters - font style, font size, space between lines, text color and background color - what are used as most important parameters to format text appropriately. • Each of parameters could be formatted by 3 recommended values (described in recommendations of e-material formatting methodologies) and at least 3 additional possible values (what are not in general recommendations as are more specific for individual users). – that’s from 125 general combinations what could be used for formatting improvement to 7776 total possible combinations - for formatting personalization. These combinations should be used based on huge number of users’ variables (about 34381): • users general features as age and gender - about 81 combinations, if look only on nine complex age groups not each age, • and variety of individual characteristics as vision problems, disabilities, limitations, reading or learning disorders as well as possible cultural and professional aspects (at least 343 combinations).

Use of Artificial Intelligence and Machine Learning for Personalization Improvement

129

That’s an enormous number of potential configurations in total. It is not something that the human mind can operate and optimize. Moreover, it is without taking into consideration some specific individual features and personal preferences what can come up only after the intensive learning process. So, with AI and machine learning there are opportunities to make further step and create e-material formatting tool with more personalized and individual approach as it allows to take in consideration as many variables as necessary for the best solution and opens possibilities to use other features what could be seen only while tool is being used and during its learning process. The goal is to build a product to automate the e-material formatting functions for the individual user completely. 3.2 Necessary Data for Machine Learning Development For successful Machine Learning process in the application, there is a need for a huge amount of data from which it is possible to learn. Necessary data for Machine Learning: • • • •

Previously developed methodologies and recommendations List of parameters for formatting List of variations of parameter values List of possible features which are involved or potentially affect user preferences or satisfaction of formatting • Users general information as – age – gender • Users specific personal information as existence or absence of – – – – – –

complains during or after screen-reading, vision problems or ocular health problems, disabilities, specific limitations, reading disorders, learning disorders.

• Information about users possible cultural and professional aspects. • Information about past satisfaction or dissatisfaction of users for various types of formatting. • Information about users’ manual changes. • Feedbacks from users.

130

K. Mackare et al.

3.3 Information Types Artificial Intelligence Generates After the Learning Process The type of information the AI is building on includes: • Information from users personal saved application or set of features. • Information there data is coupled with past satisfaction or dissatisfaction of users for various types of formatting. • Statistics gathered from previous uses, users’ manual changes and successful matches. • Preparing of the individual e-material formatting options. 3.4 Description of the Artificial Intelligence Process for e-Material Formatting Personalization A user can now use Moodle platform or another educational website and search for necessary materials as they always would do but for personalization, they need to choose the required information or set of features and upload it in the system. All this data is saved, and the app can immediately give general and basic personalized recommendations. However, now the AI immediately begins gathering all data, syncing all data, incorporating and building on other data points to improve the formatting matching and deliver more appropriate and more individual e-material formatting options to the user. The real power behind this technology, however, is how the AI interacts with the users’ information and prepare the individual e-material formatting options. The product allows the user to enjoy good quality e-material with increased comfort from screen reading, quickly and with much less human interaction.

4 Results The first result is an application with a fully automated e-material formatting process. The new automated process: • can collect and analyze data very fast. • took a few seconds for the outcome compared to long hours or days for a human to firstly develop a pattern and algorithms and secondly more attempts before finding correct combinations of formatting. • can offer much more personalized e-material formatting combination for each user by taking in consideration more individual variables and specifications. • can learn and improve recommendations much faster than a human. • increase users’ satisfaction of e-material formatting. The second result is developments of full and complete e-material formatting methodology for user-centric and adaptive e-material creation and formatting.

Use of Artificial Intelligence and Machine Learning for Personalization Improvement

131

5 Conclusions Developed methodology and on it based application prototype is suitable for general e-material formatting and primary personalization but they have limitations. Limitations are based on a huge amount of possible variations of individual features and followed by higher possibilities of combinations of e-material combinations to the satisfied user. Machine Learning will help deal with all variables and combinations and make fully automated e-material formatting process after the development of the learning process in the application. For that to be done, the huge amount of data is necessary. It is believed that the second prototype of the application will be able to gain all the necessary data through alpha testing. The AI is using learning based on experience to create future decisions and is making the decision fully automated. AI use in the application improvement is vital for a usercentered approach and personalization as it is in connection with the aim of AI to solve problems. Acknowledgments. The article is written with the financial support of European Regional Development Fund project Nr.1.1.1.5/18/I/018 “P¯etniec¯ıbas, inov¯aciju un starptautisk¯as sadarb¯ıbas zin¯atn¯e veicin¯ašana Liep¯ajas universit¯at¯e”.

References 1. Hargreaves, T., Wilson, C.: Who uses smart home technologies? Representations of users by the smart home industry. In: Proceedings of ECEEE 2013 Summer Study – Rethinking, Renew, Restart, pp. 1769–1780 (2013) 2. Khan, M., Khushdil: Comprehensive study on the basis of eye blink, suggesting length of text line, considering typographical variables the way how to improve reading from computer screen. Adv. Internet Things 3(1), 9–20 (2013) 3. W3C: Web Content Accessibility Guidelines (WCAG) 2.0. In: Caldwell, B., Cooper, M., Reid, L.G., Vanderheiden, G. (eds.) W3C (2008) 4. Nielsen, J.: Designing Web Usability: The Practice of Simplicity. New Riders Publishing, Indianapolis (2000) 5. Ramamurthy, M., Lakshminarayanan, V.: Human vision and perception. In: Handbook of Advanced Lighting Technology. Springer (2015) 6. UNCTAD: Technology and Innovation Report 2018: Harnessing Frontier Technologies for Sustainable Development, United Nations, NY and Geneva (2018) 7. Holden, B.A., Fricke, T.R., Wilson, D.R.: Global prevalence of Myopia and High Myopia and temporal trends from 2000 through 2050, Ophthalmology, Epub, Nr., February 2016 (2016) 8. Low, W., Dirani, M., Gazzard, G., Chan, Y.H., Zhou, H.J., Selvaraj, P., et al.: Family history, near work, outdoor activity, and myopia in Singapore Chinese preschool children. Br. J. Ophthalmol. 94(8), 1012–1016 (2010) 9. Mackare, K., Jansone, A.: Research of guidelines for designing e-study materials. In: Proceedings of ETR17, The 11th International Scientific and Practical Conference Environment. Technology, Resources, Latvia, 15–17 June 2017, vol. 2 (2017) 10. Mackare, K., Jansone, A.: Digital devices use for educational reasons and related vision problems. In: Proceedings of ICLEL18, The 4th International Conference on Lifelong Education and Leadership, Poland, 2–4 July 2018 (2018)

132

K. Mackare et al.

11. Mackare, K., Jansone, A.: Habits of using internet and digital devices in education. In: Proceedings of SIE18, The International Scientific Conference Society. Integration. Education, Latvia, 25–26 May 2018, vol. V (2018) 12. Mackare, K., Zigunovs, M., Jansone, A.: Justification of the need for a custom e-material creation program. In: Proceedings of the conference Society and Culture, Liepaja, Latvia, May 2018 (2018) 13. Mackare, K., Jansone, A.: Recommended formatting parameters for e-study materials. IJLEL 4(1), 8–14 (2018) 14. Mackare, K., Jansone, A.: Personalized learning: Effective e-material formatting for users without disabilities or specific limitations, In: Proceedings of ERD2019, The 10th anniversary International Conference of Education, Research and Development, Burgas, Bulgaria, 23–27 August 2019 (2019) 15. Zigunovs, M., Jansone, A., Mackare, K.: E-learning material adaptive software development, In: Presentation of ICIC18, The 2nd International Conference of Innovation and Creativity, Liepaja, Latvia, 5–7 April 2018 (2018) 16. Mackare, K., Jansone, A., Zigunovs, M.: E-material creating and formatting application. Adv. Intell. Syst. Comput. 876, 135–140 (2018) 17. Mackare, K., Jansone, A.: The concept for e-material creating and formatting application prototype. Period. Eng. Nat. Sci. 7, 197–204 (2019). ISSN 2303-4521 18. Mackare, K., Jansone, A., Konarevs, I.: The prototype version for e-material creating and formatting application. BJMC 7(3), 383–392 (2019) 19. Singh, P.: Understanding AI and ML for Mobile app development, Towards Data Science, December 2018

Probabilistic Inference Using Generators: The Statues Algorithm Pierre Denis(B) Louvain-la-Neuve, Belgium pie.denis[email protected]

Abstract. The Statues algorithm is a new probabilistic inference algorithm that gives exact results in the scope of discrete random variables. This algorithm calculates the marginal probability distributions on graphical models defined as directed acyclic graphs. These models are made up of five primitives that allow expressing, in particular, conditioning, joint probability distributions, Bayesian networks, discrete Markov chains and probabilistic arithmetic. The Statues algorithm relies on an original technique based on the generator construct, a special form of coroutine. This new algorithm aims to promote both efficiency and scope of application. This makes it valuable regarding other probabilistic inference approaches, especially in the field of probabilistic programming. Keywords: Probability · Probabilistic programming · Probabilistic arithmetic · Graphical model · Bayesian network · Algorithm · Generator

1

Introduction

Problems characterized by some uncertainty can be modeled using different approaches, formalisms and primitives. These include, among others, joint probability distributions, Bayesian networks, Markov chains, hidden Markov models, probabilistic arithmetic and probabilistic logic [5,6,19]. In order to perform actual problem resolution, each modeling approach has its own catalogue of algorithms, characterized by different merits and trade-offs. Several algorithms produce exact results but are limited practically by complexity barriers [4] whilst other algorithms can deal with intractable problems by delivering approximate results. In the specific case of Bayesian networks [14,15], exact algorithms include enumeration, belief-propagation, clique-tree propagation, variable elimination and clustering algorithms. On the other hand, approximate algorithms include rejection sampling, Gibbs sampling and Monte-Carlo Markov Chain (MCMC). Beside the Bayesian reasoning domain, probabilistic arithmetic and, more generally, the study of functions applied on random variables (+, −, ×, /, min, max, etc.) is a research field on its own that includes, for example, convolution-based approaches [1,10,23,24] and discrete envelope determination [2,3]. P. Denis—Independent scholar. c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 133–154, 2020. https://doi.org/10.1007/978-3-030-52246-9_10

134

P. Denis

These well-established algorithms, in their original formulation, are specialized for one single modeling approach. In particular, algorithms for probabilistic arithmetic do not handle Bayesian reasoning or even simple conditioning. On the other hand, above-cited inference algorithms for BN do not handle arithmetic (e.g. the sum of two random variables of the BN, whether latent or observed). Also, many BN algorithms handle only observations expressible as conjunctions of equalities; without extensions, these algorithms cannot treat the conditioning in its generality, that is considering any boolean function of the BN variables as a possible assertion. In short, early probabilistic models and associated algorithms have been constrained by some compartmentalization. These limitations tend now to disappear with the advent of probabilistic programming (PP) and richer probabilistic models [11,12,16] that mix several approaches together. Following this trend, the present paper introduces a unifying modeling framework in the scope of discrete random variables. Then, it presents an inference algorithm for such framework, namely the Statues algorithm. This algorithm is in essence a (distant) variant of the enumeration algorithm that provides significant improvements in terms of scope and efficiency. The enabler of this algorithm is the generator construct, a special form of coroutine (as defined by Knuth in [13]). This construct, which is available in several modern programming languages, is of great interest for combinatorial generation [20]. It seems however to be overlooked in computer science literature: coroutines/generators are not widely used in published algorithms, for which the subroutine construct is prevalent. At the time of writing, to the best of author’s knowledge, no probabilistic inference algorithm using generators has been published yet.1 The paper is organized as follows. Section 2 introduces the aforementioned modeling framework. Section 3 details the Statues algorithm, with pseudocode and some example of execution. Section 4 discusses the salient points of the algorithm as well as possible extensions. Section 5 presents existing implementations, with some examples of usage. Section 6 gives the conclusions.

2

p-Expressions: A Unifying Probabilistic Framework

The Statues algorithm uses a modeling framework that scopes discrete random variables having a finite domain. It aims, in particular, to unify Bayesian reasoning with probabilistic arithmetic in such scope. There is no restriction on the domains of the variables, provided that they are discrete and finite: these can be numbers, matrices, symbols, booleans, tuples, functions, etc. An order relationship on these domains is not required. In the following, the objects characterized above shall be referred as “random variables”, or simply “RV”. In the present framework, the probabilistic model for any RV is defined either by a probability 1

The continuation construct, originated from functional languages, is another way to achieve coroutines. It is worth pointing out that “continuation passing style” (CPS) is used in marginalization algorithms of WebPPL [12], a modern probabilistic programming language based on JavaScript.

The Statues Algorithm

135

mass function or by a precise dependency to other RV’s.2 In order to express such model in a form usable by an algorithm, a set of primitives are introduced hereafter under the name of “p-expressions” or “pex”, for short (“pexes”, in the plural). Five types of p-expression are defined, namely, the elementary pex, the tuple pex, the functional pex, the conditional pex and the table pex. As it will be shown, these primitives allow building up graphical models [6] for, among others, joint probability distributions, Bayesian networks, discrete Markov chains, conditioning and probabilistic arithmetic. In the following, the typographical convention uses uppercase (e.g. X) for random variables and bold lowercase (e.g. x) for p-expressions. 2.1

Elementary Pex

An elementary pex models an RV X characterized by a given probability mass function x. Elementary pexes are the most basic type of pex. They require a probability mass function (pmf) specifying a prior probability for each possible value of their domains. This function shall obey Kolmogorov axioms: in particular, the individual probabilities shall be non-negative and the sum of all probabilities over the domain shall be 1. As a simple example, an RV C giving the result obtained by flipping a fair coin can be specified by the elementary pex c defined as: © ¶ (1) c := (tail, 12 ), (head, 12 ) using the pmf notation borrowed from Williamson [23]. Continuous random variables are excluded from the definition of elementary pex but their probability density functions can be approximated to a pmf through discretization—several methods exist for this purpose, with known shortcomings [1,2]. The Poisson and hypergeometric distributions are excluded also since, although discrete, these are not finite; such distributions could however be approximated, e.g. by considering only the finite set of values having a probability above a given threshold and normalizing the probabilities to have a total of 1. Elementary pexes model RV’s that are pairwise independent. It is important to avoid confusion between pex and pmf concepts: each pex represents one defined event or outcome, even if several pexes may have identical pmf. For instance, n throws of the same die shall be represented by n pexes that are defined using the same pmf; the same applies also if n similar dice are thrown together. As a special case, an elementary pex may have a domain of one unique element; this element is then certain and has a probability of 1. For instance, 2

The term “random variable” is stricto senso specific to real number domains. This limitation is deliberately set aside here for the sake of generality. Consistently, the wording “probability mass distribution” could be replaced by “categorical distribution”. Also, the mathematical formalism of probability spaces (Ω, F , P) is avoided here, even if the present framework could be expressed using this formalism.

136

P. Denis

¶ © the π number can be represented as the elementary pex (π, 1) and the empty ¶ © tuple [ ] as ([ ], 1) . For easiness, the notation π, [ ], etc. shall be used for such special pexes. Even if there is no randomness in such contrived construct, this assimilation is meant to simplify the inference algorithm when constants and random variables are mixed in the same probabilistic model. Since any finite domain is accepted for elementary pex, two other special cases are worth mentioning because ubiquitous. – A Boolean elementary pex is used to model a proposition that is uncertain, having a given probability p to be true. It is modeled as an elementary pex having Booleans as domain: ¶ © b := (true, p), (false, 1 − p) (2) For convenience, the notation b := t(p) shall be used to represent such elementary Boolean pex. Remark that, according to the notation given above, t(1) = true and t(0) = false. – A joint elementary pex is used to define a given joint probability distribution. Its domain is a set of tuples of same length; each tuple represents a possible outcome and each element of the tuple represents a given attribute or measure of this outcome. For example, the following joint elementary pex links the weather (W ) and someone’s mood (M ) by means of tuples [ W, M ]: ¶ d := ( [ rainy, sad ], 0.20), ( [ rainy, happy ( [ sunny, sad ( [ sunny, happy

], 0.10), ], 0.05), © ], 0.65)

The joint elementary pex is one of the ways to model interdependence between random phenomena. Such interdependence shows up in the present example since, in particular, the probability Pr([ W, M ] = [ sunny, happy ]) = 0.65 is not equal to the product of marginal probabilities Pr(W = sunny) × Pr(M = happy) = (0.05 + 0.65) × (0.10 + 0.65) = 0.525. It is well known that joint probability distributions are not much suited if the number of attributes (or their domain’s size) is large—see for example [15] and [19]. The following sections introduces derived pex types, which allow modeling interdependence in more expressive and more compact ways. 2.2

Tuple Pex

A tuple pex models a given tuple of n RV [ X1 , . . . , Xn ], with n ≥ 1. It is noted x1 ⊗ ... ⊗ xn , where xi are the pexes modeling Xi .

The Statues Algorithm

137

For instance, a tuple pex y can be defined to model a tuple of two RV [ X1 , X2 ] characterized by two elementary pexes x1 , x2 having Bernoulli distributions: ¶ © x1 := (0, 12 ), (1, 12 ) ¶ © x2 := (0, 34 ), (1, 14 ) y := x1 ⊗ x2 Without more condition involving x1 or x2 , the pmf related to y can © be ¶ calculated by enumeration as ([ 0, 0 ], 38 ), ([ 0, 1 ], 18 ), ([ 1, 0 ], 38 ), ([ 1, 1 ], 18 ) . The order of elements in a tuple is significant; so, the tuple pex x1 ⊗ x2 is different from x2 ⊗ x1 (incidentally, the pmf calculated on this latter tuple pex differs from the former one). Note that, by definition, the empty tuple [ ] is not a tuple pex: as seen in Sect. 2.1, it can be assimilated to the elementary pex [ ]. It is important to understand that a tuple pex containing elementary pexes (as y) is not equivalent to a joint elementary pex (as seen in Sect. 2.1), despite the fact that the domains involved are sets of tuples in both cases. A tuple pex is a derived pex and, as such, it cannot be used to specify a joint probability distribution. The true interest of a tuple pex is to express dependencies in other derived pexes, as it will be shown soon. The tuple pex is the first type of derived pex. Generally speaking, a derived pex may refer to several pexes, which may themselves be derived. Before going any further, it is required to introduce two important constraints that shall apply on any type of pex. No Cycle – Cycles are forbidden in any pex. The interdependence between RV’s shall be expressible in a directed acyclic graph (DAG)—see Sect. 2.4. Referential Consistency – As it will be shown soon, a derived pex may contain multiple references of the same pex x at different places. Since any given pex is meant to represent one given random variable, it is required that the values for any outcome are consistent between all occurrences of x. This constraint is referred throughout the present paper as referential consistency. A rather contrived example is given when a given pex appears twice in the same tuple, e.g. v := x2 ⊗x2 . The referential consistency forces the two occurrences to refer to the very same ¶pex; using the definition © of x2 given above, the pmf calculated from 3 1 v is then ([ 0, 0 ], 4 ), ([ 1, 1 ], 4 ) . More meaningful examples will be given in the next subsections. The referential consistency is closely linked to the concept of stochastic memoization found at least in Church [11] and WebPPL [12]. The two concepts actually enforce the same consistency constraint. The difference lies essentially in the fact that referential consistency applies on exact probabilistic inference, whereas stochastic memoization applies on approximate probabilistic inference (e.g. MCMC). 2.3

Functional Pex

A functional pex models the RV f (X) obtained by applying a given unary function f on a given RV X. It is notated fÛ(x), where x is the pex modeling X.

138

P. Denis

f shall be a pure function, that is deterministic and without side-effect. It can use any algorithm, provided that it can evaluate the result in a finite time and in a deterministic way, whatever the argument given. Any given n-ary function with n > 1 can easily be converted into a unary functions by packing the arguments into a n-tuple3 . Then, a functional pex can be defined by grouping arguments in a n-tuple pex. It is important to understand the distinction between notations f (x) and fÛ(x): the former denotes a value conforming to usual mathematical meaning while the latter denotes an assembly made up of the function f and the pex x, as objects on their own. Actually, the latter is close to the concept of lazy evaluation found in programming language theory. Functional pexes allow representing, first and foremost, basic mathematical operations on RV’s (in the following, N , X, Y , Z have numerical domains and A, B have Boolean domains): √ – arithmetic: X + Y , X − Y , X.Y , −X, X, X Y , etc. – comparison: X = Y , X = Y , X < Y , X ≤ Y , etc. – logical: A, A ∧ B, A ∨ B, A ⇒ B, etc. and any combinations of these operations using usual function composition like E := (X + Y ≥ 6) ∧ (Y ≤ 4). Using notations introduced above and replacing infix subexpressions by Änamed unary functions, E could be ämodeled by the ˜ gÙe((add(x ˜ ⊗ y) ⊗ 6) ⊗ le(y Û ⊗ 4) . functional pex e := and In this last example, the referential consistency (Sect. 2.2) shall assure that all occurrences of y refers to the same RV Y . More generally, the referential consistency enforces the rules of algebra and logic in the context of probabilistic models. It assures for example that a pex representing X + X shall result in the same probability distribution as the pex representing 2X (or Y − X − Y + 3X or (X + 1)2 − X 2 − 1 or ...). The lack of referential consistency is referred with the terms “dependency error” in [23] and [24]. In contrast to these approaches, which investigate how these errors can be bounded, any such dependency error is outlawed here. As it will be shown in the following, the referential consistency is essential to enable conditioning and Bayesian reasoning. Beside the afore-mentioned usual mathematical functions, many other useful functions could be added: checking the membership of an element in a set, getting the minimum/maximum element of a set, summing/averaging the numbers of a dataset, getting the attribute of an object, etc. Among these functions, the indexing of a given tuple is worth to mention. Reconsidering the joint probability distribution d seen in Sect. 2.1 and defining the defining extract([ t, i ]) as the func˙ ⊗ 1) tion giving the ith element of tuple t, the functional pexes w := extract(d ˙ and m := extract(d ⊗ 2) model the weather and mood RV’s (resp. W , M ). The referential consistency on d enforces the interdependence between the two pexes and guarantees the same marginal probability results as those exemplified in Sect. 2.1. 3

For instance, a 2-ary function g shall be converted into a unary function g  , such that g  ([ x, y ])  g(x, y).

The Statues Algorithm

2.4

139

Conditional Pex

A conditional pex models a given RV X under the condition that a given boolean RV E is true. It is noted x  e, where x and e are the pexes modeling resp. X and E. The RV E could represent an evidence, an assumption or a constraint. E has its own prior probability to be true but, in the conditional pex context, it is assumed that it is certainly true. A valid conditional pex x  e requires that x can produce at least one value verifying the condition expressed in e. If this is not the case, then no pmf can be calculated and the inference algorithm shall report an error. Although not required, the interesting cases happen of course when X and E are interdependent; this entails the sharing of one or several pexes, e.g. fÛ(... ⊗ y ⊗ ...)  gÛ(... ⊗ y ⊗ ...). For example, consider the following pexes that represent the throwing of two fair dice and the total of their values: ¶ © d1 := (1, 16 ), (2, 16 ), (3, 16 ), (4, 16 ), (5, 16 ), (6, 16 ) ¶ © d2 := (1, 16 ), (2, 16 ), (3, 16 ), (4, 16 ), (5, 16 ), (6, 16 ) ä Ä ˜ d1 ⊗ d2 s := add Suppose now that some evidences ensure that the dice total is greater or equal to 6 while the second die’s value is less or equal to 4. The conditional pex expressing the dice total given these evidences is Ä Ä ää ä Ä ˜ gÙe s ⊗ 6 ⊗ le Û d2 ⊗ 4 x := s  and (3) The¶pmf related to x can be calculated using conditional probability defi© 4 4 3 2 1 ), (7, 14 ), (8, 14 ), (9, 14 ), (10, 14 ) —actually, the Statues algorithm nition: (6, 14 presented in Sect. 3 aims to calculate such result. At this stage, it may be worthwhile to represent derived pexes as graphs, in accordance with the concept of graphical model [6]. One may remark that the tree structure is usually inadequate since the same pex may be referred multiple times (here, s and d2 ), as dictated by the referential consistency. In full generality, any pex can be represented as a directed acyclic graph (DAG). Figure 1 shows the DAG corresponding to x. The arrow direction, from parent node p to child node c, is meant to represent that p is dependent of c.4

4

One may deplore that this is the exact opposite of the convention used in graphical models. Actually, the point of view adopted here is more suited for an algorithm: arrows represent references, as these are drawn for example in the abstract syntax trees representing arithmetic expressions.

140

P. Denis

Fig. 1. x as a DAG

Many approaches, including those handling Bayesian networks, constrain conditions to be observations [15,19], which are equalities of the form X = x or a conjunction of such equalities X1 = x1 ∧ . . . ∧ Xn = xn . Compared to these approaches, the combination of conditional and functional pexes provide a significant generalization. The sole constraint is to be able to express the evidences as a Boolean function applying on some RVs. Beside equalities and conjunctions, this includes inequalities, negations, disjunctions, membership, etc. 2.5

Table Pex

A table pex models an RV obtained by associating a given RV Xi for each value ci of a given RV C, with dom(C) = {c1 , ... , cn }. It is noted c  g, where pex c models C and where g := {c1 → x1 , .... , cn → xn } is an associative array with pexes xi modeling the Xi . The table pexes allow defining conditional probability tables (CPT), which are used in particular for defining Bayesian networks. To exemplify the idea, table pexes combined with tuple pexes can be used to model the classical “RainSprinkler-Grass” BN. Three Boolean RVs are defined: R represents whether it is raining, S represents whether the sprinkler is on and G represents whether the grass is wet. R has a prior probability 0.20; the other probabilities and dependencies are quantified using CPT’s: S’s probability depends of the weather: if it is raining, then the probability of S is 0.01, otherwise it is 0.40. G depends of both the weather and the grass state; the probabilities for G depending of the values of the tuple [ R, S ] are: [ false, false ] : 0.00, [ true, false ] : 0.80, [ false, true ] : 0.90, [ true, true ] : 0.99. This BN can be modeled as follows, using elementary pex r for R and table pexes s and g for S and G:

The Statues Algorithm

141

r := t(0.20) ¶ s := r  true → t(0.01), © false → t(0.40) ¶ g := (r ⊗ s)  [ false, false ] → t(0.00), [ true, false ] →

t(0.80), [ false, true ] →

t(0.90), © [ true, true ] → t(0.99) In order to make queries on such BN considering information obtained, new pexes mixing conditional, functional and table pexes can be built up. This allows for forward chaining (e.g. g  nˆ ot(s), for calculating Pr(G | S) = 0.2336), as well as Bayesian inference (e.g. nˆ ot(s)  g, for calculating Pr(S | G) = 0.3533). The number of entries in a table pex shall be the cardinal of the domain of conditioning RV C. This can be cumbersome if this domain is large, e.g. if the condition is a tuple having many inner RVs (the domain of C is usually the cartesian product of these RV’s domains). However, in several CPT, such as those having the property of contextual independence [14,17], redundancies can be avoided by embedding table pexes one in another. Other applications of the table pex include mixture models and discrete-time Markov chains (DTMC). For the latter, the initial state can be modeled by an elementary pex (pmf with probabilities to be in each state), while the transition matrix can be expressed in a table pex giving pmf of next state for each current state.

3

The Statues Algorithm

Assuming that some RV D has been modeled in a pex d, the marginalization inference consists in calculating, from d only, the pmf associated to D. Following the classification given in [5], this inference function shall be named marg. The aim of the Statues is to calculate marg(d) as the exact pmf ä algorithm © ¶ Ä ... , vi , Pr(D = vi ) , ... . The name “Statues” is borrowed from the popular children’s game of the same name.5 The analogy with the algorithm should hopefully be clarified after the explanations given below. The examples seen so far show that, in full generality, marg(d) can not proceed by simple recursive evaluation, as done for example in usual arithmetic; this is valid actually only if underlying RVs are independent, that is if each inner pex occurs only once in the pex under evaluation. Generally speaking, a given

5

Other names include “Red Light, Green Light” (US), “Grandmother’s Footsteps” (UK), “1–2–3, Soleil ! ” (France), “1–2–3, Piano ! ” (Belgium) and “Annemaria Koekkoek ! ” (Netherlands).

142

P. Denis

subsidiary pex x cannot be replaced by marg(x): this would transform it into an elementary pex, removing any dependency of x with other pexes. To obtain correct results in any case, the referential consistency (see Sect. 2.2) shall be enforced wherever in the model. The divide-and-conquer paradigm needs here to be revisited. The Statues algorithm uses a construction called generator, a special case of coroutine, as presented in [13] and [20]. Generators are available in several modern programming languages (e.g. C#, Python, Ruby, Lua, Go, Scheme), whether natively or as libraries. To state it in simple words, a generator is a special form of coroutine, which can suspend its execution to yield some object towards the caller and which can be resumed as soon as the caller has treated the yielded object. In the following algorithms, generators are identified as such by the presence of yield x statement(s) and, incidentally, by the absence of any return y statement. At the time yield x is executed, the generator yields the value x to its caller. The execution control is then passed to the caller: it treats x, then waits for the next value. The execution resumes to the generator until the next yield x statement, and so on until the generator terminates. Generators can be recursive, which make them particularly well suited for combinatorial generation (see [20]). This ability is extensively used in the algorithm presented below. For detailing the algorithm, the term atom shall be used to designate a couple (v, p) made up of a value v and an associated probability p. An atom relates to a particular event that excludes events related to other atoms. Such condition makes it possible to add, without error, the probabilities of atoms in a condensation treatment: more precisely, if n atoms (v, p1 ), (v, p2 ), ... , (v, pn ) n pi is the unnormalized probability of are collected for some value v, then Σi=1 v. For instance, when throwing two fair dice, the probability to get the total 3 1 1 ) and ([ 2, 1 ], 36 ) for the can be obtained by collecting the two atoms ([ 1, 2 ], 36 1 1 2-tuple pex, then converting them to atoms (3, 36 ) and (3, 36 ) by the addition functional pex; these two probabilities can then be safely added, giving the 1 (here, already normalized). expected probability 18 Another important concept used in the algorithm is that of binding. At any stage of the execution, a given pex is either bound or unbound. At start-up, all pexes are unbound, which means that they have not yet been assigned a value. When an unbound pex is required to browse the values of its domain, each yielded value is bound to this pex until the next value is yielded; when there are no more values, the pex is unbound. Consistently with referential consistency, once a pex is bound, it yields the bound value for any subsequent occurrence of this pex. The fact that a bound value is immobile for a while during execution explains that it can be likened to a statue in the aforementioned game. In the algorithm below, the pex binding is materialized in the binding store β, which is an associative array {pex → value}, initially empty. The Statues algorithm is made up of three parts. The entry-point is the subroutine marg. This subroutine calls genAtoms generator, which itself may call the genAtomsByType generator. These two generators are mutually recursive, as shown in the calling graph given in Fig. 2.

The Statues Algorithm

143

Fig. 2. Call graph of Statues algorithm

The entry-point marg subroutine is given, as pseudocode, in Algorithm 1. marg takes the given pex d to be evaluated as argument. It invokes the genAtoms generator and collects the atoms yielded one by one. Using the associative array a {value → probability}, the condensation treats atoms containing the same value so that they are merged together, by summing their probabilities. Once the genAtoms generator is exhausted, a check verifies that at least one atom has been received, otherwise an error is reported and the subroutine halts (as seen in Sect. 2.4, this may occur if the evaluated pex is conditional while the given condition is impossible). The final step normalizes the distribution a to ensure that the sum of probabilities equals 1—a post-processing common to many marginalization algorithms.6 The pmf is then returned. The genAtoms generator (Algorithm 2) uses the binding store β to check whether, in the current stage of the algorithm, the given pex is bound or not. If the given pex is unbound, which is the case at least for the very first call on this pex, then genAtomsByType is called and each atom yielded is bound to the pex before yielded in turn to the genAtoms’s caller. Otherwise, if the pex is currently bound, then the atom yielded is the bound value with probability 1. This behavior is actually the crux of the algorithm for enforcing referential consistency. The genAtomsByType generator (Algorithm 3) is the last part of the Statues algorithm. It yields the atoms according to the semantic of each type of pex. The dispatching is presented here as a pattern matching switch construct although other constructs are feasible, e.g. using object-orientation (inheritance/polymorphism). In the case of an elementary pex, the treatment is simple and non-recursive. In the case of a derived pex, the dependent pexes are accessed by calling genAtoms on them; this shall cause recursive calls, yielding atoms and updating the current bindings.

6

It can be shown that the probability sum may differ from 1 only if the evaluated pex contains some conditional pex. Actually, the division performed is closely related to Pr(A ∧ C) . the formula of conditional probability: Pr(A | C)  Pr(C)

144

P. Denis

Algorithm 1 Statues algorithm – marg subroutine (entry-point) 1: function marg(d) 2: β ← {} 3: a ← {} 4: for (v, p) ← genAtoms(d) do 5: if  a[v] then 6: a[v] ← 0 7: end if 8: a[v] ← a[v] + p 9: end for 10: if a = {} then 11: halt with error 12: end if p 13: s←

 init global binding store  init unnormalized pmf  collect atoms

 condense pmf  pmf is empty: error  normalize pmf

(v,p)∈a

  p 14: return (v, ) (v, p) ∈ a s 15: end function

Algorithm 2 Statues algorithm – genAtoms generator 1: generator genAtoms(d) 2: if ∃ β[d] then 3: yield (β[d], 1) 4: else 5: for (v, p) ← genAtomsByType(d) do 6: β[d] ← v 7: yield (v, p) 8: end for 9: delete β[d] 10: end if 11: end generator

 d is bound  yield unique atom to caller  d is unbound  (re)bind d to value v  yield atom to caller  unbind d

To ease the writing of the algorithm in a recursive way, the LISP-like notation [ h  t ] is used to represent a tuple with h as first element (the “head”) and t as a tuple with remaining elements (the “tail”). For instance, the 2-tuple [ x, y ] could be written as [ x  [ y  [ ] ] ]. The tuple pex shall follow a similar recursive structure: for any pexes x andÄy, the notation x ⊗ y introduced in Sect. 2.2 shall ä actually be interpreted as x⊗ y ⊗[ ] . This comment is required to accurately trace the treatment of tuple pexes in genAtomsByType. To get an in-depth understanding of the algorithm, one has to remember that genAtoms and genAtomsByType are not subroutines returning a list of atoms; these are generators working cooperatively and yielding atoms one by one. During algorithm execution, two generators (genAtoms and genAtomsByType) are created for each pex reachable from the root pex under evaluation. All these generators live together, the flow of control switching between the generators at each yield statement. Furthermore, at each yield

The Statues Algorithm

145

Algorithm 3 Statues algorithm – genAtomsByType generator 1: generator genAtomsByType(d) 2: switch d do 3: case {...} 4: for (v, p) ∈ {...} do 5: yield (v, p) 6: end for 7: case fÛ(x) 8: for (v, p) ← genAtoms(x) do 9: yield (f (v), p) 10: end for 11: case h ⊗ t 12: for (v, p) ← genAtoms(h) do 13: for (s, q) ← genAtoms(t) do 14: yield ([ v  s ], pq) 15: end for 16: end for 17: case x  e 18: for (v, p) ← genAtoms(e) do 19: if v then 20: for (s, q) ← genAtoms(x) do 21: yield (s, pq) 22: end for 23: end if 24: end for 25: case c  g 26: for (v, p) ← genAtoms(c) do 27: for (s, q) ← genAtoms(g[v]) do 28: yield (s, pq) 29: end for 30: end for 31: end generator

 d is an elementary pex

 d is a functional pex

 d is a tuple pex

 d is a conditional pex

 d is a table pex

statement, new bindings are created or removed. For instance, in the treatment of the conditional pex in genAtomsByType, the outer for loop in line 18 makes some bindings that act on inner statements: then, only atoms (s, q) compatible with these bindings are yielded in the inner for loop (line 20). A proof of correctness of the Statues algorithm is provided in [9]. In essence, this proof shows that the atoms collected by marg(d) form a partition of the sample space (or a subset of it) and that they conform to the semantic of the p -expression d. Beside this proof, and well before it, good confidence on the correctness of the algorithm has been gained through informal reasoning and, above all, by verification with results given in literature’s examples ([5,16] and [19], in particular).

146

3.1

P. Denis

Example of Execution

As support for understanding how the Statues algorithm works practically, the present section shows the key steps of the execution for a toy problem involving an addition and a condition. This problem involves two Bernoulli RV B1 and B2 , with respective probabilities 23 and 14 . Supposing that some evidence ensures that the sum S := B1 + B2 does not exceed 1, the goal is to calculate the odds of B1 given this evidence. This problem can be modeled by the conditional pex d (and its subsidiary pexes) defined as follows: ¶ © b1 := (0, 13 ), (1, 23 ) ¶ © b2 := (0, 34 ), (1, 14 ) ˜ 1 ⊗ b2 ) s := add(b Û ⊗ 1) d := b1  le(s This model can be represented by the DAG shown in Fig. 3. Note that, as stated earlier, the tuple pexes shown above (⊗ nodes) are slightly simplified for the sake of conciseness: in the present model, b2 is meant for b2 ⊗ [ ] meanwhile 1 is meant for 1 ⊗ [ ]. The Table 1 shows the sequence of steps executed for calculating marg(d). A step is defined by all the actions made by the main generator genAtoms to yield a new atom (line 4 of marg). Each row shows some key data present or exchanged at a given step. The first two columns show the value bound on b1 and b2 during the given step. The remaining columns, labeled c→p show atoms yielded by pex c to its parent pex p during the given step; this atom is the one yielded at line 3 or line 7 of genAtoms(c). The rightmost column → shows the atom yielded by the main generator genAtoms: it is collected in marg, which is the final action of the step. When starting marg(d), generators genAtoms / genAtomsByType are created in cascade for each pex, in a top-down order, i.e. from d until elementary pexes b1 , b2 , 1 and [ ]. Since the root node is a conditional pex, the first Ù pex (line 18 of processing is the evaluation of the condition defined on the le genAtomByType). During execution of each step, the atoms are yielded one by one until they reach the root conditional pex—graphically, they climb the DAG, from bottom to top. The first three steps yields three atoms to marg, viz. (0, 14 ), 1 ) and (1, 12 ). In this processing, the atoms yielded on b1→ have probability (0, 12 1 because b1 is already bound at this stage (see line 3 of genAtoms). The step #4 blocks the last atom yielded since it does not verify the given condition (see line 19 of genAtomsByType). During this process, marg has made on the fly of the three received atoms into the associative array © ¶ the condensation a = (0, 13 ), (1, 12 ) . The final treatment of marg(d) consists in normalizing a © ¶ to get the final pmf (0, 25 ), (1, 35 ) . This result is correct, in regard to definition of conditional probability. Incidentally, it is different from the pmf of b1 ; this shows that the given evidence, here, brings information on top of prior beliefs.

The Statues Algorithm

147

Fig. 3. d as a DAG Table 1. Execution trace of marg(d) b1 #1

0

b2 0

b 1→⊗ (0,

1) 3 1) 3

b 2→⊗ ([0],

3) 4

([1],

1) 4



ı

→add

([0, 0],

1) 4

([0, 1],

1 ) 12

ı

add→⊗ (0,

1) 4

(1,

1 ) 12



Û

Û

b 1→

→

(true,

1) 4

(0, 1)

(0, 1 ) 4

(true,

1 ) 12

(0, 1)

1 ) (0, 12

(1, 1)

(1, 1 ) 2

le→

→le

([0, 1],

1) 4

([1, 1],

1 ) 12

#2

0

1

(0,

#3

1

0

(1, 2 ) 3

([0], 3 ) 4

([1, 0], 1 ) 2

(1, 1 ) 2

([1, 1], 1 ) 2

(true, 1 ) 2

1

2) 3

1) 4

1) 6

1) 6

1) 6

1) 6

#4

1

(1,

([1],

([1, 1],

(2,

([2, 1],

(false,





The case above is decidedly basic. The Statues algorithm is nonetheless able to treat correctly all examples given in the present paper, as well as far more involved problems (see, in particular, the libraries referred in Sect. 5).

4

Discussion

Due to the usage of generators, the execution model of the Statues algorithm is quite singular considering the large majority of algorithms based on subroutines. During algorithm execution, each pex involved in the evaluated query gives rise to two generators, namely, genAtoms and genAtomsByType. These generators live together and their call graph mimics the DAG on which they operate; the yielded atoms are passed through the arcs, from child node to parent node. Each genAtomsByType node performs a very simple treatment where probabilities are multiplied together. The collecting of atoms and their condensation are done at the root of the DAG by the marg subroutine, hence the only place where probabilities are added together. As already stated, the Statues algorithm belongs to the category of exact probabilistic algorithms. At its very heart, it explores all possible paths or “possible worlds” [5] compatible with the given query. Without much surprise, it is limited by the NP-hard nature of inference on unconstrained BN [4]. However, it performs far more efficiently than a naive inference by enumeration. There are three reasons to support this assertion. Firstly, due to the way the algorithm travels through the DAG model starting from the given node d, the variables

148

P. Denis

that do not impact the query at hand (i.e. those that are unreachable by any path from d) are not considered in the calculation. There is then a de facto elimination of unused variables. Secondly, when looking for possible bindings, the treatment of the conditional pex performs a pruning of the solution branches not complying with the given evidence (lines 19–23 of genAtomsByType). In many cases, this prevents wasteful calculations—see below for an extension that may improve such pruning even further. Finally, since the binding done by genAtoms occurs for every pex (whether elementary or derived), it has the virtue of memoizing on the fly the results of functional pex, avoiding redoing the same calculation over and over. For instance suppose that, among a large set of RVs, a variable √ D is defined as D := X 2 + Y 2 . Even if D is used at multiple places of the query, like in the conditional expression D2 − U × V | (A ≤ D) ∧ (D ≤ B), the values of D will be calculated only once for each pair of values [X, Y ], hence not for each combination of [X, Y, A, B, U, V ]. This memoization is allowed without restriction since the functional pexes use pure functions, by definition. Although the five pex types introduced in Sect. 2 have a large scope in probabilistic modeling, it is possible to add new pex types in order to improve modeling expressiveness or execution performance. Their handling in the algorithm just requires the addition of case clauses in the genAtomsByType generator, the rest of the algorithm remaining unchanged. A first example of extension consists in generalizing the conditional pex for handling chained conjunctions of conditions C1 ∧ ... ∧ Cn . This would enable short-circuit evaluation when a (false, p) atom is encountered for any Ci . This extension may then perform early pruning, which can dramatically speed up the calculations on several cases. Such kind of optimization can be extended to any operation having an absorbing element (true for disjunction, 0 for multiplication, etc.). A second example of extension is a variant of the table pex: instead of providing an explicit CPT as a lookup table, the modeler may provide a function that returns a specific pex depending on the value of the decision RV. Such construct allows defining a CPT in an algorithmic way, which may be far more compact than an explicit table. This may be helpful in particular for defining noisy-OR and noisy-MAX models (see [15,19]). The construction of suitable probabilistic models, using pex or other formal frameworks, is usually a difficult task for a human being. If enough observation data are available, then the Statues algorithm could be coupled to machine learning algorithms, e.g. maximum–likelihood or expectation–maximization [19], for helping to determine the best model fitting these data. Exact algorithms, as the Statues algorithm, have general merits and liabilities relatively to approximate algorithms. As stated before, any exact probabilistic inference algorithm is limited in practice by the intractability of many problems, including large or densely connected BN. For such intractable problems at least, approximate algorithms like MCMC provide a fallback. Despite this constraint, the exact algorithms remain very useful for a number of reasons—beside their exactness! Firstly, several problems can be solved exactly in an acceptable time; these cover, at the very least, many sparsely connected BNs and the exam-

The Statues Algorithm

149

ple cases used for education. Secondly, exact algorithms offer the opportunity to represent probabilities in different manners, beyond the prevailing floatingpoint numbers. Probability representation as fractions enables perfect accuracy of results, tackling usual—and annoying—rounding errors. Furthermore, symbolic computation is made possible by defining probabilities with variable symbols instead of numbers, e.g. p, q, ..., and by coupling the algorithm with a symbolic computation system (such as SymPy [22]); the output of marg is then a parametric pmf made up of probability expressions instead of numbers, e.g. p2 (1 − q). Such approach using probability symbols instead of numbers is useful when the same query is made over and over on the same complex model, with only varying prior probability values: the query result may be compiled offline once for all into a raw arithmetic expression—maybe taking a long processing time—then, the resulting expression can be evaluated many times using fast arithmetic computations, with varying parameters. As an even bolder objective, one may envision the usage of probability amplitudes, i.e. complex numbers, in the context of quantum computing. This would allow defining pseudo-probability distributions for simulating qubits, quantum registers and quantum circuits. This extension would require a careful analysis of what can be kept/changed in the algorithm; at the very least, a “measure” post-processing should be set up for squaring the module of probability amplitudes, in order to obtain true probabilities (the so-called “Born rule”). Beside the proposed extensions, further research is definitely needed to factually assess the assets and liabilities of Statues algorithm among existing probabilistic inference algorithms. This includes at least the following research tracks: – to make an objective comparison of the expressiveness of the p-expressions framework with those used in other approaches, – to study the performance of the algorithm, both for space and time aspects, and to put these results in perspective with other comparable algorithms.

5

Implementations: The Lea Libraries

The Statues algorithm has been successfully implemented in the Python programming language [18], following the concept of “probabilistic programming language” (PPL) [11,12,16]. Python is well suited for the task because it natively supports generators [20,21]. The primary implementation in Python is an open-source library named Lea [7]. Lea is fully usable, comprehensive and well documented. Also, it encompasses several useful extensions, as those presented in Sect. 4; this includes the support of fractions, floating-point numbers and symbols for representing probabilities (for symbolic computation, Lea uses the package SymPy [22]). However, the understanding of the core Statues algorithm in Lea is quite difficult because the implementation contains many optimizations and extraneous functions, as standard indicators, information theory, random sampling, machine learning, etc. Also, beside the Statues algorithm, Lea implements an approximation algorithm

150

P. Denis

based on Monte-Carlo rejection sampling—providing a fallback alternative for intractable problems. To help the understanding of the Statues algorithm, a lighter open-source Python library has been developed: MicroLea, abbreviated as μLea [8]. μLea is much smaller and much simpler than Lea: it focuses on the Statues algorithm and nothing more. μLea has a limited functionality and usability compared to Lea’s but it is well suited to study how the Statues algorithm can be practically implemented. Incidentally, the names of classes and methods match exactly the terminology used in the present paper. As a short introduction to μLea, the Rain-Sprinkler-Grass Bayesian network seen in Sect. 2.5 is demonstrated below. Here are the statements for instantiating this BN in μLea: from microlea import * rain = ElemPex.bool(0.20) sprinkler = TablePex( rain, { True : ElemPex.bool(0.01), False: ElemPex.bool(0.40)} ) grass_wet = TablePex( TuplePex(sprinkler, rain ), { (False , False): False, (False , True ): ElemPex.bool(0.80), (True , False): ElemPex.bool(0.90), (True , True ): ElemPex.bool(0.99)} )

Note that μLea makes automatic conversion of fixed values into elementary pexes when needed; this is why False is allowed in place of ElemPex.bool(0.0) in the first entry of grass wet. From these definitions, μLea allows making several queries for which the marg subroutine is called implicitly. If the resulting pmf r is boolean, the convenience function P(r) is useful to extract the probability of true. The method given builds a conditional pex from the boolean pex passed in argument; the operator-overloading is used to build functional pexes for arithmetic and logical operators NOT (~), AND (&) and OR (|). Lines beginning with # -> display returned objects. sprinkler # -> {False: 0.6780, True: 0.3220} P(sprinkler) # -> 0.3220 P(rain & sprinkler & grass_wet) # -> 0.00198 P(grass_wet.given(rain)) # -> 0.8019 P(rain.given(grass_wet)) # -> 0.35768767563227616

For checking the consistency of these results, it is possible to retrieve the very last calculated probability thanks to the following expressions, which check respectively the definition of conditional probability and the Bayes’ theorem: P(rain & grass_wet) / P(grass_wet) # -> 0.35768767563227616

The Statues Algorithm

151

P(grass_wet.given(rain)) * P(rain) / P(grass_wet) # -> 0.35768767563227616

Other relationships, including the axioms of probability and the chain rule, can be verified similarly in μLea. One may note that these relationships do not appear explicitly in the Statues algorithm; actually, these are emerging properties thereof. Functional pexes allow expressing more complex queries or evidences: P(rain.given(grass_wet & ~sprinkler)) # -> 1.0 P(rain.given(~grass_wet | ~sprinkler)) # -> 0.27889355229430157 P((rain | sprinkler).given(~grass_wet)) # -> 0.12983575649903917 P((rain == sprinkler).given(~grass_wet)) # -> 0.87020050034444

It is easy to get the full joint probability distribution of the BN by using the tuple pex construct. This gives the probability of each atomic state of the three variables, taking their interdependence into account: TuplePex(rain,sprinkler,grass_wet) # -> {(False, False, False): 0.48000, (False, True, False): 0.03200, (False, True, True): 0.28800, (True, False, False): 0.03960, (True, False, True): 0.15840, (True, True, False): 0.00002, (True, True, True): 0.00198}

One can notice that the case (False, False, True) is absent, since impossible according to given grass wet CPT. Using tuple pexes, it is possible to derive any joint probability distribution, whether full or partial, of any pex model (e.g. factors as calculated by the variable elimination algorithm [6,19]). This may provide useful clues to understand returned results, as those given above. For providing an example involving numerical RV, the above use case can be extended by adding a hygrometer, showing a measure of grass humidity as an integer from 0 to 4. Assuming that this device is imprecise, the measure is modeled as a CPT depending on the state of the grass: measure = TablePex( grass_wet, { True : ElemPex({2: 0.125, 3: 0.375, 4: 0.500 }), False: ElemPex({0: 0.500, 1: 0.375, 2: 0.125 })})

Booleans, numerical values and comparison operators can then be freely mixed in the same query: measure # -> {0: 0.2758, 1: 0.2069, 2: 0.1250, 3: 0.1681, 4: 0.2242} measure.given(~rain) # -> {0: 0.3200, 1: 0.2400, 2: 0.1250, 3: 0.1350, 4: 0.1800} P((measure 0.685 P(~rain.given(measure 0.9018089662521034

152

P. Denis

Finally, to elaborate functional pexes on this use case, a variable dry is defined hereafter as the negation of rain, while a new variable norm measure converts measure into a normalized value ranging from −1.0 to +1.0. The queries made below are consistent with the previous results: dry = ~rain norm_measure = (measure - 2.0) / 2.0 norm_measure.given(dry) # -> {-1.0: 0.3200, -0.5: 0.2400, 0.0: 0.1250, 0.5: 0.1350, 1.0: 0.1800} P(dry.given(norm_measure 0.9018089662521034

These examples demonstrate that μLea, supported by the Statues algorithm, let the user express models and queries in a natural way, quite close to the underlying random variables. They show also that Bayesian reasoning and usual functions (such as inequalities, arithmetic and logical operators) can be combined together, providing capabilities absent from classical probabilistic approaches.

6

Conclusions

The present paper has introduced a new probabilistic framework, namely the p-expressions, that allows modeling discrete random variables having a finite domain. In essence, this framework provides primitives to define graphical models capturing the dependencies between random variables up to those having known prior probabilities. As sketched in provided examples, this formalism appears to be rich enough to model probabilistic arithmetic, conditioning, discrete-time Markov chains and Bayesian networks. Then, a new inference algorithm has been presented, the Statues algorithm, which calculates exact marginal probability distribution on any given p-expression. This algorithm relies on a special binding mechanism that uses recursive generators. Beside the validity of results regarding several problems covered in literature, a proof of algorithm correctness is available. The Statues algorithm is successfully implemented in the Lea and MicroLea libraries, using the Python programming language. The usage of MicroLea has been demonstrated on a simple Bayesian network, with some non-standard variations mixing Boolean and numerical random variables. The merits and liabilities of the Statues algorithm have been shortly discussed, as well as possible extensions. The algorithm handles only discrete random variables and it does not overcome the computational limitations of exact probabilistic inference. However, one of its interests in the perspective of probabilistic programming resides in its ability to address a set of problems traditionally handled by different specialized probabilistic modeling approaches. On the question of time efficiency, the Statues algorithm appears to have several strengths for competing with other exact algorithms, notably through its pruning and memoization features. For the algorithm’s inner machinery, the binding mechanism based on recursive generators has proven to be elegant and powerful to handle the dependencies between random variables.

The Statues Algorithm

153

Despite these promising results, further research is needed to assess the Statues algorithm’s actual assets/liabilities among other probabilistic inference algorithms. Acknowledgments. The author warmly thanks Nicky van Foreest for reviewing the early version of the present paper and for providing fruitful advice to improve it. The author is grateful to Fr´ed´eric and Marie-Astrid Buelens for their wise guidelines on writing scientific papers. The author thanks Gilles Scouvart, Nicky van Foreest, Zhibo Xiao, Noah Goodman, Rasmus Bonnevie, Paul Moore, Thomas Laroche and Guy Lalonde for their feedback, support, suggestions or contributions provided for the Lea library. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References 1. Agrawal, M.K., Elmaghraby, S.E.: On computing the distribution function of the sum of independent random variables. Comput. Oper. Res. 28(5), 473–483 (2001) 2. Berleant, D., Goodman-Strauss, C.: Bounding the results of arithmetic operations on random variables of unknown dependency using intervals. Reliable Comput. 4(2), 147–165 (1998) 3. Berleant, D., Xie, L., Zhang, J.: Statool: a tool for Distribution Envelope Determination (DEnv), an interval-based algorithm for arithmetic on random variables. Reliable Comput. 9(2), 91–108 (2003) 4. Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42(2–3), 393–405 (1990) 5. De Raedt, L., Kimmig, A.: Probabilistic programming concepts. arXiv preprint arXiv:1312.4328 (2013) 6. Jordan, M.I.: Graphical models. Statist. Sci. 19(1), 140–155 (2004) 7. Denis, P.: Lea: discrete probability distributions in Python (2014). http://www. bitbucket.org/piedenis/lea 8. Denis, P.: MicroLea: probabilistic inference in Python (2017). http://www. bitbucket.org/piedenis/microlea 9. Denis, P.: Probabilistic inference using generators – the Statues algorithm, appendix C. arXiv preprint arXiv:1806.09997 (2018) 10. Evans, D.L., Leemis, L.M.: Algorithms for computing the distributions of sums of discrete random variables. Math. Comput. Modell. 40(13), 1429–1452 (2004) 11. Goodman, N., Mansinghka, V., Roy, D.M., Bonawitz, K., Tenenbaum, J.B.: Church: a language for generative models. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (2012) 12. Goodman, N., Stuhlm¨ uller, A.: The design and implementation of probabilistic programming languages (2014). http://dippl.org 13. Knuth, D.E.: The Art of Computer Programming: Fundamental Algorithms, vol. 1, pp. 193-200, 3rd edn. Addison-Wesley, Boston (1997) 14. Pearl, J.: Reverend Bayes on inference engines: A distributed hierarchical approach, pp. 133-136. Cognitive Systems Laboratory, School of Engineering and Applied Science, University of California, Los Angeles (1982) 15. Pearl, J.: Fusion, propagation, and structuring in belief networks. Artif. Intell. 29(3), 241–288 (1986)

154

P. Denis

16. Pfeffer, A.: Practical Probabilistic Programming. Manning Publications Co, Greenwich (2016) 17. Poole, D., Zhang, N.L.: Exploiting contextual independence in probabilistic inference. J. Artif. Intell. Res. 18, 263–313 (2003) 18. Python Software Foundation (2001). http://www.python.org 19. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, Upper Saddle River (2003) 20. Saba, S.: Coroutine-based combinatorial generation (Doctoral dissertation, University of Victoria) (2014) 21. Schemenauer, N., Peters, T., Hetland, M.L.: PEP 255 - Simple Generators (2001). http://www.python.org/dev/peps/pep-0255/ 22. SymPy Development Team: SymPy: python library for symbolic mathematics (2016). http://www.sympy.org 23. Williamson, R.C.: Probabilistic arithmetic (Doctoral dissertation, University of Queensland) (1989) 24. Williamson, R.C., Downs, T.: Probabilistic arithmetic. I. Numerical methods for calculating convolutions and dependency bounds. Int. J. Approximate Reasoning 4(2), 89–158 (1990)

A Q-Learning Based Maximum Power Point Tracking for PV Array Under Partial Shading Condition Roy Chaoming Hsu1(B) , Wen-Yen Chen2 , and Yu-Pi Lin1 1 Electrical Engineering Department, National Chiayi University, Chiayi City 600, Taiwan

[email protected] 2 Computer Science and Information Engineering Department, National Chiayi University,

Chiayi City 600, Taiwan

Abstract. Due to the rise of environmental awareness and the impact of the nuclear disaster at the Fukushima nuclear power plant, green energy has become an important development project. When using solar energy to generate electricity, it is pollution-free and noise-free, making it the most important direction for green energy development. However, when portion of the PV Array is shaded, the P-V characteristic curve will produce a multi-peak situation such that the traditional maximum power point tracking (MPPT) method is difficult to successfully achieve tracking. In this paper, a maximum power tracking method based on reinforcement learning is proposed for PV array under partial shading condition, and the state, the required action and the reward function equation of the reinforcement learning are designed to effectively track the maximum power point for PV array under partial shading condition environment. Keywords: Reinforcement learning · PV array · Partial shading condition · MPPT

1 Introduction In recent years, the demand for electric energy is increasing, yet when generating energy, the awareness of environmental protection has risen and the safety of nuclear power generation has been questioned. Therefore, many alternative energy generation sources have been developed. To reduce the risk of nuclear power generation and the nuclear waste generated by the nuclear power plant, and the air pollution caused by thermal power generation, alternative green energy generations have been promoted, including wind energy, hydropower, solar energy, etc. Among these alternative power generation sources, solar energy is the most popular and widely used one. Photovoltaic Array (PV array for short) is a photoelectric component [1] in solar power generation, which converts solar energy and radiant energy into electrical energy. Because the main function of PV array is to convert solar energy into electrical energy, hence how to improve the photoelectric conversion efficiency of PV array has become an important issue in solar power research. © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 155–168, 2020. https://doi.org/10.1007/978-3-030-52246-9_11

156

R. C. Hsu et al.

Maximum power point tracking (MPPT) [2–5] is the technology used in the tracking maximum power point of a PV array under uniform insolation for solar power generation. The power vs. voltage characteristic curve (P-V curve for short) of a solar array under uniform insolation has a maximum power point at a fixed illumination and temperature. When the PV array is exposed to illuminance of sunlight under different temperature, the characteristic curve of the solar array also changes. Technically, if the load of a PV array can be adjusted with the highest power transmission efficiency of the P-V characteristic curve, the photoelectric conversion of the PV system will have the best efficiency. The MPPT is a method of adjusting the load such that the maximum power point of the PV characteristic curve can be tracking and maintained. However, PV systems are generally connected in series and in parallel by PV modules into a PV array, if the array is partially shaded, the P-V characteristic curve will have multi-local maximum situations, traditional MPPT methods such as P&O are difficult to achieve tracking because they are likely trapped in local maximum. A Two-Stage method [6] has been proposed to solve the MPPT method for PV Array with partially shading condition (PSC for short). To find the maximum power point of PV array under PSC, the Two-Stage method is divided the MPPT into two stages. The first stage uses the open circuit voltage Voc and the short-circuit current Isc as a load line. For the first step of the operating point voltage movement, the second stage uses the incremental conductance method to move the operating point voltage. After searching for the maximum operating power point at the current stage, it will compare with the maximum power point searched in the first stage. The maximum power point power stored in the first stage is greater than the maximum power point power searched in the second stage, and the operating point voltage is moved to the operating point voltage stored in the first stage, that is, the maximum power point is found in the whole domain. In this study, we apply the Q-Learning [7] learning method of Reinforcement Learning to track the power changes caused by shadow shading and track the maximum power point such that the higher efficiency of the power generation can be achieved and the results are compared with those of Two-stage [6] method. In the following, Sect. 2 is background of the proposed methodology. Section 3 shows the system architecture and simulation of this study. Section 4 exhibits the experimental results, and the conclusion comes in the Sect. 5.

2 Reinforcement Learning and Partial Shading Condition of PV Array Reinforcement Learning [7–9] is an area in machine learning that emphasizes on how to act based on the environment to maximize the expected benefits. In the reinforcement learning framework, an agent explores and learns in the environment, recognizes the current state according to the information given by the environment, executes the action after a specific decision-making process, and then receives the reward value from the environment. A common model of reinforcement learning is shown in Fig. 1. In Fig. 1, the agent observes the state of the unknown environment, takes action(s) through specific decisions, a reward associated with the state-action pair in exhibiting the positive or negative impact is then determined and given under the environment. After a continuous update of action and state to accumulate the maximum reward value, a set of optimal (action, state) strategies can be consequently learned.

A Q-Learning Based Maximum Power Point Tracking for PV Array

157

Agent

State

Reward

Action

Environment Fig. 1. Model of reinforcement learning.

The difference between reinforcement learning and supervised learning is that reinforcement learning does not require correct input and output, nor does it require precise correction of suboptimal behavior. Reinforcement learning focuses on online planning to find a balance under known/unknown environments and the existing knowledge. In the learning, the agent first explores the environment and tries a variety of different actions to obtain enough information. When certain knowledge about the environment are known, the agent then exploits the higher chances of action-state to obtain higher reward value. In the reinforcement learning, generally either ε-Greedy [10] or softmax [11] method is employed for the action selection. 2.1 Q-Learning Q-Learning [6] is a reinforcement learning algorithm among various reinforcement learning methods. The Q (state, action) value (Q-value) in the Q-Table is used as the judgment of the pros and cons of agent’s actions in certain states. The Q value is determined based on the agent’s chosen action (A) at the specific state (S). The resulting Q value is obtained after the agent takes action through the specific exploration strategy, and, to find the best strategy, the Q values are constantly updated such that the agent, after the learning, will consequently perform the same action under a similar environment with the associated Q-value. The Q value is the result of learning after each exercising of action at certain state, and the final Q values each tend to reach the best value Q* through many times of learning. The Q* is updated according to the following Eq. (1):   (1) Q∗ = maxa s , a and the updating of each Q value in the Q(S, A) is by (2)   Q(St , At ) ← Q(St , At ) + α Rt + γmaxa Q(St+1 , A) − Q(St , At )

(2)

in (2), α is the learning coefficient with value within [0, 1], the value of α is normally decreased with the time to obtain the convergence of Q. Rt is the immediate reward the agent received in every exercised action to exhibit the positive or negative impact. maxa Q(St+1 , A) is the current experience the agent learned so far, which indicates the accumulated reward starting from the state S until the next process. γ is the decreasing factor for decreasing the influence of future reward.

158

R. C. Hsu et al.

2.2 The Shading Effect of PV Array To facilitate the discussion of the MPP of the PV, the equivalent circuit of the PV is analyzed. The equivalent circuit of the ideal solar cell is composed of a current source in parallel with a diode, and a shunt resistance and a series resistance, where the magnitude of the current source is proportional to the sunlight illuminance, as shown in Fig. 2.

Fig. 2. The equivalence circuit of a solar cell and the load.

Ideally, the internal resistance Rp of the solar cell is very small which can be ignored. The output current I of the solar cell can hence be expressed as Eq. (3). I = IL − ID = IL − IOS [exp AkqB T (V + IR) − 1] ≈ IL − IOS [exp AkqVB T − 1]

(3)

In (3), the parameters are I: V: IL : IOS : kB : q: A: R: T:

output current (A) output voltage (V) light generated current (A) the dark saturation current (A) Boltzman constant the electron charge the non-ideality factor the series resistance () the temperature

2.3 The PV Array Under Partial Shading Condition A PV array placed outdoors are often shaded by uncertainties such as clouds, leaves, or dust. When the PV array are partially shaded, photoelectric conversion efficiency of the PV array might be affected. The following example shows the case when two solar cells are connected in series, while one of which is subjected to shading. The influence of the I-V and P-V characteristic curve is shown in Fig. 3(a) and (b), respectively.

A Q-Learning Based Maximum Power Point Tracking for PV Array

159

Fig. 3. Two solar cells are connected in series and one of which is subjected to shading

When two solar cells are connected in parallel and one of which is subjected to shadow shading. The influence of the I-V and P-V characteristic curve is shown in Fig. 4(a), and (b), respectively.

Fig. 4. Two solar cells are connected in parallel and one of which is subjected to shading

The shadow generated by the external shading of the PV array will cause the multilocal maxima situation in the P-V characteristic curve. By taking the 2 × 2 PV array as an example, and the P-V characteristic curve is illustrated in Fig. 5. The shadowed solar cells will generate less current than other unshaded solar cells and will become a resistor on the circuit such that the so-called Hot Spot [12] will occur.

Fig. 5. The P-V characteristic curve under partial shading condition.

160

R. C. Hsu et al.

2.4 Calculating the Number of Partial Shading Patterns To calculate the number of partial shading patterns, it is assumed that a M × N PV array is consisted of N-in series connected with M-in parallel [14]. An example of 3 × 4 PV array with 2 and 1 shaded PV in the first and second, respectively, parallel PV array is shown in Fig. 6.

Fig. 6. An example of 3 × 4 PV array.

First considering there is 1 shading PV module in any parallel connected PV array with the symbol i = 1. And there will be two cases by extending in parallel direction, i.e. (i = 1, j = 1), and (i = 1, j = 2). It is then calculated to confirm that there are two cases when i = 1, as shown in the Fig. 7.

Fig. 7. All the shading pattern when i = 1, first parallel array with 1 module is shaded.

Next considering the shading pattern for i = 2, i.e., 2 shaded modules in a parallel connection, to be calculated. When i = 2, there are two kinds of shading cases for extending to j = 2. As Fig. 8 shows, where the shaded number i and j are regarded as the i × j array, and fixing i in the series connection, the shading pattern can be counted on the parallel j shading module. The shading pattern of a 2 × 2 solar array is shown in Fig. 7, with the case of i = 1, and in Fig. 8, with the case of i = 2 and j = 2, respectively. When the size of PV array is known to be M × N, the partial shading pattern can be calculated by i and j such that the table for calculating shading pattern of M × N PV array can be obtained as in Fig. 9. For example, if M = 2, and N = 2, i.e., M × N = 2, by looking into the table, one can find the shading pattern is 6.

A Q-Learning Based Maximum Power Point Tracking for PV Array

161

Fig. 8. The shading pattern when i = 2 and j = 2

M

N

Fig. 9. Table for calculating shading pattern of M × N PV array.

It can be seen from the table that with a 2 × 2 PV array, when the series connection is as i = 1, there are two cases of parallel j = 1 and j = 2, when the series shading i = 2, there is parallel j = 1, j = 2, there are three cases, so there will be five kinds of shading patterns for the 2 × 2 PV array, but when the shading situation is calculated, the non-shadowing situation is not added, so the shading pattern must be added to the non-shading situation, 2 × 2. Hence, there are six shading patterns for the 2 × 2 PV array.

3 System Architecture and Simulation 3.1 Bullseye Reward Function of RLMPPT for PV Array with PSC To simulate Bullseye reward function of RLMPPT for PV array with PSC, a 3X2 PV array, as shown in Fig. 10, is taken as an example, where Fig. 10(a), (b) and (c), respectively, are no shading, three, and two of them are shaded. For different kind of partial shading condition, the pre-trained bullseye reward function of the RLMPPT will be different. The bullseye reward function for the different shading pattern of Fig. 10(a), (b), and (c) are shown in Fig. 11(a), 11(b), and 11(c), respectively as below. The simulation flowchart of the proposed RLMPPT [13] for PV array with PSC is show in Fig. 12.

162

R. C. Hsu et al.

Fig. 10. Three difference kind of partial shading condition for 3 × 2 PV array.

Fig. 11(a). Bullseye reward function without shading

Fig. 11(b). Bullseye reward function with 3 PV modules under shading.

A Q-Learning Based Maximum Power Point Tracking for PV Array

Fig. 11(c). Bullseye reward function with 2 PV modules under shading.

Start

Initializing learning parameters and calculating shading patterns Detecting the current and voltage, respectively, of the in series and in parallel PV array Deciding the shading type and selecting its’ Bullseye reward function

Executing RLMPPT Detecting the current and voltage, respectively, of the in series and in parallel PV array

Is shading type changed?

Yes

No Fig. 12. The simulation flowchart of the RLMPPT for PV array with PSC

163

164

R. C. Hsu et al.

3.2 The Algorithm of RLMPPT for PV Array with PSC The Algorithm of RLMPPT for PV array with Partial Shading Condition is shown in Fig. 13. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

15.

Read the value of temperature and illuminance. Read the partial shading pattern of PV array. Initialize all Q(s,a)←0 While i Max P Max P ←P(i) Max V ←V(i) end set i = i+1, for next execution time ; Calculating reward r’ and observing s’ Update Q value by Eq. 2. s←s’ end while loop

Fig. 13. The algorithm of the proposed RLMPPT for PV array with PSC

Line 1 is to read in the values of current temperature and illuminance for the PV array simulation. Line 2 is the sensing data to detect the shading pattern of the PV array and search into the memory to obtain the bullseye reward function of RL for the detected shading pattern. When in the reinforcement learning, as starts from line 4, the agent explores the current environment and selects an action by E-greedy, judges the current pair of working voltage and the obtained power, i.e., (V, P) to see if (V, P) falls into the bullseye reward function. If it is inside the bullseye, the reward value of 50 is given, if it is outside the bullseye, reward value of 0 will be given. Line 8–10 are the condition of P and V after agent’s action. If P > 0 and V > 0, i.e. (+, +), state value is 0, if P > 0 and V < 0, i.e. (+, −), state value is 1, if P < 0 and V > 0, i.e., (−, +), and the state value is 3. After the action, reward function, and state updated process, all the Q value can be updated and send the predicted working voltage to reach the end of this single execution.

4 Experimental Results In the experiment and simulation, the PV module used is MS × 60 and its specification is shown in Table 1. In the pre-training to obtain reinforcement learning’s bull’s-eye reward function, 5 days of real climate data recorded from Loyola Marymount University are used with the average values are as in Table 2. To show the advantage of the proposed method, our results are compared with those of Two-stage method. It can be seen from Fig. 14 that for the un-shaded PV array of Fig. 10(a), our method reaches convergence starting from about 45 s.

A Q-Learning Based Maximum Power Point Tracking for PV Array

165

Table 1. The PV module used is MS × 60 and it’s specification Parameters Value Pmax

60 W

Voc

21.1 V

Isc

3.8 A

VMPP

17.1 V

IMPP

3.5 A

Table 2. Averaged climate values with 5 days. Parameters

Values

Temperature

2015/08/01–08/05 Am10:00–Pm2:00

Averaged temperature

25.025 °C

Temperature variation

04387 °C

Average illumination

785.9202 W/m2

Standard deviation of illumination

86.14 W/m2

Fig. 14. The distribution of maximum power difference vs. time of the proposed method.

To simulate the shading pattern changes from Fig. 10(b) to Fig. 10(c), the distribution of maximum power difference vs. time are shown in Fig. 15, and Fig. 16, respectively. It can be seen from Fig. 16 that the convergence takes about at 50 s, while in Fig. 16 due to the shading pattern changes from 3 shaded PV modules to 2 shaded PV modules, it takes longer time to reach convergence. Under the same climates and PV array specification, the results of the Two-stage method for un-shaded PV array, 3 PV module are shaded and changes form 3 shaded PV modules to 2 shaded PV modules are shown in Fig. 17, 18, and 19, respectively. In Fig. 17, and Fig. 18 the Two-stage method converges at about 38 s, 20 s, and 170 s, respectively, for the un-shaded PV array and for the 3 shaded PV modules, and changes form 3 PV shaded modules to 2 shaded PV modules which are in comparable with our proposed RLMPPT for PV array with PSC.

166

R. C. Hsu et al.

Fig. 15. The distribution of maximum power difference vs. time for Fig. 10(b) of the proposed method

Fig. 16. The distribution of maximum power difference vs. time for changes from 3 PV module are shaded to 2 PV module are shaded of the proposed method

Fig. 17. The distribution of maximum power difference vs. time of the Two-stage method.

In comparing with the Two-stage method, it seems that our proposed RLMPPT converges a little bit slower than the Two-stage method, yet our proposed RLMPPT method achieves better accuracy of the maximum power, which can be seen from Fig. 20 and Fig. 21. In Fig. 20, the maximum power difference is below 1 W, while that of Two-stage method is about 6 –7 W.

A Q-Learning Based Maximum Power Point Tracking for PV Array

167

Fig. 18. The distribution of maximum power difference vs. time for Fig. 11(b) of Two-stage method.

Fig. 19. The distribution of maximum power difference vs. time for changes from 3 PV module are shaded to 2 PV module are shaded of the Two-stage method.

Fig. 20. The distribution of maximum power difference of the Two-stage method.

Fig. 21. The distribution of maximum power difference of the Two-stage method.

168

R. C. Hsu et al.

5 Conclusions In this study, the 3 × 2 PV array was used to verify the proposed RLMPPT for PV array with partial shading condition. In the actual environmental parameters, the proposed method can converge to the maximum power point in a short time even under the changes from one shading pattern to another shading pattern. In comparing to an existing Two-stage method, the proposed method reaches the maximum power point with better accuracy than that of the Two-stage method, which indicates that our method can find the actual global maximum better than that of Two-stage method as well. In reality, the actual PV array may be larger than the experimental ones, but as long as the proposed method can calculate and detect the real shading pattern as described in this paper, the global maximum power point tracking can be achieved even under variable conditions and to achieve even more accurate power generation with efficiency.

References 1. Singh, P.O.: Modeling of photovoltaic arrays under shading patterns with reconfigurable switching and bypass diodes. The University of Toledo Digital Repository, Paper 723 (2011) 2. Bahgata, A.B.G., Helwab, N.H., Ahmad, G.E., Shenawy, E.T.E.: Estimation of the maximum power and normal operating power of a photovoltaic module by neural networks. J. Renew. Energy 29(3), 443–457 (2004) 3. Femia, N., Petrone, G., Spagnuolo, G., Vitelli, M.: Optimization of perturb and observe maximum power point tracking method. IEEE Trans. Power Electron. 20(4), 963–973 (2005) 4. Bahgata, A.B.G., Helwab, N.H., Ahmad, G.E., Shenawy, E.T.E.: Maximum power point traking controller for PV systems using neural networks. J. Renew. Eneygy 30(8), 1257–1268 (2005) 5. Esram, T., Chapman, P.L.: Comparison of photovoltaic array maximum power point tracking techniques. IEEE Trans. Energy Convers. 22(2), 439–449 (2007) 6. Kobayashi, K., Takano, I., Sawada, Y.: A study of a two stage maximum power point tracking control of a photovoltaic system under partially shaded insolation conditions. Sol. Energy Mater. Sol. Cells 90(18–19), 2975–2988 (2006) 7. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996) 8. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) 9. Zhan, Z., Wang, Q., Chen, X.: Reinforcement learning model, algorithms and its application (2011) 10. Rodrigues Gomes, E., Kowalczyk, R.: Dynamic analysis of multiagent Q-learning with εgreedy exploration. In: ICML 09 Proceedings of the 26th Annual International Conference on Machine Learning (2009) 11. Hinton, G.E., Salakhutdinov, R.R.: Replicated softmax: an undirected topic model (2009) 12. Herrmann, W., Wiesner, W., Vaassen, W.: Hot spot investigations on PV modules-new concepts for test standard and consequences for module design with respect to bypass diodes. In: IEEE PV Specialists Conference, pp. 1129–1132 (1997) 13. Hsieh, H.-I., Wang, H., Liu, C.-T., Chen, W.-Y., Hsu, R.C.: A reinforcement learning based maximum power point tracking method for photovoltaic array. Int. J. Photoenergy 2015 (2015) 14. Walker, G.: Evaluating MPPT converter topologies using a MATLAB PV model. J. Electr. Electron. Eng. 21, 49–55 (2001)

A Logic-Based Agent Modelling Paradigm for Investment in Derivatives Markets Jonathan Waller(B) and Tarun Goel Endfield Derivatives, LLC, Wilmington, DE 19808, USA [email protected], [email protected]

Abstract. The paper describes Multiagent Systems going into details of the agents themselves and how they can be configured for a financial gain setting in the stock market. The interaction of Multiagent Systems between each other and how they behave within an environment, specifically the environment which they are in, will be detailed with their own growth from a single, relatively simple model, to a large network of model constantly interacting with different agents. The stock market is used as the source to prove that Multiagent Systems can be more efficient than other conventional models such as Machine Learning models. The common problems that can arise from bad agents are described introducing a real-world system scenario to better explain the concept. Finally, a look into the model itself is presented with an insight into how it works and some of the high-level profitable returns made from the model over the course of the models deployed timeframe. The future of the design is then discussed. Keywords: Multiagent Systems · Quantitative finance · Optimization · Investment decisions

1 Introduction 1.1 Features Multiagent systems (MASs) generally are described by their individual agents, interactions with one another and the environments in which they reside. Agents can be described in many ways but for this paper are considered as separate nodes participating in an activity and reporting the results either to a central user or to each other with some sort of moderator or overseer analysing the decisions made. The agents themselves are described by such attributes as uniformity, autonomy, goals, abilities, flexibility, etc. which in turn help to describe the overall system in which they serve a purpose to. Uniformity asks whether the agents are homogenous or heterogeneous, i.e., whether the agents are the same or have different goals and abilities. These agents have goals rather than following preset procedures which are set at the creation of the agents, and rather than locking in explicit instructions for how to achieve its goals, optimal methods are learned by the agents given certain constraints by the designer only at the initialization of the system. The agents constantly learn from actions and interactions with one another and as time passes. © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 169–180, 2020. https://doi.org/10.1007/978-3-030-52246-9_12

170

J. Waller and T. Goel

1.2 Interactions MASs achieve these goals by interacting with other agents via sensors, cognition and the like to obtain information (percepts) about their environment. The ways in which these agents socialize within a MAS can range from simple signal exchange to more knowledge-intensive, semantically rich language exchange. 1.3 Environment It is important to know what type of environment with which the agents will be interacting. Of primary interest are accessibility, predictability, periodicity, dynamics and number of states in which the environment can exist. An accessible environment is one in which the agent can obtain complete, accurate, up-to-date information about the environment’s state [1]. The stock market would be considered inaccessible as one cannot know all the information about that environment, i.e., earnings report before they happen, what every investor is thinking, national security concerns that will affect the market etc. This leads to a static versus dynamic problem. A static environment is one that remains unchanged except for the performance of actions by the agent [1]. However, there are outside agents beyond those in the stock market that affect our environment, making it dynamic in nature. A dynamic environment is one that has external processes, which affect it. A recent paper in the Federal Reserve Bank of St. Louis Review argues for a mathematical fear gauge model of forward guidance. It models connections from various announcements from members of the Federal Reserve Board to subsequent movements in a basket of securities [2]. Even if this model is sound, it is believed to be already priced-in to the market. This is a more game-theoretic belief that whatever actions or inactions are made by external actors, will influence the environment—securities prices. This is why the model focuses upon prices rather than actions to prices. Each price point of a security represents the potential states of the environment, i.e., a countable set of states. Securities prices and asset returns may appear discrete as prices move by discrete ticks; however, there are an incalculable number of price movements. In a chess game, the environment is discrete as there are a finite number of moves that are possible. Furthermore, the environment is non-episodic. There is an intertemporal relation between present and future episodes. People do care about past performance in the market and current sentiment will affect future performance somewhat. Linking this to the stock market can prove that upon carefully analysing real-time actions, can one achieve a desired end goal by profiting from stock market trades, or subsequently winning a game of chess. Financial markets are of the general environment class of inaccessible, non-deterministic, dynamic, continuous and non-episodic, making them the most complex to model [2]. 1.4 Objectives The hierarchy then explains that investment strategies are not completely random but in principle can be proven to be deterministic. When considering a risk-based scale and

A Logic-Based Agent Modelling Paradigm for Investment in Derivatives Markets

171

training a MASs model to lower risk and maximise profit, a valid investment decision opportunity can be created. The MASs will use a combination of agents, which will make all possible investment-based decisions for a shareholder and report back the potential successes and failures to a central mediator. This in turn will lead to an incremental strategy where upon vigorous training regimes, a least-risk solution will be highlighted which will lead to profit maximisation. The training itself will also be a rather important topic to consider, as the processing power required to deal with such a system will require unique solutions given limited hardware performance. A highlight into why MASs can also be favoured over other conventional models will also be presented. A look into the particular single agents will be provided in Sect. 2 and how it develops into a system of multiple agents. An overview of the introduction of disturbances into the agents will be described with a comparison of its design compared to other common artificial intelligence based approaches. Section 3 will reiterate the concept of disturbances into the system and its effects. Section 4 describes the created model while Sect. 5 takes a look at the results from the model.

2 Background 2.1 Intelligent Agents Intelligent agents exhibit goal directed behavior. The system has two goals in mind: profit maximisation and risk minimisation. Intelligent agents socialise, i.e. cooperate and or compete, otherwise known as satisfying sociability. There is also the desire to seek as many trades as possible (see Fig. 1). This creates a signal to noise problem. Taking more trades may increase returns, but it will also increase risk, which may inevitably decrease returns. Various metrics can be used for determining a marginal rate of transformation between expected returns and risk to maximize utility (profit). There is also competition about when to place trades and whether it is a call or put. The agent can perceive (stock prices) and can react in a timely manner. This means at some point it must decide whether the goal is still feasible. A hard stop is programmed if its objective is not reached by a certain date. The agents may also decide to stop trying earlier should they reach their objective, or if the objective becomes untenable (see Fig. 1 and Fig. 2). 2.2 Amplification and Disturbances One ideal way to further understand the risk scenario is by looking at a common example. Heavy traffic is a major problem in many cities and there have been many solutions put in place to maximise traffic flow, such as the introduction of infrared sensors to detect vehicle presence and inductive coils embedded inside the roads at junctions which utilise these states to create algorithms that choose which lights to make red and which to make green for maximum efficiency [4]. However, when people break the rules in this setting, it can often compromise the algorithms integrity, for example a car breaks a red light but there is not enough space for them to safely pass the junction so they are stuck in the middle of a junction blocking traffic from other lanes. This same concept can be applied to an agent-based model to combat the negative impacts that one bad action can have to

172

J. Waller and T. Goel

Fig. 1. Single agent showing its continuous interactions with the system around and how this affects its output signal, or logic. The figure shows an interaction event and three behavioral states: Behavior 1—null/quit, Behavior 2—buy/enter, Behavior 3—sell/exit through time. An orthogonal edge-routing algorithm is used because it limits edge crossing and length for a small number of nodes [3].

an overall system. The vehicle-based traffic can also be translated to the stock market workflow and how the structure of the investment market is ordered. A break in the flow of the organisations can propagate, sometimes exponentially, and have rapid positive or negative consequences. 2.3 Why Multiagent Systems With the massive advances in computing power due to decreases in cost, and increases in scalability from cloud computing, quantitative finance has blossomed. The field of machine learning (ML) offered promise in discovering and creating financial models once considered too computationally expensive [5]. However, the model presented herein, attempts to explain how and why MASs should be used above other techniques such as ML.

A Logic-Based Agent Modelling Paradigm for Investment in Derivatives Markets

173

Fig. 2. The development of a single agent into a multiple agented network, creating a network of interactions that converge onto a consensus, otherwise it clusters consensuses into multiple competing scenarios. The figure shows multiple agents and interaction events and three behavioral states: Behavior 1—null/quit, Behavior 2—buy/enter, Behavior 3—sell/exit through time. Clusters can be seen representing interaction events. Edges represent travelling from behavioral state to state.

ML generally comes in two flavours: supervised and unsupervised [5]. Supervised learning involves a great more human intervention than MASs, as it requires the architect to provide the inputs (percepts) and the desired outputs. For example, a vending machine may be required to determine the various denominations of coins being inserted. Upon being provided with features about said coins, the supervised ML algorithm will then attempt to classify. Here, the algorithm is given preset classes to fit the data into. In the unsupervised arena, an ML system would determine the classes by clustering data from a large dataset, at which time the architect must name, or attempt to name the clusters discovered by the ML system. However complex the ML system becomes, (i) It will only prescribe one optimal decision at a time, and; (ii) It will suffer from being monolithic. That is, it behaves like a single-agent system. The algorithm does not compete or cooperate with itself and all processes are centralized. Even if there are multiple inputs, sensors, actuators or robots, a single agent parses the inputs and decides its action [6]. Monolithic System. Monolithic systems have monolithic architectures, that is, there is a single overarching control of all internal processes [7].

174

J. Waller and T. Goel

Fig. 3. The figure shows part of the network in Fig. 2 with interaction events and three behavioral states: Behavior 1—null/quit, Behavior 2—buy/enter, Behavior 3—sell/exit through time. Clusters can be seen representing interaction events. Edges represent travelling from behavioral state to state. The blue box in Fig. 2 is magnified and represented here in Fig. 3.

Certain problems would prove intractable in monolithic problems. The monolithic architecture suffers from low fault tolerance as its components are tied closely together, giving the system limited flexibility. This type of structure means that the architect must understand and code for how all the constituent parts of the system will fit together [7]. On the other hand, a subsumption agent architecture allows for multiple actions to be prescribed at any one state in which the environment may exist [6]. The MAS may suggest entering 100 securities at once or just one. The investment advice for Apple doesn’t affect the investment advice for Microsoft. It also allows for paraconsistent or contradictory prescriptive actions. In an environment describing the stock market, this may lead to buying calls and puts on the same security, or even buy and don’t-buy advice contemporaneously. One might consider then that one of these actions would have to prove to be ill advised to avoid contradiction. However, such actions, like straddles, could be conducted in a securities market and

A Logic-Based Agent Modelling Paradigm for Investment in Derivatives Markets

175

could be considered contradictory as it the agent would be both betting for and against the same security. Straddle. A Straddle order is an order to purchase calls and puts in the same quantity, having the same expiration and strike price on the same security at the same time [8].

2.4 Overtraining Lastly, despite any ML algorithm’s ability to learn and evolve, at any state space within said model, it is effectively static. A top-down or a bottom-up approach would suffer from a division fallacy and a composition fallacy respectively. A top-down approach would be in essence using macroeconomics indicators to model markets. This would fallaciously assume that components parts behave the same as the macroeconomic whole. Conversely, if a model were to be trained with data from a single sector or security—Apple (AAPL), it would falsely try to model the whole based upon movements seen in AAPL. The problem remains in a monolithic system that only is generated trying to describe both the whole and constituents. We have seen how this paradigm falls apart in physics. Relativity describes the macroscopic universe fairly, but fails disastrously in the quantum world. The converse is true for quantum mechanics. An argument for the obsolescence of machine learning is being made here; however, an agented model running multiple, independent heterogeneous ML algorithms contemporaneously against each component part of the whole is needed. In this way, any ML system won’t over-train itself by fixing what is not broken.

3 Literature Review 3.1 String Stability In the context of autonomous vehicles, string stability refers to the maintenance of a constant distance between a series of vehicles moving in the same direction behind one another [9]. Sometimes, a disturbance can occur which disrupts this distance making each vehicle alter its distance somewhat where the noise is amplified through the set of vehicles (see Fig. 4). This problem is typically present when trying to model a controller that controls the acceleration of an object due to the following derivation. P˙ = v

(1)

v = Gc

(2)

P¨ = Gc

(3)

H (s) =

1 s2

(4)

176

J. Waller and T. Goel

Fig. 4. Simulation with variations in the position between cars travelling between one another with a disturbance introduced to Car1 altering and amplifying the position of all other cars to compensate producing inconsistencies.

P is the position, v is the velocity, Gc is the controller or acceleration in this case and the transfer function of the model is given as H (s). Modelling a controller on acceleration tends to yield two poles at the origin, which tend to yield instability. This sort of problem can be related to the compliance levels of an individual. The past actions witnessed affect the future outcomes hence the cost at the relative junctions may need to scale based on the results obtained at the previous junction making the need to manage entire network appropriately [10]. 3.2 Flash Crashes, Bad Actors and Market Perturbations Per our string stability theory of markets, one bad behavior begets many more of its kind, descending systems into chaos. In traffic flux dynamics, a Cambridge study found that autonomous driving vehicles could reduce traffic congestion by up to 35% over egocentric driving. Researchers programmed a small fleet of cars and recorded changes to traffic flow when one random car stopped [11]. Bad agents with poor decisions can inevitably be avoided in a MASs since no decision will be made until all agents have reported back a result that is found to be least-risk based. However, the introduction of rogue traders to the system could be a concern. They could have the negative effect of training the MASs model to follow their decision pattern and effectively yielding to a solution that is non-optimal. A perfect example of this is when the rogue trader does not effectively introduce extra risk, but introduces more costs, which has the unknowing effect of a greater risk.

A Logic-Based Agent Modelling Paradigm for Investment in Derivatives Markets

177

Along with rogue traders, market perturbations have a high impact on risk. Smaller companies that slightly depend on their larger competitors could be exponentially impacted by fluctuations of their co-dependent companies’ stock prices. Taking a look at an example, in 1990, Harry Markovitz won the Nobel Prize in economics for his Modern Portfolio Theory (MPT), a mathematical formulation and extension upon the idea of diversification to maximize asset return given a specific level of risk aversion. This model prescribes an optimal portfolio diversification based upon covariances of asset returns [12]. This only reduces static risk: the risk as determined by MPT at one point in time. Static risk can be alleviated, but MPT does nothing about intertemporal risk. To consider the broader idea, another diversification model must be taken into account. This means that we need to diversify to remove intertemporal risk. The idea is to minimise the probability of an unexpected situation—black swans or flash crashes, having untoward effects upon assets. Current models of diversification are long-term, accounting only for static risk. As evidence of this, the California Public Employee Retirement System (CalPERS)—the largest pension fund in the United States—experienced a 51.4% loss in three quarters from July 2008 to March 2009 [13]. At seemingly no point did the managers—of which there were 76 of them acting as external securities managers—decide to start betting against the market [14]. Either they could not—due to bylaw restrictions— or would not hedge for intertemporal risk. As evidence of that, there was a point within that three-quarter stretch that the managers could turn around and start betting against the market or pull out. Admittedly, there are potential geopolitical and feasibility concerns. Subsequently, CalPERS experienced three major corrections in their domestic equity investment some time after the Great Recession: in 2010Q2 (~11%), 2011Q3 (−11%) and 2015 (−11%), proving that the consensus ideology was diversification would be enough having foreknowledge of a severe market downturn, or else a false sense of security has been placed into these fiduciaries [15]. The main reason for divesting some of the total assets among a large number of external securities is to allow for nimble transaction maneuvering. However, it would seem that the general consensus is to remain exposed in the market, and ride through the corrections. This means current models do not account for a second dimension of diversification: exposure and non-exposure. The purpose of intertemporal risk minimization is to avoid these events, by limiting the percentage of AUM exposed at a given state, and limiting portfolio exposure to the markets across time. This decreases the probability that an investment will be exposed during a black swan event. 3.3 Existing Models The process in which MASs are used to assess patterns within financial markets is not solely unique. Others have attempted to implement their own versions of models by characterizing single agents with particular parameters so that a MAS is generated with a slightly altered end goal. However, most of these solutions that are successful essentially yield the same output, which is financial profit. One such example attempts to simulate the behaviour of the entire stock market through agents [16]. They do this by specifying a fixed number of starting agents, three in this case, and label them as investors with altered personas. The agents then are given categories in the ways they make investment

178

J. Waller and T. Goel

decisions. The interaction between the agents helps to form the model and determine what actually should be invested in, probabilistically speaking. The research [16] also does well to highlight the work that exists in relation to this field. In many ways, most of the work around this area is often just tweaked at the initial tuning stage, otherwise known as the creation of the agents. The constraints that the agents then rely on, typically a mathematical formula are just defined based on what is thought to be considered more significant than others to form the MAS.

4 Proposed Investment Strategy 4.1 The Created Model Agents with memory architecture: agents have percepts; in this case, only price data is used—both historical and real-time. Percepts about the market are used to make determination about the state of the environment. The agents use a rule set to determine optimal actions given a specific state of the environment. These states are databased to make intertemporal reasonings about the evolution of the environment, more specifically, the market and its constituent parts—securities. The agents also catalog decisions across time to measure performance of said decisions and course correct. These decisions make tiny market perturbations in its environment, which creates a feedback loop, and generates new percepts.

5 Outputs from the Model 5.1 Results The described model was then put through a test. The statistics of the output were analysed and the returns were recorded throughout the trial period of the model. An initial sum was invested into a proof-of-concept fund. The model assumes a game-theoretic movement of price, makes decisions about when to enter, speculates about direction and timeframe, and maximizes price target trajectories, whilst minimizing exposure time in the market—intertemporal risk minimization. The agents act like state-machines, reevaluating their choices depending on the state. The graph below shows a cumulative benchmark return year-to-date (see Fig. 5). The overall model has yielded over a 200% profit return to what was initially invested into the design. Although there are stages in the design whereby the model’s decision incurs a loss, the overall projection has yielded almost a linear progression of profit after exposure to an initial period of fine-tuning of the design parameters. 5.2 Limitations and Weaknesses The current model potentially suffers from over-communication, or in the very least, the need to communicate. This leads to information lag. Unlike a system underpinned by ML algorithms, a MAS doesn’t have constant, instantaneous information about its component agents and their behaviors. This because the agents are semi-autonomous or fully autonomous and can make novel decisions and/or change their belief structure. There is not a central controller (monolithic system) that predetermines the behavior.

A Logic-Based Agent Modelling Paradigm for Investment in Derivatives Markets

179

Fig. 5. Cumulative benchmark returns for the Dow Jones Industrial Average (green), NASDAQ Composite (purple) and S&P500 (red) measured against the prototypical account year-to-date 2019, stopping at 8th Oct 2019 based on the decisions, which the model indicated, should be taken. The INDU represents the Dow Jones Industrial Average, the COMP represents the NASDAQ composite, and the SPX represents the S&P 500 with PROTOTYPICAL ACCOUNT (blue) being the rate of the models account.

6 Conclusion The idea behind the use of MASs was discussed in detail throughout the report as well as highlighting its implications and applications for maximising returns when investing in stocks. The design that when run proves that return will in some way be greater than the amount invested. There is always a risk when it comes to such a model but with further training and data, the model can be improved to optimally minimise the risk. One such evident approach would be to analyse the stock prices and trends of companies on a regular basis by exporting the model to a cloud form of solution so that new data is always read in and used as training information and that the model is up to date with the news affecting stock prices values.

References 1. Weiss, G.: Multiagent Systems, 2nd edn. The MIT Press, Cambridge (2019) 2. Kliesen, K., Levine, B., Waller, C.: Gauging market responses to monetary policy communication. Federal Reserve Bank of St. Louis. Review 101(2), 69–91 (2019). Second Quarter 3. Freivalds K., Glagolevs, J.: Graph compact orthogonal layout algorithm. In: Fouilhoux, P., Gouveia, L., Mahjoub, A., Paschos, V. (eds.) Combinatorial Optimization. ISCO 2014. LNCS, vol. 8596. Springer, Cham 4. Engineers Journal. http://www.engineersjournal.ie/2015/06/30/traffic-monitoring-nra/. Accessed 10 Nov 2019 5. Simon, A., Singh, M., Venkatesan, S., Ramesh, D.: An overview of machine learning and its applications. Int. J. Electr. Sci. Eng. 1(1), 22–24 (2015) 6. Stone, P., Veloso, M.: Multiagent systems: a survey from a machine learning perspective. Auton. Robot. 8(3), 1–57 (2000)

180

J. Waller and T. Goel

7. Stephens, R.: Beginning Software Engineering, 1st edn. Wiley, Indianapolis (2015) 8. SEC Options Trading Rule 6. https://www.sec.gov/rules/sro/pcx/34-49451_a6.pdf. Accessed 07 Nov 2019 9. Oncu, S., Wouw, N., Heemels, W., Nijmeijer, H.: String stability of interconnected vehicles under communication constraints. In: IEEE 51st IEEE Conference (2012). https://doi.org/10. 1109/cdc.2012.6426042 10. Ferraro, P., King, C., Shorten, R.: Distributed ledger technology for smart cities, the sharing economy, and social compliance. arXiv e-prints, page arXiv:1807.00649, October 2018 11. Cambriduge University Driverless Cars Working Together Can Speed Up Traffic By 35 Percent. https://www.cam.ac.uk/research/news/driverless-cars-working-together-can-speed-uptraffic-by-35-percent. Accessed 07 Nov 2019 12. Markovitz, H.: Portfolio selection. J. Financ. 7(1), 77–91 (1952) 13. CalPERS Comprehensive Annual Financial Report, Fiscal Year Ending June 30, p. 17 (2009) 14. CalPERS Comprehensive Annual Financial Report, Fiscal Year Ending June 30, pp. 71–72 (2009) 15. California Public Employee Retirement System 13F Metrics. https://whalewisdom.com/filer/ california-public-employees-retirement-system. Accessed 11 Nov 2019 16. Souissi, M.A., Bensaid, K., Ellaia, R.: Multi-agent modeling and simulation of a stock market. Invest. Manag. Financ. Innov. 15(4), 123–134 (2018)

An Adaptive Genetic Algorithm Approach for Optimizing Feature Weights in Multimodal Clustering Manar Hosny(B) and Sawsan Al-Malak Computer Science Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia [email protected]

Abstract. Social Media is a popular channel of communication, where people exchange different types of high volume and multimodal data. Cluster analysis is used to categorize this data to extract useful information. However, the variation of features that can be used in clustering makes the clustering process difficult, since some features may be more important than others, and some may be irrelevant or redundant. An alternative to traditional feature selection techniques, especially with the absence of domain knowledge, is to assign feature weights that depend on their importance in the clustering process. In this paper, we introduce a multimodal adaptive genetic clustering (MAGC) algorithm that clusters data according to multiple features. This is done by adding feature weights as an extension to the clustering solution. In other words, feature weights evolve and improve alongside the original clustering solution. In addition, the number of clusters is not determined a priori, but it is adapted and optimized during the evolutionary process as well. Our approach was tested on a large collection of Flickr images metadata and was found to perform better than a non-adaptive genetic algorithm clustering approach and to produce semantically related clusters. Keywords: Clustering · Genetic algorithms · Multimodal data · Feature selection · Adaptive feature weights

1 Introduction User generated content has been vastly growing on the World Wide Web (WWW). In fact, the essence of Web 2.0 is to transform users into co-developers who generate and share content in websites. It has developed over the last few years into becoming a channel of communication that is widely known as Social Media. People now share, like, and annotate content of all types, such as text (posts or tweets), documents, images, and videos. Social media data is complex and multimodal in nature. Multimodality refers to seeing an object from different perspectives. For example, a post, a tweet, a shared photo or video may have associated tags, geolocation features, temporal features, visual characteristics, etc. Multimodality makes it hard to manage social media data, due to its volume and structure diversity [1]. Research efforts have been made to manage and © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 181–197, 2020. https://doi.org/10.1007/978-3-030-52246-9_13

182

M. Hosny and S. Al-Malak

categorize data in social media using a variety of approaches, one of which is cluster analysis. Clustering techniques, in general, construct groups of data objects such that objects belonging to the same group are similar and objects belonging to different groups are dissimilar. Clustering data based on their semantic meaning has recently gained the attention of researchers, and efforts in optimizing the clustering quality in such domain are still ongoing. Using metadata in clustering has proven effective in many studies, yet exploiting the multimodal nature of shared information has been proven to give better semantically related clusters [1]. One problem associated with multimodality, though, is how to weigh the different features associated with the data, since some features may be more important than others during the clustering process. In addition, traditional clustering approaches usually require certain parameters, such as the number and shape of clusters to be known a priori. The latter is considered a limitation since the algorithm that requires such prior knowledge does not discover efficiently the naturally-existing groups within the dataset. In this research, our aim is to implement an optimization model that considers multimodal feature weights during clustering, where the number of clusters and feature weights are not determined a priori. Genetic algorithms (GAs) are well-known evolutionary methods that have been successfully used to handle clustering problems [2]. Thus, we introduce in this paper a novel multimodal adaptive genetic clustering algorithm that clusters information based on multiple features. Our approach adds feature weights as an extension to the clustering solution, such that feature weights are also optimized with the original clustering solution. Optimizing feature weights aims to achieve a clustering in which the most important features, i.e., those that are more relevant to the clustering solution, are targeted by adaptively adjusting their weights during the evolutionary process. The rest of this paper is organized as follows: Sect. 2 overviews some related work to our study. Section 3 formally defines the problem. Section 4 presents the details of the proposed method. Sections 5 and 6 introduce the experimental setup and the architecture design and implementation environment of our system, respectively. Section 7 details the results of the experiments and discusses some challenges and limitations of this work. Finally, Sect. 8 concludes this paper.

2 Related Work Clustering is considered as an unsupervised learning approach, because it tries to identify natural occurring groups of objects without having prespecified labels. Examinable features of data clusters include cluster compactness (the closeness of data objects belonging to the same cluster to each other), and external separation (how far separated clusters are from one another) [3]. Clustering has been widely used in a variety of fields, such as statistics, pattern recognition machine learning, image processing, data compression and data mining. In addition, recently, there has been an emergent need for robust and efficient systems that can manage the exploding volume of the data available on the WWW. Thus, clustering has been applied in many applications of Web indexing and data retrieval as well as recommender systems [1].

An Adaptive Genetic Algorithm Approach for Optimizing Feature Weights

183

As previously mentioned, another issue that has emerged with the multimedia available on the Web is how to handle multimodality in clustering. A common approach in this context is to consider each modality separately. This approach, though, does not consider the interaction between modalities. For example, a dataset that is clustered based on temporal modality only may not be able to detect clusters of data that are related in both spatial and temporal modalities, which could be interpreted as a relation to some event. Multimodal clustering has thus been adopted to obtain more meaningful clustering [1]. Clustering data has gradually progressed from one-way to multi-way, proving the effectiveness of exploiting multimodality. One-way clustering is based on the traditional bag of keywords, which uses the text associated with the document or within it. On the other hand, two-way clustering groups together documents based on keywords and then groups keywords based on the common documents they appear in at the same time. The latter approach proved to be effective in documents that are high in dimensionality [4]. The works in [5–7] exploited spatial and temporal modalities in addition to other modalities in order to identify events in social media. The work in [8] focused on the fusion of text features (tags, text descriptions, etc.) and spatial knowledge to provide a better description of data and add extra semantics. The approach is claimed to be effective in many realistic applications, such as content classification, clustering, and tag recommendation. Images are one of the most common types of contributed content on the Web, due to the ease of capturing it on devices and its richness in content. Online public photo-sharing websites are widespread and billions of images are shared through them. Managing such large collections of images is challenging, and has resulted in many attempts of archiving and clustering algorithms that automate the process. Unlike text documents, images can be difficult to automatically make sense of, due to their entire dependence on metadata, which in many cases may not be fully available or consistent. Clues of the content of images may be extracted from tags or associated captions, and possibly some visual features. The research in [9] and [10] depends on tags and visual features in enhancing the experience of browsing images in social media. Search results can be better enhanced through multimodal clustering, which would also depend on tags and visual features of images [11]. The work in [12] added an extra probabilistic modality, which is user preference, in order to deliver a personalized experience in image retrieval. Visual features and tags can also be used to enhance computer vision and identify salient objects inside images [13]. Analyzing images in social media with reliance on tags only lacks precision, since tags are user generated, and hence error prone. In addition, this reliance on tags motivates users to overwhelm data with relevant tags (tag synonymy), irrelevant tags, and invalid tags, which would lead to inaccurate retrieval. In [14], the authors proposed a method that aims to generate additional semantics after mining images in social media. With other modalities, these extracted semantics are employed in an unsupervised learning model. The works in [14] and [15] also proposed approaches of generating additional semantics based on multimodal (textual, visual, and spatial) analysis, however their approach uses visual notes provided in Flickr as part of the visual features.

184

M. Hosny and S. Al-Malak

The technique proposed in our work is an evolutionary algorithm that is inspired from natural biological evolution. Evolutionary algorithms are meta-heuristic methods that are considered to be efficient in solving complex problems, where they usually provide nearoptimal solutions. The clustering problem is considered an NP-hard grouping problem [16], which justifies the use of meta-heuristic algorithms to solve it. Many evolutionary algorithms have been used to solve clustering problems in the literature. Some of these techniques can be found in [6, 17–19]. For more information about evolutionary clustering techniques, the reader is referred to the survey in [2]. Besides evolutionary approaches that considered classical clustering methods without adaptation of feature weights, few studies considered the use of evolutionary algorithms for adapting feature weights while clustering. For example, the work in [20], which in turn adopts some features from the work in [21], presents a co-evolutionary framework to optimize feature weights in multi-dimensional data clustering. The idea is to have two populations evolving simultaneously, one for the clustering solution and another for the weights of the features. The approach was tested on datasets obtained from UCI machine learning repository, where the results indicate the superiority of the approach compared with another version of the algorithm that does not adapt feature weights, and it also significantly outperforms classical K-means clustering. Finally, Ant Colony Optimization (ACO) [22] is another popular technique that is inspired from the behavior of ants when finding the shortest path from their colony to food sources. The works in [1] and [23] adopt ACO in clustering a set of social media images. ACO is used to optimize feature weights for applications that handle large datasets and high-dimensional.

3 Problem Statement As defined in [7] and [20] we assume having a dataset consisting of N objects X = {X1 , X2 , ..., XN }. Each object Xi is described by a set of P features, Xi = (xi1 , xi2 ..., xiP ), where xij is the feature value of the ith data object (Xi ) in terms of the jth feature. The objective is to partition the N data objects into K non-overlapping clusters, where K is not previously defined. In addition, we need to find a set W = {w1 , w2 , ..., wP } of p P feature weights, such that wj is in the range [0, 1], and j=1 wj = 1 (i.e., the sum of all feature weights = 1). The partition should maximize a certain objective function to be described in Sect. 4.5.

4 Methodology: A Multimodal Adaptive Genetic Clustering (MAGC) In this section, we introduce the details of our proposed Multimodal Adaptive Genetic Clustering (MAGC), where the number of clusters and feature weights are not previously determined. Rather, they are optimized together with the clustering solution (i.e., the partitioning of objects into clusters). Our approach is explained under four sections: Genetic algorithm overview, the solution representation, the objective function, and the genetic operators.

An Adaptive Genetic Algorithm Approach for Optimizing Feature Weights

185

4.1 Genetic Algorithm (GA) GA is a popular optimization technique that is inspired from natural genetics [2]. It constructs a collection of solutions, through selection and combination, in search for a near-optimal solution. An individual solution is called a chromosome, and a set of chromosomes is a population. As an initial step, a population is created randomly, and then a degree of goodness is assigned to each individual chromosome based on a fitness function. A selection of the fittest individuals is then undertaken for further operations that are genetically inspired, such as crossover and mutation, in order to produce a new population. The latter operations are applied for multiple iterations until a specific number of generations is reached or a termination condition is satisfied. The solution to the problem is the fittest one of the last population [24]. 4.2 Solution Representation Each genetic solution (chromosome) in our MAGC is divided into two parts, one for the clustering solution, and the second is for the feature weights solution, as follows: Clusters. To represent the clusters, we use a label-based representation that represents the solution with integer encoding. In this representation, clusters’ centroids are actual data objects (a.k.a. medoids) that represent the cluster [21]. Assuming that we have N data objects that we need to cluster, we generate a number of solutions for the initial population. For each solution, k medoids are randomly selected from the N data objects. The value k (the number of clusters) is randomly chosen within a range of [kmin , kmax ], 

where kmin ≥ 2 and kmax ≈ N2 as recommended in [25]. In a chromosome of length N + 1, the first gene value at index 0 contains the number of clusters (k) that the partition contains. Gene values of medoids are −1, and gene values of other data objects are the indices of their nearest medoid. The choice of −1 gene value is justified by the fact that no data object has a −1 index, and hence a medoid is easily distinguished from other objects. A representation of a solution is illustrated in Fig. 1, where the clustering part is the non-shaded part of the chromosome. Feature Weights. Feature weights are added as an extension at the end of the chromosome, where its length is equal to the number of features considered in the clustering, and weights are real numbers falling in the range [0, 1]. At the beginning, weights are chosen randomly, such that, in a single chromosome, they should sum up to 1. For example, Fig. 1 shows a chromosome that represents a partition of five data objects into two clusters using three feature weights. In this figure, the number of clusters is stored in the first gene with index 0, objects 3 and 4 are the chosen medoids, objects 1 and 5 belong to the same cluster as the medoid with index 3, and object 2 belongs to the same cluster as the medoid in index 4. The shaded part of the chromosome represents the feature weights for the three features.

186

M. Hosny and S. Al-Malak

0

1

2

3

4

5

2

3

4

-1

-1

3

0.41

0.23

0.36

Fig. 1. Solution representation

4.3 Genetic Operators Crossover. For the crossover operator, parents transfer some properties to the children. We propose a Join and Split (J&S) crossover [21]. In this crossover, parents randomly pass down the number of medoids they carry to the children. For example, if two parents contain partitions of k1 and k2 clusters, then one child will randomly have k1 clusters, and the other will have k2 clusters. The (k1 + k2 ) cluster medoids are then randomly distributed between the two children, taking into consideration that duplicate medoids are not allowed in the same child. In other words, in case the same medoid appears in both parents, it should appear in both children, which makes the J&S crossover heritable (i.e., respecting the decision made by both parents), which is a desirable feature in crossover [2]. After that, both children will be amended by having their objects reallocated to the nearest clusters’ medoids.

P1

P2

0

1

2

3

4

5

2

3

4

-1

-1

3

0

1

2

3

4

5

3

-1

4

5

-1

-1

0.41

0.23

0.36

0.15

0.52

0.33

0.15

0.52

0.33

0.41

0.23

0.36

Crossover

C1

C2

0

1

2

3

4

5

3

-1

4

-1

-1

3

0

1

2

3

4

5

2

5

4

5

-1

-1

Fig. 2. Crossover example

Regarding the weights part of the chromosome, this part will also be passed randomly to the children. In other words, one child will randomly inherit the weights of parent 1,

An Adaptive Genetic Algorithm Approach for Optimizing Feature Weights

187

while the other child will randomly inherit the weights of parent 2. Figure 2 illustrates a crossover example. Mutation. With a small probability, mutation is applied in a cluster-oriented fashion, where we add or remove a cluster to the solution, updating the number of clusters in the chromosome correspondingly. When a cluster is added, all data objects must be redistributed, whereas in the case of removing a cluster, we only need to redistribute data objects belonging to the removed cluster to other clusters. For the feature weights part, we apply mutation through subtracting a small value ε from one weight and adding it to another weight [21]. Mutation will take place right after the crossover step in the reproduction process. Figure 3 illustrates weights mutation example. 0

1

2

3

4

5

2

5

4

5

-1

-1

0.41

0.23

0.36

0.42

0.23

0.35

Weights Mutation

0

1

2

3

4

5

2

5

4

5

-1

-1

Fig. 3. Weights mutation example with ε = 0.01

4.4 Selection Strategy The selection strategy is a scheme followed to select individuals for reproduction. A good selection scheme is one that gives all individuals a chance of reproduction, yet fitter individuals are more likely to be selected [26]. In the literature, various methods have been used in assigning this probability; one of the most commonly used is the Roulette Wheel Selection [27]. As its name implies, each individual is given a slice of the roulette wheel, where fitter individuals have bigger slices, hence a larger chance of selection. According to [26], this stochastic selection scheme and many others have almost the same performance and are not reported to have an effect on the overall result. Our algorithm uses the roulette wheel selection scheme to select individuals for reproduction. 4.5 Objective Function Recalling the problem definition in Sect. 3, the objective function of our algorithm depends on the quality of the clustering solution. In other words, it is our cluster validity measure. Our chosen cluster validity measure is the Davies-Bouldin (DB) index

188

M. Hosny and S. Al-Malak

[28], since it is non-monotonically increasing with k, and hence it is suitable for a clustering algorithm with a variable number of clusters. It has also been proven to be computationally efficient [29]. The DB index is described in Eq. (1). DB =

k SCi + SCj 1 maxi=j k Mij

(1)

i=1

Where k is the number of clusters and Ci and Cj are clusters such that i = j. The scatter SCi is the average distance between objects belonging to Ci and their cluster medoid Ri , while Mij is the distance between the two cluster medoids Ri and Rj . The scatter is obtained as shown in Eq. (2): SCi =

1  d (X , Ri ) |Ci |

(2)

X ∈Ci

The smaller the DB index, the better the clustering solution. In other words, individuals with smaller DB value have better fitness. Thus our fitness function f (z) is as shown in Eq. (3), where DB is the clustering measure given by Eq. (1). minimiz f (z) = DB

(3)

To measure the distance between two objects Xi , Xj , we use the Euclidean distance (Eq. (4)), which measures the distance between each xl feature, where l = 1, 2, . . . p and p is the number of features.  p    2 d Xi , Xj = xil − xjl (4) l=1

5 Experimental Setup The computational experimentation aims to apply our MAGC algorithm and compare its performance to a non-adaptive genetic clustering version, where all features are assumed to have equal weights. In the following subsections we explain the details of the experimental setup and the implementation environment used to test our approach. 5.1 Data Set Our algorithm was tested on the CoPhIR test collection, which is the largest Flickr metadata collection that is available for studies on scalable similarity search techniques [30]. It consists of 106 million images processed and extracted from Flickr, with metadata structured as XML files containing a number of standard MPEG-7 image features alongside the Flickr image entries (e.g. title, location, tags, comments, etc.).

An Adaptive Genetic Algorithm Approach for Optimizing Feature Weights

189

5.2 Feature Selection For the feature selection phase, we followed a manual approach to select a subset of the images’ features in CoPhIR dataset. The selected features were categorized as follows: 1) the upload date, 2) photograph owner’s location, 3) photograph title, 4) photograph tags, 5) photograph’s location, and 6) photograph’s visual descriptors. These features will be referred to thereafter as feature 1 to feature 6, respectively. 5.3 XML Parsing In order to extract the information related to images that are embedded in XML files, a tool can be used to ease the process of reading the content in-between XML tags. For this purpose, we have used Java architecture for XML binding (JAXB) [31] tool to parse XMLs. This parser follows a document object model (DOM) approach which creates a tree of objects that represents the content and hierarchy of data in the document, which will then be stored in memory. The XML schema of CoPhIR dataset is used with JAXB to create image objects and the objects within it, in the correct object hierarchy. 5.4 Data Pre-processing The CoPhIR dataset contains parts of text that are not preprocessed, hence may introduce noise that can affect the clustering process. We have used Apache Lucene API [32] tool to preprocess text fields such as image title and tags. This API is a Java-based fullfeatured text search engine library that provides various capabilities to process text. A part of the API is the Analysis package, which provides features to convert text into searchable (or in our case, comparable) tokens. The first step of text pre-processing is removing stop-words, followed by removing non-word characters, and finally stemming the resulting text. After text pre-processing, a lookup table [33] is generated. The lookup table is a symmetric matrix data structure transferred to a text file to store the distances between each image and all other images in the data collection. This is very important to reduce the time of recalculating the distances between images repeatedly. The lookup table will be fetched to memory whenever it is needed during runtime. Each image will contain a distance entry per feature, between it and all other images’ features.

6 Architecture Design and Implementation In this section, we will explain the design decisions and implementation of our MAGC. First, we will explain the components and their interactions, then introduce an important component of the implementation of the algorithm, the Watchmaker Framework. 6.1 Software Components Our algorithm is composed of many components that interact with one another in order to produce the final result. Figure 4 illustrates the software architecture of our MAGC

190

M. Hosny and S. Al-Malak

and its components’ interactions. The XMLUnmarshaller is a component built based on JAXB Unmarshaller, which is used to transform each XML file to a Java object. First, this component is used to produce a list of objects that represents the dataset. The objects are then updated with the preprocessed text that is produced by the TextPreprocessor. After this step, the list is ready to be used as an input in the Lookup Table Generator. The Lookup Table Generator outputs the results to the Lookup Table store. The Lookup Table Reader fetches the distance values stored in the lookup table to the working memory before the start of the algorithm. The next component is based on the Watchmaker Framework, which will be explained further in the upcoming subsection. 6.2 The Watchmacker Framework The Watchmaker framework [34] is an efficient, extensible, object-oriented framework for developing evolutionary/genetic algorithms in Java. It is an open source software that provides a variety of ready-to-use operators, such as crossover and mutation implementations for common data types.

Fig. 4. MAGC software architecture

The core component of the framework is the EvolutionEngine, which performs the evolution of the populations. The evolution takes multiple objects as inputs. These objects

An Adaptive Genetic Algorithm Approach for Optimizing Feature Weights

191

are implementations of the already-defined interfaces, which are the CandidateFactory, FitnessEvaluator, EvolutionaryOperator, and the EvolutionObserver. First, an implementation of the CandidateFactory, which is the ChromosomeFactory, is created to randomly initiate the first population of candidates. ClusterEvaluator is our implementation of the FitnessEvaluator interface that utilizes the Davies-Bouldin (DB) index. The evolution engine accepts a pipeline of evolutionary operators, which in our case are the JoinAndSplitCrossover and the ClusterMutation based on the EvolutionaryOperator interface. Finally, the EvolutionLogger is our implementation of EvolutionObserver interface, whose core functionality is to output the best candidate of each generation, the value of its fitness, and the average fitness of the complete generation.

7 Results and Evaluation To illustrate the performance of our MAGC, we conducted three experiments. First, we show the results of multiple runs of the algorithm on five different data collection sizes. Second, we compare the results of the algorithm with a non-adaptive genetic clustering algorithm on larger sizes collections of data sets. Third, we demonstrate some visuallyevaluated results of a clustering solution found by MAGC. Finally, we highlight some of the challenges and limitations of this work. 7.1 Multi-run Experiment In this experiment we have run our MAGC multiple times on a number of nonoverlapping collections of the dataset, whose sizes range from 100 to 500 images. The experiments were conducted on a machine with Intel Core processor with an i5 CPU that has a clock speed of 1.70 GHz, and 4 GB of RAM. We have set the population size to 100, and the crossover and mutation probabilities to pc = 0.8 and pm = 0.2, respectively. 

The number of clusters in all individuals ranges between kmin ≈ N8 and kmax ≈ N2 . The latter parameter values were chosen by trial (more explanation in Sect. 7.4). The termination condition halts the evolution if no improvement in fitness is observed within 10 consecutive generations. Table 1 lists the best and average fitness values calculated in terms of the DB index in 10 runs for each collection size. Recall that the lower the DB index the better the result (i.e., the fitness) obtained. From the results in Table 1, we can observe that the fitness value of the best individual slightly decreases as the dataset size increases. This means that the clustering quality becomes slightly worse as the collection size increases. This is expected, since the increase in the number of images in the collection makes the clustering more difficult. On the other hand, the average number of generations remains relatively stable irrespective of the dataset size (ranging from approximately 27 to 40). With respect to processing time, the algorithm is quite fast with less than 7.5 s needed to process the largest dataset of 500 images. Throughout the experiment, the evolvement of weights was recorded in order to detect any possible patterns in terms of the assigned feature weights. The results,

192

M. Hosny and S. Al-Malak Table 1. MAGC multi-run experiment results

Dataset size

Best fitness (DB index)

Average fitness (DB index)

Average no. generations

Processing time (sec.)

100

1.07

1.57

31.70

2.62

200

1.39

1.86

38.60

3.41

300

1.47

1.76

40.50

4.73

400

1.54

1.86

27.00

5.04

500

1.48

1.78

35.20

7.49

however, were non-conclusive, since it was observed that each run produced different weights. This means that weights were not inclined towards certain features, i.e., overall no particular feature(s) can be considered as more important in the clustering process as far as the algorithm is concerned. Nevertheless, observing the fittest individuals, it was noticed that weights for the location (features 2 and 5) and the title (feature 3) were relatively higher than the other weight values. The relation between 10 of the fittest solutions found in multiple runs (sorted ascendingly) and their feature weights is illustrated in Fig. 5.

0.6

Weight

0.4

1.472045

1.454487

1.452239

1.445013

1.430355

1.426708

1.401802

1.356159

1.287466

0

1.253294

0.2

DB index value Feature 1 Feature 2

Feature 3

Feature 4

Feature 6

Feature 5

Fig. 5. MAGC weight values of multiple runs

An Adaptive Genetic Algorithm Approach for Optimizing Feature Weights

193

7.2 Comparing with Non-adaptive Genetic Clustering In this experiment, we have run the same MAGC on larger dataset sizes that range from 100 to 1500 images in each dataset. The algorithm was run 10 times on each dataset. In addition, another version of the algorithm (non-adaptive genetic clustering algorithm (GCA)) that excludes the adaptation of weights, i.e., all features have equal weights, was run the same number of times on the same datasets. The parameters of the population size, crossover, and mutation rates are all identical to the former experiment. The results of this experiment are listed in Table 2. Table 2. Results of genetic clustering with and without adaptive weights Dataset size

MAGC

Non-Adaptive GCA

Best DB ind.

Avg DB ind.

Avg no. gen.

Proc time (sec.)

Best DB ind.

Avg DB ind.

Avg no. gen.

Proc time (sec.)

100

1.04

1.50

35.00

1.32

1.22

1.79

18.00

1.04

300

1.46

1.85

24.90

2.40

1.53

1.94

20.40

2.06

500

1.40

1.76

40.10

5.04

1.56

1.97

21.40

3.70

700

1.50

1.77

47.80

13.78

1.61

1.95

20.50

9.35

1000

1.51

1.72

62.40

27.07

1.64

1.91

24.00

14.96

1500

1.44

1.64

73.60

56.65

1.63

1.92

29.40

25.65

Avg.

1.39

1.71

47.30

17.71

1.53

1.91

22.30

9.46

From the results in Table 2, we can observe that the non-adaptive GCA produced solutions that are worse in fitness (i.e., have higher DB index) than the MAGC. Observing the overall average in the last row of the table, we can see that our MAGC outperforms the non-adaptive GCA with approximately 10%, in terms of the best solution fitness, and approximately 12% in terms of the average solution fitness. Moreover, as noticed by the smaller number of generations, the non-adaptive version converged faster than the adaptive version, which probably indicates that it has been stuck in a local optimum and cannot improve further. 7.3 Visual Evaluation To visualize the results of our genetic clustering algorithm, we have chosen one of the solutions that assigned a relatively high weight to visual features (feature 6), and collected some of the images that belonged to the same cluster. Figure 6 displays the images that were collected from three clusters. It is clear that MAGC was able to produce semantically-related image clusters since images belonging to one cluster had visual similarities. For example, cluster B contains two images of the same child, as well as similar color ranges to the other images.

194

M. Hosny and S. Al-Malak

Fig. 6. Sample images of three clusters A, B, and C

7.4 Challenges and Limitations The following are some of the challenges and the limitations of our work: First, in the assessment of the clustering solution, we have observed that sometimes there are solutions with empty clusters for the fittest individuals. This required adding a code fragment to detect and remove empty clusters. However, this also had an effect on the minimum number of clusters which was previously defined as (kmin = 2), in case one of the two clusters happened to be empty. Thus, we increased the minimum threshold of the number of clusters to make  it a ratio of the number of objects N (i.e., it was changed

from kmin = 2 to kmin ≈ N8 ), as explained in Sect. 7.1. Second, our approach slightly altered the values of weights as part of mutation only. Mutation is usually given a very low probability, which might make the change of weights insignificant in some cases. More experiments are needed in terms of mutation probability and the amount of change allowed for the weights part of the chromosome, or handling the weights part differently in the crossover operation. This may have a better effect in terms of determining the features that are more important in the clustering solution, if their weights are evolved more significantly. Finally, one limitation of our work is handling the missing values in the dataset. For example, the best solutions found emphasized the weight of the location feature, even though it is not available for all data objects. In addition, our work needs more analysis in terms of interpreting solutions and visual results assessment. The latter is a tedious and time-consuming process when done manually, due to the large size of the dataset. Moreover, some image URLs are inaccessible, which makes it a challenge to visually evaluate these images.

An Adaptive Genetic Algorithm Approach for Optimizing Feature Weights

195

8 Summary and Future Work Research efforts have been made to manage and categorize data in social media using a variety of approaches, one of which is cluster analysis. The focus on multimedia is due to two main reasons: it is one of the most commonly shared types in social media, and it has many modalities that can contribute in learning the semantic meanings contained within it. As a result, research attempts head towards mining information in social media, with multiple modalities employed in an unsupervised learning model. Evolutionary techniques can be used to optimize the fusion of modalities, aiming to enhance the clustering performance. In this work, we have implemented a Multimodal Adaptive Genetic Clustering (MAGC) algorithm that exploited multimodality in clustering in order to produce semantically-related clusters. Our algorithm optimizes both the number of clusters and feature weights in order to accomplish this objective. We have extensively tested the algorithm on the CoPhIR dataset. The test results of our adaptive algorithm showed that it performs better compared to a non-adaptive genetic clustering algorithm and produces semantically related clusters. Our future work will emphasize on the adaptation of feature weights by involving them more in the evolutionary process through crossover and mutation. This may improve the cohesiveness of the produced clusters. We will also investigate the scalability of the algorithm by testing it on larger datasets and using other sources of data.

References 1. Nikolopoulos, S., Giannakidou, E., Kompatsiaris, I.: Combining multi-modal features for social media analysis. In: Hoi, S.C.H., Luo, J., Boll, S., Xu, D., Jin, R., King, I. (eds.) Social Media Modeling, pp. 71–96. Springer, London (2011). 2. Hruschka, E., Campello, R., Freitas, A.A.: A survey of evolutionary algorithms for clustering. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 39, 133–155 (2009) 3. Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79, 191–215 (1997) 4. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2003, p. 89. ACM Press, New York (2003) 5. Becker, H., Naaman, M., Gravano, L.: Event identification in social media. In: 12th International Workshop on the Web and Databases (WebDB), Rhode Island, USA (2009) 6. Sheng, W., Liu, X.: A hybrid algorithm for k-medoid clustering of large data sets. In: Evolutionary Computation. CEC2004. IEEE (2004) 7. Liu, Y., Zheng, F., Cai, K., Jiang, B.: Cross-media retrieval method based on temporalspatial clustering and multimodal fusion. In: 2009 Fourth International Conference on Internet Computing for Science and Engineering, pp. 78–84. IEEE (2009) 8. Sizov, S.: GeoFolk: latent spatial semantics in web 2.0 social media. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining - WSDM 2010, p. 281. ACM Press, New York (2010) 9. Olivares, X., Ciaramita, M., van Zwol, R.: Boosting image retrieval through aggregating search results based on visual annotations. In: Proceeding of the 16th ACM international conference on Multimedia - MM 2008. p. 189. ACM Press, New York (2008)

196

M. Hosny and S. Al-Malak

10. Aurnhammer, M., Hanappe, P., Steels, L.: Augmenting navigation for collaborative tagging with emergent semantics. In: International Semantic Web Conference (ISWC), pp. 58–71. Springer, Heidelberg (2006) 11. Wu, F., Pai, H.-T., Yan, Y.-F., Chuang, J.: Clustering results of image searches by annotations and visual features. Telemat. Inform. 31, 477–491 (2014) 12. Zhuang, Y., Chiu, D.K.W., Jiang, N., Jiang, G., Wu, Z.: Personalized clustering for social image search results based on integration of multiple features. In: Zhou, S., Zhang, S., Karypis, G. (eds.) Advanced Data Mining and Applications, pp. 78–90. Springer, Heidelberg (2012) 13. Chatzilari, E., Nikolopoulos, S., Patras, I.: Enhancing computer vision using the collective intelligence of social media. In: New Directions in Web Data Management 1, pp. 235–271. Springer, Heidelberg (2011) 14. Giannakidou, E., Kompatsiaris, I.: SEMSOC: semantic, social and content-based clustering in multimedia collaborative tagging systems. In: 2008 IEEE International Conference on Semantic Computing (2008) 15. Lienhart, R., Romberg, S., Hörster, E.: Multilayer pLSA for multimodal image retrieval. In: Proceedings of the ACM International Conference on Image and Video Retrieval, p. 9 (2009) 16. Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, Hoboken (1998) 17. Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.: FGKA: a fast genetic k-means clustering algorithm. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 622–623 (2004) 18. Ma, P., Chan, K., Yao, X., Chiu, D.K.: An evolutionary clustering algorithm for gene expression microarray data analysis. IEEE Trans. Evol. Comput. 10, 296–314 (2006) 19. Alhenak, L., Hosny, M.: Genetic-frog-leaping algorithm for text document clustering. Comput. Mater. Contin. 61, 1045–1074 (2019) 20. Hosny, M.I., Hinti, L.A., Al-Malak, S.: A co-evolutionary framework for adaptive multidimensional data clustering. Intell. Data Anal. 22, 77–101 (2018) 21. Al-malak, S., Hosny, M.: A multimodal adaptive genetic clustering algorithm. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2016), pp. 1453–1454, Denver, Colorado. ACM (2016) 22. Dorigo, M.: Optimization, learning and natural algorithms, Ph.D. thesis. Politecnico di Milano, Italy (1992) 23. Piatrik, T., Izquierdo, E.: Subspace clustering of images using ant colony optimisation. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 229–232. IEEE (2009) 24. Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addion Wesley, Boston (1989) 25. Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate analysis. Analysis 97, 1–4 (1979) 26. Goldberg, D.E., Deb, K.: A comparative analysis of selection schemes used in genetic algorithms. Found. Genet. Algorithms 1, 69–93 (1991) 27. De Jong, K.A.: An Analysis of the Behavior of a Class of Genetic Adaptive Systems, Ph.D. thesis. University of Michigan, USA (1975) 28. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1, 224–227 (1979) 29. Petrovic, S.: A comparison between the silhouette index and the davies-bouldin index in labelling ids clusters. In: Proceedings of the 11th Nordic Workshop of Secure IT Systems, pp. 53–64 (2006) 30. Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: CoPhIR: a test collection for content-based image retrieval. CoRR abs/0905.4627 (2009) 31. JAXB Reference Implementation — Project Kenai 32. Apache Lucene 5.3.1 Documentation

An Adaptive Genetic Algorithm Approach for Optimizing Feature Weights

197

33. Lin, H., Yang, F., Kao, Y.: An efficient GA-based clustering technique. Tamkang J. Sci. 8, 113–122 (2005) 34. The Watchmaker Framework for Evolutionary Computation (evolutionary/genetic algorithms for Java)

Extending CNN Classification Capabilities Using a Novel Feature to Image Transformation (FIT) Algorithm Ammar S. Salman1(B) , Odai S. Salman2 , and Garrett E. Katz1 1 Syracuse University, Syracuse, NY 13244, USA

{assalman,gkatz01}@syr.edu 2 Carleton University, Ottawa, ON K1S-5B6, Canada

[email protected]

Abstract. In this work, we developed a novel approach with two main components to process raw time-series and other data forms as images. This includes a feature extraction component that returns 18 Frequency and Amplitude based Series Timed (FAST18) features for each raw input signal. The second component is the Feature to Image Transformation (FIT) algorithm which generates uniquely coded image representations of any numerical feature sets to be fed to Convolutional Neural Networks (CNNs). The study used two datasets: 1) behavioral biometrics dataset in the form of time-series signals and 2) EEG eye-tracker dataset in the form of numerical features. In earlier work, we used FAST18 to extract features from the first dataset. Different classifiers were used and Deep Neural Network (DNN) was the best. In this work, we used FIT on the same features and invoked CNN which scored 96% accuracy surpassing the best DNN results. For the second dataset, the FIT with CNN significantly outperformed DNN scoring ~90% compared to ~60%. An ablation study was performed to test noise effects on classification and the results show high tolerance to large noise. Possible extensions include time-series classification, medical signals, and physics experiments where classification is complex and critical. Keywords: Fingerprint · Biometrics · Spoofing · Feature to Image Transformation FIT · Anti-spoofing protection · CNN stochastic gradient descend optimizer · Ablation · Frequency and Amplitude based Series Timed (FAST18) · CFS

1 Introduction A novel approach to transform raw signals into images promises to add a major extension to the CNN capabilities [1]. We have transformed signals into images by first extracting signals features using FAST18 algorithm [2, 3] or using ready features, and then using our novel FIT algorithm to transform them into coded images. No previous work has used these specialized combinations of FAST18 or FIT algorithms. We made tests and compared DNN and other classifiers with the CNN after these added capabilities. © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 198–213, 2020. https://doi.org/10.1007/978-3-030-52246-9_14

Extending CNN Classification Capabilities

199

The first experiment reported separately involves a spoofing detection system using behavioral biometrics [3], which collects data from several sensors prior to any unlocking attempt. The sensors detect a variation between intentional and forced application of the fingerprint. The work used Naïve-Bayes [4], Support Vector Machine (SVM) [5], and Deep Neural Networks (DNN) [1] classifiers, and the DNN was the most successful. SVMs came close when Correlation-based Feature Selection (CFS) [6] was used. In this work we have used CNN for classification of the same data, and another set featuring eye-print data [7]. The biometrics raw data we have is made of timestamped signals, and the eyeprint set has some features which are time dependent. Features were extracted using FAST18 algorithm developed by one of the authors, as reported in the previous work [3]. The novel method we have developed and tested in this work can effectively transform signals or features into coded images that are optimized for CNN classification. The method is different from other works, in terms of transformations, feature extraction, and optimization. A success in doing a robust transformation can open the door for extending the CNN capabilities to become a general classifier not restricted to pioneering in images. Section 2 covers related work, and Sect. 3 presents the methodology. Sections 4 and 5 provide a description of datasets, and extensive tests including accuracy, ablation and optimizations. We outline our conclusions and future work in Sect. 6.

2 Related Work There are three main related work types. Many works transform timed series signals into images through amplitude correlations, and then apply CNN [8, 9]. The second type splits the signal into amplitude and frequency based content, then extracts features of each part in a format suitable for CNN [10]. The third type projects the signal into 2D format, and extracts features from the images by mapping the pixels and records the information as features for a non-CNN learner (SVM) [11]. Finally, there are review studies of the various Time-Series Classification (TSC) works, and their observations about the University of California Riverside (UCR) archive. All approaches do not address our method strategy of creating coded images with maximum contrast between classes, hence the work is truly novel in using the FIT and the Features Extraction algorithms for CNN use. Hatami et al. [8] used Recurrence Plots to transform time-series into 2D texture images and then invoked CNN classifier. They reported competitive results on the UCR archive compared to deep architectures, and state-of-the art TSC algorithms. Wang and Oates [9] proposed a novel framework to encode time series data as different types of images, namely, Gramian Angular Fields (GAF) and Markov Transition Fields (MTF). Using a polar coordinate system, GAF images are represented as a matrix with elements as the trigonometric sum of different time intervals. MTF images represent the first order Markov transition probability along one dimension and temporal dependency along the other. They used Tiled CNNs on 12 standard datasets to learn high-level features from individual GAF, MTF, and combined GAF-MTF images. They reported results competitive with five state-of-the-art approaches.

200

A. S. Salman et al.

Cui et al. [10] proposed a Multi-Scale CNN (MCNN) method, which incorporates feature extraction and classification in a single framework with a multi-branch layer and learnable convolutional layers. MCNN extracts features at different scales and frequencies. They have made an empirical evaluation with various methods and benchmark datasets and claim that MCNN showed superior accuracy. Comparing ordinary CNN with their method they claim their system fared better, but they did not specify what CNN or what feature extraction method used for the ordinary CNN, hence their conclusions are not solidly tested. The approach of Azad et al. [11] transforms 1-D signals into 2-D grayscale image and extracts features by taking the pixel values to calculate and present energy as gray image. They normalize by measuring the signals within the time intervals, and use Empirical Mode Decomposition to remove low frequency noise. They used Segmentation-based Fractal Texture Investigation (SFTA) algorithm to create the feature vectors, and SVM for classification. They claim their method preserves more information from 1D signal compared to losses of correlations using other methods. Their accuracy is 88.57% but they do not explain mapping the signal that can preserve more correlations. Dau et al. [12] review various TSC user works and observations about the UCR archive sets. Chen et al. [13] contains standard datasets with exact form or time duration for any class signal. The sets are rigidly constructed, and divided into train and test sets that cannot be changed. They do not represent real life data but are exact with zero noise for use with benchmark testing. The high achievements for some signals do not reflect a reliable measure of the method strength or robustness in real applications, but can produce fair and reasonable ranking since they use the exact same standard. Hu et al. [14] challenges the assumptions on TSR that the beginning and ending points of the signal or pattern can be correctly identified, during training or testing which can lead to over optimism about the algorithms’ performances. They show, that correctly extracting individual gait cycles, heartbeats, gestures, behaviors, is more difficult than classification. They propose a solution by introducing an alignment-free TSC that requires only weakly annotated data. They claim that extending the machine learning data editing to streaming/continuous data enabled building robust, fast and accurate classifiers. By testing real-world problems, they claim their framework is faster and more accurate than other solutions. Their work is related to our method in terms of testing on real world data, in the sense that it takes care of their concerns implicitly through the contrasted transformed images.

3 Methodology 3.1 System Layout Figure 1 shows the main configuration of the experiment scope. The first stage is data collection; we have used the same data as provided in references [3] and [7]. Features extraction can be accomplished by many means, and for the biometric data the FAST18 algorithm was used [2]. The FIT Algorithm and its trainable parameters are described in Sect. 3.2.

Extending CNN Classification Capabilities

201

Fig. 1. General system layout

The CNN receives the transformed data, then learning and optimization are incorporated in the interface between the FIT and the CNN. In the following we provide some details of the various pipeline steps. 3.2 Features to Images Transformation (FIT) The developed FIT Algorithm has parameters that can be tuned for each dataset to maximize the features’ contrast in the generated images. It transforms the signal features’ values into gray-scale spatial gradients with different angles based on their amplitudes and exponents (Fig. 2). “Gradient” here refers to a gradual change of brightness within a 2D image from light to dark. The rotation hyper-parameters can be optimized based on the dataset. The constructed images are then fed to the CNN for classification. Results during the learning stage can be used to calibrate the parameters of the FIT algorithm. CNN’s Stochastic Gradient Descent with Momentum (SGDM) optimizer is used as part of the CNN. In addition, we studied the accuracy dependence on the rotation hyper-parameters to get the best settings. Mapping a machine problem for efficient use with CNNs requires generation of an image that effectively represents features as a valid input. For such reason, a unique representation of features is highly valued. To achieve uniqueness, one must identify the parameters that help distinguish features from each other. Once each feature has its own unique representation, they are distributed on an N × N grid image. This way, the neural network can maximize learning of feature-to-class correlations. Moreover, since these features are grouped onto one grid, and given the process of CNN in passing a filter along the whole image, the filters will also learn some feature-to-feature correlations (e.g. when a filter covers portions of several features). Computing Feature Map Generation Parameters. The first step in mapping features to images is to represent the feature value X into m significant digits and an exponent. We used m = 3 which provides all needed variations. Setting m to other values could be needed for complex datasets, but m = 3 covers a sufficient range of variations for our

202

A. S. Salman et al.

purposes. We write a three-digit number X in the form = x(ddd) × 10p , (e.g. 45905 → 459 × 102 ). This representation is important to generate a unique combination of feature intensity I in the range [0–255], and rotation R in degrees, range ±p[0–360]. I and R are calculated using  255 − x%255, x > 255 I (x) = (1) x, x ≤ 255  (p + (x−255) 1000 ), x > 255 (2) R(x, p) = p, x ≤ 255 The free parameter  defines the coded image rotation. If Rmaxf is the largest absolute rotation in degrees for a given feature, we define Nf = ABS int (255R/Rmaxf )

(3)

as the multiplier of the cell border intensity. In this setting we have three defining quantized parameters with a wide range of values. I takes 0–255 values, R values are coarse multipliers of p, plus some fine sub range, and N f takes 0–255 values. The number of unique representations of micro images is 2 × range N f × range I × coarse R. In our settings the maximum is around 130,000p. The coding has a wide range of possibilities and can be extended if needed.  can be optimized to maximize classification accuracy, and that generally requires the training and testing results on the dataset.

Fig. 2. Example feature-vector representation

Generating the Features Image. Figure 2 shows how I and R are used to generate a feature micro image. It generates a rectangle that has a rotation of R degrees, intensity I in the range [0–255], and a border with a normalized intensity. Other parameters that add valuable information are the signs of the feature itself, and the rotation. The feature sign is added as a character to the top-left corner of the micro image, and the rotation negative sign applies a color inversion to the generated feature map and this doubles the number of possible representations. A feature matrix for the instance is created by concatenating the micro images.

Extending CNN Classification Capabilities

203

3.3 Designing a CNN Given that the generated input differs from one dataset to another, we considered designing a neural network architecture that is adaptive to such variations. Figure 3 shows the CNN network architecture. It consists of 5 convolutional layers, each followed by a maxpooling layer of stride of “(1, 1)” and a ReLu.

Fig. 3. A CNN architecture to classify generated images

We used a working empirical formula that was established from extensive optimizations. This is valid for datasets having fpr less than or equal to 12. For the input layer, the filter size is set to [3 × round (12/fpr)]2 (where fpr = number of features per row) and a stride of (1, 1) × round (12/fpr). As for the remaining layers, the filter size should be [2 × round (12/fpr)]2 and always having a stride of (1, 1). The optimum CNN has a constant learning rate of 0.001, mini-batch size of 64, and an SGDM optimizer. We have tested and recorded results for several mini-batch sizes of {16, 32, 64} with number of epochs {3, 5,1 0} respectively and two values for #filters/layer of {32, 64}. An epoch is the cycle of iterations needed to complete the dataset training, and an iteration is the process of feed-forward/backpropagation needed to adjust weights using a mini batch. Figure 2 shows also the feature matrix for an instance. The matrix dimension dim = √ Int ( #F + 1) where #F is the number of features. In our case #F = 126, hence dim = 12. the excess matrix elements are blackened. 2D or 3D representations provide more flexibility for correlations between the features. We decided to use 2D for simplicity. 3.4 Optimizations The parameter  is a multiplier within the rotation angle of the generated gradient. From the definition, it does not have a preferred value for the best unique representation of features, hence we do not expect it to have large influence on accuracy. In that sense our algorithm constructs a well-behaved form that preserves and adds more correlations for the features. We do not expect to need optimizations beyond the standard CNN’s SGDM [15].

204

A. S. Salman et al.

Studying Error Dependence on . This method requires selecting a number of values for  and, on each one, training a CNN to obtain the final accuracy. There is generally no function form for error vs.  available priori; there is no function indeed in our case, and hence optimization requires constructing the function. Therefore, we need to run the CNN first with adequate cross-validation for  values from (0–2π], then we plot the error vs . If  is critical for data conversion, the initial functions construction will show responses with a wide error range. If it has no significance beyond statistical fluctuations, the CNN will optimize despite the initial value. In this sense measuring the dependence adds another layer for optimization when conversion is sensitive to .

3.5 Training, Configurations, and Ablation Settings We used SGD with Momentum (SGDM). We have built and tested 6 configurations for the number of filters/layer and mini-batches. The 4-fold Cross Validation (CV) datasets are split into 25% for Test1, the other 75% for Training, and half of Test1 for Test2. For 10-fold CV the Test is 10% and the other 90% is for Training; then repeat the run 10 times covering all portions for testing. Standard errors are generally small, and their training errors asymptotically decay to zero. We have applied the same settings with random shuffling of the dataset elements for the various runs covering the configurations, or the ablation studies. The number of iterations used are well beyond saturation, and as it turned out, most near optimum configurations needed around 60 iterations, while the weak ones required over a hundred. For the ablation studies we have three types of noise on the constructed images; Gaussian; salt and pepper, and speckle. In addition, we have applied different levels of Gaussian noise on the numerical features before using the FIT.

4 Datasets 4.1 Biometrics Data Table 1 [3] shows the biometric dataset description. A total of 6 runs with nearly 100 instances each represented a reasonable range of values, and statistical samples for training and testing the classifiers. The application we developed runs on Android devices and collects the sensors data, when a fingerprint push is made on the scanner. The data is collected during the time span when the system is processing the authenticity of the fingerprint. We have the original raw signals and features as constructed by our feature extraction algorithm which is designed to provide highly contrasted features.

Extending CNN Classification Capabilities

205

Table 1 Biometric dataset: runs codes and settings used events from runs 1, 2 and 4. A total of 426 instances and 18 feature/sensor. Class = 1 intentional; class = 0 forced. Pressing force ratio F01 = F0 /F1 ; Fx = force for class x. Number of features 126. Composition for all runs 50% class = 0 and 50% class = 1. From [3]. Run code Type

F0/F1 Configuration

R1

Classification

2.5/1

R2

Reproduction of R1 2.5/1

CP1

R3

Classification

2/1

CP2: two different pushes for the two classes push changes slightly during the application for each class The push ratio is slightly less than CP1

R4

Classification

1.5/1

CP3: two different pushes for the two classes. Push changes during the application for each class. The push ratio is a little less than the R3

R01

Calibration

1/1

CP4: push and positioning are fixed for all instances but labeling 50% class = 1 and 50% class = 0

R02

Calibration

2/1

CP5: slightly different positioning and different pushes for the two classes

CP1: two different fixed pushes for the two classes to optimize contrast between the two cases for most sensors

4.2 EEG Eye State Dataset The second dataset has a large number of instances (Roesler 2019) [7], and the purpose is to classify the eye print if done correctly or not. The same procedure was applied, and achieved similar CNN accuracy results. Table 2 shows the dataset description. The data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 s. The eye state was detected via a camera during the EEG measurement, and added later manually to file after analyzing the video frames. ‘1’ indicates eye-closed, and ‘0’ eye-open. Table 2. EEG Eye State Dataset consists of ~15k instances. The number of attributes = 15 (14 for features and 1 for the class). From [7]. Property

Description

Dataset characteristics

Multivariate, Sequential, Time-Series

Attribute characteristics Integer, Real Associated tasks

Classification, two classes

Number of instances

14980

Number of features

15 matrix of dimension 4

Date donated

2013-06-10

206

A. S. Salman et al.

5 Results 5.1 Data Transformation Results Figure 4 shows an instance demonstration from the biometric dataset. A feature vector of 126 features was created by extracting and concatenating FAST18 features from seven different sensors in the form of time-series readings. The vector was transformed by the FIT algorithm into coded images matrix that was fed to a trained CNN, which classified it as not forced.

Fig. 4. Instance demonstration. The images representation and the CNN classification from biometric dataset

5.2 Six Different Hyper-parameter Configurations We ran experiments with six hyper-parameter configurations (variable filters/layer, and mini batch settings). Figure 5 shows the learning curves (classification error vs. iteration). The first two configurations are not saturated while the other four are. All six configurations used exactly the same set of data and reshuffling, reflecting the configuration and statistical contributions only. The results are generally good, and saturation is reached fairly early. These experiments were applied only to the fingerprint biometric dataset given its small size and nature of instances. Each experiment was run using 10-fold cross validation. EEG dataset was tested using a single configuration (1024 batch size and 32 filters, Sect. 5.4). 5.3 Ablation Study After training, we added noise to the data. Figure 6 shows the impact of four types of noise, with training on noiseless data only. Data-noise refers to noise applied on the numerical features before the FIT is used to generate images, while the other three were applied on the images after transformation. The degradation is about 5% for noise on the signal, while on the images it went up to 20–30%. These effects do not impose a serious impact on the main algorithm robustness, because what is critical is a noise on the signals.

Extending CNN Classification Capabilities

207

10- Fold Cross Validation, Learning Rate =0.001, Train = 90% Test =10%

Fig. 5. Classification accuracy percentage: Mean ± SD for 6 configurations, Biometric Dataset stable and the variation between them is within the statistical error. Fold means 10% of the dataset, of 426 instances.

Fig. 6. Training on noiseless data, 64/32 config. 10-Fold CV

208

A. S. Salman et al.

Figure 7 shows the results after applying the same four noise types, with training on noiseless and noisy data. The noise training helps accuracy compared to Fig. 6, with little improvement for data-noise, and highest improvements for the images noise.

Fig. 7. Training on noiseless and noisy data. 64/64 config. 10-Fold CV

On the other hand, it’s important to note the drop in standard deviation of Fig. 6 compared to Fig. 7. This indicates a significant increase in classification precision after including noisy data in the training. Therefore, we can conclude the results are consistent with applying training on noise, and while it did not dramatically affect accuracy mean on noiseless data, it certainly improved precision. Figure 8 shows the results after applying the same four noise types and testing their effects individually. Four training and testing sessions were conducted; one for each noise type. Data-noise had the least effect on accuracy since SNR was high enough, and the CNN is only trained on noisy instances. As for image noise types, the results are consistent with Fig. 7 experiment.

Fig. 8. Training on noisy data only, 64/32 config. 4-Fold CV

Extending CNN Classification Capabilities

209

We have also made a fourth test by training on a mixed set of all types and the original. The accuracy went down more than the case of training on noise. Noiseless data classification was also affected negatively suggesting the complexity of using many noise types at once in the training. We should note that the original signals are real life measurements and carry with them certain amounts of error which was more preannounced in this test. Figure 9 shows the classification error vs. noise to signal ratio, for data-noise where training is made on noiseless data only. The error increase resembles a log function and nearly saturates at around 25%, when the ratio reaches about 20%. This is impressive because the global structure of the image could be seen even if the noise ratio is 1. This demonstrates the power of vision vs. numbers.

Fig. 9. Error vs. noise to signal ratio. The noise is applied to the original signal then the test is made. Training is on signal only.

5.4 Testing EEG Eye State Dataset Figure 10 shows the results of the EEG Eye State dataset. We have selected near optimal settings, where the learning curve was stable after 60 iterations. The standard deviation is smaller as a result of larger number of instances. The end results are comparable with the biometric data, taking into account the provided features content of the dataset. Signals of the set are not available, which did not allow for seeing the impact of applying our FAST18 on the EEG for the DNN. The point is that the FIT algorithm has provided a good contrast for high CNN accuracy for both datasets. On the other hand, the features of the EEG set are not that separate and DNN suffered, while the CNN kept almost the same accuracy. The FIT algorithm provided enough contrast through the image representation. Figure 11 shows the scatter of the feature values for both sets.

210

A. S. Salman et al. 1024/32 configuration. 4-Fold CV, Train = 75% Test =25%

Fig. 10. CNN Classification accuracy percentage: Mean ± SD from EEG eye state dataset. DNN accuracy mean: 57.7% +1.8

Fig. 11. (top) Features values for the two classes of the biometric data are well separated due to good FE Algorithm. (bottom): Normalized feature values for the two classes of the EEG data are not that separate resulting in a low accuracy for DNN while no negative effect on CNN due to the FIT solution.

5.5 Testing Dependence on  We made a number of test runs as outlined below. Our initial estimation was that the error vs.  curve is almost flat. There is little to expect from changing . The FIT algorithm is designed to give a unique representation for all features due to versatility of the transformation variables independent of . We sought to confirm this empirically, and to ensure stability, we made a 4-fold cross-validation CNN run using 12 different  points covering the range (0–2π]. Figure 12 shows the error vs. . The results are consistent with the assumption, and the curve indicates accuracy is not sensitive to . The error range goes from 8% to a

Extending CNN Classification Capabilities

211

12% and the data statistical error is around 3%. The variations mainly reflect statistical errors. This leads to the conclusion that we can calibrate the behavior and select a stable minimum. Since we have to perform these calibrations before the final experiments it seems natural to plot such curve always, and select the best operation range. On the other hand, if the angular parameter is critically unstable for the operation, and the variation impact is significantly beyond the statistical range, we can always extend the calibration to operate over stable regions only. Our FIT algorithm is powerful, reliable, stable with minimum variations, and that is good.

Fig. 12. Error vs. . For the biometric CNN data. The variations are within sigma. We used  = 1 rad that shows the least error. From 4 fold CV.

6 Discussion and Conclusions In summary, we have utilized CNNs to classify signals that are hard to classify with CNNs in regular terms. The main point is to transform raw data signals into featured images. The CNN is highly capable of image classification; hence by doing the transformation we utilized the best of CNN. The FIT algorithm has some tunable parameters in the case of data dependence. The system learns to fit the best parameters values for the image transformation, and from there the CNN takes the best classification path. The ability of the CNN system to achieve comparable accuracies for two different datasets with different feature extraction methods that showed extreme impact on the DNN classification, suggests that the FIT algorithm method is stable and powerful. It means in principle we can take real life variable datasets, better but not critical to use an efficient feature extraction algorithm, then apply the FIT algorithm, and the CNN will perform well. In that sense we think testing many diverse datasets with large number of classes is needed to find the limits of the procedure. In the reported tests, the accuracy limit reflects the nature of data more than the system accuracy. The comparison shows that time and resources are fair compared to other methods, and that is an additional point for the FIT-CNN utilization. The ablation studies showed that the method is not sensitive to a wide range of noise levels and that is good. This is mainly because the image represents a global structure, and a small variation does not change the overall indicators. This is a very strong point

212

A. S. Salman et al.

for extending the process to the maximum limits. It supports our main vision that the image processing can make recognition much easier and more reliable. The main task is how to get the transformation that gets the best of CNN and SGDM. Applying different noise levels over the coded images is more complicated, but not critical. What usually matters is the noise on the signals. Still this sensitivity can be utilized to create augmented sets for small datasets without the need to generate noisy copies of the originals. One only needs to add images with little noise that does not harm the selection efficiency, and a statistical enhancement can be provided. We are working on extending the testing to other diverse datasets. It is natural to extend it to data with multiple classes. In principle, we do not anticipate this would be a problem, but we rather test the limits of the FIT algorithm uniqueness capabilities. The FIT algorithm can handle very large fluctuations, while still providing a unique representation. The maximum number of unique representations for micro images is about 1.3 × 105 p, where p is the exponent power in the 3-digit representation. The combination with other features micro images extends the limit further. Theoretically the limit is not exhaustible, and we expect the practical limit to remain high with the ability to manage feature classes in the thousands, with stable capabilities, through the use of 2D capabilities of the CNN. We expect applications with good success to a wide range of signal-based feature extraction processes, and our transformation algorithm can provide maximum uniqueness for instance representations. The configuration of features as images provides an advantage for the CNN compared to numbers only for other deep learners. This work shows the capacity of our novel FIT algorithm to extend CNN applications to cover different datatypes including times series classification, medical signals, high energy physics experiments, big data mining, and speech recognition, where classification is complex. We conclude with some notes: In separate work [2] we have shown that FIT + CNN compares favorably with many other classifiers such as DNN applied to non-temporal features extracted from the raw data. In future work, FIT could be compared with other models such as Long Short Time Memory (LSTM) that are designed to operate on raw time-series data rather than extracted features. However, there is no mathematical limitation of FIT to time-series data, so it could also be tested on other types of data-sets to empirically demonstrate its generality. We hypothesize that on other types of data, FIT will be similarly insensitive to parameter tuning, as we observed here, and that could be verified empirically as well. Acknowledgments. This work was thoroughly and critically reviewed, evaluated, and manuscript corrected by Professor Salman M Salman from Alquds University.

References 1. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015). https://doi.org/10.1016/j.neunet.2014.09.003 2. Salman, A.S., Salman, O.S.: Spoofed/unintentional fingerprint detection using secondary biometric features. In: SAI Computing Conference, London (2020)

Extending CNN Classification Capabilities

213

3. Salman, O., Jary, C.: Frequency and amplitude based series timed signals 18 features extraction algorithm (FAST18), pattern classification project report. SCE Carleton University, Spring 2018 4. Rish, I.: An empirical study of the Naive Bayes classifier, T.J. Watson Research Center, 30 Saw Mill River Road, Hawthorne, NY 10532 (2001) 5. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https:// doi.org/10.1023/A:1022627411411 6. Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. thesis, Department of Computer Science, University of Waikato, Hamilton, New Zealand (1999) 7. Roesler, O.: EEG Eye State Dataset. Baden-Wuerttemberg Cooperative State University (DHBW), Stuttgart (2013) 8. Hatami, N., Gavet, Y., Debayle, J.: Classification of time-series images using deep convolutional neural networks. In: Proceedings of the Tenth International Conference on Machine Vision. International Society for Optics and Photonics, Vienna (2017). https://doi.org/10. 1117/12.2309486 9. Wang, Z., Oates, T.: Encoding time series as images for visual inspection and classification using tiled convolutional neural networks. In: Trajectory-Based Behavior Analytics: AAAI Workshop 2015 (2015) 10. Cui, Z., Chen, W., Chen, Y.: Multi-scale convolutional neural networks for time series classification (2016). arXiv preprint. arXiv:1603.06995 [cs.CV]. Cornell University Library, Ithaca, NY 11. Azad, M., Khaled, F., Pavel, M.I.: A novel approach to classify and convert 1D signal to 2D greyscale image implementing support vector machine and imperial mode decomposition algorithm. Int. J. Adv. Res. (IJAR) 7(1), 328–335 (2019). https://doi.org/10.21474/IJAR01/ 8331 12. Dau, H.A., Bagnall, A., Kamgar, K., Yeh, C.M., Zhu, Y., Gharghabi, S., Ratanamahatana, S.A., Keogh, E.: The UCR time series archive (2018). arXiv preprint. arXiv:1810.07758 [cs.LG]. Cornell University Library, Ithaca, NY 13. Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: UCR Time Series Classification Archive (2015). www.cs.ucr.edu/~eamonn/time_series_data/ 14. Hu, B., Chen, Y., Keogh, E.: Time series classification under more realistic assumptions. In: Proceedings of the 2013 SIAM International Conference on Data Mining. Austin, Texas (2013). https://doi.org/10.1137/1.9781611972832.64 15. Bergen, K., Chavez, K., Ioannidis, A., Schmit, S.: Distributed Algorithms and Optimization. CME-323, Stanford Lecture Notes, Institute for Computational & Mathematical Engineering (ICME), Stanford University, CA (2015)

MESRS: Models Ensemble Speech Recognition System Ben Zagagy(B) and Maya Herman Department of Mathematics and Computer Science, The Open University of Israel, Ra’anana, Israel [email protected], [email protected]

Abstract. Speech recognition (SR) technology is used to recognize spoken words and phonemes recorded in audio and video files. This paper presents a novel method for SR, based on our implementation for an ensemble of multiple deep learning (DL) models with different architectures. Contrary to standard SR systems, we ensemble the most commonly used DL architectures followed by dynamic weighted averages, in order to classify audio clips correctly. Models’ training is performed using conversion of audio signals from the audio space into the image space. We used the converted images as training input for the models. This way, most of the default parameters originally used for training models using images, can also be used for training our models. We show that the combination of space conversion and models ensemble can achieve high accuracy results. This paper has 2 main objectives. The first - represent a new trend of definition by expanding the DL process for the audio space. The second - present a new platform for deep learning models ensemble using weighted averages. Previous works in this field tend to stay in the comfort zone of a single DL architecture, fine-tuned to capture all edge cases. We show that applying dynamic weighted average over multiple architectures can improve the final classification results significantly. Since models that classify high pitch audio might not be as good in classifying low pitch audio and vice versa, we harness the capabilities of multiple architectures in order to handle the various edge cases. Keywords: Data mining · Deep learning · Ensemble classifier · Speech recognition

1 Introduction Speech recognition is the ability of a device or program to recognize words in spoken language and convert them into text. The most common speech recognition applications include turning speech into text, voice dialing, and voice search. Although some of these applications work properly for end users, improvement is still needed in several aspects: Speech is sometimes difficult to detect due to variations in pronunciation, speech recognition is performed poorly for most non-English languages, and background noise must be filtered. All these factors can lead to inaccuracies; hence speech recognition is still an interesting area of research. © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 214–231, 2020. https://doi.org/10.1007/978-3-030-52246-9_15

MESRS: Models Ensemble Speech Recognition System

215

The goal of identifying voice commands is to enable people to communicate naturally and efficiently with machines as well as with other people who do not speak the same language. A voice recognition process, also known as Automatic Speech Recognition (ASR), is a process of converting voice signals into words. By using a fully connected deep neural network (DNN), deep learning methods were successful in facing the problem of speech recognition [1–11]. As shown in [12] and [10], over the past few years, DNNbased systems were used by multiple research groups that provided speech recognition solutions with significantly higher accuracy for continuous phone and word recognition, than the earlier GMM-based systems [13] could provide. Advanced deep learning techniques that were successfully applied to improve the speech recognition problem, include the temporarily repeated (deep) neural networks (RNN) [5, 8, 11, 12] and the coiled (CNN) deep neural networks [9, 14–16]. In recent years, a new cutting-edge method called “Deep Learning” has emerged and made a huge impact upon the entire Computer Vision Community. Since they first appeared, deep learning, deep neural networks and mixed neural networks, have achieved results with high quality and accuracy in many computer vision problems, even those research teams considered the most difficult to crack. Deep learning is, roughly, a type of neural network consisting of several layers. When applied to computer vision problems, they are capable of automatically finding a set of highly expressive features. Based on empirical results, in many cases, these features have been shown to be better than those produced manually for solving the given problem. This methodology has another great benefit – there is no need for researchers to manually design these features, since the network is responsible for that. In addition, the features learned by the deep neural networks can be considerably abstracted. The most effective way to train a deep neuronal network model on new input, is to train an existing network with different weights on the existing training input files. This approach is called Transfer Learning [5]. In the groundbreaking article “CNN Features off-the-shelf: an Astounding Baseline for Recognition”, published by researchers at Stockholm University, it has been shown that a network that has learned to classify a particular problem, can undergo a simple basic conversion process and adapt to a different classification problem. This approach is successful thanks to the fact that a deep neuron network contains many layers of abstraction and modeling for the received input. These layers focus on different and on partial features of the original image. These features are determined by the architecture per network and are not always visible to the human eye. Focusing on these features helps the network to improve the quality of the classification it returns. The prevalent approach among deep learning researchers, reinforced, among other things, in the above article, is that abstraction in the various layers of the neuronal network is generic and similar for problems from other spaces, as well. The implications for the system built for our project are, that deep neuron networks that have been trained for purposes other than voice signal classification, can be used to classify voice signals - and this is how the system was built.

216

B. Zagagy and M. Herman

The implications for the system suggested in this paper are, that deep neuron networks, trained for purposes other than voice signal classification, can be used to classify voice signals in the audio space. The purpose of this paper is to present an effective method for speech recognition with an ensemble deep learning technique. Contrary to the basic approach presented in literature, where one highly tuned DNN is trained for the purpose of classifying all spoken words, we train multiple deep learning neural networks with various architectures in order to cover all different edge cases, utilizing the different architectures’ builtin properties. Eventually the system classifies a specific audio clip with a weighted average which is performed on top of the trained networks classifications, in order to retrieve the most likely classification result. An implementation that considers multiple architectures classifications instead of just one, is more robust. Such implementation can cover multiple edge cases which a single architecture cannot cover (since models that classify high pitch audio might not be as good in classifying low pitch audio and vice versa). The paper is organized as follows: • Section 2 overviews our methodology for the problem solving. • Section 3 provides concrete platform implementation details. • Section 4 presents results of a real ensemble for deep learning models that are trained for classifying human speech. • Section 5 provides conclusions and a discussion of future improvement and extensions.

2 Methodology The developed system consists of two core components: batch offline for training the system and real time online testing data output. The offline batch process is responsible for data retrieval, data preprocessing including converting the signals into images in spectral space. Deep Learning models are applied to find the most promising models. Finally, the models are deployed to a shared location, to be used during the online process. 2.1 Offline Process The offline process is responsible for generating and storing deep learning neural networks with various architectures, to be used during the online process. The offline flow shown in Fig. 1, describes data retrieval from audio files in the sound space. Then, the audio data is preprocessed into data in the image space in the form of Mel Spectrogram (as shown in Fig. 2), to be loaded as input for the deep learning models training mechanism. The deep learning training mechanism is responsible for generating the output of the offline batch by creating a series of deep learning models, to be used later by the online flow. The offline batch process can (and should) be executed on a separate strong computer, regardless of the machine on which the online process will be executed.

MESRS: Models Ensemble Speech Recognition System

217

Fig. 1. The offline process flow

Fig. 2. An Example for the Mel Spectrogram image generated from the word “Down”

Data Source. The data sources that were used for the system contain more than 60,000 audio files, each containing different words, spoken by different people. All files are labeled according to the spoken words within them. The labeled audio folders contain 1 s clips of voice commands, with the folder name being the label of the audio clip. This folder contains 31 words, including: “bed”, “bird”, “cat”, “dog”, “down”, “eight”, “five”, “four”, “go, happy “, “house”, “left”, “marvin”, “nine”, “no”, “off”, “on”, “one”, “right”, “seven”, “sheila”, “six”, “stop”, “three”, “tree”, “two”, “up”, “wow”, “yes”, “zero”. This data source was taken from TensorFlow’s speech commands data [17] and

218

B. Zagagy and M. Herman

was collected using crowdsourcing [18]. The files contained in the training audio are not uniquely named across labels but are unique in their inner labeled folder. Data Preprocessing. Prior to the training phase of the labeled input data, or to the classification phase for a non-labeled data, a conversion is performed. We convert audio files’ content into the image space. The conversion generates Mel Spectrograms images out of audio clips. The conversion uses the open source python library of LibROSA [19]. Deep Learning. Deep learning neural network models are created using the preprocessed data, the classification output of which will be used during the online phase. The better and more accurately the system is trained on the training files, the more accurate the system classification of the input data during the online phase.

2.2 Online Process The online process is responsible for classifying the content of a given audio input (in other words – return the output classification of our speech recognition system). The offline flow shown in Fig. 3, describes data retrieval, setting weights and impact for each of the models produced during the Offline stage. Then it describes data preprocessing, including conversion from the audio space to the image space of given input. Each of the deep learning models generated during offline phase produces its own classifications for the given input, then the majority voting component performs an elaborate average on the identification results obtained by the given models’ weights, thus generating a single classification for a given audio input. Note that the online phase is performed “Live” on user’s machine and is not machine dependent. Data Source. The online Process receives two main inputs and data sources – the first input are the weights given by each of the models trained during offline phase, (the weights sum up to 100). These weights will be used during Majority voting section, for creating a weighted average. The second input is an audio clip that the system is required to classify. Data Preprocessing. Prior to classification of the given input audio clip, a conversion is performed. A similar conversion was performed during the offline process, when audio files from the audio space, were converted into image space, in the output form of Mel Spectrogram. Majority Voting. This component is responsible for producing the system’s final classification. The final system classification is made following the performance of a weighted average on top of the system’s models according to their configured weights. Since the system contains more than one neuron network model, and since different models can produce different classifications - getting a single classification for a specific input audio file, requires calculating a weighted average of the classification results produced by the various models in the system. The calculation considers the different weights (percentages) of the various neural network models, considered when setting the system’s final classification of a given audio input file.

MESRS: Models Ensemble Speech Recognition System

219

Fig. 3. The online process flows

3 Implementation Our speech recognition system relies on several modules. There was a need to separate between the offline and the online processes, so they can run on different machines in different times. Most of our code was written in python, as it has multiple free and easy to use audio to image conversion libraries. The same goes for the deep learning neural networks frameworks for training models. We chose to use LibROSA [19] as our audio conversion library and Pytorch [20] (by Facebook) as our deep learning framework. Using Pytorch we have created 6 deep learning neural networks models, based on the following deep learning architectures: DenseNet [21], ResNet [22], SeNet [23], VGG [24].

220

B. Zagagy and M. Herman

3.1 Architecture Components The Models Ensembled Speech Recognition System (MESRS) architecture is described in Fig. 4, its components are:

Fig. 4. MESRS Architecture

Training Data Storage. This section stores all information and data that is required for deep learning neural networks training. It contains labeled audio files in “wav” format, which are categorized according to the content of the spoken word, contained in them. Test Data Storage. This section stores all information and data required for testing the system. It contains non labeled audio files in “wav” format. Audio to Image Conversion Service. This Service converts wav file content into image format. The conversion is performed using the LibROSA library for open source written Python. This library converts an audio file to Mel Spectrogram image. Training Service. This service is the heart of the system. In this service, the various models are trained according to input data entered from training files. The neuron network models created in this process will be activated during the classification phase. The better and more accurately the system is trained on the training files, the more accurate the system classification of the testing files will be. The deep learning neural networks training library of “PyTorch” powered by “Facebook”, is widely used in this service. Models Storage. This module stores the models created during the training phase. These models are in the standard format of PyTorch models (having “pth” extension), and in the later stages they will be used to classify the various test files.

MESRS: Models Ensemble Speech Recognition System

221

CSV Generator Service. The purpose of this service is to produce CSV files (a file for each built-in neuron network model), containing the classification results of each network for each test file. Each CSV file will contain as many rows as the number of test files. Each row will contain two columns: The first column will contain the test file name. The second column will contain the classification results of the current model for the same file. For example, for the model “vgg1d_mel”, a file called “vgg1d_mel_predictions.wav” will be generated. Its content will contain classifications for each and every test file, for example, in lines 60 and 61 it will hold the following information: “clip_0018ff8e9.wav, Up”, “clip_001a5ce9c.wav, Stop”. It means that for the clip_0018ff8e9.wav test file, the model called vgg1d_mel classifies the word “up”. Models Classifications Storage. Here the classification results are stored for each of the CSV-built models (where each model has a corresponding CSV file). Majority Voting Service. The purpose of this service is to obtain the system’s final classification, using the classifications of each of the various test files. The need for this service stems from the system architecture. Since the system contains more than one neuron network model and since different models can produce different classifications, in order to get a single classification for a specific test file, a weighted average of the classification results for the various models in the system, is required. Verification Service. The purpose of the verification service is to compare the final classification results obtained from the system for each test file, with the correct identification results for each test file. This service returns a percentage score between 0 and 100 that describes the quality and correctness of system results. UI Client. The UI interface is designed to load the different weight percentages for each model that will be entered as input to the Majority Voting service. After receiving the results from the Majority Voting phase, the interface will show the user the obtained result, regarding the nature of the model after comparing the system’s results with the correct results during verification phase. The interface is written in html and communicates via the Flask architecture with the project’s python libraries.

3.2 The System’s Training Phase Figure 5 describes a sequence diagram of all the events during the system’s training phase. The person responsible for creating the system sends all the training files in their initial form (“wav” audio files) to the service, which converts them into image files, to be used to train the deep learning system. The generated information is then sent to the training service whose role is to produce models using Facebook’s Porch framework.

222

B. Zagagy and M. Herman

Fig. 5. MESRS training phase sequence diagram

When the training phase ends (this step, as mentioned above can take a lot of time between hours on a powerful computer to days and weeks on a home computer), the newly created model will be stored in a folder containing the system’s model pool. Following various attempts, we have concluded that in order to obtain the best system score, a golden number of six different neural network models is needed. Following the creation of the six different neural network models in the way described, the part where the system produces six CSV files containing the classifications of each model - will begin. First, the system converts the test files, the same way it converted the training files, earlier (from “wav” audio files to a format that the system’s deep network model can classify - that is, to image file format). The system will then go over each test file, for each of the system’s neural network models it will create its own CSV file. Each CSV file line will contain the test file name and the classification that the model gave it, from among one of the words the neural network model can classify. The CSV files for the various models are stored in a folder which will be used during the next phase of system classification. An example of a few rows from the CSV file created for the VGG1D model which classifies Mel Spectrograms images is shown in Table 1.

MESRS: Models Ensemble Speech Recognition System

223

Table 1. An example for different classifications between the system’s models. Unknown clip_00147bbb6.wav Yes

clip_0014ea384.wav

No

clip_0014ed3d5.wav

No

clip_0014f2f18.wav

Unknown clip_00150496f.wav Unknown clip_0015fa156.wav Up

clip_00169a7f7.wav

Go

clip_0017365f5.wav

Unknown clip_0017714af.wav

3.3 The System’s Query Phase Figure 6 describes a sequence diagram for all events during the system’s query phase.

Fig. 6. MESRS query phase sequence diagram

A system query is performed using the user interface. In this interface, the user determines the weight percentages for each of the six classifications, created by the six deep learning neural networks models. After determining the weight percentages for each of the model classifications, these percentages will be sent to the Majority Voting service, which will perform a weighted average based on the results of the classifications and the user entered data. The Majority Voting service will return the classification obtained from the data for each of the testing files. At this point, the received classifications will be

224

B. Zagagy and M. Herman

sent to another service which compares these classifications to the correct classifications of the testing files. After the comparison is done, this service will return the user a final score of the results of the weights he chose, in relation to the correct classifications results received in previous work. The system then calculates an overall grade given to system according to the weighted average method shown in Eq. (1).  (test data files amount) SystemClassification[i] == CorrectClassification[i]: 1 i=0 else: 0 Grade = (test data files amount) (1) Each test file in which the classification given by the system and the original classification are the same, will be added to the final calculation. The overall score of the system is calculated by the amount of times the classification given by the system was identical to the correct classifications, divided by the total amount of files to be classified. This way a score between 0–100 can be quantified to reflect the accuracy of the system.

4 Experimental Results 4.1 Experiment Phases Our experiments are comprised of four main stages. We begin with training deep learning neural networks (colored in blue in Fig. 7). Then we classify the testing data files using the trained neural networks (colored in orange in Fig. 7). Afterwards we perform a weighted average on top of the testing data (colored in yellow in Fig. 7) in order to retrieve a single system classification for each of the testing data. Finally, we compare the system’s classifications to the correct classifications of the testing files (colored in green in Fig. 7) in order to retrieve an overall grade for the system classifications. Train Deep Neural Networks. Colored in blue in Fig. 7, this phase is consisting of two main parts: the first part is the conversion of the labeled audio clips that were designated for training of the system from the audio space to the image space in the form of Mel Spectrograms. The second is the actual training of neural networks using deep learning techniques on top of the converted audio clips. The outputs of this action are 6 models of neural networks that were generated using deep learning and has the ability to classify audio clips content (following their conversion into the image space) into one of the following options: Yes, No, Up, Down, Left, Right, On, Off, Stop, Go, Silence, Unknown. Classify Testing Data Using the Trained Networks. Colored in orange in Fig. 7, this phase consists of two main parts: the first is conversion of non-labeled audio clips that were designated for testing the system, from the audio space into image space, in the Mel Spectrograms form. The second is retrieving and saving the classification of the neural networks’ models (generated in the previous phase) for the Mel spectrograms of

MESRS: Models Ensemble Speech Recognition System

225

Fig. 7. The complete system flows

the testing audio files. The outputs of this action are the classification results for each of the neural networks models that were generated in the previous phase, for each of the testing files. Perform Majority Voting and Retrieve a Single System Classification Per Testing File. Colored in yellow in Fig. 7, in the 3rd phase we will perform a weighted average for all the different model classifications for each of the testing files. The outputs of this action are the system’s overall classification results for each of the testing files. Following this action, there will be only one classification per testing file (unlike the output of action #2 in which each testing file had six different classifications according to the number of trained models). Comparison Between the System’s Classifications and the Correct Classifications. Colored in green in Fig. 7, in the 4rd phase, a comparison will be held between the MESRS system classification results, received during phase 3, to the correct results received from previous work. The output of this phase is a final classification grade that represents the correctness of the entire system.

226

B. Zagagy and M. Herman

4.2 An Experimental Ensemble-Learning Paradigm The uniqueness of MESRS is in the use of an ensemble of very different deep learning neural networks architectures. Thus, it can cover many edge cases that one neural network architecture cannot cover. For example, The ensemble of the deep learning neural networks models is performed by taking different weights (percentages) for the different neural network models and apply a weighted average upon their classifications in order to select the final classification of each test file in the system. For example, given the following data: 1. Audio file named “sample.wav”. 2. 3 neural networks models named VGG, ResNet and DenseNet. 3. 3 weights in percentage for the above models as shown in Table 2. Table 2. An example of weights distribution between the system’s models. DL model architecture

Weight of model in the final classification decision

VGG-1D on Mel Spectrogram images

30%

VGG-1D on Mel Spectrogram images

10%

VGG-2D on Mel Spectrogram images

15%

DenseNet on Mel Spectrogram images

15%

SeNet on Mel Spectrogram images

15%

ResNet on Mel Spectrogram images

15%

An example of the different classifications that could be received from the above models, on the sample file shown in Table 3. Table 3. An example for different classifications between the system’s models. DL model architecture

Classification for “sample.wav”

VGG-1D on Mel Spectrogram images

Up

VGG-1D on Mel Spectrogram images

Go

VGG-2D on Mel Spectrogram images

Go

DenseNet on Mel Spectrogram images

Go

SeNet on Mel Spectrogram images

Up

ResNet on Mel Spectrogram images

Up

MESRS: Models Ensemble Speech Recognition System

227

The classification obtained from the weighted average results for the above data is: “Up”, since this classification holds 60% compared to the “Go” classification which only holds 40%. Choosing the Right Architectures. These are the criteria for choosing deep learning neural networks architectures for MESRS: 1. Have been proven in the past. 2. Have a wide pool of online support and documentation. 3. Have been able to classify images with high accuracy. The architectures that were selected to be used in MESRS were VGG [24], SeNet [23], DenseNet [21] and ResNet [22]. Since all the carefully selected networks for MESRS implementation, are commonly used in the deep learning world, their preprepared implementations exist in Pytorch. In order to match the implementation of the above architectures in python, one could inherit from the pytorch model object (torch.nn.Module) [25]. Table 4 shows the models classification grades, in case of a single architecture which is used to perform the classification of the testing audio clips. Note that the grades are verified against the correct classifications of the audio clips that are dedicated for testing. Table 4. Single architecture classification results (without models’ ensemble) Model

Grade (out of 100)

VGG-1D on Raw Data

91.55

VGG-1D on Mel Spectrogram images 90.71 VGG-2D on Mel Spectrogram images 92.04 DenseNet on Mel Spectrogram images 92.13 SeNet on Mel Spectrogram images

92.05

ResNet on Mel Spectrogram images

89.93

Table 5 holds the ideal weights that produce the best result for the system’s six trained models, when compared to the correct classifications of the audio clips that are dedicated for testing. Audio Clips Classification Using Ensemble of Neural Networks Models. As shown in Fig. 8, a simple user interface was created in order to help calculate different grades given by MESRS for different models. The grade that the MESRS implementation received using the above weights for the entire testing data was 94.03. We see that there is not one dominant model - only with the combination of the many predictions from the different models can the optimal result be obtained. Moreover - when examining the score of a single model (for each of the six models) the highest score is 92.04, less than two percent of the optimal score obtained by using all models together. It can be thought

228

B. Zagagy and M. Herman Table 5. The ideal weights distribution for the system’s models

ResNet on Mel Spectrogram images

SeNet on Mel Spectrogram images

DenseNet on Mel Spectrogram images

VGG-2D on Mel Spectrogram images

VGG-1D on Mel Spectrogram images

VGG-1D on Raw Data

15

15

15

15

10

30

that those 2% are not a difference that justifies the “complexity” of the MESRS system, but it is not true, as 2% of the test files (158539 files) equals 3170, meaning that using 6 models and the elaborate average mechanism, the system achieves 3170 more correct identifications than it was Achieves without them. This is a very large number of correct identifications that would be lost, since in a world where neuronal network models can be implemented in critical systems, such many correct identifications are essential. The best results are obtained when there is one model with 30% and the other 70% of the result consists of a relatively uniform distribution between the other models.

Fig. 8. The user interface to load and calculate different weights into MESRS with the calculated grade.

Comparative Outcome Analysis. The above case study was developed and tested within the restriction of Kaggle’s speech recognition competition [26]. As part of the competition, two large databases were provided, these databases are significant and contain large amounts of audio files. Training set - an audio clip pool containing 65,000 files. The sound clips were recorded by thousands of different people, each file including a single command. The various files are tagged by content. Testing set - a repository containing almost 160,000 files which, like the Training Set, have been recorded by thousands of different people and contain different voice commands. Unlike the Training set files, the testing set files aren’t tagged according to their content. This case study has two main objectives:

MESRS: Models Ensemble Speech Recognition System

229

1. Present the optimal situation and the optimal system score for the above input. 2. Demonstrate the ease of use of the system for testing various weights for various trained neural network models, thereby also demonstrating the ease of obtaining a system quality indicator score. 4.3 Comparative Outcome Analysis and Other Experiments Additional experiments on different weights distribution is shown in Table 6, the different weights are set during the Majority Voting phase for each model. For each different distribution between the models’ weights, the obtained score is indicated in the “Overall system grade” column of Table 6. Table 6. MESRS grades during testing different weights distribution over the system models ResNet on Mel Spectrogram images

SeNet on Mel Spectrogram images

DenseNet on Mel Spectrogram images

VGG-2D on Mel Spectrogram images

VGG-1D on VGG-1D Overall Mel on Raw System Spectrogram Data Grade images

15

15

15

15

10

30

94.03%

20

20

10

10

30

10

93.46%

15

15

15

15

35

5

93.19%

10

20

5

5

50

10

91.43%

10

20

10

30

15

15

93.56%

20

10

30

10

15

15

93.68%

10

10

10

10

10

50

92.47%

10

10

10

10

50

10

91.43%

10

10

10

50

10

10

92.47%

10

10

50

10

10

10

92.44%

10

50

10

10

10

10

91.49%

50

10

10

10

10

10

90.29%

30

10

10

10

10

30

93.62%

10

20

20

20

20

10

93.51%

0

0

0

0

0

100

91.55%

0

0

0

0

100

0

90.71%

0

0

0

100

0

0

92.04%

0

0

100

0

0

0

92.13%

0

100

0

0

0

0

92.05%

100

0

0

0

0

0

89.93%

230

B. Zagagy and M. Herman

5 Conclusions This paper describes an automated speech recognition system that was developed under the name of MESRS (Models Ensemble Speech Recognition System). The conventional data format that is used for training and working with deep learning neural networks, is data from the visual image space. This paper presented MESRS, a system designed for speech recognition using an ensemble of deep learning models that gives high quality classification results. As deep learning is a new and pioneering field, deep learning of audio files is significantly newer, which makes the various developments in this field insufficiently robust and usable for our system, MESRS is a new system based on Google’s TensorFlow Simple Audio Recognition library [27] at first. This library trains on audio data without any further conversion. The classification results from this library were not robust enough for MESRS system. The method that proved to be the most efficient for training such networks was using regular image-aimed deep learning neural networks, but with a twist of converting its input data from the audio space into the image space. This is a well-known and highly robust method for deep learning purposes. The results clearly indicate that creating a quality robust system using an ensemble of deep learning models’ architectures is a viable task. This paper could serve as a base for future studies in the area of speech recognition or other deep learning models ensemble. It has been clearly shown that using an ensemble of deep learning models architectures, one could obtain a classification result that fits trivial and non-trivial problems in forms that a single architecture would not be able to obtain.

References 1. Sainath, T., Kingsbury, B., Ramabhadran, B., Novak, P., Mohamed, A.: Making deep belief networks effective for large vocabulary continuous speech recognition. In: Proceedings of the ASRU (2011) 2. Deng, L., Abdel-Hamid, O., Yu, D.: A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion. In: Proceedings of the ICASSP (2013) 3. Yu, D., Deng, L., Seide, F.: The deep tensor neural network with applications to large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 21(2), 388–396 (2013) 4. Mohamed, A., Dahl, G., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Language Process. 20, 14–22 (2012) 5. Sainath, T., Kingsbury, B., Soltau, H., Ramabhadran, B.: Optimization techniques to improve training speed of deep neural networks for large speech tasks. IEEE Trans. Audio Speech Language Process. 21(11), 2267–2276 (2013) 6. Sainath, T., Kingsbury, B., Mohamed, A., Dahl, G., Saon, G., Soltau, H., Beran, T., Aravkin, A., Ramabhadran, B.: Improvements to deep convolutional neural networks for LVCSR. In: Proceedings of the ASRU (2013) 7. Deng, L., Li, J., Huang, Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., Gong, Y., Acero, A.: Recent advances in deep learning for speech research at Microsoft. In: Proceedings of the ICASSP (2013) 8. Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview. In: Proceedings of the ICASSP (2013)

MESRS: Models Ensemble Speech Recognition System

231

9. Baker, J., Deng, L., Glass, J., Khudanpur, S., Lee, C.-H., Morgan, N., O’Shaughnessy, D.: Research developments and directions in speech recognition and understanding. IEEE Sig. Proc. Mag. 26(3), 75–80 (2009) 10. Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012) 11. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of the Interspeech (2011) 12. Yu, D., Deng, L., Dahl, G.E.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2010) 13. Jaitly, N., Nguyen, P., Vanhoucke, V.: Application of pre-trained deep neural networks to large vocabulary speech recognition. In: Proceedings of the Interspeech (2012) 14. Kingsbury, B., Sainath, T., Soltau, H.: Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization. In: Proceedings of the Interspeech (2012) 15. Sainath, T., Mohamed, A., Kingsbury, B., Ramabhadran, B.: Convolutional neural networks for LVCSR. In: Proceedings of the ICASSP (2013) 16. Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent, pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Language Process. 20, 30–42 (2012) 17. Training & Testing sets Used in Kaggle’s Speech Recognition Challenge. https://www.kag gle.com/c/tensorflow-speech-recognition-challenge/data 18. Google’s Crowd sourcing Open Speech Recording. http://aiyprojects.withgoogle.com/open_s peech_recording 19. Librosa – python package for music and audio analysis. https://librosa.github.io/librosa 20. Pytorch – An open source deep learning platform by Facebook. https://pytorch.org 21. Squeeze-and-excitation Networks (SeNet). https://arxiv.org/abs/1709.01507 22. Deep Residual Learning for Image Recognition (ResNet). https://arxiv.org/abs/1512.03385 23. Very Deep Convolutional Networks For Large Scale Image Recognition (VGG). https://arxiv. org/pdf/1409.1556.pdf 24. Densely Connected Convolutional Networks (DenseNet). https://arxiv.org/pdf/1608.06993. pdf 25. Pytorch neural networks module inheritance documentation. https://pytorch.org/docs/stable/ nn.html 26. Tensor Flow Speech Recognition Challenge. https://www.kaggle.com/c/tensorflow-speechrecognition-challenge 27. Tensorflow Simple Audio Recognition Library. https://github.com/tensorflow/docs/blob/mas ter/site/en/r1/tutorials/sequences/audio_recognition.md

DeepConAD: Deep and Confidence Prediction for Unsupervised Anomaly Detection in Time Series Ahmad Idris Tambuwal(B) and Aliyu Muhammad Bello Faculty of Engineering and Informatics, University of Bradford, Bradford, UK [email protected], [email protected]

Abstract. The current digital era of Industrial IoT and Automotive Technologies have made it standard for a large number of sensors to be installed on machines or vehicle, capture and exploit time-series data from such sensors for health monitoring tasks such as anomaly detection, fault detection, as well as prognostics. Anomalies or outliers are unexpected observations which deviate significantly from the expected observations and typically correspond to critical events. Current literature demonstrates good performance of Autoencoder for anomaly and novelty detection problems due to its efficient data encoding in an unsupervised manner. Despite the unsupervised nature of autoencoder-based anomaly detection methods, they are limited by the identification of anomalies using a threshold that is defined based on the distribution of reconstruction cost. Often, it is difficult to set a precise threshold when the distribution of reconstruction cost is not known. Motivated by this, we proposed a new unsupervised anomaly detection method (DeepConAD) that combined Autoencoder with forecasting model and used uncertainty estimates or confidence interval from forecasting model to identify anomalies in multivariate time series. We performed an experimental evaluation and comparison of DeepConAD with two other anomaly detection methods using Yahoo benchmark dataset, which contain both real and synthetic time series. Experimental results show DeepConAD outperforms other anomaly detection methods in most of the cases. Keywords: Time series · Anomaly detection · Uncertainty estimation · Deep Neural Networks · Autoencoder

1 Introduction Anomaly has become popular in our daily life activities, where every day we observe and identify abnormalities. When something deviates mostly from the sequence of these activities, it is label as an anomaly or an outlier. In the field of data mining, anomalies and outliers are used interchangeably, which refers to unexpected observations which deviate significantly from the expected observations and typically correspond to critical events [1, 2]. The current digital era of Industrial Internet of Things [3] and Automotive Technologies have made it familiar for a large number of sensors to be installed on devices, © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 232–244, 2020. https://doi.org/10.1007/978-3-030-52246-9_16

DeepConAD: Deep and Confidence Prediction

233

capture and exploit time-series data from such sensors for health monitoring tasks such as anomaly detection, fault detection, as well as prognostics. In time-series signals, an anomaly is any unexpected change in the pattern of one or more of the signals. It is vital for timely detection of abnormalities in such signals from any system to avoid its total failure [4–6]. In the context of automotive industries, for example, anomaly detection can provide prior information on mechanical faults [7] and sensor faults [8]. Several types of anomalies exist in the literature, but the three most popular types are the point, contextual, and collective anomalies [6, 9]. Point anomaly is generally studied in the context of multidimensional dependency-oriented data types such as time series, discrete sequence, spatial and temporal data. In the context of this paper, we focus on point anomalies in time series data. Several techniques for point anomaly detection exist in the literature, but we will focus on prediction or regression-based models due to their ability to handle temporal features that exist within the time series [6]. In prediction models, the anomaly score is computed as the rate of deviation of a point from its predicted value. Current literature also shows the usage of deep learningbased regression models such as Recurrent Neural Network (RNN) mainly based on Long Short Term Memory (LSTM) [10, 11] or Gated Recurrent Units (GRU) [12] and Convolutional Neural Network (CNN) [13] for anomaly detection in time-series data. Their performance in unsupervised representation learning of time sequence applicable to text, video, and speech recognition, shows their ability to handle the temporal nature of time series data [14]. Despite the performance of these techniques and their unsupervised learning approach, they used the prediction error for the detection of anomalies. However, in most of the real-life scenarios, that involves complex system (e.g. automotive driving); there are often external factors or variables, which are not captured by sensors that lead to unpredictable time series. Detecting anomalies in such scenarios become challenging using a standard approach based on prediction models that utilized prediction errors to detect anomalies and hence, the introduction of Autoencoders. Autoencoder is another type of deep neural network, which is trained to reconstruct its input. Autoencoders are used for dimensionality reduction that helps in classification and visualization tasks and used for learning a hidden representation of time series. As a result of its efficient data encoding in an unsupervised manner, it is also gaining popularity in anomaly and novelty detection problems [15–17]. Despite the unsupervised nature of autoencoder-based anomaly detection methods, they are limited by the identification of anomalies using a threshold that is defined based on the distribution of reconstruction cost. Often, the distribution of reconstruction cost is not known, or the experimenter does not aim at making any assumption about a specific distribution. As such, it is not possible to define a precise threshold value that can aid for identification of anomalies and hence, the need for probabilistic time-series forecasting to identify anomalies. This area has not been extensively explored in deep learning where prediction models focus more on estimating an optimal value as define by lost function. In contrast, Lingxue and Nikolay proposed a deep learning model that provides time-series predictions along with uncertainty estimates and used that for forecasting of extreme events [18]. Similarly, Reunanen et al. demonstrate the use of Autoencoder reconstruction cost and Chebyshev’s inequality to calculate the upper and lower outlier detection limits in sensor streams [2].

234

A. I. Tambuwal and A. M. Bello

Motivated by probabilistic time series forecasting and ability of Autoencoder to learn the hidden representation of data, we proposed a new unsupervised anomaly detection method (DeepConAD) that utilized hidden representation provided by autoencoder and uncertainty estimates from prediction model to identify anomalies in multivariate time series. We employ a window-based sequence learning by combining LSTM autoencoder and LSTM forecaster for encoding and forecasting of next time step based on a window of previous time steps. In this context, an encoder is used to encode the input sequence and captured the patterns of the time series. This encoded sequence is then passed to a regression model, which then forecast the next step for each value along with uncertainty estimate. We achieve this by developing a quantile regression model, which approach the regression problem as estimating continuous probability distribution instead of estimating a single value or sequence of values. By computing the upper and lower quantiles of the model predicted output, we assume the model have captured most likely values the reality can expect. We then compute the interval between upper and lower quantiles to see whether our model find any anomalies. We consider any observed value that is outside the range as an anomaly. DeepConAD also captured deviations in the correlation of data features via Autoencoder, which enhances its performance in handling the multivariate and temporal characteristics of time series. As such DeepConAD is suitable for domains where time-series is collected from heterogeneous sensors. When tested with publically available Yahoo anomaly benchmark dataset, DeepConAD outperforms must of the state of the art anomaly detection methods. In summary, the main contributions of this paper are as follows: 1) To the best of our knowledge, DeepConAD is the first deep learning-based anomaly detection method that used model uncertainties for detecting point anomalies in time series data. 2) The proposed method is flexible and can easily be adapted to different use cases and domains where dynamic behaviour and complex temporal dependencies exist among the sensors. 3) In contrast to the LSTM and CNN based approach, DeepConAD used Autoencoder to learn hidden representation in the time series, which increase its performance. The rest of the paper is organized as follows: Sect. 2 provides a literature review on anomaly detection methods. In Sect. 3, we provide a theoretic background and detail description of our approach. Section 4 provides experimental evaluation, and Sect. 5 includes a conclusion and future work.

2 Literature Review The broad diversity of application domain together with a different type of data affects the choice of the anomaly detection technique. Temporal data such as social network data streams, computer network traffic, astronomy data, sensor data and commercial transactions that are generated from different application domains have led to the rise of a field of anomaly detection for temporal data. Anomaly detection problem in the field of temporal data can be categorized in different ways due to the diversity of the area.

DeepConAD: Deep and Confidence Prediction

235

One of the categorizations is based on data type, nature of the data, types of abnormality in context, and availability of label anomalies [6]. Different types of data exist from different application domains such as continues series (e.g. sensor readings), discrete series (e.g. weblogs), data streams (e.g. text streams), and network data (e.g. graph and social streams). In this section, we will review anomaly detection methods designed for time series data. Literature has shown two significant types of anomaly detection techniques for time-series. The first involves anomaly detection in the time series database where the focus is on identifying entire time series sequences in the database that is anomalous [19, 20]. The second involves identification of anomalies within a given time series sequence which includes point (instantaneous) anomaly and window-based anomaly. Window-based require identification of anomalous subsequence within the time series where the current window is unexpected and deemed abnormal. Although there are several window-based anomaly detection methods proposed in the literature [21, 22], these methods require the definition of the expected pattern. Unfortunately, without expert knowledge of the system, it will be difficult to define such patterns, which affect the performance of the model. Point-based anomalies involve identification of element or point within the subsequence of time series as an anomaly. Which means given a window of time series; the aim is to identify an abnormal point within the window. Point-based anomaly detection methods are also used for identification of sensor drift where an individual continuous sensor suddenly drift. Many point anomaly detection techniques exist in literature and review of those techniques is beyond the scope of this paper. We refer the reader to [6] for detailed study and understanding of those techniques. Similarly, current literature shows the used of Deep Neural Networks (DNNs) such as Recurrent Neural Network (RNN) mainly based on Long Short Term Memory (LSTM) or Gated Recurrent Unit (GRU) for point anomaly detection in time series data. Their performance in unsupervised representation learning of time sequence applicable to text, video, and speech recognition, shows their ability to handle the temporal nature of time series data. Sangha et al. [8], used RBF for online fault diagnosis on real engine sensor data from automotive industry. An anomaly detection method based on stack LSTM was also proposed in [10]. The authors developed a predictive model that was trained using the normal sequence, which is further evaluated to compute error vectors based on its performance on the anomalous sequence. Anomaly is then defined by the setting of an error threshold that is defined using the validation test. A similar approach was also used to detect anomalies in ECG data [23]. Deep CNN was also used as a regression model for anomaly detection on time series data [13]. The model predicts the next timestamp using a window of previous time stamps and used Euclidean distance to measure the difference between the predicted and actual values. Based on these differences, an anomaly is identified at a given timestamp using a threshold value. The techniques review above works on Univariate Time Series (UTS) and therefore, cannot handle the correlation of Multivariate Time Series (MTS). Saurav et al. [12], proposed another highly related anomaly detection method that used a sliding window to handle both multidimensional and streaming nature of time series.

236

A. I. Tambuwal and A. M. Bello

Similarly, the authors in their proposed approach used GRU units, which are a simplified version of LSTM units. Even though GRU is simpler than LSTM units, LSTM is more potent than GRU because of its ability to learn long-term correlations in a sequence and capable of accurately modelling complex multivariate sequences without the need for pre-specified time window [24]. As such, it becomes more appropriate to consider LSTM units with multiplicative gates that enforce constant error flow through the internal states of individual units called “memory”. LSTM also have internal memory that operates like a local variable, allowing them to accumulate state over the input sequence. Despite the ability of the method mentioned above to handle multidimensional and streaming nature of time series, the technique used prediction error for the detection of anomalies. However, in most of the real-life scenarios, that involves complex system (e.g. automotive driving); there are often external factors or variables, which are not captured by sensors that lead to unpredictable time series. Detecting anomalies in such scenarios become challenging using a standard approach based on prediction models that utilized prediction errors to detect anomalies and hence the introduction of Autoencoders. Autoencoder is another type of deep neural network, which is trained to reconstruct its input. Autoencoders are used for dimensionality reduction that helps in classification and visualization tasks and also used for learning the hidden representation of timeseries. As a result of its efficient data encoding in an unsupervised manner, it is also gaining popularity in anomaly and novelty detection problems. In this context, LSTM Encoder-Decoder had been used to handle MTS characteristics and shown to be useful for anomaly detection [15]. In the paper, Encoder-Decoder model learns to capture the normal behaviour of the machine by learning to reconstruct MTS corresponding to normal functioning in an unsupervised manner, thereby using reconstruction error to detect anomalies. Since their model is trained only on time series corresponding to normal behaviour, it is expected to perform well with a small error while reconstruction normal MTS and poorly with more significant error on abnormal MTS. The reconstruction error is then used as an anomaly score that is used to identify anomalies. In a similar approach, Amarbayasgalan et al. [17] combined autoencoding and clustering method for unsupervised novelty detection. The authors in their approach used to compress data and reconstruction error threshold obtained from autoencoders and apply density-based clustering on the compressed data to detect novelty groups with low density. Schreyer et al. [16] also used deep autoencoders to detect anomalies in large-scale accounting data in the area of fraud detection. Despite the unsupervised nature of autoencoder-based anomaly detection methods, they are limited by the assumption of the distribution of reconstruction cost. They used the distribution of reconstruction cost to define a threshold, which can help in identifying anomalies. Often, the distribution of reconstruction cost is not known, or the experimenter does not aim at making any assumption about a particular distribution. As such, it is not possible to define a precise threshold value that can aid for identification of anomalies and hence, the need for probabilistic time-series forecasting to identify anomalies. This area has not been extensively explored in deep learning where prediction models focus more on estimating an optimal value as define by lost function. In contrast, Lingxue and Nikolay proposed a deep learning model that provides time-series predictions along

DeepConAD: Deep and Confidence Prediction

237

with uncertainty estimates and used that for forecasting of extreme events [18]. Similarly, Reunanen et al. demonstrate the use of Autoencoder reconstruction cost and Chebyshev’s inequality to calculate the upper and lower outlier detection limits in sensor streams [2]. Motivated by probabilistic time series forecasting and ability of Autoencoder to learn the hidden representation of data, we proposed a new unsupervised anomaly detection method (DeepConAD) that utilized hidden representation provided by autoencoder and uncertainty estimates from prediction model to identify anomalies in multivariate time series. We employ a window-based sequence learning by combining LSTM autoencoder and LSTM forecaster for encoding and forecasting of next time step based on a window of previous time steps. In this context, an encoder is used to encode the input sequence and captured the patterns of the time series. This encoded sequence is then passed to a regression model, which then forecast the next step for each value along with uncertainty estimate. We achieve this by developing a quantile regression model, which approach the regression problem as estimating continuous probability distribution instead of estimating a single value or sequence of values. By computing upper and lower quantiles of the model predicted output, we assume the model have captured most likely values the reality can expect. We then compute the interval between upper and lower quantiles to see whether our model find any anomalies. We consider any observed value that is outside the range as an anomaly. DeepConAD also captured deviations in the correlation of data features via Autoencoder which enhance its performance in handling the multivariate and temporal characteristics of time series data and therefore suitable for domains where time series is collected from heterogeneous sensors.

3 DeepConAD: Proposed Approach In this section, we described our proposed approach, which is divided into seven stages: Normalization, Time Series Segmentation, Auto encoding, Forecasting, Uncertainty estimation, Anomaly identification, and Visualization of anomalies. The overall steps are depicted in Fig. 1.

Fig. 1. The workflow of the propose approach

238

A. I. Tambuwal and A. M. Bello

Normalization. Motivated by the existence of different value scales in each time series, normalization of the MTS is carried out to enhance the performance of the regression model. Consider an MTS x = {x1 , x2 , . . . , xt }, where t is the length of the time series and each point xi ∈ Rm (for i = 1 . . . t) in the time series is an m-dimensional vector corresponding to the m features or sensor channels read at time t. We scaled the feature between 0 and 1 (xij ∈ [0, 1]) where j = 1 . . . m, as shown in (1). xi =

xi − xmin xmax − xmin

(1)

Where xmax and xmin are vectors that contain the maximum and minimum values of the features. The scaling is done for each point per feature. The output of this stage is an array of scaled values representing the MTS. Time Series Segmentation. The input to this stage is an array of scaled MTS from the previous stage. In order to leverage our regression model for sequence-to-sequence learning, the input MTS is transformed into multiple sequences of time steps. This transformation involves converting the MTS from one sequence to pairs of input and output sub-sequences. The MTS is segmented into two overlapping fixed length windows of size l. First, is a history window (hw ) which define the number of previous time steps in history that will  be used as input to theAutoencoder. That is, given an MTS we use a history window xt−w−(l−1) , . . . , xt−1 , xt of length w. Second, is the predicted window (pw ), which represent the number of time steps required to be forecasted as output from the regression model. The aim is to predict the next time step given a window of previous time steps as shown in (2) for w = 5 and pw = 1. xt−4 , xt−3 , xt−2 , xt−1 , xt → xt+1

(2)

In a regression problem as ours, the left-hand side serve as input data to the model and the right-hand side as the output that is treated as a label to the input data. The output of this stage is two arrays of sub-subsequence representing hw and pw . Auto Encoding. The input to this stage is an array of history windows from the previous step. In this stage, an encoder-decoder model is used to reconstruct each window of sub-sequence and extract useful representation or pattern from the window. The last LSTM cell states of the encoder are extracted, which contains both learned features for forecasting and unusual input captured in the feature space that will be propagated to the regression model in the next stage. The output of this stage is an array of extracted features from a window of sub-sequence which is passed as input for forecasting in the next step. Forecasting. In this step, an array of extracted features obtained from the auto encoding step and an array of prediction window (pw ) are used as input. An LSTM based regression model is trained to forecast the next time step (pw ) using the learned features. The model forecast next time step, taking into account uncertainty as explain in the next stage.

DeepConAD: Deep and Confidence Prediction

239

Uncertainty Estimation. Uncertainty estimation in prediction is the process of quantifying uncertainty (or confidence) in prediction models where instead of estimating an optimal value, the model can estimate probability distributions. It is the process of demonstrating Bayesian inference, which aims at finding the posterior distribution over model parameters p(ω|X , Y ), given set of sub-sequence X and Y where X represents hw and Y represent pw . With a new data point x ∈ X , the distribution of the prediction is obtained by marginalizing out the posterior distribution, as shown in (3), where ω is the collection of model parameters. Taking the variance of this distribution will quantify the prediction uncertainty, which further elaborated using the law of total variance, as shown in (4). p(y|x) = ∫ p(y|x, ω)p(ω|X , Y )d ω

(3)

  Var(y|x) = Var E(y|ω, x) + E = Var(ω(x)) + δ 2

(4)

ω

From the last part of (4), we can see that the variance is decomposed into two terms: (i) Var(ω(x)), which represent model uncertainty that reflects our ignorance over model parameters ω; and (ii) δ 2 which represent the noise level during the data generation process, referred to as inherent noise. The assumption is y is generated from the same distribution, which may not be the case in most scenarios. In anomaly detection, for example, we expect some time series sub-sequence will have unexpected points, which can be different from the sequence used in training the model. Therefore, we can conclude that complete measurement of prediction uncertainty should be a combination of model uncertainty, model misspecification and inherent noise level [18]. In order to achieve this, a quantile regression model was developed which focus on forecasting extreme values (lower (10th), upper (90th), and classical (50th) quantiles). The model is implemented in Keras LSTM by creating a quantile loss function which penalizes errors based on quantile depending on whether the error is positive or negative. By computing upper and lower quantiles, the model has captured most likely values the reality can assume. The width of this range can be very depth; it will be small when the model is sure about the future and can be big when the model isn’t able to see significant changes within the time series. This behaviour was used and let the model detect whether the predicted point is an anomaly or not. The computed quantiles are stored in arrays and pass as input to the next stage for anomaly identification. A detailed description of this procedure is given in the following subsection. Anomaly Identification. The input to this step is the arrays of computed quantiles. Anomalies are identified by computing an interval as an array of difference between upper and lower quantiles (90-10 quantiles range). The interval is expected to be small when the data is normal and is within the model learned range. On the contrary, anomalous values are expected when there is a bigger interval. We consider any observed value xt , which falls outside of the 95% prediction interval as an anomaly. The output of this stage as an array is given to the next step for visualization. Visualization of Anomalies. In this stage, an array of computed intervals received as input from the previous step is plotted against the actual sequence to visualized actual points that are anomalous.

240

A. I. Tambuwal and A. M. Bello

4 Experimental Evaluation In this section, we designed and conducted extensive experiments to evaluate and compare the performance of DeepConAD with the current state-of-the-art methods using both real and artificial datasets. The section starts with a description of the dataset used, which is then followed by an experimental setup, and finally results and discussion. 4.1 Datasets Descriptions DeepConAD is evaluated using Yahoo Webscope dataset, which is a commonly used anomaly benchmark dataset in the literature. Yahoo Webscope dataset is publically available dataset release by Yahoo Labs. The benchmark datasets have been widely used in research to validate anomaly detection algorithms and determine the accuracy of various types of an anomaly, including outliers and change-points [12, 13]. The choice of this datasets is because of the availability of the anomaly labels that we can use to validate our model. The dataset consists of 367 real and synthetic time series with label anomaly points. The real datasets include of time series with representing the metrics of various yahoo services. The synthetic datasets consist of time series that demonstrate trend, noise, and seasonality changes with anomalies presents at random positions. Each time series contains 1,420 to 1,680 instances. The benchmark dataset is further divided into four sub-benchmarks that include A1Benchmark, A2Benchmark, A3Benchmark, and A4Benchmark. In each dataset file, there is a Boolean attribute named “is_anomaly” with values 1 and 0 that indicate if the value at a particular timestamp is an anomaly or not. Because we are using unsupervised learning approach in our evaluation, we drop the label attribute from each dataset. 4.2 Experimental Setup All the experiments are run on the same computer having Intel Pentium processor with core i7, Windows operating system and python anaconda 3.7 configured with deep learning libraries. 60% of each dataset is considered as a training set and the rest of 40% as a test set. Similarly, to test the power of the model on an unseen time series data, the training set is further divided, and 20% is used as unseen data to validate the model. A window sizes of 35 and 45 was used, which have an optimal performance as demonstrated in [13]. Similarly, 100 number of iterations was used as the number of times the model was evaluated and dropout probability of 0.5. F-score was used as an evaluation metric for DeepConAD and all anomaly detection methods used in our comparison. F_score was used to evaluate the models in terms of the number of detected and rejected point anomalies. Average F-scores (5) per sub-benchmark are reported for each anomaly detection method. F − score = 2 ×

Precision × Recall Precision + Recall

(5)

DeepConAD: Deep and Confidence Prediction

241

4.3 Experimental Result and Discussions This section contains two sets of results. It starts by demonstrating how Autoencoder recognizes the pattern of the time series, thereby improving the prediction performance of the model using uncertainty estimation. Then, it shows the performances of DeepConAD on anomaly detection and its comparison with other anomaly detection methods. Prediction Performance and Uncertainty Estimation. Table 1 reports the Mean Absolute Error (MAE) and related uncertainties of DeepConAD and a single LSTM model using the validation set. As it can be observed from the table, DeepConAD has overall improvement of at least 1% in accuracy and at least 0.3% lower degree of uncertainty in all the datasets. With this result, we can assert that LSTM Autoencoder used in DeepConAD improve its performance due to its ability of extracting important unseen features from time series. Table 1. MAE and its relative uncertainties DeepConAD and LSTM models Datasets

Models

MAE

Uncertainties

A1Benchmark DeepConAD 0.0296 0.0008 LSTM

0.0824 0.0009

A2Benchmark DeepConAD 0.0342 0.0006 LSTM

0.0386 0.0007

A3Benchmark DeepConAD 0.0253 0.0002 LSTM

0.0370 0.0007

A4Benchmark DeepConAD 0.0288 0.0003 LSTM

0.0332 0.0006

In order to demonstrate the performance of DeepConAD on anomaly detection, A1Benchmark original time series is depicted against the prediction interval as shown in Fig. 2. The actual values in red with blue dots showing the interval range between the predicted quantiles in the time series. As illustrated in the figure, the quantile interval range (blue dots) is higher in the period of uncertainty, and the model tends to generalized well in other cases. We then go more in-depth to investigate the periods of high uncertainties and notice these coincide with the anomaly points in the original label time series which shows the performance of DeepConAD in detecting anomalous point within the time series. Performance Comparison. In this subsection, we described the performance comparison of DeepConAD and other state-of-the-art anomaly detection methods. On a detailed level, Table 2 show a comparison of the technique with DeepAnT [13] and LSTM-AD [10] anomaly detection methods. The table shows our method outperforms other methods in three sub-benchmarks whereas for A2Benchmark it a runner up between our approach and DeepAnt, which all outperform by LSTM-AD model. In this table, it is

242

A. I. Tambuwal and A. M. Bello

Fig. 2. An example of identified anomalous points in A1Benchmark time series is shown in this figure. Original time series is shown in red lines and quantile interval in blue dots.

also important to note that our method performs better than LSTM-AD on three subbenchmarks and only perform slightly weak in one sub-benchmark. This demonstrates how Autoencoder improves the prediction performance by extracting essential features from the time series. Table 2. Average F-Score of DeepConAD, LSTM-AD, and DeepAnT on Yahoo dataset Yahoo dataset DeepConAD LSTM-AD DeepAnT A1Benchmark 0.57

0.44

0.46

A2Benchmark 0.94

0.97

0.94

A3Benchmark 0.98

0.72

0.87

A4Benchmark 0.98

0.59

0.68

5 Conclusion and Future Work This paper presented a new unsupervised anomaly detection method (DeepConAD) that utilized uncertainty estimates from the prediction model to detect anomalies in multivariate time series. We employ a window-based sequence learning using LSTM autoencoder and LSTM forecaster for predicting next time step based on previous time steps. In this context, an encoder is used to encode the input sequence and captured the patterns of the time series. This encoded sequence is then passed to a prediction model, which then make the next step prediction for each value along with the uncertainty estimate in the output sequence. This is achieved by developing a quantile regression model, which approach the regression problem as estimating continuous probability distribution instead of estimating a single value or sequence of values. By computing the upper and lower

DeepConAD: Deep and Confidence Prediction

243

quantiles of the model predicted output, we assume the model have captured most likely values the reality can expect. The interval between upper and lower quantiles is then computed to determine whether the model find any anomalies. Observed values that falls outside 95% confidence interval are considered as anomalies. Experiments were conducted to evaluate the performance of DeepConAD and its comparison with other state of the art anomaly detection methods. The result shows that in most cases, DeepConAD outperforms other methods. DeepConAD also handles the multivariate and temporal characteristics of time series data and therefore suitable for domains where unlabeled time series is collected from heterogeneous sensors. Our future work will be focused on extending the method to include anomaly prediction part that will use anomaly labels obtain in the current approach and train a classifier for prediction of anomalies. Furthermore, an online model will be explored to detect both concept drift and anomalies from sensor streams.

References 1. Wang, J.: Outlier detection in big data. J. Clean. Prod. 16(15), 2862 (2014) 2. Reunanen, N., Räty, T., Jokinen, J.J., Hoyt, T., Culler, D.: Unsupervised online detection and prediction of outliers in streams of sensor data. Int. J. Data Sci. Anal. 9(3), 285–314 (2019) 3. Da Xu, L., He, W., Li, S.: Internet of things in industries: a survey. IEEE Trans. Ind. Inform. 10(4), 2233–2243 (2014) 4. Teng, M.: Anomaly detection on time series. In: International Conference on Progress in Informatics and Computing, vol. 1, pp. 603–608 (2010) 5. Galeano, P., Pena, D., Tsay, R.S.: Outlier detection in multivariate time series by projection pursuit. J. Am. Stat. Assoc. 101, 654–669 (2006) 6. Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014) 7. Theissler, A., Dear, I.: An anomaly detection approach to detect unexpected faults in recordings from test drives. In: Proceedings of the WASET International Conference on Vehicular Electronics and Safety 2013, vol. 7, no. 7, pp. 195–198 (2013) 8. Sangha, M.S., Yu, D.L., Gomm, J.B.: Sensor fault diagnosis for automotive engines with real data evaluation. Multicr. Int. J. Eng. Sci. Technol. 3(8), 13–25 (2011) 9. Kandhari, R., Chandola, V., Banerjee, A., Kumar, V., Kandhari, R.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–6 (2009) 10. Malhotra, P.A.P Vig, L., Shroff, G., Rinard, M.: Long short term memory networks for anomaly detection in time series. In: Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges (Belgium), 22–24 April 2015, (2015) 11. Aldosari, M.S.: Unsupervised anomaly detection in sequences using long short term memory recurrent neural networks. PhD Diss. Georg. Mason Univ., p. 98 (2016) 12. Saurav, S., Malhotra, P., Vishnu, T.V., Gugulothu, N., Vig, L., Agarwal, P., Shroff, G.: Online anomaly detection with concept drift adaptation using recurrent neural networks. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data - CoDS-COMAD 2018, pp. 78–87 (2018) 13. Munir, M., Siddiqui, S.A., Dengel, A., Ahmed, S.: DeepAnT: a deep learning approach for unsupervised anomaly detection in time series. IEEE Access 7, 1991–2005 (2019) 14. Gugulothu, N., Vishnu, T.V., Malhotra, P., Vig, L., Agarwal, P., Shroo, G.: Predicting remaining useful life using time series embeddings based on recurrent neural networks. In: 2nd ML PHM Work. SIGKDD 2017, vol. 10 (2017)

244

A. I. Tambuwal and A. M. Bello

15. Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., Shroff, G.: LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv.org, July 2016 16. Schreyer, M., Sattarov, T., Borth, D., Dengel, A., Reimer, B.: Detection of anomalies in large scale accounting data using deep autoencoder networks arXiv:1709.05254, September 2017 17. Amarbayasgalan, T., Jargalsaikhan, B., Ryu, K.: Unsupervised novelty detection using deep autoencoders with density based clustering. Appl. Sci. 8(9), 1468 (2018) 18. Zhu, L., Laptev, N.: Deep and confident prediction for time series at Uber. In: IEEE International Conference on Data Mining Workshops, ICDMW, vol. 2017–November, pp. 103–110 (2017) 19. Hyndman, R.J., Wang, E., Laptev, N.: Large-scale unusual time series detection. In: Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015, pp. 1616–1619 (2016) 20. Keogh, E., Lonardi, S., Chiu, B.Y.: Finding surprising patterns in a time series database in linear time and space. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2002, p. 550 (2002) 21. Keogh, E., Lin, J., Fu, A.: HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 226–233 (2005) 22. Bontemps, L., Cao, V.L., McDermott, J., Le-Khac, N.A.: Collective anomaly detection based on long short-term memory recurrent neural networks. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS, vol. 10018, pp. 141–152 (2016) 23. Chauhan, S., Vig, L.: Anomaly detection in ECG time signals via deep long short-term memory networks. In: Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015 (2015) 24. Hochreiter, S., Schmidhuber, J.J.: Long short-term memory. Neural Comput. 9(8), 1–32 (1997)

Reduced Order Modeling Assisted by Convolutional Neural Network for Thermal Problems with Nonparametrized Geometrical Variability Fabien Casenave1(B) , Nissrine Akkari1 , and David Ryckelynck2 1

Safran Tech, Rue des Jeunes Bois, 78114 Chˆ ateaufort, Magny-Les-Hameaux, France [email protected] 2 Centre des Mat´eriaux, Mines ParisTech PSL Research University, CNRS UMR 7633, 63-65 rue Henri Auguste Desbru`eres, Corbeil-Essonnes, France

Abstract. In this work, we consider a nonlinear transient thermal problem numerically solved by a high-fidelity model. The objective is to derive fast approximations of the solutions to this problem, under nonparametrized variability of the geometry, and the convection and radiation boundary conditions, using physics-based reduced order models (ROM). Nonparametrized geometrical variability is a challenging task in model order reduction, which we propose to address using deep neural networks. First, a convolutional neural network (CNN) is trained to compute the discretization error of a fastly simulated solution on a coarse mesh, under the aforementioned geometry and boundary conditions variability. Then, for a fixed geometry, a ROM is constructed under the boundary conditions variability; the data used to construct the ROM being the coarse solutions and the CNN predicted discretization errors. We illustrate that in all our tested configurations, the reduced order model improves the accuracy of the coarse and CNN predictions. Keywords: Convolutional neural network · Thermal problem · Physical reduced order modeling · Nonparametrized geometrical variability · Discretization error

1

Introduction

In the industry, fast procedures to simulate physical problems are very important in design and uncertainty quantification processes. Meta modeling, deep learning or physical reduced order modeling are possible candidates to speed up numerical simulations. All these techniques require first solving some instances of the problem, in order to construct a procedure able to approximate the solution under new variability. In this work, the industrial problem of interest is the thermal analysis of mechanical parts of aircraft engines. Such parts are subjected c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 245–263, 2020. https://doi.org/10.1007/978-3-030-52246-9_17

246

F. Casenave et al.

to thermal loads, and their strength is usually limited by the maximum temperature reached by the material: too close to the melting point, the mechanical properties of the part quickly deteriorate. We find in the literature many papers where the authors are interested in using regression in deep learning combined to physics-based approach in order to control the sharpness of the data of interest. In [16], it is proposed to train convolutional neural networks (CNNs) without any labeled data, where the highfidelity partial differential operators are incorporated in the likelihood loss functions. These CNNs are called physics-constrained deep learning models. In [7], the authors propose to use machine learning in order to quantify errors made by some approximations of partial differential equations. These approximations could be obtained at the end of an iterative scheme, lower fidelity model or even projection-based reduced order models. The machine learning framework for model errors is based on a regression method that constructs the noise model as a zero-mean Gaussian random variable. In [13], a Gaussian process regression method is used to construct a set of response functions for the errors between the high fidelity model and a parametric nonintrusive reduced order model (PNIROM). In [1], physics-based data driven models are proposed, where the distance to the admissible points in a physical sense is used to find the optimal weights of a regression model. In [9], it is shown that one-dimensional flow models can be used to constrain the outputs of deep neural networks, so that their predictions satisfy the conservation of mass and momentum principles. In [11], it is proposed a new method to transform the source domain knowledge to fit the target domain, using deep learning method and limited samples from the target domain to transform the source or the input domain dataset. It is proposed in [15] to find feature representations which minimize the distance between source and target domains, as it is not naturally done in classical deep learning methods: a novel semi-supervised representation deep learning framework is proposed thanks to a softmax regression model auto-encoder for the manifold regularization. In this work, we are interested in the fast simulation of nonlinear transient thermal problems, under a nonparametrized variability of the geometry, as well as the boundary conditions. We propose to combine physical reduced order models and deep learning in the following fashion: a first coarse prediction is computed by solving the problem on a coarse mesh. We then use CNNs to predict the error of this coarse prediction with respect to the reference solution, namely the discretization error. This new prediction is then improved using physical reduced order modeling. More precisely, we use the snapshots Proper Orthogonal Decomposition [3,12] combined with the Empirical Cubature Method [8] to reduce this nonlinear problem. Having a physical reduced order model compute the final prediction has numerous advantages: the boundary conditions and physical equations are weakly satisfied, i.e. on the reduced order basis (besides, the homogeneous Dirichlet boundary conditions are exactly enforced), the constitutive law is exactly solved, and in some cases, we even dispose of an error estimation of the approximation [10]. We illustrate in our numerical applications

CNN-Assisted ROM for Thermal Problems

247

that the reduced order model improves the prediction of the coarse model and the CNNs in all the tested configurations. In what follows, we first present in Sect. 2 our problem of interest: a transient nonlinear thermal problem under a nonparametrized variability of the geometry, as well as the convection and radiation boundary conditions. The proposed CNNs to predict the discretization error of the coarse solution are presented in Sect. 3 and elements on physical reduced order modeling are provided in Sect. 4. Our proposed framework combining CNNs with reduced order modeling is detailed in Sect. 5, and numerical applications are given in Sect. 6.

2

Description of the Problem of Interest

Consider a structure of interest denoted Ω, such that the boundary ∂Ω is partitioned in d = 4 surfaces: ∂Ω = ∪di=1 Γ (i) , Γ (i)◦ ∩ Γ (j)◦ = ∅, 1 ≤ i, j ≤ d, where ·◦ denotes the interior of a set, see Fig. 1. Γ (2)

Γ (1)

Ω

Γ (3)

Γ (4)

Fig. 1. Representation of the structure of interest Ω, with a partitioning of the boundary ∂Ω = ∪di=1 Γ (i) , Γ (i)◦ ∩ Γ (j)◦ = ∅, 1 ≤ i, j ≤ d = 4.

We are interested in the prediction of the temperature field T (x, t) over the structure Ω during a time interval [0, tf ]. In the absence of work (the geometry of the structure is fixed) and volumic heat source, the heat equation reads ∂u (x, t) + ∇ · q(x, t) = 0, ∂t

(1)

where u is the volumic internal energy (in J.m−3 ) and q is the heat flux density (in J.s−1 .m−2 ). We make the following additional assumptions: the density ρ (in kg.m−3 ), the massic heat capacity cp (in J.kg−1 .K−1 ) and heat conductivity λ (in J.s−1 .m−1 .K−1 ) are supposed uniform over Ω and constant in time. The massic internal energy U = uρ (in J.kg−1 ) is assumed to depend only on the temperature. Moreover, the heat exchanges between the structure and the exterior are modeled by convection and radiation boundary conditions: q(x, t) ·

248

F. Casenave et al.

  4 n(x) = h(x, t) (T (x, t) − T1,e (x, t)) + (σ)(x, t) T 4 (x, t) − T2,e (x, t) , for (x, t) ∈ ∂Ω × [0, tf ], where h denotes the convection coefficient (in J.s−1 .m−2 .K−1 ), σ the Stefan–Boltzmann constant (in J.s−1 .m−2 .K−4 ),  is the emissivity coefficient (dimensionless), and T1,e and T2,e are external temperature values. The coefficients σ and  are fixed, and h is uniform over each surface and constant in time: ⎧ h(x, t) = h(i) , Γ (i) × [0, tf ] ⎪ ⎪ ⎪ ⎪ ⎨ (i) T1,e (x, t) = T1,e (t), Γ (i) × [0, tf ] (2) ⎪ ⎪ (σ)(x, t) = σ, ∂Ω × [0, tf ] ⎪ ⎪ ⎩ T2,e (x, t) = T2,e (t) ∂Ω × [0, tf ] Under these assumptions, T (x, t), x ∈ Ω, t ∈ [0, tf ], is solution of the following system of equations: ⎧ ∂T ⎪ ρcp (x, t) − λΔT (x, t) = 0, Ω × [0, tf ] ⎪ ⎪ ⎪ ∂t ⎪  ⎪ ⎨ (i) λ∇T (x, t) · n(x) = h(i) T (x, t) − T1,e (t) + ⎪   ⎪ 4 ⎪ ⎪ σ T 4 (x, t) − T2,e (t) , Γ (i) × [0, tf ], 1 ≤ i ≤ d ⎪ ⎪ ⎩ Ω × {0} T (x, t = 0) = Tinit (x). (3) The strong form (3) of the partial differential equations of interest is weakened into a variational formulation, then discretized in space using the Galerkin method with finite elements, and in time using a backwards Euler time stepping scheme. This leads to a system of nonlinear equations, for which an approximate solution is computed using a Newton algorithm.

N The solution temperature is obtained as T (x, ts ) = k=1 Tk (s)ϕk , 1 ≤ s ≤ J, where {0, t1 , ..., tJ = tf } is the time discretization and {ϕk }1≤k≤N , the finite element basis of cardinal N , is the space discretization. At time step s + 1, the p-th iteration of the Newton algorithm writes: ⎧ (0) ⎨ T (s + 1) = T (s)    ⎩ DFs T (p) (s + 1) T (p+1) (s + 1) − T (p) (s + 1) = −Fs T (p) (s + 1) , DV (4) where T (p) (s + 1) ∈ RN is the p-th iteration at time step s + 1 of the solution temperature coefficients on the finite element basis, T (s) is the known solution at the previous time step s, where, for V ∈ RN , Fs (V ) ∈ RN , 0 ≤ s ≤ J − 1, is such that

CNN-Assisted ROM for Thermal Problems

249

 N ρcp Fs,l (V ) = ϕk (x)ϕl (x)dx (Vk − Tk (s)) dt Ω k=1  N ∇ϕk (x) · ∇ϕl (x)dx Vk +λ k=1

− σ

Ω

⎛ ⎝

∂Ω



d

N

h(i)

Vk ϕk (x)



4 T2,e (s



Γ (i)

N

(5)

+ 1)⎠ ϕl (x)dx

k=1

i=1



4

 Vk ϕk (x) −

(i) T1,e (s

+ 1) ϕl (x)dx, 1 ≤ l ≤ N.

k=1

 (p)  s and where DF (s + 1) ∈ RN ×N is such that DT T   ∂F  DFs s,k T (p) (s + 1) = T (p) (s + 1) DV k,l ∂Vl

ρcp ϕk (x)ϕl (x)dx + λ ∇ϕk (x) · ∇ϕl (x)dx = dt Ω Ω 3

 N (p) − 4σ Tk (s + 1)ϕk (x) ϕk (x)ϕl (x)dx ∂Ω



d i=1

The

Newton

algorithm

T (p+1) (s+1)−T (p) (s+1)2

h(i)

k =1

Γ (i)

ϕk (x)ϕl (x)dx, 1 ≤ k, l ≤ N.

iterations

(4)

are

stopped

(6) when

≤ HF tol , where  · 2 denotes the Euclidean norm on 

d 4 (i) (i) RN , and where bext l (s + 1) = σT2,e (s + 1) ∂Ω ϕl (x)dx + i=1 h T1,e (s +  1) Γ (i) ϕl (x)dx, 1 ≤ l ≤ N . The problem (3), solved by the Newton algorithm (4), is our reference high fidelity model (HFM). This HFM will be solved to generate the data needed in our training tasks. In our application, we are interested in fast numerical strategies to approximate the solution temperature under nonparametrized variations of the geometry Ω, the convection coefficients h and the time evolutions of the external temperatures T1,e and T2,e . To generate the random geometries, the coordinates of the four corners of the unit 2D square are shifted by a value taken randomly following a uniform probability distribution over [−0.25, 0.25]. In practice, the unit square is first meshed, and then the mesh is morphed using radial basis function interpolation, see [5]. The convection coefficients h(i) , 1 ≤ i ≤ 4, are taken randomly following a uniform probability distribution over [0, 10000]. To generate the evolutions of T1,e and T2,e , for each temperature, 11 values are taken following a uniform probability distribution over [0, 1000], corresponding to times instances 0, 100, ..., 1000, and the time evolution is obtained by linear interpolation between these values, see Fig. 2 for an example. bext (s+1)2

250

F. Casenave et al. 1,000

temperature (◦ C)

800 600 T2,e

400

(1)

T1,e

(2)

T1,e

200

(3)

T1,e

(4)

T1,e

0 0

200

400 600 time (s)

800

1,000

Fig. 2. Example of external temperature time evolution (i)

Remark 1 (external temperatures). The external temperature T1,e , 1 ≤ i ≤ 4, and T2,e can be different at the same time since they do not model the same physical phenomenon. The convection models the heat exchanged by contact at (i) the surface Γ (i) , where T1,e is the temperature of the fluid medium in contact. The radiation models the heat exchanged by the exterior surfaces seen by ∂Ω, in the linear optics sense, where T2,e is the temperature of these external seen surfaces (assumed here uniform in space). Remark 2 (nonparametrized variability). The fact that we choose some parametrization to generate our data in the previous paragraph do not contradict the fact that our variability is nonparametrized. When exploiting our fast approximation, we want to be able to use more general variations. In practice, the boundary conditions come from numerical simulations from another solver and another physics, for which we do not know any explicit parametrization.

3

Convolutional Neural Networks

In this section, we consider couples of meshes for the same geometry: one set of coarse meshes for which the HFM is solved very fast, and one set of fine meshes for which the HFM provides our reference solutions, in a runtime considered too long for efficient exploitation. A possibility for deriving fine solutions from coarse ones is to use superresolution strategies. These strategies are popular for computational fluid dynamics simulations, where the data is often produced as piece-wise constant fields on grids by finite volume schemes: the coarse solution is already available in the form of a coarse grid, a natural candidate for the input of a deep neural network. In our case, taking a subsampling of the coarse

CNN-Assisted ROM for Thermal Problems

251

solutions would neglect a lot of available information: the solution of (most) finite element problems are available as continuous functions over the complete structure through the finite element interpolation. We recall that the difference between a coarse solution and the reference solution at each point of the structure is called the discretization error. Hence, we propose to use deep learning to learn the discretization error of the coarse solutions. The proposed approximation is the prediction of the discretization error by the neural network added to the coarse solution. The considered data consist in 125 simulations of 100 times steps, for random geometries and boundary conditions, computed each on a coarse and fine mesh. We keep 100 simulations for the training set, and 25 for the testing set. The coarse solution and the difference between the fine and the coarse solutions (the discretization errors) are projected on a regular grid of 81 by 81 cells. The obtained tensors are scaled between 0 and 1. The regression task addressed by CNNs has for input a tensor of size (10000 × 81 × 81 × 2): 100 times steps for 100 random geometries for the number of samples, a 2D grid of 81 × 81 cells, and 2 channels: one for the projected coarse field and another one for the mask of the fine mesh projected onto the grid. The output tensor is of size (10000 × 81 × 81 × 1): still 10000 samples on the same 81 × 81-cell grid, and only one channel: the prediction of the discretization error. The data preparation for the training of the CNNs is illustrated in Fig. 3. For the computation of the prediction using the trained CNNs, the input data is prepared in the same way as for the training (the scaling of the data is done using the same function as the one fitted on the training data). After the CNNs has been applied to the data, an inverse projection is done to represent the data on the fine mesh, see Fig. 4. Notice that both the coarse and the fine meshes are two different approximations of an underlying geometry. To help the CNNs and prevent them for having to learn how to transform the coarse boundary into the fine one, all the projection and inverse projection operations are done so that the mask of the fine mesh is used as geometrical support. Depending on the convexity of the boundary, data is suppressed or extrapolated so that the fine mask is always used, see Fig. 5. Two CNNs are trained for 24 h on two Nvidia Quadro K5200 Graphics Processing Units using keras [4], their architectures are represented in Fig. 6. They consist in a succession of 2D convolution layers of increasing number of filters, followed by 2D deconvolution layers of decreasing number of filters. In all layers, the kernel size is (3 × 3) and ‘tanh’ activation functions are chosen for the first CNN whereas ‘relu’ ones are chosen for the second CNN. The batch size is 100, and the Adam optimizer is chosen with a learning rate of 10−4 and the mean square error metric. In this work, we use for predictions the average of the predictions of these two CNNs. Some other tested architectures lead to worse performance on our particular data, namely the stochastic gradient descent optimizer, and adding 50% dropout layers after each convolution and deconvolution layer.

252

F. Casenave et al.

Fig. 3. Data manipulation for the training of a CNN.

Two predictions using data from the testing set are presented in Fig. 7. We see that the CNNs successfully reproduce patterns and values of the discretization error on a configuration not seen during the training. Remark 3 (Prediction by windows). Notice that the learning of the discretizations error using the CNNs is restricted to a square subdomain (a window), see the images at the top of Fig. 3. When using the CNNs, the discretization error is not predicted outside the window, see the bottom-right image in Fig. 4. This enables us to use the CNNs on larger structures by predicting the discretization error window by window, as done in Sect. 6.2. In this section, we have constructed a framework to generate fast approximations of the HFM (3) from fastly computed coarse solutions using CNNs. As we will see in Sect. 6, such predictions can lead in some cases to worse L∞ -errors, with respect to the coarse solution.

CNN-Assisted ROM for Thermal Problems

253

Fig. 4. Data manipulation for the prediction using a trained CNN.

4

Model Order Reduction

Physical reduced order model (ROM) techniques can also be used to accelerate the computation of the high fidelity problem (3). ROM procedures consist in two stages: an offline stage, where information from the HFM is learned using machine learning, and an online stage, where the reduced order model is constructed in the form of an approximation of the physical equations and solved. Expensive tasks are computed during the offline stage, whereas the online stage is required to be efficient, since only operations in computational complexity independent of the number N of degrees of freedom of the HFM are usually allowed. In our case, as for most physical ROM techniques, the online approximation consists in solving the same equations as the HFM, namely the Newton algorithm (4), but where the Galerkin method is applied on a particular basis, called reduced-order basis (ROB), instead of the finite element basis. The ROB being

254

F. Casenave et al.

Fig. 5. Zoom over a couple of coarse and fine meshes represented together: depending on the convexity of the boundary, the coarse or the fine mask is larger.

Fig. 6. Architecture of the two tested CNNs.

specifically tailored for our problem of interest, its construction needs solving instances of the HFM. The cardinality n of the ROB is much smaller than N , leading to reduced runtimes for the reduced problem. As far as the ROM task is concerned, our objective contains important challenges. First, the nonparametrized variabilities of the geometry are moved out of the scope of the ROM: each time a new geometry is considered, a new complete ROM procedure has to be restarted. Then, the other challenges are the nonparametrized variability for the external temperature time evolutions and

CNN-Assisted ROM for Thermal Problems

255

Fig. 7. Two predictions using data from the testing set at two different time steps. Left: predictions from the CNNs, right: exact discretization error.

the nonlinearity of the equations of the HFM: they have been recently addressed in the literature. The ROB is obtained using a principal component analysis (PCA) (called proper orthogonal decomposition in the ROM community [12]) on the collection of some solution temperature fields. The efficiency of the reduced problem is obtained by replacing the costly integrals in (5) and (6) with precomputed tensors when possible, and reduced quadrature schemes constructed using a NonNegative Orthogonal Matching Pursuit Algorithm otherwise, see [14, Algorithm 1], [6] and [8] (this last approximation is often called hyperreduction).

5

Proposed Framework

We recall our objective: the fast computation of the HFM (3) under nonparametrized variations of the geometry Ω, the convection coefficients h and

256

F. Casenave et al.

the time evolutions of the external temperatures T1,e and T2,e . In this section, we propose to improve the accuracy of the CNNs prediction using physical ROM in the following framework. First, CNNs are trained to provide a first prediction under the considered variabilities, using some training data. Then, for each geometry of interest (in the exploitation phase), we construct a ROM using some training data representing the variabilities of h, T1,e and T2,e in the ROM offline stage. The data used in the training of the ROM, on which the PCA is applied to construct the ROB, are the coarse solutions and the prediction of the discretization errors from the CNNs. Finally, in the ROM online stage, we can fastly construct an approximation of the HFM for any variability of h, T1,e and T2,e . The framework is illustrated in Fig. 8.

Fig. 8. Proposed framework combining CNNs on discretization error and physical ROMs.

Using a ROM constructed with the projection of the coarse solution on the fine mesh would not improve the accuracy of the coarse prediction, since the ROB is a subspace of the linear space generated by the data. On the contrary, CNNs prediction do not account for the underlying physics, as explained in [16], where physics-based constraints have been imposed in the loss function. In this work, the data provided to the ROM is the coarse solution enriched by predictions of the CNNs. The intuition is that the ROM will filter this data to keep only the ones relevant to the current configuration and partial differential equations. The data not relevant will be discarded by the ROM when solving the physical reduced problem. As stated in the introduction, the advantages of the ROM prediction are, in our case, the boundary conditions and physical equations being satisfied on the ROB and the constitutive law being exactly solved (more precisely, only on the reduced integration scheme for the terms needing hyperreduction in the online assembling). For these reasons, we expect the ROM predictions to be more accurate than the CNNs ones.

6

Numerical Results

In this section, we apply the framework proposed in Sect. 5 to our problem of interest. For the moment, we do not consider the variabilities of h, T1,e and T2,e in the ROM: the CNNs are constructed under the complete variability, but when

CNN-Assisted ROM for Thermal Problems

257

considering the ROM, the offline and online stages are computed for one set of h and temporal profiles for T1,e and T2,e . In what follows, we denote Tcoarse the coarse temperature solution, TCNN the prediction temperature from the CNNs (namely, the prediction of the discretization error added to the coarse solution), TROM the prediction temperature of the ROM using the framework described in Sect. 5 and Tref the reference temperature solution (which is, in our case, the fine temperature solution). We define the following indicators, where Tpred is taken among Tcoarse , TCNN and TROM : – E ∞ (Tpred ) := max max |Tpred − Tref | (x, ts ), 1≤s≤J x∈Ω J

¯ ∞ (Tpred ) := 1 – E J – EL2 (Tpred ) :=

– E Q (Tpred ) := ts ) · n(x)dx, – Emat (Tpred ) :=

max |Tpred − Tref | (x, ts ),

x∈Ω s=1 

J 1 s=1 J Ω

1 J

2

(Tpred − Tref ) (x, ts )dx  , 2 max (T ) (x, t )dx ref s Ω 1≤s≤J 

J 

 s=1 QTpred (ts ) − QTref (ts ) , where QT (ts ) := ∇T (x, max |QTref (ts )| ∂Ω 1≤s≤J

1 D



|Tpred − Tref | (x , ts ), where D := {(x , s ) ∈ (Ω ×

(x ,s )∈D

{1, · · · , J}) | Tref (x , ts ) > 0.98 cardinal of D.

max

(x,s)∈(Ω×{1,··· ,J})

(Tref (x, ts ))} and D is the

¯ ∞ (Tpred ) and EL2 (Tpred ) quantify L∞ - and L2 The indicators E ∞ (Tpred ), E Q errors, E (Tpred ) quantifies the error on the prediction of the amount of heat exchanged with the exterior, and Emat (Tpred ) quantifies the error where the reference temperature is close to its maximal value. We explained in the introduction that this last indicator is related to the material integrity in an industrial context, where materials are pushed to the limit of their strength. 6.1

Geometrical Variabilities for Structures of the Same Size as the Training of the CNNs

We first consider random geometries located inside the limits of the grid used for the training of the CNNs of Sect. 3, so that we cannot have parts of the geometry going outside the grid, as illustrated in Fig. 3 and 4. The proposed framework is tested in the geometries illustrated in Fig. 9. The predicted discretization errors using CNNs and ROM, as well as the reference discretization errors, for the third geometry at t = 450 s are illustrated in Fig. 10. We notice that the ROM improves the CNNs prediction in the areas where the difference Tref −Tcoarse is maximal and minimal, and on the boundaries of the structure.

258

F. Casenave et al.

Fig. 9. Five geometries used for testing the proposed framework.

Fig. 10. Predicted discretization errors using CNNs and ROM, and reference discretization error, for the third geometry at t = 450 s.

The accuracy indicators of the different predictions for the five geometries, under five random instances of the boundary conditions, are provided in Table 1. In all cases and for all indicators, the ROM provided the best approximation, with a significant accuracy improvement for the last two indicators, more related to the physics of the problem and the industrial stakes. We notice that for the last two indicators, the CNNs predictions can be worse than the coarse ones. The duration of the different steps of the framework are given in Table 2. We remind that the ROM online stage is in computational complexity independent of N , the number of degrees of liberty of the associated HFM. Without accounting for the training of the CNNs, the duration of the complete procedure is approximately the same as the duration of the fine reference simulation. Here, we do not consider variability for the boundary conditions in the ROM. Suppose that the constructed ROM is accurate in a certain variability regime for the boundary conditions, then any new set of boundary condition in this regime can be computed to the price of the ROM online stage, which is 4.2 s in this case. The indicated durations for our framework can be improved by optimizing the code for the projection and inverse projection operations.

CNN-Assisted ROM for Thermal Problems

6.2

259

Geometrical Variability for Structures Larger Than the Training of the CNNs

In this experiment, we consider a structure 8 times larger than the previous ones (twice as high and four times as large), see Fig. 11. Table 1. Accuracy indicators applied to five different configurations. The bold results correspond to the more accurate prediction, and the underlined results to the least accurate one. Geometry 1 Geometry 2 Geometry 3 Geometry 4 Geometry 5 E∞

Tcoarse 45.1 ◦ C TCNN 45.4 ◦ C TROM 33.1 ◦ C

55.1 ◦ C 50.0 ◦ C 46.9 ◦ C

70.8 ◦ C 66.0 ◦ C 58.3 ◦ C

46.1 ◦ C 44.9 ◦ C 37.8 ◦ C

63.8 ◦ C 58.8 ◦ C 49.4 ◦ C

¯∞ E

Tcoarse 18.7 ◦ C TCNN 17.2 ◦ C TROM 16.6◦ C

29.2 ◦ C 26.6 ◦ C 25.9◦ C

32.5 ◦ C 30.0 ◦ C 26.9◦ C

23.1 ◦ C 21.0 ◦ C 19.2◦ C

29.8 ◦ C 27.6 ◦ C 24.0◦ C

EL2

Tcoarse 0.00926 TCNN 0.00753 TROM 0.00733

0.00872 0.00688 0.00650

0.0107 0.0080 0.0072

0.00713 0.00561 0.00525

0.00901 0.00699 0.00659

EQ

Tcoarse 0.0212 TCNN 0.0210 TROM 0.0097

0.0165 0.0165 0.0097

0.0191 0.0192 0.0129

0.0145 0.0146 0.0088

0.0228 0.0222 0.013

Emat Tcoarse 2.69 ◦ C TCNN 2.76 ◦ C TROM 2.16 ◦ C

14.2 ◦ C 13.4 ◦ C 7.6 ◦ C

6.48 ◦ C 7.33 ◦ C 4.77 ◦ C

5.04 ◦ C 4.98 ◦ C 3.75 ◦ C

3.65 ◦ C 3.58 ◦ C 3.23 ◦ C

Table 2. Duration of the different steps of the framework when testing with small geometries. Fine simulation Coarse simulation Projection coarse mesh to grid CNNs prediction Projection grid to fine mesh ROM offline ROM online

61.83 s 0.46 s 63.76 s 6.2 s 4.8 s 11.1 s 37 s 4.2 s

260

F. Casenave et al.

Fig. 11. Large geometry used for testing the proposed framework.

Fig. 12. Predicted discretization errors using CNNs and ROM, and reference discretization error, for the large geometry at t = 770 s. Table 3. Accuracy indicators applied to the configuration with the large geometry. The bold results correspond to the more accurate prediction, and the underlined results to the least accurate one. Large geometry ∞

Tcoarse 30.1 ◦ C TCNN 29.2 ◦ C TROM 28.9 ◦ C

¯∞ E

Tcoarse 13.5 ◦ C TCNN 18.8 ◦ C TROM 13.1 ◦ C

EL2

Tcoarse 0.00306 TCNN 0.00301 TROM 0.00242

EQ

Tcoarse 0.0184 TCNN 0.0237 TROM 0.0138

E

Emat Tcoarse 3.69 ◦ C TCNN 4.32 ◦ C TROM 3.50 ◦ C

CNN-Assisted ROM for Thermal Problems

261

Table 4. Duration of the different steps of the framework when testing with a large geometries. Fine simulation Coarse simulation Projection coarse mesh to grid CNNs prediction Projection grid to fine mesh ROM offline ROM online

483 s 2.4 s 340 s 61 s 10 s 90 s 176 s 1s

The predicted discretization errors using CNNs and ROM, as well as the reference discretization errors, for the large geometry at t = 770 s are illustrated in Fig. 12. We notice that the ROM improves the CNNs predictions on the boundaries of the structure and removes some irregularities. The accuracy indicators of the different predictions for the large geometry, under random instances of the boundary conditions, are provided in Table 3. The comments done in Sect. 6.1 are also valid for this experiment. The duration of the different steps of the framework are given in Table 4. Notice that the faster ROM online stage with respect to the previous section is due to a less stringent accuracy requirement enforced for the ROM approximation in this case.

7

Conclusions and Future Works

In this work, we considered a transient nonlinear thermal problem, for which we proposed fast numerical approximations in a context of nonparametrized geometrical and temporal boundary condition scenarios variability. This approximation is based on a first numerical resolution on a coarse mesh, from which the discretization error is predicted by convolutional neural networks. As illustrated in our numerical experiments by error indicators adapted to the physical problem, this first prediction can sometimes be less accurate than the coarse solution. This prediction is then improved by constructing a physical reduced order model on data composed of the coarse solution and the prediction of the discretization error by the neural networks. In all our numerical experiments, the prediction of the reduced order model is more accurate than both the coarse and the neural network predictions. Hence, among the data generated by the neural networks, the reduced order model only keeps the one pertinent for the problem at hand. The numerical applications contain an experiment where the neural network predicts the discretization error window by window on a structure 8 times larger than the ones used during the training, and the reduced order model, constructed on the complete structure, could also improve the accuracy of the prediction.

262

F. Casenave et al.

For the moment, we do not consider variation of boundary conditions temporal scenarios in the reduced order model procedure, and the complete framework computes an approximation in the same runtime as the reference high-fidelity model in this case. As a perspective, we plan to consider this variability in the physical model order reduction, which should lead to practical speedups with respect to the high-fidelity model, in a many queries context. With our second numerical application, we experiment the use of the CNNs window by window. In a parallel computing context, the proposed framework can generate in parallel with distributed memory the data for the offline stage of the reduced order modeling method, which can also be computed in parallel with distributed memory using the procedure detailed in [2]. Another way to put this in perspective is: for any new geometry, a reduced order model strategy can be derived without having to solve any global equilibrium over the high-fidelity structure, since the data required to construct the reduced order model is only coarse computations, and discertization errors computed locally in parallel by the CNNs. This must be confirmed by also increasing the prediction ability of the CNNs to provide better data to the reduced order modeling procedure.

References 1. Ayensa-Jim´enez, J., Doweidar, M.H., Sanz-Herrera, J.A., Doblar´e, M.: An unsupervised data completion method for physically-based data-driven models. Comput. Methods Appl. Mech. Eng. 344, 120–143 (2019) 2. Casenave, F., Akkari, N., Bordeu, F., Rey, C., Ryckelynck, D.: A nonintrusive distributed reduced order modeling framework for nonlinear structural mechanics - application to elastoviscoplastic computations. Int. J. Numer. Methods Eng. 121, 32–53 (2020) 3. Chatterjee, A.: An introduction to the proper orthogonal decomposition. Curr. Sci. 78(7), 808–817 (2000) 4. Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras 5. de Boer, A., van der Schoot, M.S., Bijl, H.: Mesh deformation based on radial basis function interpolation. Comput. Struct. 85(11), 784–795 (2007). Fourth MIT Conference on Computational Fluid and Solid Mechanics 6. Farhat, C., Avery, P., Chapman, T., Cortial, J.: Dimensional reduction of nonlinear finite element dynamic models with finite rotations and energy-based mesh sampling and weighting for computational efficiency. Int. J. Numer. Methods Eng. 98(9), 625–662 (2014) 7. Freno, B.A., Carlberg, K.T.: Machine-learning error models for approximate solutions to parameterized systems of nonlinear equations. Comput. Methods Appl. Mech. Eng. 348, 250–296 (2019) 8. Hern´ andez, J.A., Caicedo, M.A., Ferrer, A.: Dimensional hyper-reduction of nonlinear finite element models via empirical cubature. Comput. Methods Appl. Mech. Eng. 313, 687–722 (2017) 9. Kissas, G., Yang, Y., Hwuang, E., Witschey, W.R., Detre, J.A., Perdikaris, P.: Machine learning in cardiovascular flows modeling: predicting arterial blood pressure from non-invasive 4D flow MRI data using physics-informed neural networks. Comput. Methods Appl. Mech. Eng. 358, 112623 (2020)

CNN-Assisted ROM for Thermal Problems

263

10. Ryckelynck, D., Gallimard, L., Jules, S.: Estimation of the validity domain of hyper-reduction approximations in generalized standard elastoviscoplasticity. Adv. Model. Simul. Eng. Sci. 2(1), 19 (2015) 11. Salaken, S.M., Khosravi, A., Nguyen, T., Nahavandi, S.: Seeded transfer learning for regression problems with deep learning. Expert Syst. Appl. 115, 565–577 (2019) 12. Sirovich, L.: Turbulence and the dynamics of coherent structures, parts I, II and III. Q. Appl. Math. XLV, 561–590 (1987) 13. Xiao, D.: Error estimation of the parametric non-intrusive reduced order model using machine learning. Comput. Methods Appl. Mech. Eng. 355, 513–534 (2019) 14. Yaghoobi, M., Wu, D., Davies, M.E.: Fast non-negative orthogonal matching pursuit. IEEE Sig. Process. Lett. 22(9), 1229–1233 (2015) 15. Zhu, Y., Wu, X., Li, P., Zhang, Y., Hu, X.: Transfer learning with deep manifold regularized auto-encoders. Neurocomputing 369, 145–154 (2019) 16. Zhu, Y., Zabaras, N., Koutsourelakis, P.-S., Perdikaris, P.: Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. J. Comput. Phys. 394, 56–81 (2019)

Deep Convolutional Generative Adversarial Networks Applied to 2D Incompressible and Unsteady Fluid Flows Nissrine Akkari1(B) , Fabien Casenave1 , Marc-Eric Perrin2 , and David Ryckelynck2 1

Safran Tech, Modelling and Simulation, Rue des Jeunes Bois, 78114 Chˆ ateaufort, Magny-Les-Hameaux, France [email protected] 2 Centre des Mat´eriaux, Mines ParisTech PSL Research University, CNRS UMR 7633, 63-65 Rue Henri Auguste Desbru`eres, Corbeil-Essonnes, France

Abstract. In this work, we are studying the use of Deep Convolutional Generative Adversarial Networks (DCGANs) for numerical simulations in the field of Computational Fluid Dynamics (CFD) for engineering problems. We claim that these DCGANs could be used in order to represent in an efficient fashion high-dimensional realistic samples. Let us take the example of fluid flows’ unsteady velocity and pressure fields computation when subjected to random variations associated for example with different design configurations or with different physical parameters such as the Reynolds number and the boundary conditions. The evolution of all these variables is usually very hard to parameterize and to reproduce in a reduced order space. We would like to be able to reproduce the time coherence of these unsteady fields and their variations with respect to design variables or physical ones. We claim that the use of the data generation field in Deep Learning will enable this exploration in numerical simulations of large dimensions for CFD problems in engineering sciences. Therefore, it is important to precise that the training procedure in DCGANs is completely legitimate because we need to explore afterwards large dimensional variabilities within the Partial Differential Equations. In literature it is stated that theoretically the generative model could learn to memorize training examples, but in practice it is shown that the generator did not memorize the training samples and was capable of generating new realistic ones. In this work, we show an application of DCGANs to a 2D incompressible and unsteady fluid flow in a channel with a moving obstacle inside. The input of the DCGAN is a Gaussian vector field of dimension 100 and the outputs are the generated unsteady velocity and pressure fields in the 2D channel with respect to time and to an obstacle position. The training set is constituted of 44 unsteady and incompressible simulations of 450 time steps each, on a cartesian mesh of dimension 79 × 99. We discuss the architectural and the optimization hyper-parameters choice in our case, following guidelines from the literature on stable GANs. We quantify the GPU cost needed to train a generative model to the 2D unsteady flow fields, to 892 s on one Nvidia Tesla V100 GPU, for 40 epochs and a batch size equal to 128. c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 264–276, 2020. https://doi.org/10.1007/978-3-030-52246-9_18

GAN for Unsteady and Incompressible Flows Keywords: DCGAN · CFD · Velocity field · Pressure field dimensional realistic samples · Data generation

1

265

· High

Introduction

Unsteady simulations for fluid mechanics in the domain of aeronautics are increasingly costly. Due to the statistical nature of the flow of unsteady and turbulent fluids, data driven algorithms could potentially reduce the computational cost through trained reduced-models. In fluid mechanics for computer graphics, the abundant amount of high-fidelity simulations has been used for training deep neural networks to approximate the behavior of a complex solver [1], to compress and decompress fluid simulations [7] or to synthesize high-resolution fluid flows starting from low-resolution velocities or vorticities [10]. Among the different techniques from the deep learning community, Generative Adversarial Networks (GAN)s [2] are particularly interesting for our task. GANs aim to capture the data distribution such that they can then easily generate new realistic samples similar with the real ones. We can find a large number of papers that discuss empirical choices and heuristic ones in order to obtain a stable GAN’s architecture for a given domain of application. In the paper of Goodfellow et al. [2], the GANs were trained on a range of datasets including MNIST [3], the Toronto Face Database [4], and CIFAR-10 [5]. The empirical rules of the models architecture are based on rectifier linear activations and sigmoid activations for the generative model and maxout activations for the discriminative one. Dropout was used at the final layer only of the generator. Goodfellow showed also in [6] different domains of application of GANs other than the representation and manipulation of highdimensional realistic samples, such as the reinforcement learning and the super resolution. In [7], the authors applied deep neural networks for parameterized fluid simulations. They called these networks generative networks because the inputs are defined by a reduced latent space of the corresponding problem parameters. These generative models were however optimized during the training phase of an auto-encoder of which the reduced latent space is the corresponding parameters latent space. Hence, the decoder part of this latter auto-encoder is considered afterwards for the generation of parameterized solutions. In [9], it was proposed some constraints on the architectural topology of the GANs that make them stable to train. The authors proposed to use the batchnorm in both the generator and the discriminator, to remove fully connected hidden layers for deeper architectures, to use the ReLU activation in the generator for all layers except for the output, which uses Tanh, and to use LeakyReLU activation in the discriminator for all layers. These guidelines were applied on the generation of human faces with different poses and bedrooms images. As already stated in the abstract, we are interested in the use of DCGANs for engineering problems, more particularly in the domain of aeronautics, in order to generate simulations data (i.e. which are usually generated by the high-fidelity Partial Differential Equations PDEs) in an efficient fashion. More precisely, we

266

N. Akkari et al.

need to generate very fast realistic high-dimensional samples. This could be very useful in a design exploration procedure for finding the optimal one with respect to a given industrial criteria. In this work, we propose to study the performance and robustness of DCGANs in order to generate high-dimensional realistic samples of high-fidelity simulations associated with 2D incompressible and unsteady fluid flows, with geometrical variations. To our knowledge, it is the first time that DCGANs are applied to generate unsteady fluid flows with geometrical variations. Nevertheless, there exists applications of GANs in the field of numerical simulations in the aim to do a super-resolution and an upscaling of the simulations data, see for instance [10] for more details. The architectures of the generative and discriminative models used in this work are inspired from the DCGAN tutorial of Pytorch, see [8] for instance. These architectures involve the main guidelines we already mentioned for stable GANs guarantee. The paper is organised as follows: in Sect. 2 we show the simulations data set considered for the training phase of the GAN. In Sect. 3 we provide the models’ architectures, the choice of the gradient descent algorithm and the hyperparameters values such as the learning rate, the batch size and the generator inputs. In Sect. 4, we show the generated 2D fluid flows in the training and test phases after 40 epochs of training. In Sect. 5, we give some conclusions and prospects to this work.

2

Training Fluid Simulations

The training fluid simulations are constituted of 44 CFD simulations of unsteady and incompressible fluid flows in a 2D channel around a moving squared solid obstacle. The dimensions of this 2D channel are respectively x ∈ [0.0 m, 0.07536 m] and y ∈ [−0.006 m, 0.006 m]. The positions of this obstacle respectively with respect to the 44 training unsteady simulations are imposed randomly, as an immersed solid boundary using a level set function, see for instance [15]. This test case runs on a cartesian mesh discretized into 79 parts in the x − axis direction and 99 parts in the y−axis. The boundary conditions are respectively given by an inlet constant velocity equal to 2 m/s, an outlet condition and two wall boundary conditions, see Fig. 1 for a sample of these training unsteady simulations. The physical time for each one of these simulations is specified equal to 45 ms, and snapshots of the instantaneous velocity and pressure fields are extracted each 0.1 ms, i.e. we have 450 snapshots of the velocity and the pressure per simulation. This makes a total of 19800 3D-snapshots (2D-velocity field and 1D-pressure field) in the training set, of dimension respectively 3 × (79 × 99) (as we concatenated the velocity and the pressure fields for the training phase of the GAN).

GAN for Unsteady and Incompressible Flows

267

Fig. 1. Three chosen couples of the magnitude of the high-fidelity velocity fields (on the top) and the high-fidelity pressure fields (at the bottom) in a batch of the training set.

3

GAN and Discriminator Architectures and the Optimization Hyper-parameters

GAN and Discriminator Architectures. As already mentioned in the introduction, we adopted the configurations of the deep convolutional generative and discriminative models following the Pytorch tutorial on DCGAN [8], as it satisfies the main features and guidelines for stable GANs, that we found in the literature. The Optimization Hyper-parameters. In what follows, we enumerate the hyper-parameters choice for the optimization algorithm of the weights of the generative and discriminative models from [8]: – The 19800 snapshots are loaded using a dataloader in Pytorch. The workers number is taken equal to 4 in this case.

268

– – – – – – – – – – –

N. Akkari et al.

The data are scaled in the interval [−1, 1]. The data are resized into an image size equal to 64 × 64. The batch size is taken equal to 128. The number of channels is taken equal to 3, as we concatenate the twodimensional velocity field with the one dimensional pressure field. The size of feature maps in the generator is equal to 64. The size of feature maps in the discriminator is equal to 64. The ADAM optimizer is considered, with a learning rate equal to 0.0002. During the training phase, a random noise is added on the target labels of the genarator and the discriminator. The generative model inputs are random distributions of dimension 100. The epochs number is equal to 40, in order to obtain sharp and realistic generated samples. The number of GPU is equal to 1.

The training phase runs on one GPU node of the Tesla V100 of Nvidia. We are able to realise 40 epochs during 892 s.

4

Experimental Results

Generated 2D Fluid Flows by the Generator at the Last Epoch of the Training Phase. The generator loss and the discriminator one during the training stage are shown in Fig. 2. We can deduce from this plot that the training phase was very stable, this could be seen also by showing in Fig. 3 the generator outputs on a fixed noise, during the training stage. More precisely, we show the generator outputs at the last epoch of the training stage for 64 different inputs. These 64 output images are compared to 64 training images picked from a batch (of 128 images) of the dataloader. Newly Generated 2D Fluid Flows by Optimized-Generative Models Trained for 40 Epochs. In this part, we consider the application of the optimized generative model on a set of input distributions of dimension 100 which have been obtained as a linear interpolation between two fixed random distributions of dimension 100: this will allow us to verify the assumption that the generator did not memorize the training images and was able to generate new realistic ones. We consider this application for several times and we obtain the results in Fig. 4, 5, 6, 7, 8, 9, 10 and 11. We remark thanks to these results that the generator was able to learn the temporal evolution and the geometrical variation during the training phase. Variation Laws of the Newly Generated Data in the Space of Time. In the following part, we show some logarithmic laws describing the evolution of the velocity magnitude of the previously-like newly generated fields for given points of the fluid domain for instance, with respect to the physical simulation time. We claim to detect thanks to these laws, if the generator is producing time coherent

GAN for Unsteady and Incompressible Flows

269

Fig. 2. The generator loss in blue and the discriminator loss in orange.

Fig. 3. Comparison between 64 training images (RGB ones containing a concatenation of the velocity field and the pressure one) on the right and 64 fake ones generated after 40 epochs of the training phase.

velocity fields and/or velocity fields associated with different positions of the obstacle. We would like to precise that we are not considering the obstacle’s position variable as we do not have an explicit parameter that describes this variation, for instance. However, we are able to establish a non-linear logarithmic law of the velocity with respect to time if the obstacle’s position is changing and, a linear one if the simulation’s time is going forward or backward for a given channel configuration. Therefore, we might be able to establish, in real time, physical models with respect to design/time variables and to calibrate them

270

N. Akkari et al.

Fig. 4. Test phase: From the left to the right a newly generated couple of velocity and pressure fields from a linear interpolation between two fixed random distributions of dimension 100.

Fig. 5. Test phase: From the left to the right a newly generated couple of velocity and pressure fields from a linear interpolation between two fixed random distributions of dimension 100.

GAN for Unsteady and Incompressible Flows

271

Fig. 6. Test phase: From the left to the right a newly generated couples of velocity and pressure fields from a linear interpolation between two fixed random distributions of dimension 100.

Fig. 7. Test phase: From the left to the right three newly generated couples of velocity and pressure fields from a linear interpolation between three couples of fixed random distributions of dimension 100.

272

N. Akkari et al.

Fig. 8. Test phase: From the left to the right three newly generated couples of velocity and pressure fields from a linear interpolation between three couples of fixed random distributions of dimension 100.

Fig. 9. Test phase: From the left to the right three newly generated couples of velocity and pressure fields from a linear interpolation between three couples of fixed random distributions of dimension 100.

with respect to data of realistic behavior and distribution generated thanks to the DCGAN. In Fig. 12, 13 and 14, we confirm the different logarithmic laws with respect to time. Moreover, we recover the fact that the distance between the generated fields from the interpolation distributions and the first generated field tends to increase with a decreasing rate, see Fig. 12, 13 and 14.

GAN for Unsteady and Incompressible Flows

273

Fig. 10. Test phase: From the left to the right three newly generated couples of velocity and pressure fields from a linear interpolation between three couples of fixed random distributions of dimension 100.

Fig. 11. Test phase: From the left to the right three newly generated couples of velocity and pressure fields from a linear interpolation between three couples of fixed random distributions of dimension 100.

5

Conclusions and Prospects

In this paper, we applied Deep Convolutional Generative Adversarial networks DCGANs for the generation of unsteady and incompressible 2D fluid flows in the wake of a moving squared obstacle: the latent reduced space being of dimension 100 and the high-fidelity fields of dimension 3 × (79 × 99). The newly generated 2D fluid flows by the optimized generative model with random distribution

274

N. Akkari et al.

Fig. 12. Test phase: Evolution of ln(d(V − V0 )) with respect to ln(|t − t0 |): d is the l2 − norm on given points of the fluid domain, V denotes the magnitude of the generated velocity from an interpolation point and V0 denotes the magnitude of the generated velocity field from the first interpolation random distribution. t is the time value.

Fig. 13. Test phase: Evolution of ln(d(V − V0 )) with respect to ln(|t − t0 |).

inputs, are very interesting. We were able to identify different realistic characteristics inherited from the real training samples, such as the temporal coherence and the geometrical variability. This is a first step towards the efficient manipulation and representation of high-dimensional realistic numerical data in engineering sciences. Moreover, we were able to quantify the GPU cost needed for this learning phase: 892 s on one GPU node of the Tesla V100 of Nvidia. These results indicate that the GPU cost of the generator training might become very important on a data set with 3D geometrical and unsteady fluid flows. In the prospects of this work, we want to be able to prove that the generative model

GAN for Unsteady and Incompressible Flows

275

Fig. 14. Test phase: Evolution of ln(d(V − V0 )) with respect to ln(|t − t0 |).

is always not memorizing the training samples. This is very important in order to accomplish the desired objectives of the fast design conception in engineering sciences problems. Moreover, these results are very encouraging and promising for further future studies, where we would like to assist a large number of newly generated velocity and pressure fields by the generative model thanks to physical reduced order modeling based on the projection of the high-fidelity equations on a Proper Orthogonal Decomposition (POD) basis with the generated fields, see for instance [11–14]. In other words, we claim to add more physical constraints to these GAN-generated velocity and pressure fields by resolving efficient reduced order systems of the Navier-Stokes equations.

References 1. Tompson, J., Schlachter, K., Sprechmann, P., Perlin, K.: Accelerating Eulerian fluid simulation with convolutional networks. arXiv 2016 (2016). https://arxiv. org/abs/1607.03597 2. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. In: Proceedings of the International Conference on Neural Information Processing Systems (NIPS 2014), pp. 2672–2680 (2014) 3. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 4. Susskind, J., Anderson, A., and Hinton, G.E.: The Toronto face dataset. Technical report UTML TR 2010-001, U. Toronto (2010) 5. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009) 6. Goodfellow, I.J.: NIPS 2016 Tutorial. arXiv:1701.00160 7. Byungsoo, K., Vinicius, C.A., Nils, T., Theodore, K., Markus, G., Barbara, S.: Deep fluids: a generative network for parameterized fluid simulations. Eurographics 38(2), 59–70 (2019)

276

N. Akkari et al.

8. Inkawhich, N.: Pytorch tutorial, DCGAN. https://pytorch.org/tutorials/beginner/ dcgan faces tutorial.html 9. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: Conference paper at ICLR 2016 (2016) 10. Xie, Y., Franz, E., Chu, M., Thuerey, N.: tempGAN: a temporally coherent, volumetric gan for super-resolution fluid flow. ACM Trans. Graph. 37, 4 (2018). Article 95. arXiv:1801.09710 11. Akkari, N., Mercier, R., Moureau, V.: Geometrical reduced order modeling (ROM) by proper orthogonal decomposition (POD) for the incompressible Navier Stokes equations. In: 2018 AIAA Aerospace Sciences Meeting, AIAA SciTech Forum, (AIAA 2018-1827) (2018) 12. Akkari, N., Casenave, F., Moureau, V.: Time stable reduced order modeling by an enhanced reduced order basis of the turbulent and incompressible 3D Navier– Stokes equations. Math. Comput. Appl. 24(45), 2019. http://www.mdpi.com/ 2297-8747/24/2/45 13. Akkari, N.: A Velocity Potential Preserving Reduced Order Approach for the Incompressible and Unsteady Navier-stokes Equations. AIAA Scitech forum and exposition (2020) 14. Akkari, N., Casenave, F., Ryckelynck, D.: A novel Gappy reduced order method to capture non-parameterized geometrical variation in fluid dynamics problems. Working paper (2019). https://hal.archives-ouvertes.fr/hal-02344342 15. Abgrall, R., Beaugendre, H., Dobrzynski, C.: An immersed boundary method using unstructured anisotropic mesh adaptation combined with level-sets and penalization techniques. J. Comput. Phys. 257(Part A), 83–101 (2014)

Improving Gate Decision Making Rationality with Machine Learning Mark van der Pas1(B) and Niels van der Pas2 1 European Center for Digital Transformation, Roermond, The Netherlands

[email protected] 2 European Center for Machine Learning, Roermond, The Netherlands

Abstract. Canceling ideas and projects is an important part of the Innovation Portfolio Management (IPM) process as stopping the unsuccessful ones avoids sunk costs and sets free resources for successful ideas and projects. A large body of literature is available on the decision making in IPM. In this study, we analyzed within IPM the cancellation of ideas and projects by gatekeeping boards as well as the possibilities of applying machine learning. The hypotheses were tested with data from three large European telecommunication organizations. In total the three organizations shared 9,118 canceled ideas and projects of which 0.3% was canceled in the gatekeeping boards. 2.7% of the 1,469 gate requests on the agenda of these boards were canceled. The dataset of one organization was used to train four machine learning models to predict the likelihood of idea and project cancellation. The first model is trained to predict the ideas that will be canceled before the first gate approval, the second model does the same for projects being canceled after the first gate is approved but before the second gate is approved. Models three and four are trained for projects in a later phase. The fourth model predicts the cancellation of projects that hold a go on implementation approval. All four hold a high area under the curve of at least 0.802 turning them into possibly valuable instruments for predicting project cancellation. Keywords: Innovation Portfolio Management · Mortality curve · Machine learning

1 Introduction Where numerous ideas for innovations are generated and present in organizations, not all these are turned into projects and implemented. Several mechanisms can be used by organizations to determine which ideas to realize or if they are not realized, which to cancel. These mechanisms could embrace the ideas with the strongest contribution to realizing organizational goals whilst canceling the remaining ideas. It is generally accepted that the ideas and projects that need to be stopped are to be canceled as early as possible [11, 28]. Sifting out successful value contributing ideas from the total set of ideas is not always done error-free. Back in 2004 Cooper, Edgett and Kleinschmidt claimed that products © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 277–290, 2020. https://doi.org/10.1007/978-3-030-52246-9_19

278

M. van der Pas and N. van der Pas

fail at an alarming rate; amongst others, they found that 21.3% of businesses’ total NPD efforts meet their project objectives [5]. In a more recent study Van der Pas and Furneaux [29] found that 43% of new products and 69% of cost-saving investments deliver at least 80% of the expected revenue whereas 26% of new products and 7% of cost-saving investments never earned back more than the cost they occurred and thus delivered a bottom-line negative value. Practitioners from different telecom organizations explained several mechanisms to cancel ideas and projects. E.g. some ideas never get captured as the idea generator never takes the effort to register the idea. In a Spanish telecommunication organization, the capturing of ideas was structured in a way that the registering of an idea could only be done by a small group of employees. Idea generators that could not convince their colleagues would see their idea being canceled even before it was formally captured in a process or system. For the captured ideas that were registered, other mechanisms were identified in telecommunication organizations. Practitioners from a German organization explained that ideas, as well as projects, were canceled automatically after nobody worked on them for 90 days. In a Dutch organization, the ideas were only worked on, if they were budgeted for that year. Idea generators needed to wait for the next year budget cycle in case their ideas are not on the budget list. During that waiting time, a lot of ideas lost the initial energy and never attempted to be on the next budget cycle. Practitioners from organizations from several countries explained the usage of IRR and payback hurdles. Not passing these would mean the cancellation of an idea or project. This study focusses on two specific instruments to cancel ideas and projects. First, a well-known governance instrument, being a board to approve or reject innovation calls is studied. Focus is on the contribution of these boards on canceling ideas and projects. Second is the usage of machine learning to predict the probability that an idea or project will be canceled. The purpose of this study is to better understand these instruments so organizations can push the quality of decision making as well as the structure of their innovation funnel. This could push their Innovation Portfolio Management (IPM) performance which is important as it is related to firm performance [24]. Researchers can use the output of this study to better understand the contribution these instruments hold on canceling ideas and projects. This brings us to the research question: what contribution can decisions of gatekeeping boards and machine learning deliver in canceling ideas and projects? On our research we elaborate in the following sections commencing with a discussion of our theoretical framework including the resulting hypotheses. We then outline our research methodology and present our results. Finally, we present our conclusion as well as the managerial implications, limitations and suggestions for future research.

2 Theoretical Framework Innovation Portfolio Management [25] is defined as the dynamic decision-making process in which projects are evaluated and selected, and resources are allocated to them [19]. This process can be used to strive for an efficient portfolio [22] meaning that it is impossible to obtain a greater average return (benefits and costs) without incurring a greater standard deviation (risk) or it is impossible to obtain a smaller standard deviation

Improving Gate Decision Making Rationality with Machine Learning

279

without giving up return. Alternatively, it can be focused on growing profitability over the long term [18] or on selecting high-value projects in a balanced portfolio reflecting the business strategy with a good balance between projects and available resources [5, 6]. Regardless of the goals pursued with Innovation Portfolio Management, IPM is the decision-making process to select and cancel projects. The decision-making process can be organized along gates. A well-known example of this is the StageGate® process that holds gates and is used to evaluate, select or cancel projects by managing ideas through key steps into launched products [8, 10]. The gates in the StageGate® process are formal decision points that split the project into multiple stages. Cooper [9] presents a Stage-Gate overview with five gates and five stages. Alternative models use different setups with a different number of gates or a different split of work between those gates (e.g. [13, 14]). Before an idea enters this formal product development process it is in the pre-Stage-Gate® [7]. This ‘front end’ of IPM can include work as: technical feasibility, financial viability analysis, business model development and business plan preparation [20]. In this study, we focus on the captured ideas and follow these through the formal gate decisions. Within those, we studied the early decisions up to the decision to technically realize the project and to start the development [2]. With this, we capture the decisions determining the allocation of a large part of the project resources. On top, these IPM early phases are important as they determine the structure of the innovation funnel and have even shown to influence product success, time to market and financial performance [21]. Captured ideas can be canceled by formally rejecting a requested gate by the gatekeeper or by not requesting the next gate. In the latter case, the “decision” to cancel the project is not a gate decision but is made in a stage of the gated process. This can be an offline decision by senior management, but the project can also be abandoned and left to bleed to death. This cancellation of projects can be supported by numerous reasons. Examples are not passing a financial threshold [12] and extensive project risks [22]. Other reasons mentioned are that the idea or project cannot be financed, is technically not feasible, does not fit to the strategy or into the existing product portfolio, is illegal, is immoral, can hurt the company’s reputation, is politically not wanted, the initiator credibility is too low etc. Gate approvals are normally organized in review teams [14], boards [11] or with experienced gatekeepers [10]. Different to this we have seen in several telecommunication organizations IPM structures where early gates were approved by one person. Practitioners have chosen this set up as it assured a clear commitment of one manager as the overall end responsible manager of the investment since he or she approved the start of the initiative. Governance that empowers to approve the first gate by an individual senior manager and the next gate by a board also holds an effect on escalation of commitment [11, 30]. Since the board did not commit to the first gate it can decide on the second gate without any prior commitment immunizing it to escalation of commitment. Furthermore, practitioners were convinced that turning down an idea by one senior manager holds a less negative impact on the idea generator than turning it down in front of a board with multiple senior gatekeepers. And even though an idea was not prioritized, the organizations are nevertheless interested in the upcoming, better proposals of that

280

M. van der Pas and N. van der Pas

idea generator. Finally, practitioners set up this governance as they were convinced that a large part of the requests on the agenda of the board is being approved. They believed this was valid for IPM as for all kinds of other board decisions and considered a board as an inappropriate instrument to turn down large amounts of investments calls. As one CEO put it ‘the colleague determining the agenda holds more influence on what the organization will be doing than every board member alone and all board members together’. Although no exact figures were known this CEO claimed that 85% or even more of the requested gates on the board agenda were approved. Several reasons were mentioned for a high board approval rate. In one organization the board members agreed to work as a team and to stop fighting and arguing. This could steer board participants to avoid tough decisions (assuming that approving is easier than canceling). The group dynamic in the gate approving board could also influence the cancellation rate as board members do not criticize or reject proposals of board colleagues to avoid future push back on their own board requests. Finally, it was mentioned that board presentations are prepared in detail and hold a lot of upfront (emotional) investment. Turning down an idea, especially in front of a board with senior managers would kill this positive energy and even lead to a culture where innovative ideas are not respected. Board members could be reluctant to cancel projects as they value innovations. In hypothesis 1 we test the practitioners’ thesis that more than 85% of requests are being approved by the gatekeeping board: H1. More than 85% of gate requests on the agenda of a board is being approved In case more than 85% of the gate requests on the agenda of boards are approved and less than 15% is rejected then the latter is, as compared to published cancel rates, a low rate. Cooper, Edgett and Kleinschmidt [5] found that average businesses kill 19.0% of projects that entered the development stage prior to launch. As IPM also includes the stages before the development in which ideas and projects can be canceled, the IPM cancellation rate will be higher than 19%. The cancellation rate can be visualized and benchmarked with mortality curves presenting the progressive rejections of projects through the stages of the gated process [13]. Although the structure of the mortality curve improved strongly from 1982 to 2004 approximately 75% of the captured ideas never make it to commercialization [2]. The termination rates are even higher in case ideas are captured outside the organization by customers or suppliers. Mendely canceled 91.1% (feedback.mendely.com) of the captured ideas, the Idea boxes of Ericsson canceled 96.3% [3] whereas Ideastorm [16] had a cancellation rate of 97.8% and MyStarbucksIdeas canceled 99.8% [15]. Based on the published cancellation rates and the low number of gate requests rejected by boards we expect that a large part of the cancellation is not done by gatekeepers in their boards. The cancellations outside board meetings would support the suggestion that IPM is a decision-making process made up of more than gate decisions at single points of time [19]. In the second hypothesis, we investigate the contribution of gatekeeping boards to the total amount of cancellations. H2. Decisions of gatekeeping boards deliver a contribution of less than 5% of the total number of canceled initiatives.

Improving Gate Decision Making Rationality with Machine Learning

281

Organizations hold an interest in the likelihood of cancellation as it can be used to avoid sunk cost and to mitigate the risks that lead to cancellation. If cancellation is a rational reproducible process, the organization needs all the data points relevant as well as the capacity to process them to predict cancellation. Relevant data points can be split into structured and unstructured data. Examples of structured data are a classification of the innovation e.g. into new product, sales expansion and cost reduction [4] or into new-to-the-world, new-to-the-firm, major revisions and incremental improvements, repositioning, cost reduction [13] as well as the name of the initiator and of the project leader, the time-to-market in calendar days, the NPV and project budget. Unstructured data are the text documents describing the product, the business case or the weekly report of the project leader. Thousands of structured data points can be found on each project and the unstructured data can be dozens of pages of documentation. A lot of project information is scattered over emails, presentations, minutes, spreadsheets and text documents spread over all kind of (personal) drives. It is also available in databases, on whiteboards or ‘in the air’ for spoken words. Collecting this data is a task consuming significant resources. Collecting, internalizing and processing project data is due to the size of it challenging for most human beings (from here on called organizational agents) especially when considering portfolios of hundreds or even thousands of projects. Also, the time component makes this analysis of projects challenging as the project data evolves together with the project over time. As an alternative or a complementary instrument to organizational agents processing this data, machine learning could be applied. Machine learning models could be trained to predict the likelihood of cancellation based on the available information. Machine learning can be an interesting IPM instrument in case it can outperform organizational agents’ performance, that is in case it can better predict project cancellation than organizational agents. Given the size and complexity of the data, we expect machine learning to outperform organizational agents in predicting the likelihood of project cancellation. H3. Machine learning can outperform organizational agents in predicting the likelihood of projects cancellation The amount, level of detail and quality of information is expected to grow as projects progress along the stages and gates. Furthermore, due to the structure of the mortality rate the percentage of canceled projects normally reduces when more gates are passed. Based on these two effects we expect that the predictability improves when different machine learning models are used for the different gates. H4. Machine learning performance improves when different models are used for each of the gates.

3 Methodology In this study, we applied data from three European telecommunication organizations. These organizations run operations in three different countries and all three used boards as gatekeepers. In two organizations (A and B) the first three gates are approved by boards, whereas the third organization (C) holds a governance where the first gate was approved

282

M. van der Pas and N. van der Pas

by a senior manager and the second and third gates are approved by gatekeeping boards. The board approvals are recorded in minutes and the senior manager approval is captured over an automated workflow. Two organizations (A and C) hold a low threshold to enter ideas into the NPD process, by allowing every employee to enter ideas in a low threshold, browser-based, system. In organization B the ideas are entered by a PMO-organization. The idea owner contacts the PMO and requests them to enter the idea. The organizations have included the activities normally defined in the front end of IPM into their stage gated process. The exact allocation of activities into the early stages differ between the three organizations but they all hold three gates before the actual development and realization of the idea starts. As the names of the gates in the organizations differ, we normalized them for this study as Gate 1, 2 and 3. The data we received for H1 and H2 covered different time frames ranging from August 2008 up to August 2018. In total, we received information on 9,118 canceled projects and 1,469 early gate decisions. Details on the timeframe as well as the number of decisions are in Table 1. Table 1. The number of board decisions as well as the total number of cancelations for the three organizations Organization A

Organization B

Organization C

Time frame covered

November 2014–September 2016

February 2017–July 2018

August 2008–August 2018

Number of board gate decisions

285

126

1,058

Average decisions per month

12.8

7.0

8.8

Number of cancelled projects

523

67

8,528

For H1 we used the number of approvals, cancellations as well as the number of gate requests (being the sum of approvals and rejections) on the agenda of the gatekeeping boards. For postponed gate decisions we neglected the delay and counted the outcome of the decision. Some of the cancellations turned out to be temporary. For that we checked whether a rejected gate was approved in a later stage; in those cases, we counted both decisions together as one approval since the gate was finally approved. For H1 we calculated the cancellations divided by the number of topics on the agenda. For H2 we compared the number of board cancellations to the total number of cancellations during the time frame defined by the received board meeting minutes. H3 and H4 were tested using data from organization A. In total 2,451 ideas were captured that were canceled or launched in the time frame from November 2014 up to September 2019. Figure 1 presents the mortality curve for this data set. For each of the 2,451 ideas, we received 37 data points. These data points were used to train the machine learning models for H3 and H4. The output of the model (Y) is a

Improving Gate Decision Making Rationality with Machine Learning

283

3000

Number of projects

2500 2000 1500 1000 500 0

Pre Gate 1

Gate 1

Gate 2

Gate 3

Gates Fig. 1. Mortality curve of the data set for H3 and 4 from organization A

Boolean parameter with the values 1 for canceled and 0 for implemented and launched. The remaining data points are features on: 1. The project organization. This includes categorical identifiers for the demand owner who initiated the idea as well as the project leader. It also includes Booleans on departments involved in the project (e.g. IT, Technology and Digital). 2. Project type. The project type includes a classification on business-to-consumer products, business-to-business products or business-to-partner products, cost-saving projects but also an indication on the Net Promoter Score as well as the targeted number of customers affected. 3. Project financials. This includes information on the estimated project cost as well as net present value and payback period. 4. Time to Market. Information on the date the idea was captured as well as information on the estimated duration in days between gates 1, 2, 3 and the closing of the project. 5. Project execution. Information on the number of days a project was on hold, how long a project was flagged with a red risk and how often it was flagged with red risks. The data is used to feed a linear model-based learning algorithm using binary classification and the loss was minimized using stochastic gradient descent. The dataset was shuffled randomly and from that set, the first 70% was used to train the model. The trained model was evaluated with the remaining 30%. Four models were created. In the first model, the Pre Gate 1, all 2,451 captured ideas were used, in the second, Gate 1, model all captured ideas with an approved gate 1 were used. The third, Gate 2, model used all projects with an approved gate 2 and in the fourth, Gate 3, model an approved gate 3 was required. Some data points are only available after certain gates have been approved. E.g. the approval of gates 1 and 2 as well as any budget approvals are unknown in organization A before gate 1 is approved. In the trained models only data points were included that are available at the time of decision making.

284

M. van der Pas and N. van der Pas

To test H3 we compared the trained machine learning models to organizational agents’ performance. Based on the data set the following results of the machine learning models are known: • true positive (TP): the model predicted a cancellation and the project was canceled, • false positive (FP): the model predicted a cancellation and the project was launched and implemented, • true negative (TN): the model predicted a launch and implementation and the project was launched and implemented and • false negative (FN): the model predicted a launch and implementation and the project was canceled. The dataset includes ideas and projects that were captured in organization A. Based on inputs from practitioners we assume that they could only be captured in the case at least one colleague expects the idea to be a success (the idea capturing colleague). Against this expectation, false and true negatives can be measured as a cancellation or a launch and implemented project. Due to this baseline which only includes expected launches both the true and false positives of the ideas of organizational agents are unknown. Metrics like recall, accuracy, precision and false positive rate cannot be calculated for organizational agents as they all require false or true positives. The percentage of false negatives of the total data set can be calculated for both machine learning as well as organizational agents’ performance. This false negative percentage is defined as: false negative percentage =

FN FN + TN

(1)

The thresholds used in the machine learning models can be used to optimize the false negative rate. For this reason, we test H3 with two thresholds: 0.5 and 0.33. The first was chosen as the de facto default in machine learning models, the second since it optimizes significantly towards our focus area for H3: the false negative percentage. H4 was tested using new machine learning models trained for pre gate 1 and gate 3 from hypothesis 3. As stated earlier, 30% of the total dataset was used to evaluate the pre gate 1 model. Of this 30%, the projects without an approval for gate 3 were filtered out, leaving projects from the evaluation dataset with a gate 3 approval. It is likely that the training data set used to create the gate 3 model included projects that were used to evaluate the pre gate 1 model. Evaluating a model on the same data it was trained by is considered a bad research method as it may give unreliable results due to the risk of overfitting. Therefore, a new gate 3 model will be trained on the total dataset for gate 3 excluding the gate 3 approved projects in the evaluation set of the pre gate 1 model. The gate 3 model holds additional features not included in the pre gate 1 model. These additional features are excluded when using the pre gate 1 model but included in evaluating the gate 3 method. This approach allows testing both models using the same projects whilst these projects have not been used for training the models. As described e.g. by Powers, recall, accuracy, precision and false positive rate (FPR) are measures to evaluate machine learning models [26], these measures are defined as follows: Recall =

TP TP + FN

(2)

Improving Gate Decision Making Rationality with Machine Learning

TP TP + FP

(3)

TN + TP TN + TP + FN + FP

(4)

Precision = Accuracy =

285

FPR =

FP FP + TN

(5)

By comparing these four measures as well as the AUC of both models, H4 can be tested.

4 Results The cancellation rates of the gatekeeping boards are shown in Table 2. From the 1,469 captured gate decisions over the three organizations, 1,430 (= 97,3%) were approved. Organization A holds with 95.4% the lowest approval rate. In organization B 100% of the requested gates were approved; this can be explained by the way ideas are captured as in organization B only a select group of senior managers is empowered to capture and register an idea. Overall in the three organizations in this study, the percentage of approved board requests is clearly above 85%. H1 was tested with a binominal test with N = 1,569 and p = 0.85, the cumulative probability of getting up to 1,430 approvals equals 1.00, based on this H1 is accepted. Table 2. The percentage of approvals and cancelations by gatekeeping boards Organization A

Organization B

Organization C

Total

Approved

95.4%

100.0%

97.5%

97.3%

Canceled

4.6%

0.0%

2.5%

2.7%

100.0%

100.0%

100.0%

100.0%

Total

Table 3 shows the percentage of gates canceled in the gatekeeping board (in board) as well as the percentage of cancellations outside the board (not in board). On average over all three organizations, 99.6% of the cancellations are done outside the gatekeeping board. The gatekeeping board of organization A accounts for 2.5% of the total cancellations in that organization and since no cancellations were made by the gatekeeping board of organization B, all cancellations were done outside the board. With 0.4% of all cancellations made in gatekeeping boards, it seems to support H2. H2 was also tested with a binominal test with N = 9,118 and p = 0.05. 1.00 minus the probability of getting 39 or fewer cancellations is also 1.00 allowing us to accept H2. The results of the tests of the four machine learning models are presented in Table 4. The models show the area under the curve (AUC) of 0.802 or higher. An AUC of 0.5 is equal to guessing at random and the closer the figure is to 1 the better the prediction. The AUC for all four models is classified as “very good” or better.

286

M. van der Pas and N. van der Pas Table 3. The percentage of cancellations in a board as well as outside the board Organization A

In board

Organization C

Total

2.5%

0.0%

97.5%

100.0%

99.7%

99.6%

100.0%

100.0%

100.0%

100.0%

Not in board Total

Organization B

0.3%

0.4%

Table 4. The area under the curve of the four machine learning models Implemented and launched projects (y = 0)

Canceled (y = 1)

Total

AUC

1,004

1,447

2,451

0.942

Gate 1

960

308

1,268

0.871

Gate 2

734

118

852

0.859

Gate 3

727

105

832

0.802

Pre Gate 1

Table 5. False negative percentage of organizational agents and the machine learning models with thresholds of 0.5 and 0.33 Organizational agents Machine learning threshold Machine learning threshold 0.5 0.33 Pre Gate 1 0.590

0.103

0.033

Gate 1

0.243

0.142

0.110

Gate 2

0.138

0.073

0.051

Gate 3

0.126

0.082

0.081

The false negative percentage for both the current decisions of organizational agents as well as the machine learning model are presented in Table 5. Machine learning outperforms organizational agents based on false negative percentage by a factor of 1.5 to 5.7 for threshold 0.5 and 1.5 to 17.9 for the threshold 0.33. Based on this H3 is accepted. To test H4 we used the model created for pre gate 1 and evaluated it with data from gate 3. Since the models are now evaluated with different test data compared to the models created for H3, the AUC might differ from H3. The AUC as well as the four metrics; recall, accuracy, precision, and false positive rate, are presented in Table 6. For the metrics, the threshold was set to 0.5. The gate 3 model returns a better AUC as well as better precision and a lower false positive rate. The recall of the Pre Gate 1 model is better. Based on the strong reduction of the recall we cannot accept H4 for this dataset.

Improving Gate Decision Making Rationality with Machine Learning

287

Table 6. AUC and metrics of the Pre Gate 1 and Gate 3 machine learning models AUC

Recall

Accuracy

Precision

False positive rate

Pre Gate 1

0.800

0.581

0.884

0.529

0.073

Gate 3

0.824

0.161

0.884

0.625

0.014

5 Conclusion H1, more than 85% of gate requests on the agenda of a board is being approved, has been accepted based on a binominal test with N = 1,569 and p = 0.85. The cumulative probability of getting up to 1,430 approvals equals 1.00, which is sufficient to accept the hypothesis. H2, decisions of gatekeeping boards deliver a contribution of less than 5% of the total number of canceled initiatives, has also been accepted based on a binominal test with N = 9,118 and p = 0.05. 1.00 minus the probability of getting 39 or fewer cancellations is also 1.00. Based on this H2 was accepted. H3, machine learning can outperform organizational agents in predicting the likelihood of projects cancellation, has been accepted on data from Table 5. Machine learning outperforms organizational agents based on false negative percentage by a factor of 1.5 to 5.7 for threshold 0.5 and 1.5 to 17.9 for the threshold 0.33. H4, machine learning performance improves when different models are used for each of the gates, has been rejected based on data from Table 6. The rejection was mainly based on the strong reduction of the recall between the models.

6 Discussion and Managerial Implications According to H1 gatekeeping boards approve most of the requests on the agenda. For practitioners, this creates more insights on the limitations of gatekeeping boards as the decisions of these boards will only help in a modest way in canceling project calls. Furthermore, accepting H2 makes clear that these boards are, in the studied organizations, not in use as a main instrument to cancel initiatives; canceling projects in IPM is done outside the gates. The conclusion that gatekeeping boards are not a main cancellation instrument does not mean that these boards hold a limited contribution to the cancellation of projects. Practitioners could use the agenda of the board as an important cancellation instrument by blocking projects, in the preparation process for the boards, from the agenda. These projects blocked from the agenda are cancelled outside the board without the need of a cancellation by the gatekeeping board. A gatekeeping board can also support cancellation by defining clear thresholds. A project could be canceled outside the board as it becomes clear that it cannot pass the board defined threshold of e.g. an IRR > 20%. With AUCs that can be tuned above 0.800, the generated machine learning models hold an interesting capability in predicting the likelihood of project cancellation. Machine learning could provide additional information that the organizational agents

288

M. van der Pas and N. van der Pas

can consider. E.g. gatekeepers could be informed on the probability of cancellation. So, each gate request would be accompanied by a figure between 0 and 1. Furthermore they could include previous gate requests that got a similar valuation from the machine learning models. This enables gatekeepers to compare the probability of cancellation. Project leaders could run scenario analyses by changing features and studying the effects of those changes on the predictions. Machine learning could be a new category of instruments supporting IPM complementary to known instruments defined by e.g. Mauerhoefer, Strese and Brettel [23]. Currently, the performance of machine learning models and organizational agents is compared based on the false negative percentage. A high false negative percentage is an indication for sunk costs, reducing it could reduce sunk costs. For that, the machine learning models can be used to support gate keeper decision making to reduce sunk costs.

7 Limitations and Future Research The hypotheses are only tested on telecommunication companies operating in Europe. Innovations in this industry are almost completely executed using IT. No physical production processes nor any distribution optimizations were in the scope. This limits the possibility to apply the findings to manufacturing, logistics and services industries. The findings can probably be transferred more easily to finance, IT or insurance industries. In case the machine learning findings are used to reduce the cancellation rate as studied in H3 and the counteractions are successful then this could reduce the accuracy of the machine learning models. In future research, these countermeasures should be used to train the machine learning models. Future research could also focus on selffulfilling prophecy which happens in case organizations decide to cancel projects based on machine learning models predicting the cancellation. Adding additional insights into the decision process based on machine learning does not guarantee an improvement of the outcome of that process. Projects are carried on even when there are clear indications that the project needs to be canceled [27]. Ajzen developed the theory of planned behavior [1] that can be used to explain organizational agents’ behavior [31]. Behavior is amongst others influenced by the subjective norm being the inputs received from other persons. This influence can, but is not per definition, be aligned with rationality. Kahneman has done extensive research on the limitations of humans in acting in an optimized way [17]. Nevertheless, the usage of a conscious rational thinking style outweighs the disadvantages of the limitations of rationality of organizational agents [11]. The machine learning models were only trained with project metadata such as the budget, NPV, and time-to-market. The actual content of the project (what is the product the project is creating) was not an input parameter. In future research, this could be added to the features to improve the models. A further limitation of this study is that the false kills are left out of scope. False kills are ideas or projects that are canceled even though they would have been a success. Both gatekeepers, as well as machine learning models, are prone to false kills, but the data provided did not allow us to study this effect. A future study could focus on the canceled projects and focus on the false kills.

Improving Gate Decision Making Rationality with Machine Learning

289

Finally, we only studied the cancellation process. In future research machine learning models could be tested to predict budget overruns, not delivering on time and even the percentage of expected value delivered. Acknowledgements. We would like to thank Stijn van Rozendaal for setting up together the first machine learning models as well as Ruben van der Linden and Dan Heinen for optimizing the models. Finally, we thank Dominik Mahr for his inputs and thoughts on this paper.

References 1. Ajzen, I.: The theory of planned behavior. Organ. Behav. Hum. Decis. Process. 50(2), 179–211 (1991) 2. Barczak, G., Griffin, A., Kahn, K.B.: Perspective: trends and drivers of success in NPD practices: results of the 2003 PDMA best practices study. J. Prod. Innov. Manage 26(1), 3–23 (2009) 3. Björk, J., Karlsson, M.P., Magnusson, M.: Turning ideas into innovations - introducing demand-driven collaborative ideation. Int. J. Innov. Reg. Dev. 5(4–5), 429–442 (2014) 4. Bower, J.L.: Managing the Resource Allocation Process, Revised edn. Harvard Business Review Press, Boston (1986) 5. Cooper, R.G., Edgett, S.J., Kleinschmidt, E.J.: Benchmarking best NPD practices I. Res. Technol. Manage. 47(1), 31–43 (2004) 6. Cooper, R.G., Edgett, S.J., Kleinschmidt, E.J.: Benchmarking best NPD practices II. Res. Technol. Manage. 47(3), 50–59 (2004) 7. Cooper, R.G.: Fixing the fuzzy front end of the new product process: building the business case. CMA Mag. 71(8), 21–23 (1997) 8. Cooper, R.G.: Perspective: the stage-gate® idea-to-launch process-update, what’s new, and nextgen systems. J. Prod. Innov. Manage 25(3), 213–232 (2008) 9. Cooper, R.G.: Stage-gate systems: a new tool for managing new products. Bus. Horiz. 33(3), 44–54 (1990) 10. Cooper, R.G.: The invisible success factors in product innovation. J. Prod. Innov. Manage 16(2), 115–133 (1999) 11. Eliens, R., Eling, K., Gelper, S., Langerak, F.: Rational versus intuitive gatekeeping: escalation of commitment in the front end of NPD. J. Prod. Innov. Manage 35(6), 890–907 (2018) 12. Figueiredo, P.S., Loiola, E.: Ehancing new product development (NPD) portfolio performance by shaping the development funnel. J. Technol. Manage. Innov. 7(4), 20–35 (2012) 13. Griffin, A.: PDMA research on new product development practice: updating trends and benchmarking best practices. J. Prod. Innov. Manage 14(6), 429–458 (1997) 14. Grönlund, J., Rönnberg-Sjödin, D., Frishammer, J.: Open innovation and the stage-gate process: a revised model for new product development. Calif. Manage. Rev. 52(3), 106–131 (2010) 15. Hossain, M., Islam, K.M.Z.: Generating ideas on online platforms: a case study of “My Starbucks Idea”. Arab Econ. Bus. J. 10(2), 102–111 (2015) 16. IdeaStorm Homepage. http://www.ideastorm.com. Accessed 06 Aug 2018 17. Kahneman, D.: Thinking Fast and Slow. Penguin Books Ltd, London (2011) 18. Kester, L., Griffin, A., Hultink, E.J., Lauche, K.: Exploring portfolio decision-making processes. J. Prod. Innov. Manage 28(5), 641–661 (2011) 19. Kock, A., Heising, W., Gemünden, H.G.: Antecedents to decision-making quality and agility in innovation portfolio management. J. Prod. Innov. Manage 33(6), 670–686 (2016)

290

M. van der Pas and N. van der Pas

20. Markham, S.K., Ward, S.J., Aiman-Smith, L., Kingon, A.I.: The valley of death as context for role theory in product innovation. J. Prod. Innov. Manage 27(3), 402–417 (2010) 21. Markham, S.K.: The impact of front-end innovation activities on product performance. J. Prod. Innov. Manage 30(S1), 77–92 (2013) 22. Markowitz, H.M.: Portfolio Selection: Efficient Diversification of Investment. BookCrafters, Michigan (1959) 23. Mauerhoefer, T., Strese, S., Brettel, M.: The impact of information technology on new product development performance. J. Prod. Innov. Manage 34(6), 719–738 (2017) 24. McNally, R.C., Durmus, oˇglu, S.S., Calantone, R.J.: New product portfolio management decisions: antecedents and consequences. J. Prod. Innov. Manage 30(2), 245–261 (2012) 25. Meifort, A.: Innovation portfolio management: a synthesis and research agenda. Creativity Innov. Manage. 25(2), 251–269 (2016) 26. Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2, 37–63 (2011) 27. Schmidt, J.B., Calantone, R.J.: Escalation of commitment during new product development. J. Acad. Mark. Sci. 30(2), 103–118 (2002) 28. Unger, B.N., Kock, A., Gemünden, H.G., Jonas, D.: Enforcing strategic fit of project portfolios by project termination: an empirical study on senior management involvement. Int. J. Proj. Manage. 30(6), 675–685 (2012) 29. Van der Pas, M., Furneaux, B.: Improving the predictability of it investment business value. In: 2015 ECIS proceedings, paper 190. AIS Electronic Library, Münster (2015) 30. Van der Pas, M., Van der Pas, N.: Escalation of commitment in NPD and cost saving IT projects. In: Nunes, M.B., Isaías, P., Powell P., Ravesteijn, P., Ongena, G. (eds.) 12th IADIS International Conference Information Systems 2019, Utrecht, Netherlands, pp. 276–280 (2019) 31. Van der Pas, M., Walczuch, R.: Behaviour of organisational agents to improve information management impact. In: Arai, K., Kapoor, S., Bhatia, R. (eds.) Advances in Intelligent Systems and Computing. Paper presented at the Science and Information Conference: Intelligent Computing, London, United Kingdom, pp. 774–788. Springer, Cham (2018)

End-to-End Memory Networks: A Survey Raheleh Jafari1(B) , Sina Razvarz2 , and Alexander Gegov3 1

School of Design, University of Leeds, Leeds LS2 9JT, UK [email protected] 2 Departamento de Control Autom´ atico, CINVESTAV-IPN (National Polytechnic Institute), Mexico City, Mexico [email protected] 3 School of Computing, University of Portsmouth, Buckingham Building, Portsmouth PO1 3HE, UK [email protected]

Abstract. Constructing a dialog system which can speak naturally with a human is considered as a major challenge of artificial intelligence. Endto-end dialog system is taken to be a primary research topic in the area of conversational systems. Since an end-to-end dialog system is structured based on learning a dialog policy from transactional dialogs in a defined extent, therefore, useful datasets are required for evaluating the learning procedures. In this paper, different deep learning techniques are applied to the Dialog bAbI datasets. On this dataset, the performance of the proposed techniques is analyzed. The performance results demonstrate that all the proposed techniques attain decent precisions on the Dialog bAbI datasets. The best performance is obtained utilizing end-to-end memory network with a unified weight tying scheme (UN2N). Keywords: Memory networks

1

· Deep learning · Dialog bAbI dataset

Introduction

Instructing machines that can converse like a human for real-world objectives is possibly one of the crucial challenges in artificial intelligence. In order to construct a meaningful conversation with human, the dialog system is required to be qualified in the perception of natural language, constructing intelligent decisions as well as producing proper replies [1–3]. Dialog systems, recognized as interactive conversational agents, communicate with the human through natural language in order to aid, supply information and amuse. They are utilized in an extensive applications domain from technical support services to language learning tools [4,5]. Artificial intelligence techniques are viewed as the most efficient techniques in recent decades [6–17]. For example, Fuzzy logic systems are broadly utilized to model the systems characterizing vague and unreliable information [18–37]. In artificial intelligence area [38,39], end-to-end dialog systems have been attained interest because of the current progress of deep neural networks. In [40] a gated c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 291–300, 2020. https://doi.org/10.1007/978-3-030-52246-9_20

292

R. Jafari et al.

end-to-end trainable memory network is proposed which is learning in an end-toend procedure without the utilization of any extra supervision signal. In [41] the original task is broken down into short tasks where they should be individually learned by the agent, and also built in order to perform the original task. In [42] a long short term memory (LSTM) model is suggested which learns in order to interact with APIs on behalf of the user. In [43] a dynamic memory network is introduced which contains tasks for part-of-speech classification as well as question answering, also uses two gated recurrent units in order to carry out inference. In [44] the memory network has been implemented which needed supervision in every layer of the network. In [45] a set of four tasks in order to test the capability of end-to-end dialog systems has been introduced which focuses on the domain of movies entities. In [46] a word-based method to dialog state tracking utilizing recurrent neural networks (RNNs) is proposed which needs less feature engineering. Even though neural network models include a tiny learning pipeline, they need a remarkable content of the training. Gated recurrent network (GRU) and LSTM units permit RNNs to deal with the longer texts needed for question answering. Additional advancements to be mentioned as attention mechanisms, as well as memory networks, permit the network to center around the most related facts. In this paper, the applications of different types of memory networks are studied on data from the Dialog bAbI. The performance results demonstrate that all the proposed techniques attain decent precisions on the Dialog bAbI datasets. The best performance is obtained utilizing UN2N. The remaining of the article is organized as follows. In Sect. 2, different types of memory networks are demonstrated and explained in details. Experimental results are given in Sect. 3. Section 4 concludes the work.

2 2.1

Memory Networks End-to-End Memory Network with Single Hop

The end-to-end Memory Network (N2N) with single hop has two stories embed C,  as well as a question embedding B,  see Fig. 1. Matrices dot product ding A, are utilized in order to match each word in the story with each word in the question which will cause the creation of the attention. By passing the attention through a softmax layer they will change into the probability distribution across the whole word from the story. Afterward, these probabilities are implemented  and the sum of that with the question embedding B  to the story embedding C passes through a dense layer and the softmax prediction layer. 2.2

End-to-End Memory Network with Stacked Hops

The N2N architecture contains two major components: supporting memories and final answer prediction [47]. Supporting memories consist of a set of input and output memory represented by memory cells. In complicated tasks with the requirement of multiple supporting memories, the model can be developed

End-to-End Memory Networks

293

Fig. 1. End-to-end memory network with single hop

in order to contain more than one set of input-output memories by stacking a number of memory layers. Each memory layer in the model is called hop, also the input of the (κ + 1)th hop is the output of the κth hop: ˜κ u ˜κ+1 = o˜κ + u

(1)

κ , C  κ , utilized in order to Each layer contains its own embedding matrices A embed the inputs x ˜i . The prediction of the answer to the question q˜, is carried out by  (˜ ˜κ )) a ˜ = sof tmax(W oκ + u

(2)

 (of size V × d) is where a ˜ is taken to be the predicted answer distribution, W considered to be a parameter matrix for the model in order to learn, also κ is the total number of hops. The N2N architecture with three hop operations is shown in Fig. 2. The hard max operations within each layer are substituted with a continuous weighting from the softmax. ˜n which are stored in the The method takes a discrete set of inputs x ˜1 , ..., x memory, a question q˜, also outputs a reply a ˜. The model can write all x ˜ to the memory up to a fixed buffer size, also it obtains a continuous demonstration for x ˜ and q˜. Afterward, the continuous demonstration is processed with multiple hops in order to generate a ˜. This permits backpropagation of the error signal through multiple memory accesses back to the input while training. 2.3

Gated End-to-End Memory Network

The gated end-to-end memory network (GN2N) is able to dynamically conditioning the memory reading operation on the controller state u ˜κ at every hop, see Fig. 3. In GN2N, (1) is reformulated as below [48], κ ˜κ κu T κ (˜ uκ ) = σ(W T ˜ + bT )

(3)

294

R. Jafari et al.

Fig. 2. A three layer end-to-end memory network

u ˜κ+1 = o˜κ  T κ (˜ uκ ) + u ˜κ  (1 − T κ (˜ uκ ))

(4)

 κ and ˜bκ are taken to be the hop-specific parameter matrix and bias where W T x) is the transform gate for the κth hop. term for the κth hop, respectively. T κ (˜  is the Hadamard product.

Fig. 3. Gated end-to-end memory network

End-to-End Memory Networks

2.4

295

End-to-End Memory Networks with Unified Weight Tying

In [47], two kinds of weight tying are proposed for N2N, namely, adjacent and layer-wise. Layer-wise approach portions the input and output embedding matri2 = ... = A κ and C 1 = C  2 = ... = C  κ ). 1 = A ces across various hops (i.e., A Adjacent approach portions the output embedding for a given layer with the  κ ). Furthermore, the matrix W  κ+1 = C corresponding input embedding (i.e., A  which predicts the answer, as well as the question embedding matrix B, are T = C  κ and B =A 1 . In [48], a dynamic mechanism is designed developed as W which permits the model to choose the proper kind of weight tying on the basis of the input. Therefore, the embedding matrices are developed dynamically for every instance which makes UN2N more efficient compared with N2N and GN2N where the same embedding matrices are implemented for each input. In UN2N a gating vector z˜, described in (8), is used in order to develop the embedding  κ , B,  and W  . The embedding matrices are influenced by the κ , C matrices, A information transported by z˜ related to the input question u ˜0 and the context sentences in the story m ˜ t . Therefore, κ+1 = A κ  z˜ + C  κ  (1 − z˜) A

(5)

 κ+1 = C  κ  z˜ + C  κ+1  (1 − z˜) C

(6)

where  is taken to be the column element-wise multiplication operation, also  κ+1 is the unconstrained embedding matrix. In (5) and (6), the large value of C z˜ leads UN2N towards the layer-wise approach and the small value of z˜ leads UN2N towards the adjacent approach. In UN2N, at first, the story is encoded by reading the memory one step at a time with a gated recurrent unit (GRU) as below, ˜ t) ˜ t+1 = GRU (m ˜ t, h h

(7)

such that t is considered to be the recurrent time step, also m ˜ t is taken to be the context sentence in the story at time t. Afterward, the following relation is defined,  0 ˜ z˜ u (8) z˜ = σ(W + ˜bz˜) ˜ hT ˜ T is the last hidden state of the GRU which presents the story, W z˜ is where h ˜ considered  0  as a weight matrix, bz˜ is bias, σ is taken to be the sigmoid function u ˜ ˜ T . A linear mapping G ∈ Rd×d is also, ˜ is the concatenation of u ˜0 and h hT added for updating the connection between memory hops as below, u ˜κ+1 = o˜κ + (G  (1 − z˜))˜ uκ

(9)

296

3 3.1

R. Jafari et al.

Experiments and Results Experiment Setup

In this section, an extensive range of parameter settings along with data set configurations are utilized in order to validate the proposed techniques in this paper. 3.2

Task Explanations

The tasks in the dataset are divided into 5 groups where each group focus on a special objective. Task 1: Issuing API calls. The chatbot asks questions in order to fill the missing areas, and finally produces a valid corresponding API call. The questions asked by the bot is for collecting information in order to make the prediction possible. Task 2: Updating API calls. In this part users update their requests. The chatbot asks from users if they have finished their updates, then chatbot generates updated API call. Task 3: Demonstrating options. The chatbot provides options to users utilizing the corresponding API call. Task 4: Generating additional information. User can ask for the phone number and address and the bot should use the knowledge bases facts correctly in order to reply. Task 5: Organizing entire dialogs. Tasks 1–4 are combined in order to generate entire dialogs. For evaluating the capability of the techniques in order to deal with out-ofvocabulary (OOV) items a set of test data is used which contains entities different from the training set. Task 6 is the Dialog state tracking 2 task (DSTC-2) [49] with real dialogs, and only has one setup. 3.3

Experimental Results

Efficiency results on Dialog bAbI tasks are demonstrated in Table 1, with seven techniques which are among the most important techniques, namely rule-based systems, TF-IDF, nearest neighbor, supervised embedding, N2N, GN2N, and UN2N. As is shown in Table 1, the rule-based system has a high performance on tasks 1–5. However, its performance reduces when dealing with DSTC-2 task. TF-IDF match has poor performance compared with other methods on both the simulated tasks 1–5 and on the real data of task 6. The performance of the TF-IDF match with match type features considerably increases but is still behind the nearest neighbor technique. Supervised embedding has higher performance compared with TF-IDF match and nearest neighbor technique. In task 1, supervised embedding is fully successful but its performance reduces in task 2–5, even with match type features. GN2N and UN2N models outperform the other methods in DSTC-2 task and Dialog bAbI tasks, respectively.

End-to-End Memory Networks

297

Table 1. The accuracy results of rule-based systems, TF-IDF, nearest neighbor, supervised embedding, N2N, GN2N, and UN2N methods

4

Conclusion

End-to-end learning scheme is suitable for constructing the dialog system because of its simplicity in training as well as effectiveness in model updating. In this paper, the applications of various memory networks are studied on data from the Dialog bAbI. The performance results demonstrate that all the proposed techniques attain decent precisions on the Dialog bAbI datasets. The best performance is obtained utilizing UN2N. In order to evaluate the true performance of the proposed methods, extra experimentations are required utilizing wide non-synthetic data set.

References 1. Araujo, T.: Living up to the chatbot hype: The influence of anthropomorphic design cues and communicative agency framing on conversational agent and company perceptions. Comput. Hum. Behav. 85, 183–189 (2018) 2. Hill, J., Ford, W.R., Farreras, I.G.: Real conversations with artificial intelligence: a comparison between human-human online conversations and human-chatbot conversations. Comput. Hum. Behav. 49, 245–250 (2015) 3. Quarteroni, S.: A chatbot-based interactive question answering system. In: 11th Workshop on the Semantics and Pragmatics of Dialogue, pp. 83–90 (2007) 4. Young, S., Gasic, M., Thomson, B., Williams, J.D.: POMDP-based statistical spoken dialog systems: a review. Proc. IEEE 101, 1160–1179 (2013) 5. Shawar, B.A., Atwell, E.: Chatbots: are they really useful? LDV Forum 22, 29–49 (2007) 6. Dote, Y., Hoft, R.G.: Intelligent Control Power Electronics Systems. Oxford Univ. Press, Oxford (1998) 7. Mohanty, S.: Estimation of vapour liquid equilibria for the system carbon dioxidedifluoromethane using artificial neural networks. Int. J. Refrig. 29, 243249 (2006)

298

R. Jafari et al.

8. Razvarz, S., Jafari, R., Yu, W., Khalili, A.: PSO and NN Modeling for photocatalytic removal of pollution in wastewater. In: 14th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE) Electrical Engineering, pp. 1–6 (2017) 9. Jafari, R., Yu, W.: Artificial neural network approach for solving strongly degenerate parabolic and burgers-fisher equations. In: 12th International Conference on Electrical Engineering, Computing Science and Automatic Control (2015). https:// doi.org/10.1109/ICEEE.2015.7357914 10. Jafari, R., Razvarz, S., Gegov, A.: A new computational method for solving fully fuzzy nonlinear systems. In: Computational Collective Intelligence. ICCCI 2018. Lecture Notes in Computer Science, vol. 11055, pp. 503–512. Springer, Cham (2018) 11. Razvarz, S., Jafari, R.: ICA and ANN modeling for photocatalytic removal of pollution in wastewater. Math. Comput. Appl. 22, 38–48 (2017) 12. Razvarz, S., Jafari, R., Gegov, A., Yu, W., Paul, S.: Neural network approach to solving fully fuzzy nonlinear systems. In: Fuzzy modeling and control Methods Application and Research, pp. 45–68. Nova science publisher Inc., New York (2018). ISBN: 978-1-53613-415-5 13. Razvarz, S., Jafari, R.: Intelligent techniques for photocatalytic removal of pollution in wastewater. J. Elect. Eng. 5, 321–328 (2017). https://doi.org/10.17265/ 2328-2223/2017.06.004 14. Graupe, D.: Chapter 112. In: Chen, W., Mlynski, D.A. (eds.): Principles of Artificial Neural Networks. Advanced Series in Circuits and Systems, 1st edn. vol. 3, p. 4e189. World Scientific (1997) 15. Jafari, R., Yu, W., Li, X.: Solving fuzzy differential equation with Bernstein neural networks. In: IEEE International Conference on Systems, Man, and Cybernetics, Budapest, Hungary, pp. 1245–1250 (2016) 16. Jafari, R. Yu, W.: Uncertain nonlinear system control with fuzzy differential equations and Z-numbers. In: 18th IEEE International Conference on Industrial Technology, Canada, pp. 890–895 (2017). https://doi.org/10.1109/ICIT.2017.7915477 17. Jafarian, A., Measoomy, N.S., Jafari, R.: Solving fuzzy equations using neural nets with a new learning algorithm. J. Adv. Comput. Res. 3, 33–45 (2012) 18. Werbos, P.J.: Neuro-control and elastic fuzzy logic: capabilities, concepts, and applications. IEEE Trans. Ind. Electron. 40, 170180 (1993) 19. Jafari, R., Yu, W., Razvarz, S., Gegov, A.: Numerical methods for solving fuzzy equations: a Survey. Fuzzy Sets Syst. (2019). ISSN 0165-0114, https://doi.org/10. 1016/j.fss.2019.11.003 20. Kim, J.H., Kim, K.S., Sim, M.S., Han, K.H., Ko, B.S.: An application of fuzzy logic to control the refrigerant distribution for the multi type air conditioner. In: Proceedings of IEEE International Fuzzy Systems Conference, vol. 3, pp. 1350– 1354 (1999) 21. Wakami, N., Araki, S., Nomura, H.: Recent applications of fuzzy logic to home appliances. In: Proceedings of IEEE International Conference on Industrial Electronics, Control, and Instrumentation, Maui, HI, pp. 155–160 (1993) 22. Jafari, R., Razvarz, S.: Solution of fuzzy differential equations using fuzzy Sumudu transforms. In: IEEE International Conference on Innovations in Intelligent Systems and Applications, pp. 84–89 (2017)

End-to-End Memory Networks

299

23. Jafari, R., Razvarz, S., Gegov, A., Paul, S.: Fuzzy modeling for uncertain nonlinear systems using fuzzy equations and Z-numbers. In: Advances in Computational Intelligence Systems: Contributions Presented at the 18th UK Workshop on Computational Intelligence, Advances in Intelligent Systems and Computing, 5–7 September 2018, Nottingham, UK, vol. 840, pp. 66–107. Springer, Cham (2018) 24. Jafari, R., Razvarz, S.: Solution of fuzzy differential equations using fuzzy Sumudu transforms. Math. Comput. Appl. 23, 1–15 (2018) 25. Jafari, R., Razvarz, S., Gegov, A.: Solving differential equations with z-numbers by utilizing fuzzy Sumudu transform. In: Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol. 869, pp. 1125-1138. Springer, Cham (2019) 26. Yu, W., Jafari, R.: Modeling and Control of Uncertain Nonlinear Systems with Fuzzy Equations and Z-Number. IEEE Press Series on Systems Science and Engineering. Wiley-IEEE Press, John Wiley & Sons, Inc., Hoboken (2019). ISBN-13: 978-1119491552 27. Negoita, C.V., Ralescu, D.A.: Applications of Fuzzy Sets to Systems Analysis. Wiley, New York (1975) 28. Zadeh, L.A.: Probability measures of fuzzy events. J. Math. Anal. Appl. 23, 421– 427 (1968) 29. Zadeh, L.A.: Calculus of fuzzy restrictions. In: Fuzzy Sets and Their Applications to Cognitive and Decision Processes, pp. 1-39. Academic Press, New York (1975) 30. Zadeh, L.A.: Fuzzy logic and the calculi of fuzzy rules and fuzzy graphs. Multiple Valued Logic 1, 1–38 (1996) 31. Razvarz, S., Jafari, R.: Experimental study of Al2O3 nanofluids on the thermal efficiency of curved heat pipe at different tilt angle. J. Nanomater. 2018, 1–7 (2018) 32. Razvarz, S., Vargas-Jarillo, C., Jafari, R.: Pipeline monitoring architecture based on observability and controllability analysis. In: IEEE International Conference on Mechatronics (ICM), Ilmenau, Germany, vol. 1, pp. 420–423 (2019). https://doi. org/10.1109/ICMECH.2019.872287 33. Razvarz, S., Vargas-jarillo, C., Jafari, R., Gegov, A.: Flow control of fluid in pipelines using PID controller. IEEE Access 7, 25673–25680 (2019). https://doi. org/10.1109/ACCESS.2019.2897992 34. Razvarz, S., Jafari, R.: Experimental study of Al2O3 nanofluids on the thermal efficiency of curved heat pipe at different tilt angle. In: 2nd International Congress on Technology Engineering and Science, ICONTES, Malaysia (2016) 35. Jafari, R., Razvarz, S., Gegov, A.: Neural network approach to solving fuzzy nonlinear equations using Z-numbers. IEEE Trans. Fuzzy Syst. (2019). https://doi. org/10.1109/TFUZZ.2019.2940919 36. Jafari, R., Yu, W., Li, X., Razvarz, S.: Numerical solution of fuzzy differential equations with Z-numbers using Bernstein neural networks. Int. J. Comput. Intell. Syst. 10, 1226–1237 (2017) 37. Jafari, R., Yu, W., Li, X.: Numerical solution of fuzzy equations with Z-numbers using neural networks. In: Intelligent Automation and Soft Computing, pp. 1–7 (2017) 38. Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: Proceedings of ICML-2011 (2011) 39. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of CVPR-2015 (2015) 40. Liu, F., Perez, J.: Gated end-to-end memory networks. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Association for Computational Linguistics, Valencia, Spain, pp. 1–10 (2017)

300

R. Jafari et al.

41. Bordes, A., Weston, J.: Learning end-to-end goal-oriented dialog, arXiv preprint. arXiv:1605.07683 (2016) 42. Williams, J.D., Zweig, G.: End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning, arXiv preprint arXiv:1606.01269 (2016) 43. Kumar, A., Irsoy, O., Su, J., Bradbury, J., English, R., Pierce, B., Ondruska, P., Gulrajani, I., Socher, R.: Ask me anything: dynamic memory networks for natural language processing. In: Proceedings of ICML-2016 (2016) 44. Weston, J., Chopra, S., Bordes, A.: Memory networks. In: International Conference on Learning Representations (ICLR) (2015) 45. Dodge, J., Gane, A., Zhang, X., Bordes, A., Chopra, S., Miller, A.H., Szlam, A., Weston, J.: Evaluating prerequisite qualities for learning end-to-end dialog systems. In: Proceedings of ICLR-2016 (2016) 46. Henderson, M., Thomson, B., Young, S.: Word-based dialog state tracking with recurrent neural networks. In: Proceedings of SIGDIAL-2014 (2014) 47. Sukhbaatar, S., Szlam, A., Weston, J., Fergus, R.: End-to-end memory networks. In: Proceedings of Advances in Neural Information Processing Systems (NIPS 2015), Montreal, Canada, pp. 2440–2448 (2015) 48. Liu, F., Cohn, T., Baldwin, T.: Improving end-to-end memory networks with unified weight tying. In: Proceedings of the 15th Annual Workshop of The Australasian Language Technology Association (ALTW 2017), Brisbane, Australia, pp. 16–24 (2017) 49. Henderson, M., Thomson, B., Williams, J.D.: The second dialog state tracking challenge. In: Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2014), Philadelphia, USA, pp. 263–272 (2014)

Enhancing Credit Card Fraud Detection Using Deep Neural Network Souad Larabi Marie-Sainte, Mashael Bin Alamir, Deem Alsaleh, Ghazal Albakri, and Jalila Zouhair(B) College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia {slarabi,jzouhair}@psu.edu.sa, [email protected], [email protected], [email protected]

Abstract. With the development of technology, e-commerce became an essential part of an individual’s life, where individuals could easily purchase and sell products over the internet. However, fraud attempts; specifically credit card fraudulent attacks are rapidly increasing. Cards may potentially be stolen; fake records are being used and credit cards are subject to being hacked. Artificial Intelligence techniques tackle these credit card fraud attacks, by identifying patterns that predict false transactions. Both Machine Learning and Deep Learning models are used to detect and prevent fraud attacks. Machine Learning techniques provide positive results only when the dataset is small and do not have complex patterns. In contrast, Deep Learning deals with huge and complex datasets. However, most of the existing studies on Deep Learning have used private datasets, and therefore, did not provide a broad comparative study. This paper aims to improve the detection of credit card fraud attacks using Long Short-Term Memory Recurrent Neural Network (LSTM RNN) with a public dataset. Our proposed model proved to be effective. It achieved an accuracy rate of 99.4% which is higher compared to other existing Machine and Deep Learning techniques. Keywords: Recurrent Neural Network · Long Short-Term Memory · Deep Learning · Machine Learning · Fraud detection

1 Introduction Due to the rapid development of the internet, online shopping is becoming a trend. It enables customers to shop and pay their deposit online. Online payments cannot always be trusted, however, as theft is an issue. Theft is a crime committed by stealing payment information such as credit card information. This can also be referred to as credit card fraud [1]. It can be observed in various ways such as using a stolen card, using a fake card record, or hacking a credit card by making a fake copy of it. In the United States, the occurrence of credit card fraud has been increasing since 2014. According to [2], the US reported a financial loss of 9.1 billion dollars due to online fraud attacks. Crimes related to credit card fraud can generally be detected and prevented through technology and the application of Artificial Intelligence (AI). © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 301–313, 2020. https://doi.org/10.1007/978-3-030-52246-9_21

302

S. L. Marie-Sainte et al.

Machine Learning (ML) is one field of AI that aims to find patterns in data and garner certain information [3]. ML is mainly used in credit card fraud detection. It enables the detection of suspicious credit card transactions. Many techniques have been used such as the Decision Trees, the Random Forest, the Majority Voting, the Artificial Neural Networks, and others [4, 5]. However, ML has limitations such as the long period of time it takes to train the classifier when the data size is large [6]. Additionally, these techniques do not detect patterns that have biases in the provided training data [6]. Moreover, feature reduction is essential for ML. For that, this research proposes using Deep Learning (DL) as a method of detecting credit card fraud. DL is another field used for data classification and prediction. It can perform tasks on raw data directly with a huge size reaching millions of thousands of records. It accomplishes its purpose by learning through a hierarchy of concepts. Thus, enable computers to build simple concepts after understanding the complex ones. DL is applied through various techniques such as Deep Boltzmann Machines, Deep Feedforward Networks, Deep Neural Network based Hidden Markov Model (HMM), and Deep Convolutional Networks, etc. Applications of DL techniques range from object detection, speech recognition and natural language processing [7]. These techniques are already applied in credit card fraud detection including Deep Autoencoders [8–10], Recurrent Neural Networks [5, 11], Convolutional Neural Networks [12], Deep Belief Networks [13], and others. The Recurrent Deep Neural Network (RNN) is a common Deep Neural Network (DNN) technique that has proved its efficiency in many domains such Natural Language Processing [15], image processing [16], music classification [17]. However, its application in credit card detection has only been used in a private dataset for an institutional bank. To the best of our knowledge, none has applied RNN to the Kaggle dataset and compared it with the state-of-the art techniques. The Kaggle dataset is a well-known credit card data available in [18] and used in many studies (for example [19]). Hence, in this paper, we aim to investigate the effectiveness of applying the Long Short-Term Memory Recurrent Neural Network (LSTM RNN) to the Kaggle dataset to improve the detection rate of the credit card fraudulent transactions. The rest of the paper is organized as follows. Section 2 highlights the related works addressing the issues regarding the detection of credit card fraud. In Sect. 3, the methodology and the dataset used in this paper are discussed. Section 4 presents the experimental results. Finally, Sect. 5 addresses the discussion and conclusion to this study.

2 Related Works The research studies related to detection of Credit Card Fraud have been trending in the last few years. The following sections cover the research related to this area since 2016 based on ML and DL. 2.1 Machine Learning Related Works For the purpose of detecting frauds in credit cards, the authors in [4] proposed two algorithms; namely, the Random Forest and the Decision Tree. The aforementioned

Enhancing Credit Card Fraud Detection Using Deep Neural Network

303

methods were applied to the credit cards’ transactions of a Chinese e-Commerce company. Their classification method resulted in an accuracy rate of 98.67%. However, the authors mentioned the issue of imbalanced dataset used in their work. So, the accuracy can be improved by handling the dataset. Moreover, in [20], the authors compared four different methods for detecting credit cards fraud. They used the K-Nearest Neighbor (KNN), Random Tree, AdaBoost, and Logistic Regression. The dataset used was acquired from the UCI repository. The accuracy of the results achieved by the KNN was 96.9%, whilst Random Tree was 94.3%, the AdaBoost algorithm was 57.7%, and the Logistic Regression was 98.2%. The results revealed that the Logistic Regression method was the most effective model for credit card fraud detection. However, it was not deemed to be a practical method for detecting frauds at the time of the transaction. Furthermore, in [12], the authors combined the Back Propagation Neural Network (NN) algorithm with the Whale optimization algorithm (WOA) on the Kaggle dataset. The authors used WOA to optimize the weights of the NN and enhance its results. The proposed method yielded an accuracy rate of 96.4%. In [21], the authors proposed a fraud detection system based on the Neuro-Fuzzy expert system. It combined evidence obtained from a rule-based fuzzy inference system as well as a learning component that uses Back Propagation NN. The authors analyzed different transaction attributes and the deviation of a client’s behavior compared to their normal spending profile. Data was obtained from a synthetic dataset developed with a simulator. The authors did not mention the accuracy value of their work. However, in their results, they affirmed that the proposed system has higher accuracy compared to the previously proposed systems. Another model was proposed by [22]. The model used the Decision Trees with a combination of Luhn’s and Hunt’s algorithms to detect credit card fraud. It also used an address matching rule to check whether the customer’s billing address and shipping address matched or not. The authors did not mention any accuracy rate. Additionally, [23] developed and implemented a fraud detection system in a large e-tail merchant. The experiments were applied to real data obtained from the company. The authors compared three distinct algorithms, the Logistic Regression (LR), the Random Forests (RF), and the Support Vector Machines (SVM). The RF resulted in a classification accuracy of 93.5%, the SVM resulted in 90.6%, and the LR scored 90.7%. However, the proposed model demonstrated a limitation in terms of its integration with real retail systems and services. Incidentally, the model showed weakness due to the variant difference between the training and testing accuracy results. In [24], the authors used two Machine Learning techniques based on outliers’ detection to discover the credit card fraud called Local Outlier Factor and Isolation Forest algorithms. A private dataset from a Germany bank collected in 2006 was used. The authors preprocessed the dataset using the Principal Component Analysis. The algorithms reached an accuracy of 99.6% and precision of 28%. The authors explained that the low value of precision was due to the unbalanced dataset. In [10], the authors experimented with the use of a new model based on the linear mapping, nonlinear mapping, and probability, along with the use of the RUSMRN algorithm for imbalanced datasets. The data was retrieved in October 2005 from a bank

304

S. L. Marie-Sainte et al.

in Taiwan. The classification result is 79.73% which outperformed the RUS Boost, the AdaBoost, and the Naïve Bayes classifiers. In [13], the paper presented ANN to detect credit card fraud. To solve the issue of the imbalanced dataset, the Meta Cost algorithm was applied. The data was retrieved from a big Brazilian credit card issuer. However, the authors did not mention the accuracy of their results. 2.2 Deep Learning Related Works In [25], the authors used a three-layer Autoencoder to detect credit card fraud. Two datasets were collected from companies in Turkey. In order to evaluate their method, they calculated its accuracy and precision. The accuracy reached was 68% whilst the precision was shown to be more than 61%. The accuracy and precision of the results were low despite the fact that they were using the Autoencoder. Perhaps the datasets used were the reason for these low percentages. In [5], the authors used multiple classifiers to detect the credit card fraud, ANN, the RNN, the Long Short-term Memory (LSTM), and the Gated Recurrent Units (GRUs). The dataset used in this study was provided by a US financial institution engaged in the retail banking. The results of their classifiers are as follows: 88.9% for ANN, 90.433% for RNN, 91.2% for LSTM, and finally 91.6% for GRU. Although the authors achieved highly accurate results, they mentioned that there was an issue regarding the limited number of instances in the used dataset. In [19], the authors attempted to detect credit card fraud by using Autoencoders. The Kaggle dataset was used. To evaluate their model, they calculated the recall and the precision. The recall achieved 0.9 and the precision 0.009. Therefore, the accuracy of the results were not very high and needed improvement. The authors in [26] stated that since the fraud behavior is changing continuously, it is better to use unsupervised learning to detect it. They proposed using a model of Deep Autoencoder and Restricted Boltzmann Machine (RBM) that can identify both normal and suspicious transactions. Autoencoders is a DL algorithm for unsupervised learning. Their experiments were conducted on three datasets, the German (1,000 instances), the Australian (600 instances), and the European (284,807 instances) datasets. The results of their experiments demonstrated that DL is effective with huge datasets, since they achieved 96% accuracy rate for applying their method on the European dataset, while the accuracy for the other two datasets was around 50%. In [27], the authors proposed using DL methods to detect fraud behavior. They tried using several methods such as, Autoencoder, Restricted Boltzmann Machine, Variational Autoencoder, and Deep Belief Network. The authors applied their models to the European dataset (284,807 instances). The results of their work for Autoencoder, Variational Autoencoder, Restricted Boltzmann Machine, and Deep Belief Network achieved accuracy rates of 96.26%, 96.55%, 96.00%, and 96.31%, respectively. In [28], the authors aimed to detect credit card fraud by using the Deep Autoencoders Artificial Neural Networks. They applied the proposed model to the German Credit dataset achieving an accuracy result of 82%. However, the accuracy rate achieved was considered to be low compared to other studies that were conducted.

Enhancing Credit Card Fraud Detection Using Deep Neural Network

305

In [8], the authors aimed to detect credit card fraud transactions by applying Backpropagation DNN algorithm. Two open source DL libraries, known as TensorFlow and Scikit-learn were used. The authors chose Logistic Regression (LR) as the benchmark model, which yielded in accuracy of 96.04% demonstrating a better performance result on the test set than NN. Then, different DNN were tested with a different number of hidden layers. The average validation accuracy differed based on the learning rates, for learning rate 0.1 the accuracy was 96.27%, for learning rate 0.01 the accuracy was 99.59%, and for 0.001 the accuracy was 99.12%. This study shows that the learning rate value can enhance the classification accuracy. In [11], the authors used the Deep Belief Networks (DBN) with multilayer belief networks to detect credit card fraud. The authors used the Restricted Boltzmann Machines (RBM) as hidden layers in the model. The dataset used was the Markit which contains credit transactions from different regions in the United States. The authors did not mention the accuracy results of their experiment. However, they made a comparison between the proposed model, the SVM classifier, and the Multinomial Logistic Regression. The authors claimed that their approach accomplished the highest accuracy rate. In [30], the authors aimed to detect credit card fraud by using the Convolutional Neural Network (CNN). They used a credit card transactions dataset obtained from a commercial bank. Due to the imbalanced dataset, the authors used cost-based sampling on their experiments. The accuracy of the results was not mentioned. The authors in [29] applied Autoencoders Deep Learning to detect fraud in credit card used in insurance companies. They used a private data collected in September 2013 in Europe. Although the dataset was unbalanced, this technique achieved a high accuracy of 91% which outperformed the logistic regression algorithm that yielded an accuracy of 57%. The authors stressed that the Autoencoders algorithm is efficient in handling unbalanced datasets. To summarize then, the papers that have been discussed all possess a similar purpose in detecting credit card fraud. They used different classification methods which are based on ML (see Table 1) such as the ANN, the Decision Trees, the SVM, and many others. However, in ML, the classification accuracy is highly depending on the data prepossessing, cleaning, and feature selection. Moreover, DL techniques have been also used (see Table 2) including the Deep Autoencoders, the Deep Belief Networks, as well as the Restricted Boltzmann Machine, etc. DL can directly use raw data while achieving high results. As is evident from the literature that has been discussed, the DL models achieved higher accuracy rates than the ML techniques. On the other hand, most of the used datasets were obtained from either private banks or companies, which make any related studies closed and special. To the best of our knowledge, the RNN has not been used with a public dataset. In this research paper, the RNN and Kaggle dataset are used to enhance the classification accuracy of the present study and make the use of RNN open.

306

S. L. Marie-Sainte et al. Table 1. Related works based on Machine Learning techniques

Ref

Year Method

Dataset

Accuracy

[24] 2019 Local Outlier Factor and Isolation Germany bank dataset Forest algorithms

99.6%

[4]

e-Commerce Chinese company dataset

98.67%

UCI dataset

96.9% 94.3% 57.7% 98.2%

2018 Random Forest Algorithm and Decision Tree

[19]

KNN Random Tree AdaBoost Logistic Regression

[12]

Back Propagation Neural Network Kaggle with Whale Optimization Algorithm

96.4%

[21] 2017 Neuro Fuzzy based system

Synthetic datasets developed with not mentioned a simulator

[22]

Decision Tree with Lunh’s and Hunt’s algorithms



not mentioned

[23]

Logistic Regression Random Forest And Support Vector Machines

e-Commerce dataset

not mentioned

Payment data of Taiwanian bank

79.73%

[10] 2016 RUS based on linear mapping, non-linear mapping, and probability [13]

(CSNN) model based on Artificial Brazilian credit card issuer dataset not mentioned Neural Networks (ANN) and Meta Cost procedure

Table 2. Related works based on Deep Learning Techniques Ref

Year

Method

Dataset

Accuracy

[29]

2019

Autoencoders Deep Learning

Private data

91%

[27]

2018

Deep Autoencoder and Deep Networks

German Credit dataset

82%

Artificial Neural Networks (ANNs) Recurrent Neural Networks (RNNs) Long Short-term Memory (LSTMs) And Gated Recurrent Units (GRUs)

Financial institution engaged in the retail banking in the US

88.9% 90.433% 91.2% 91.6%

[5]

(continued)

Enhancing Credit Card Fraud Detection Using Deep Neural Network

307

Table 2. (continued) Ref

Method

Dataset

Accuracy

[14]

Deep Autoencoder

Kaggle

90%

[25]

Deep Autoencoder with Restricted Bolzman Machine

German dataset Australian dataset European dataset

48% 50% 96%

[26]

Deep Autoencoder Restricted Bolzman Machine Variational Autoencoder Deep Belief Networks

European dataset

96.26% 96.55% 96.00% 96.31%

Convolutional Neural Network (CNN)

Commercial bank credit card transaction dataset

Not mentioned

BP Deep Neural Networks

Not mentioned

96%

Deep Autoencoder

Turkish companies

68%

Deep Belief Networks with Restricted Bolzman Machine

Markit dataset

Not mentioned

[28]

Year

2017

[8] [23] [10]

2016

3 Methodology 3.1 Recurrent Neural Network (RNN) Deep Learning is derived from Machine Learning where the machine is learning from experience. Deep Learning uses raw data as an input then processes them in hierarchy levels of learning in order to obtain useful conclusions [7]. Recurrent Neural Network was first introduced in the 1980s [31]. It is considered to be a supervised learning algorithm. The RNNs process involves the inputting of data in sequence loops and pass-by hidden layers, also known as state vectors, which then stores the history of past inputs in an internal state memory. While learning about the output, it learns from the current input and the previous data, both of which are stored in the internal state memory. During each iteration, the RNN takes two inputs, both the new input and the recent past stored input. For the output, the RNN can match one input to one output, or one input to many outputs, or many inputs to many outputs, or many inputs to one output. Figure 1 shows the structure of the RNN. 3.2 Long Short-Term Memory Recurrent Neural Network (LSTM RNN) RNN has a disadvantage of vanishing gradients, meaning that the RNN fails to derive new learning from past inputs and outputs such as reading a sequence of input at time 0, then reading another sequence of input at time 1. The second output should be derived from earlier inputs and outputs. However, the RRN has a tendency to forget about the first input at time 0 then produces an output from the current input only. On account of this, the Long Short-Term Memory (LSTM) RNN was introduced in [9]. The LSTM has a longterm memory which helps the network to remember better. This powerful characteristic

308

S. L. Marie-Sainte et al.

Fig. 1. RNN structure

Fig. 2. LSTM RNN structure

contributes in enhancing the result of our problem. Figure 2 shows the architecture of the LSTM layer. The LSTM RNN has three steps to determine the weights’ values: 1. Forget Gate Operation: This step takes the current input x at time t and the output h at time t–1 and combine them, then sigmoid them into a single tensor called f t .   ft = σ (Wf ht−1 , xt + bf

(1)

Where: f t : The tensor, ht–1 : The output at time t–1, x t : The input at time t, W f : The weight for the forget gate, bf : The bias vector. The output of this equation f t will be between 0 and 1 due to the sigmoid operator. f t is then multiplied by the internal state, and that is why the gate is called forget gate. In the case of f t = 0, then the previous internal state is completely forgotten, but if f t = 1, it passes through unaltered. 2. Update Gate Operation: LSTM joins values from current and previous steps, and then applies the joint data to tanh function. After that, LSTM chooses which data to pick from the tanh results and updates them using the update gate.

Enhancing Credit Card Fraud Detection Using Deep Neural Network

  Ct = tanh(Wi ht−1 , xt + bi )

309

(2)

Where: C t : The new cell state, tanh: The ratio of the corresponding hyperbolic sine and hyperbolic cosine, W i : The weight for the input, x t : The input at time t, ht–1 : The previous output, bi : The bias vector. 3. Output Gate Operation: In this step, the old cell state C t–1 is updated into the new cell state C t. The old state is multiplied by f t to forget about the previous state. Afterward, it * C t is added as new candidate values which are scaled by how much we want to update each state value. Ct = ft ∗ Ct−1 + it ∗ C˜ t

(3)

Where: C t : The new cell state, C t–1 : The old cell state, f t : The tensor, it * C t : The new candidate values scaled by an update value. The last step is to decide what to output. The output is based on the cell state. A sigmoid function decides what part of the state is presented as an output.   (4) Ot = σ (Wo ht−1 , xt + bo ) ht = Ot ∗ tanh(Ct )

(5)

Where: Ot : The output, W o : The weight for the output, ht–1 : The previous output at time t–1, x t : The input at time t, bo : The bias vector. The cell state goes through tanh to push values between –1 and 1. The result is multiplied by the sigmoid gate. Only selected candidates will be used as an output.

4 Experimental Study For the experiment, Python 3 was used to implement the model through the KERAS library. The experiment was conducted using MacBook Pro with a processor (2.3 GHz Intel Core i5) and memory capacity of (8 GB 2133 MHz LPDDR3). 4.1 The Dataset The data, obtained from [18], contains information about credit card transactions made in September 2013 by European users. The dataset is highly imbalanced since it only has 39206 fraud instances out of 284,807 transactions. Due to confidentiality, the original features are not mentioned in the dataset and were replaced with different names (V1, V2, …, V28). The dataset contains only numerical attributes (features) as a result of the Principal Component Analysis (PCA) transformation that was applied to all features except for Time (i.e. the number of seconds which elapses between the current transaction and the first transaction in the dataset), Amount (the transaction amount), and Class features (Class: 1 for a fraud transaction with 39206 instances, 0 for a normal transaction with 246,102 instances).

310

S. L. Marie-Sainte et al.

4.2 Experiment Settings At first, the data is divided into 70% training, 15% testing, and 15% validation. The classifier is based on three hidden layers that are: the LSTM layer which has 12 neurons, Flatten, and Dense layers. The LSTM and Dense layers can have different parameters related to the kernel initializer, activation function, and others. In contrast, the Flatten layer has no parameters and it is only used to resolve the obstacle of the dimensionality, meaning that the data needs to become accessible to the dense layer. Different parameters are added to the hidden layers to avoid the issue of overfitting and construct a powerful classifier. To find the best combination of the parameters, several experiments are conducted using different combinations of parameters for each of the LSTM and Dense layers. After repeating each experiment many times, we decided to set the length of the epochs to 50. This guaranteed that the accuracy of the results that are obtained in that way cannot be improved upon further. To save space, Table 3 summarizes only the best-found parameters that are used for the classification. First, the loss function is used for the purpose of neglecting false classifications. Incidentally, minimizing the loss function leads to maximizing the accuracy rate. Different loss functions are used to determine the most suitable one. From the conducted experiments, the Mean Square Error (mse) loss function is the best match that raised the classification accuracy. It works by summing the squared distance between the target and predicted values. In addition, the kernel initializer is was also used to randomly initiate the weights of the layers. Then these initialized values are passed to the LSTM, and Dense layers. Each layer has its own kernel_initializer including uniform, lecun_uniform, glorot_normal, and others. The best kernel_initializer in this study is the uniform which is based on the uniform distribution within [-limit, limit] where limit is sqrt(3/fan_in), and fan_in is the number of input units. Moreover, the activation function is used to determine the output of the layer. Many activation functions are available such as tanh, sigmoid, relu, softwax, and others. In this study, the tanh activation function is used for both layers as previously explained in Sect. 3.2. Furthermore, the optimizer is required for compiling the Keras model. There are many available optimizers such as rmsprop, Adam, Adagrad, Adadelta, etc. The rmsprop, Adagrad, and adam optimizers provided better accuracy results. However, the rmsprop is deemed to be the best in this cohort. 4.3 Results and Discussion After implementing the LSTM RNN, with the highlighted parameters in Table 1, the resulted testing accuracy rate is founded to be 99.4% whilst the training accuracy rate is 99.97%. Firstly, we can notice that the training accuracy is very high. Secondly, the testing accuracy remained high, with the difference between the training and testing accuracy is being only 0.57%. In fact, LSTM RNN aimed to remember the past information needed to predict the output in the long or the short term. This characteristic is important when several layers exist. This is a crucial benefit which RNN does not possess. Moreover, LSTM RNN can outperform CNN when the data does not possess a hierarchical structure needed for the prediction. This was exactly our case, as the dataset used in this study does not contain any hierarchical structure. In addition, predicting the incidence of credit card fraud does not require hierarchical information.

Enhancing Credit Card Fraud Detection Using Deep Neural Network

311

Table 3. Experimentation Results # Epoch Layer 1 50

Loss kernel_initializer Activation Optimizer Training Validation function

LSTM mse

lecun_uniform

tanh

Dense

lecun_uniform

tanh

2 50

LSTM mse

uniform

tanh

Dense

uniform

tanh

3 50

LSTM mse

lecun_uniform

tanh

Dense

lecun_uniform

tanh

LSTM mse

uniform

tanh

Dense

uniform

tanh

4 50

rmsprop

99.6

99.53

rmsprop

99.97

99.97

Adam

99.4

99.4

Adagrad

99.4

99.4

Secondly, the achieved accuracy result (testing accuracy) was higher than the accuracy of the state of the art methods discussed above (see Table 4). LSTM RNN yielded a better result than the Back Propagation Neural Network combined with an optimization method; the accuracy is shown to increase by 3%. Moreover, the accuracy obtained by the Deep autoencoder is also improved by 9.4%. Table 4. Comparison with the related works Ref

Year

Method

Dataset Accuracy

[12] 2018 Back propagation neural network with whale optimization Kaggle algorithm

96.4%

[32] 2003 Frequency domain

Kaggle

77%

[32] 2003 Random Forest

Kaggle

95%

[14] 2018 Deep encoder

Kaggle

90%

LSTM RNN

Kaggle

99.4%

To validate the performance of the proposed method, the study presented in [32] is used for comparison. Two models are proposed; namely, the Frequency domain and the Random Forest. The same dataset is used along with 10-flods cross validation. The accuracy reached 77% and 95% respectively for both models (as shown in Table 4), which is less than that obtained by the proposed method. This proves the effectiveness of the LSTM RNN in credit card fraud detection.

5 Conclusion To conclude, the application of Deep Learning techniques in credit card fraud detection has proved to be effective. LSTM RNN outperformed both the Machine Learning

312

S. L. Marie-Sainte et al.

and Deep Learning techniques that have been previously applied in credit card fraud detection. Though the data is imbalanced, the achieved results are high. For future work, the data would be balanced using the Synthetic Minority Over-sampling Technique (SMOTE) algorithm. Furthermore, a larger dataset would be used in order to achieve higher results. Comparison results with different state of the art methods would be performed using a common comparison framework. Ackowledgment. We would like to acknowledge the Artificial Intelligence and Data Analytics (AIDA) Lab, Prince Sultan University, Riyadh, Saudi Arabia for supporting this work.

References 1. Inc. US Legal. Internet fraud law and legal definition 2. U.S. payment card fraud losses by type 2018 | statistic 3. Mohammed, M., Khan, M.B., Bashier, E.B.M.: Machine Learning: Algorithms and Applications. Crc Press, Boca Raton (2016) 4. Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., Jiang, C.: Random forest for credit card fraud detection. In 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC), pp. 1–6. IEEE (2018) 5. Roy, A., Sun, J., Mahoney, R., Alonzi, L., Adams, S., Beling, P.: Deep learning detecting fraud in credit card transactions. In: Systems and Information Engineering Design Symposium (SIEDS), pp. 129–134 (2018) 6. Machine Learning: the Power and Promise of Computers That Learn by Example. The Royal Society, 2017, Machine Learning: the Power and Promise of Computers That Learn by Example. royalsociety.org/~/media/policy/projects/machine-learning/publications/machinelearning-report.pdf 7. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT press, Cambridge (2016) 8. Lu, Y.: Deep Neural Networks and Fraud Detection (2017) 9. Gandhi, R.: Introduction to Sequence Models - RNN, Bidirectional RNN, LSTM, GRU. Towards Data Science, Towards Data Science, 26 June 2018. towardsdatascience.com/introd uction-to-sequence-models-rnn-bidirectional-rnn-lstm-gru-73927ec9df15 10. Charleonnan, A.: Credit card fraud detection using RUS and MRN algorithms. In: Management and Innovation Technology International Conference (MITicon), 2016, pp. MIT-73. IEEE (2016) 11. Luo, C., Desheng, W., Dexiang, W.: A deep learning approach for credit scoring using credit default swaps. Eng. Appl. Artif. Intell. 65, 465–470 (2017) 12. Wang, C., Wang, Y., Ye, Z., Yan, L., Cai, W., Pan, S.: Credit card fraud detection based on whale algorithm optimized bp neural network. In: 2018 13th International Conference on Computer Science & Education (ICCSE), pp. 1–4. IEEE (2018) 13. Ghobadi, F., Rohani, M.: Cost sensitive modeling of credit card fraud using neural network strategy. In: International Conference of Signal Processing and Intelligent Systems (ICSPIS), pp. 1–5. IEEE (2016) 14. Zhang, X.-Y., Yin, F., Zhang, Y.-M., Liu, C.-L., Bengio, Y.: Drawing and recognizing Chinese characters with recurrent neural network. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 849–862 (2018) 15. Yogatama, D., Dyer, C., Ling, W., Blunsom, P.: Generative and discriminative text classification with recurrent neural networks. arXiv preprint arXiv:1703.01898 (2017)

Enhancing Credit Card Fraud Detection Using Deep Neural Network

313

16. Toderici, G., Vincent, D., Johnston, N., Hwang, S.J., Minnen, D., Shor, J., Covell, M.: Full resolution image compression with recurrent neural networks. In: CVPR, pp. 5435–5443 (2017) 17. Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396. IEEE (2017) 18. Kaggle: Your home for data science. https://www.kaggle.com 19. Tom SweersCredit Card Fraud. Radboud University, Bachelor thesis Computer Science (2018) 20. Naik, H.: Credit card fraud detection for online banking transactions. Int. J. Res. Appl. Sci. Eng. Technol. 6(4), 4573–4577 (2018) 21. Behera, T.K., Panigrahi, S.: Credit card fraud detection using a neuro-fuzzy expert system. In: Computational Intelligence in Data Mining, pp. 835–843. Springer, Singapore (2017) 22. Save, P., Tiwarekar, P., Jain, K.N., Mahyavanshi, N.: A novel idea for credit card fraud detection using decision tree. Int. J. Comput. Appl. 161(13) (2017) 23. Carneiro, N., Figueira, G., Costa, M.: A data mining based system for credit-card fraud detection in e-tail. Decis. Support Syst. 95, 91–101 (2017) 24. Maniraj, S.P., Aditya, S., Shadab, A., Deep Sarkar, S.: Credit card fraud detection using machine learning and data science. Int. J. Eng. Res. Technol. (IJERT) 08(09), September 2019 25. Renstrom, M., Holmsten, T.: Fraud Detection on Unlabeled Data with Unsupervised Machine Learning. The Royal Institute of Technology (2018) 26. Pumsirirat, A., Yan, L.: Credit card fraud detection using deep learning based on auto-encoder and restricted boltzmann machine. Int. J. Adv. Comput. Sci. Appl. 9(1), 18–25 (2018) 27. Reshma, R.S.: Deep learning enabled fraud detection in credit card transactions. Int. J. Res. Sci. Innov. (IJRSI) V(VII), 111–115 (2018) 28. Kazemi, Z., Zarrabi, H.: Using deep networks for fraud detection in the credit card transactions. In: 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), pp. 0630–0633. IEEE (2017) 29. Al-Shabi, M.: Credit card fraud detection using autoencoder model in unbalanced datasets. J. Adv. Math. Comput. Sci. 33(5), 1–16 (2019). https://doi.org/10.9734/jamcs/2019/v33i53 0192 30. Fu, K., Cheng, D., Tu, Y., Zhang, L.: Credit card fraud detection using convolutional neural networks. In: International Conference on Neural Information Processing, pp. 483–490. Springer, Cham (2016) 31. Pozzolo, A.D., Caelen, O., Johnson, R.A., Bontempi, G.: Calibrating probability with undersampling for unbalanced classification. In: 2015 IEEE Symposium Series on Computational Intelligence, pp. 159–166. IEEE (2015) 32. Financial Transaction Card Originated Messages, document ISO 8583-1 (2003). https://www. iso.org/standard/31628.html

Non-linear Aggregation of Filters to Improve Image Denoising Benjamin Guedj1,2(B) and Juliette Rengot3 1

2 3

Inria, Paris, France [email protected] University College London, London, UK Ecole des Ponts ParisTech, Paris, France [email protected] https://bguedj.github.io

Abstract. We introduce a novel aggregation method to efficiently perform image denoising. Preliminary filters are aggregated in a non-linear fashion, using a new metric of pixel proximity based on how the pool of filters reaches a consensus. We provide a theoretical bound to support our aggregation scheme, its numerical performance is illustrated and we show that the aggregate significantly outperforms each of the preliminary filters. Keywords: Image denoising · Statistical aggregation methods · Collaborative filtering

1

· Ensemble

Introduction

Denoising is a fundamental question in image processing. It aims at improving the quality of an image by removing the parasitic information that randomly adds to the details of the scene. This noise may be due to image capture conditions (lack of light, blurring, wrong tuning of field depth, . . . ) or to the camera itself (increase of sensor temperature, data transmission error, approximations made during digitization, . . . ). Therefore, the challenge consists in removing the noise from the image while preserving its structure. Many methods of denoising already have been introduced in the past decades – while good performance has been achieved, denoised images still tend to be too smooth (some details are lost) and blurred (edges are less sharp). Seeking to improve the performances of these algorithms is a very active research topic. The present paper introduces a new approach for denoising images, by bringing to the computer vision community ideas developed in the statistical learning literature. The main idea is to combine different classical denoising methods to obtain several predictions of the pixel to denoise. As each classic method has pros and cons and is more or less efficient according to the kind of noise or to the image structure, an asset of our method is that is makes the best out of each method’s strong points, pointing out the “wisdom of the crowd”. We adapt the c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 314–327, 2020. https://doi.org/10.1007/978-3-030-52246-9_22

Non-linear Aggregation of Filters to Improve Image Denoising

315

strategy proposed by the algorithm “COBRA - COmBined Regression Alternative” [2,10] to the specific context of image denoising. This algorithm has been implemented in the python library pycobra, available on https://pypi.org/ project/pycobra/. Aggregation strategies may be rephrased as collaborative filtering, since information is filtered by using a collaboration among multiple viewpoints. Collaborative filters have already been exploited in image denoising. [8] used them to create one of the most performing denoising algorithm: the block-matching and 3D collaborative filtering (BM3D). It puts together similar patches (2D fragments of the image) into 3D data arrays (called “groups”). It then produces a 3D estimate by jointly filtering grouped image blocks. The filtered blocks are placed again in their original positions, providing several estimations for each pixel. The information is aggregated to produce the final denoised image. This method is praised to well preserve fine details. Moreover, [13] proved that the visual quality of denoised images can be increased by adapting the denoising treatment to the local structures. They proposed an algorithm, based on BM3D, that uses different non-local filtering models in edge or smooth regions. Collaborative filters have also been associated to neural network architectures, by [18], to create new denoising solutions. When several denoising algorithms are available, finding the relevant aggregation has been addressed by several works. [16] focused on the analysis of patch-based denoising methods and shed light on their connection with statistical aggregation techniques. [6] proposed a patch-based Wiener filter which exploits patch redundancy. Their denoising approach is designed for near-optimal performance and reaches high denoising quality. Furthermore, [17] showed that usual patch-based denoising methods are less efficient on edge structures. The COBRA algorithm differs from the aforecited techniques, as it combines preliminary filters in a non-linear way. COBRA has been introduced and analysed by [2]. The paper is organized as follows. We present our aggregation method, based on the COBRA algorithm in Sect. 2. We then provide a thorough numerical experiments section (Sect. 3) to assess the performance of our method along with an automatic tuning procedure of preliminary filters as a byproduct.

2

The Method

We now present an image denoising version of the COBRA algorithm [2,10]. For each pixel p of the noisy image x, we may call on M different estimators (f1 ...fM ). We aggregate these estimators by doing a weighted average on the intensities:  q∈x ω(p, q)x(q) , (1) f (p) =  q∈x ω(p, q) and we define the weights as  ω(p, q) = 1

M 

k=1

 1(|fk (p) − fk (q)| ≤ ) ≥ M α ,

(2)

316

B. Guedj and J. Rengot

Fig. 1. General model

where  is a confidence parameter and α ∈ (0, 1) a proportion parameter. Note that while f is linear with respect to the intensity x, it is non-linear with respect to each of the preliminary estimators f1 , . . . , fM . These weights mean that, to denoise a pixel p, we average the intensities of pixels q such as a proportion at least α, of the preliminary estimators f1 , . . . , fM have the same value in p and in q, up to a confidence level . Let us emphasize here that our procedure averages the pixels’ intensities based on the weights (which involve this consensus metric). The intensity predicted for each pixel p of the image is f (p) and the COBRA-denoised image is the collection of pixels {f (p), p ∈ x}. This aggregation strategy is implemented in the python library pycobra [10]. The general scheme is presented in Fig. 1, and the pseudo-code in Algorithm 1. Users can control the number of used features thanks to the parameter “patch size”. For each pixel p to denoise, we consider the image patch, centered on p, of size (2 · patch size + 1) × (2 · patch size + 1). In the experiments section, patch size = 2 is usually a satisfying value. Thus, for each pixel, we construct a vector of nine features. The COBRA aggregation method has been introduced by [2] in a generic statistical learning framework, and is supported by a sharp oracle bound. For the sake of completeness, we reproduce here one of the key theorems. Theorem 1 (adapted from Theorem 2.1 in [2]). Assume we have M preliminary denoising methods. Let |x| denote the total number of pixels in image 1 x. Let  ∝ |x|− M +2 . Let f  denote the perfectly denoised image and f denote the COBRA aggregate defined in (1), then we have 2  E f(p) − f  (p) ≤

min

m=1,...,M

2

E [fm (p) − f  (p)] + C|x|− M +2 , 2

(3)

where C is a constant and the expectations are taken with respect to the pixels.

Non-linear Aggregation of Filters to Improve Image Denoising

317

Algorithm 1. Image denoising with COBRA aggregation INPUT: im noise = the noisy image to denoise psize = the pixel patch size to consider M = the number of COBRA machines to use OUTPUT: Y = the denoised image Xtrain ← training images with artificial noise Ytrain ← original training images (ground truth) cobra ← initial COBRA model cobra ← to adjust COBRA model parameters with respect to the data (Xtrain, Ytrain) cobra ← to load M COBRA machines cobra ← to aggregate the predictions Xtest ← feature extraction from im noise in a vector of size (nb pixels, (2·psize +1)2 ) Y ← prediction of Xtest by cobra Y ← to add im noise values lost at the borders of the image, because of the patch processing, to Y

What Theorem 1 tells us is that on average on all the image’s pixels, the quadratic error between the COBRA denoised image and the perfectly denoised image is upper bounded by the best (i.e., minimal) same error from the preliminary pool of M denoising methods, up to a term which decays to zero as the number of pixels to the −1/M . As highlighted in the numerical experiments reported in the next section, M is of the order of 5–10 machines and this remainder term is therefore expected to be small in most useful cases for COBRA. Note that in (3), the leading constant (in front of the minimum) is 1: the oracle inequality is said to be sharp. Note also that contrary to more classical aggregation or model selection methods, COBRA mactches or outperforms the best preliminary filter’s performance, even though it does not need to identify this champion filter. As a matter of fact, COBRA is adaptive to the pool of filters as the champion is not needed in (1). More comments on this result, and proofs are presented in [2].

3

Numerical Experiments

This section illustrates the behaviour of COBRA. All code material (in Python) to replicate the experiments presented in this paper are available at https:// github.com/bguedj/cobra denoising.

318

3.1

B. Guedj and J. Rengot

Noise Settings

We artificially add some disturbances to good quality images (i.e. without noise). We focus on five classical settings: the Gaussian noise, the salt-and-pepper noise, the Poisson noise, the speckle noise and the random suppression of patches (summarised in Fig. 2).

Fig. 2. The different kinds of noise used in our experiments.

3.2

Preliminary Denoising Algorithms

We focus on ten classical denoising methods: the Gaussian filter, the median filter, the bilateral filter, Chambolle’s method [5], non-local means [3,4], the Richardson-Lucy deconvolution [14,15], the Lee filter [12], K-SVD [1], BM3D [8] and the inpainting method [7,9]. This way, we intend to capture different regimes of performance (Gaussian filters are known to yield blurry edges, the median filter is known to be efficient against salt-and-pepper noise, the bilateral filter well preserves the edges, non-local means are praised to better preserve the details of the image, Lee filers are designed to address Synthetic Aperture Radar (SAR) image despeckling problems, K-SVD and BM3D are state-of-the-art approaches, inpainting is designed to reconstruct lost part, etc.), as the COBRA aggregation scheme is designed to blend together machines with various levels of performance and adaptively use the best local method. 3.3

Model Training

We start with 25 images (y1 ...y25 ), assumed not to be noisy, that we use as “ground truth”. We artificially add noise as described above, yielding 125 noisy images (x1 ...x125 ). Then two independent copies of each noisy image are created by adding a normal noise: one goes to the data pool to train the preliminary filters, the other one to the data pool to compute the weights defined in (2) and perform aggregation. This separation is intended to avoid over-fitting issues [as discussed in [2]]. The whole dataset creation process is illustrated in Fig. 3. 3.4

Parameters Optimisation

The meta-parameters for COBRA are α (how many preliminary filters must agree to retain the pixel) and  (the confidence level with which we declare two pixels identities similar). For example, choosing α = 1 and  = 0.1 means that we impose that all the machines must agree on pixels whose predicted intensities are at most different by a 0.1 margin. The python library pycobra ships with a dedicated class to derive the optimal values using cross-validation [10]. Optimal values are α = 4/7 and  = 0.2 in our setting.

Non-linear Aggregation of Filters to Improve Image Denoising

319

Fig. 3. Data set construction.

3.5

Assessing the Performance

We evaluate the quality of the denoised image Id (whose mean is denoted μd and standard deviation σd ) with respect to the original image Io (whose mean is denoted μo and standard deviation σo ) with four different metrics. – Mean Absolute Error (MAE - the closer to zero the better) given by N M Σx=1 Σy=1

|Id (x, y) − Io (x, y)| . N ×M

– Root Mean Square Error (RMSE - the closer to zero the better) given by 2 N Σ M (Id (x, y) − Io (x, y)) . Σx=1 y=1 N ×M – Peak Signal to Noise Ratio (PSNR - the larger the better) given by

d2 10 · log10 RMSE2 with d the signal dynamic (maximal possible value for a pixel intensity).

320

B. Guedj and J. Rengot

– Universal image Quality Index (UQI - the closer to one the better) given by cov(Io , Id ) 2 · μo · μd 2 · σo · σd · 2 · 2 σ ·σ μ + μ2 σ + σ2 o  d  o  d  o  d  (i)

(ii)

(iii)

where term (i) is the correlation, (ii) is the mean luminance similarity, and (iii) is the contrast similarity [19, Eq. 2]. 3.6

Results

Our experiments run on the gray-scale “lena” reference image (range 0–255). In all tables, experiments have been repeated 100 times to compute descriptive statistics. The green line (respectively, red) identifies the best (respectively, worst) performance. The yellow line identifies the best performance among the preliminary denoising algorithms if COBRA achieves the best performance. The first image is noisy, the second is what COBRA outputs, and the third is the difference between the ideal image (with no noise) and the COBRA denoised image. Results – Gaussian noise (Fig. 4). We add to the reference image “lena” a Gaussian noise of mean μ = 127.5 and of standard deviation σ = 25.5. Unsurprisingly, the best filter is the Gaussian filter, and the performance of the COBRA aggregate is tailing when the noise level is unknown. When the noise level is known, COBRA outperforms all preliminary filters. Note that the bilateral filter gives better results than non-local means. This is not surprising: [11] reaches the same conclusion for high noise levels. Results – salt-and-pepper noise (Fig. 5). The proportion of white to black pixels is set to sp ratio = 0.2 and such that the proportion of pixels to replace is sp amount = 0.1. Even if the noise level is unknown, COBRA outperforms all filters, even the champion BM3D. Results – Poisson noise (Fig. 6). COBRA outperforms all preliminary filters. Results – speckle noise (Fig. e 7). When confronted with a speckle noise, COBRA outperforms all preliminary filters. Note that this is a difficult task and most filters have a hard time denoising the image. The message of aggregation is that even in adversarial situations, the aggregate (strictly) improves on the performance of the preliminary pool of methods. Results – random patches suppression (Fig. 8). We randomly suppress 20 patches of size (4 × 4) pixels from the original image. These pixels become white. Unsurprisingly, the best filter is the inpainting method – as a matter of fact this is the only filter which succeeds in denoising the image, as it is quite a specific noise. Results – images containing several kinds of noise (Fig. 9). On all previous examples, COBRA matches or outperforms the performance of the best filter for each kind of noise (to the notable exception of missing patches, where inpainting

Non-linear Aggregation of Filters to Improve Image Denoising

(a) Noisy image

(b) COBRA

(c) Diff. ideal-COBRA

Fig. 4. Results – Gaussian noise.

(a) Noisy image

(b) COBRA

(c) Diff. ideal-COBRA

Fig. 5. Result – salt-and-pepper noise.

321

322

B. Guedj and J. Rengot

methods are superior). Finally, as the type of noise is usually unknown and even hard to infer from images, we are interested in putting all filters and COBRA to test when facing multiple types of noise levels. We apply a Gaussian noise in the upper left-hand corner, a salt-and-pepper noise in the upper right-hand corner a noise of Poisson in the lower left-hand corner and a speckle noise in the lower right-hand corner. In addition, we randomly suppress small patches on the whole image (see Fig. 9a). In this now much more adversarial situation, none of the preliminary filters can achieve proper denoising. This is the kind of setting where aggregation is the most interesting, as it will make the best of each filter’s abilities. As a matter of fact, COBRA significantly outperforms all preliminary filters. 3.7

Automatic Tuning of Filters

Clearly, internal parameters for the classical preliminary filters may have a crucial impact. For example, the median filter is particularly well suited for saltand-pepper noise, although the filter size has to be chosen carefully as it should grow with the noise level (which is unknown in practice). A nice byproduct of our aggregated scheme is that we can also perform automatic and adaptive tuning of those parameters, by feeding COBRA with as many machines as possible values for these parameters. Let us illustrate this on a simple example: we train our model with only one classical method but with several values of the parameter

(a) Noisy image

(b) COBRA

(c) Diff. ideal-COBRA

Fig. 6. Results – Poisson noise.

Non-linear Aggregation of Filters to Improve Image Denoising

(a) Noisy image

(b) COBRA

(c) Diff. ideal-COBRA

Fig. 7. Results – speckle noise.

(a) Noisy image

(b) COBRA

(c) Diff. ideal-COBRA

Fig. 8. Results – random suppression of patches.

323

324

B. Guedj and J. Rengot

(a) Noisy image

(b) COBRA (Un- (c) COBRA (d) Bilateral fil- (e) Non-local known noise) (Known noise) ter means

(f) Richardson- (g) Gaussian fil- (h) Median filter (i) TV Lucy deconvolu- ter bolle tion

(j) Inpainting

(k) K-SVD

(l) BM3D

Cham-

(m) Lee filter

Fig. 9. Denoising an image afflicted with multiple noises types.

Non-linear Aggregation of Filters to Improve Image Denoising

325

to tune. For example, we can define three machines applying median filters with different filter sizes: 3, 5 or 10. Whatever the noise level our approach achieves the best performance (Fig. 10). This casts our approach onto the adaptive setting where we can efficiently denoise an image regardless of its (unknown) noise level.

Fig. 10. Automatic tuning of the median filter using COBRA.

4

Conclusion

We have presented a generic aggregated denoising method—called COBRA— which improves on the performance of preliminary filters, makes the most of their abilities (e.g., adaptation to a particular kind of noise) and automatically adapts to the unknown noise level. COBRA is supported by a sharp oracle inequality demonstrating its optimality, up to an explicit remainder term which

326

B. Guedj and J. Rengot

quickly goes to zero. Numerical experiment suggests that our method achieves the best performance when dealing with several types of noise. Let us conclude by stressing that our approach is generic in the sense that any preliminary filters could be aggregated, regardless of their nature and specific abilities. While stacking many preliminary filters obviously induces an extra computational cost, the COBRA aggregate will benefit from a statistical accuracy perspective. We hope to help diffuse non-linear aggregation to the denoising community.

References 1. Aharon, M., Elad, M., Bruckstein, A., et al.: K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311 (2006) 2. Biau, G., Fischer, A., Guedj, B., Malley, J.D.: Cobra: a combined regression strategy. J. Multivariate Anal. 146, 18–28 (2016) 3. Buades, A., Coll, B., Morel, J.: A non-local algorithm for image denoising. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 60–65 (2005) 4. Buades, A., Coll, B., Morel, J.M.: Non-local means denoising. Image Process. Line 1, 208–212 (2011) 5. Chambolle, A.: Total variation minimization and a class of binary MRF models. Energy Minimization Methods Comput. Vis. Pattern Recogn. 3757, 132–152 (2005) 6. Chatterjee, P., Milanfar, P.: Patch-based near-optimal image denoising. IEEE Trans. Image Process. 21(4), 1635–1649 (2012) 7. Chuiab, C., Mhaskar, H.: MRA contextual-recovery extension of smooth functions on manifolds. Appl. Comput. Harmon. Anal. 28, 104–113 (2010) 8. Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080– 2095 (2007) 9. Damelin, S., Hoang, N.: On surface completion and image inpainting by biharmonic functions: numerical aspects. Int. J. Math. Math. Sci. 2018, 8 (2018) 10. Guedj, B., Srinivasa Desikan, B.: Pycobra: a python toolbox for ensemble learning and visualisation. J. Mach. Learn. Res. 18(190), 1–5 (2018) 11. Kumar, B.S.: Image denoising based on non-local means filter and its method noise thresholding. SIViP 7(6), 1211–1227 (2013) 12. Lee, J.S., Jurkevich, L., Dewaele, P., Wambacq, P., Oosterlinck, A.: Speckle filtering of synthetic aperture radar images: a review. Remote Sens. Rev. 8(4), 313–340 (1994) 13. Liu, J., Liu, R., Chen, J., Yang, Y., Ma, D.: Collaborative filtering denoising algorithm based on the nonlocal centralized sparse representation model. In: 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) (2017) 14. Lucy, L.: An iterative technique for the rectification of observed distributions. Astron. J. 19, 745 (1974) 15. Richardson, W.H.: Bayesian-based iterative method of image restoration. J. Opt. Soc. Am. 62, 55–59 (1972)

Non-linear Aggregation of Filters to Improve Image Denoising

327

16. Salmon, J., Le Pennec, E.: Nl-means and aggregation procedures. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 2977–2980, November 2009 17. Salmon, J.: Agr´egation d’estimateurs et m´ethodes ` a patch pour le d´ebruitage d’images num´eriques. PhD thesis, Universit´e Paris-Diderot-Paris VII (2010) 18. Strub, F., Mary, J.: Collaborative filtering with stacked denoising autoencoders and sparse inputs. In: NIPS workshop on machine learning for eCommerce (2015) 19. Wang, Z., Bovik, A.C.: A universal image quality index. IEEE Signal Process. Lett. 9(3), 81–84 (2002)

Comparative Study of Classifiers for Blurred Images Ratiba Gueraichi(B) and Amina Serir Image Processing and Radiations Laboratory (LTIR), Houari Boumediene University of Science and Technology (USTHB), Algiers, Algeria {rgueraichi,aserir}@usthb.dz

Abstract. In this paper, we want to launch a first step for the classification of images according to their degree of blur based on the subjective measurement of DMOS image quality in the Gblur database. For this purpose, we have carried out a comparative study on several classifiers in order to build a robust learning model based on the transformation into a DCT. The class imbalanced has forced us to look for and find appropriate performance evaluation metrics so that the comparison is not biased. It has been found that random forests (RF) give the best overall performance but that other classifiers discriminate better between certain types of images (depending on the degree of blur) than others. Finally, we compared the classification by the proposed model with another classification based on the NIQE quality measurement algorithm. The results of the model proposed by its simplicity are very promising. Keywords: Discrete Cosine Transform (DCT) · Blur classification · Random Forests (RF)

1 Introduction In recent years, we have seen a significant technological deployment of audio-visual content, particularly image acquisition and processing systems such as digital cameras used in smartphones, video surveillance, etc. However, these technological advances are often accompanied by new issues such as the introduction of artifacts during acquisition, coding or transmission. To control and improve image quality, it is imperative during acquisition that management, communication and processing systems are able to detect, identify and quantify degradation introduced into the image. In practice, blur is considered as a major problem in image quality degradation. It manifests itself as a loss of sharpness at the edges and a decrease in the visibility of fine details. There are several types of blur such as Gaussian blur, motion blur, de-focus blur, etc. To detect blur in images, one of two main approaches could be used: modeling the blur effect or analyzing its disturbing effect on the human visual system HVS [1–6]. The most popular techniques for characterizing the blur effect are approaches based on © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 328–336, 2020. https://doi.org/10.1007/978-3-030-52246-9_23

Comparative Study of Classifiers for Blurred Images

329

edge analysis and transformation [7–9]. The widely used blur identification methods based on transformation, are generally applied in certain frequency domains: local DCT coefficients [9, 10] and image wavelet coefficients [11–13]. A good restoration of a blurred image depends on the right choice of the restoring algorithm which is in an adequacy with the blur rate of this image [14]. Thus, it would be wise to automatically classify images according to the degree of blurring followed by adequate restoration. In this work, we limited ourselves to classify the images in the LIVE database according to three degrees of blur: low blur, medium blur and high blur (based on a descriptor vector formed by statistics of DCT coefficients of blocks (8 × 8)), make a comparative analysis of the following classifiers, namely, k-NN, Naïve-Bayes (NB), multilayer perceptron (MLP), Support Vector Machine (SVMs), random forest (RF) and finally find the classifier which discriminates in the best way each category of blur. The remainder of this paper is organized as follows: Sect. 2 describes the approach taken for extracting features using DCT transformation. In Sect. 3, we will briefly describe the different classifiers used. In Sect. 4, we will present the experimental results obtained. Finally, the last section is devoted to the conclusion of this report and gives some perspectives.

2 Features Extraction The calculation of the descriptor vector is done in the locally frequency domain: • An image of the Gblur dataset (from Live database) is divided into blocks of size (8 × 8) and on which the cosine transform (DCT) is applied. • Each block obtained is divided (in addition to the DC band) into three regions R1, R2, and R3 which represent the low frequency (LF), medium frequency (MF) and high frequency (HF) regions, respectively (Fig. 1). These frequency regions are divided according to their sensitivity to distortions and are consistent with experimental psychophysical results [15]. • On each frequency range, local statistics (mean, variance, kurtosis, skewness, entropy and energy, of all AC components constituting each region) are calculated. • The vector descriptor of the image consists of the averages of respective statistics found in all the blocks of the image. The descriptor vector will therefore be composed of 18 features.

3 Classification At this step, a test image (I) is classified as “slightly blurred”, “moderately blurred” or “strongly blurred” according to the three labels based on the DMOS (Difference Mean

330

R. Gueraichi and A. Serir

Fig. 1. Representation of (8 × 8) DCT bloc Frequency bands.

Opinion Score) values provided by Live’s Gblur database, as expressed below: I is slightly blurred if 19 < DMOS ≤ 30 I is moderately blurred if 30 < DMOS ≤ 60 I is strongly blurred if DMOS > 60

(1)

A priori all types of classifiers can be used in image classification, but formally only classifiers capable of dealing with data complexity should be used. In fact, there are two main families: the first is based on a statistical and probabilistic approach, which presupposes the form of the laws as the Bayesian networks which are a good example; while the second which are considered unconscious a priori examples, include multilayer perceptron (MLP), k nearest neighbors (k-NN) and support vector machines (SVMs) as well as a decision tree classifier called random forests (RF). We will give an overview of each classifier. 3.1 Bayesian Naive Networks (NB) They are based on Bayesian decision theory. The decision problem is assumed to be probabilistic in nature and it is a question of calculating posterior probabilities for a given a class. Bayesian Naive networks are a special case of Bayesian networks where the characteristics of the data are assumed to be statistically independent, which facilitates the use of the Bayesian rule [16–19]. 3.2 The Multilayer Perceptron (MLP) MLP are neural networks interfaced by hidden layers. This will involve adjusting between the number of hidden layers and weights in order to optimize the prediction error on data that are generally non-linearly separable by a learning algorithm, the most popular being the propagation of the gradient [16–20].

Comparative Study of Classifiers for Blurred Images

331

3.3 k-Nearest Neighbors (k-NN) It is a question of assigning a particular class to a given point around a certain number of points. To do this, the distance from this point to as many points around it as possible should be calculated. Theoretically several distances can be used (Euclidean, Mahalanobis, Minkowsky, etc.), the most common is the Euclidean distance. The k that gives the best recognition rate is adopted [20, 22]. 3.4 Support Vector Machines (SVMs) It is a learning algorithm whose purpose is to search for a decision rule based on a hyperplane separation of maximum margin. The search for the optimal hyperplane is reinforced by the very original idea of SVMs, since the separation margin between two classes can always be maximized, when the classification errors are minimized by finding the optimal parameters such as the parameter of the kernel used (Sigmoid, Polynomial, RBF, …) and the regulation parameter C which constitutes a compromise between the maximization of the margin and the classification error due to non-linearly separable data. In the multi-class case, several approaches can be used, the most well-known are: the one against one approach and the one against all approach. In our case we opted for the last approach. It is the easiest and oldest method of decomposition. It consists of using one binary classifier (with real values) per class. The k-th classifier is intended to distinguish the index class k from all the others. To assign an example, it is therefore presented to Q classifiers, and the decision is obtained according to the “winner-takesall” principle: the label selected is the one associated with the classifier that returned the highest value [16–21]. 3.5 Random Forests (RF) A Random Forests is a classifier composed of a set of elementary classifiers of the decision tree type, noted {h(x, θk )|k = 1, . . . , L} where {θk } is a family of independent and identically distributed random vectors, and within which each tree participates in the voting of the most popular class for an input data x. The main advantage of this type of classifier is that even by increasing the number of decision trees, the model never tends towards overfitting [20, 23].

4 Experimental Part and Results Before starting to explain the results of the outputs of each classifier applied to this database, it is important to make some relevant remarks: • To compare the performance of the above-mentioned classifiers, we used the k-fold cross validation with k = 10. • By analyzing the classifier k-NN, the best result is obtained with a parameter k = 3.

332

R. Gueraichi and A. Serir

The results of the learning algorithms must be carefully analyzed and interpreted correctly. This is why several studies have been carried out to find evaluation metrics that respond to each type of data in an appropriate way [24, 25]. These studies focus essentially on the robustness of each metric in the presence of unbalanced data as is the case for our data [26]. The chosen database of blurred images is Live’s Gblur database, which consists of 145 blurred images. Applying the class selection criteria given by expression (1), 39 low blur images (class 1), 78 medium blur images (class 2) and 28 high blur images (class 3) are found. The distribution of classes is thus quite unbalanced. So in order to properly compare the performance of learning algorithms, it would be wise to choose metrics that are not sensitive to class imbalanced. 4.1 Selection of Evaluation Metrics for Unbalanced Data Different evaluation methods are sensitive to unbalanced data when there are more samples of one class in a dataset than in the other classes. If we take the example of a confusion matrix (2 × 2) given in Fig. 2:

Fig. 2. (2 × 2) Confusion matrix.

The class distribution is the ratio between positive and negative samples (P/N) which represents the relationship between two columns. Any evaluation metric that uses the values of the two columns will be sensitive to unbalanced data such as accuracy and precision with exceptions where changes in class distribution cancel each other out [27]. • This is the case of the geometric mean (GM) (also called G-score) whose equation is given below [27]:  √ TN TP × (2) GM = TPR × TNR = TP + FN TN + FP Where TPR and TNR design the true positive rate and the true negative rate, respectively. • A very good measure worth mentioning is Cohen’s kappa-statistics because it can very well manage multi-class and unbalanced class problems. It is defined as: κ=

1 − p0 p0 − pe =1− 1 − pe 1 − pe

(3)

Comparative Study of Classifiers for Blurred Images

333

Where p0 is the observed chord, and pe is the expected indicates how much better our classifier performs than the performance of a classifier that simply guesses at random according to the frequency of each class [27]. • There are metrics based on graphs, as the well-known ROC curve (Receiver Operating Characteristics) which is the plot of the TPR as a function of the FPR (false positive rate). This graph shows the trade-off between specificity and sensitivity. However, it requires particular caution and can be misleading when applied in unbalanced classification scenarios. In addition, it is difficult in many cases to interpret false positive results. • Many studies have shown that the alternative to ROC is the PRC (Precision Recall Curve) which configures the precision according to the recall (or TPR). This graph (PRC) can provide the user with an accurate prediction of future classification performance as it evaluates the fraction of true positives among the positive predictions. Thus, it is a robust metric even with unbalanced data [28]. The score of the area under the PRC curve, noted AUC (PRC), is also effective in multiple classifier comparisons [29]. The AUC can also be generalized to the multi-class parameter. This approach is based on the adjustment of one-vs-all classifiers where, in the i-th iteration, the group is defined as the positive class, while all other classes are considered as negative classes. 4.2 Obtained Results The performances of the five classifiers calculated from the first two selected (scalar) metrics are summarized in Table 1. Table 1. Kappa and G-score values for the five classifiers. Classifier Kappa (%) G-score (%) N-B

57.6

78.1

k-NN

66.2

82.9

SVMs

70

84.1

MLP

74.2

85.9

RF

75.3

86.3

The last metric (AUC-PRC) was applied from both a global and local perspective, i.e. for each class. The results are given from the graph of Fig. 3. From these results, we can recognize that the classifier that best discriminates these images in slightly blurred, moderately blurred and highly blurred images is the “Random Forests” classifier. In addition, its overall performance (for kappa statistics, G-score or global CUA-PRC) is also the best. However, MLP and SVMs classifiers are closely following. Thus, the SVMs classifier recognizes better (than the MLP) slightly and

334

R. Gueraichi and A. Serir

Fig. 3. Comparison of (micro and macro) performance (AUC-PRC) of the five classifiers.

strongly blurred images, while the MLP classifier is better than the SVMs for moderately blurred images. 4.3 Comparison of the Obtained Results from the Proposed Descriptor with Others Methods Obtained (from Quality Measurement) As explained above, the three classes that represent the blur rate in images were based on the value of the DMOS (Difference Mean Opinion Score) which is a subjective measure of image quality. Therefore, our classification results were compared against a classification ones based on objective methods of blind image quality measurement (without reference) (NR-IQA) applied: NIQE (Natural Image Quality Evaluator) algorithm [30]. The NIQE Algorithm: The principle of this algorithm is based on (natural) images that do not have human judgments, we speak of a “no opinion” model (OR: Opinion Unaware); moreover, the type of distortions that can affect these images are not known, we speak of the model (DU: Distortion Unaware). As a result, this algorithm is applied to an “NSS-driven blind OR-DU IQA” model [30]. We limited ourselves to comparing our results to those of this method (applied to the Gblur database of live), to using Cohen’s kappa metric which is a relevant metric. In addition, the comparison will be made with the best result obtained (random forests classifier) (Table 2). Table 2. Evaluation of the proposed results relative to others methods. Methods

Kappa (%)

NIQE

59.4

RF (proposed) 75.3

According to these results, the proposed method has the best performance because the kappa value is higher than 0.7. So, it remains better than the classification performance using the NIQE method.

Comparative Study of Classifiers for Blurred Images

335

5 Conclusion and Perspectives The comparative study carried out on the five classifiers for the classification of images according to the three degrees of blurring, is only the first step for obtaining the training model which discriminate in the best way the blur rate in an image, using a descriptor vector based on the statistics of the DCT transform coefficients for image blocks of size (8 × 8). This vector gives good classification performances with the classifiers such as SVMs, MLP and RF where the kappa rate is greater or equal to 0.7 value for blurred images. Moreover, this study has shown the performance superiority of the Random Forests (RF) classifier and its robustness compared to other classifiers, particularly with metrics adapted to unbalanced data. This model from RF classifier, which is applied to the Gblur database of Live, can be tested on other known databases like TID and CSIQ databases. It will also serve in practice in the restoration of blurred images in an adequate and simple manner according to their blur classification and their quality measurement. The twice aspects will be the subject of future work.

References 1. Wang, Z., Bovic, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment from error visibility to structure similarity. IEEE Trans. Image Process. 4(113), 600–612 (2004) 2. George, A.G., Prabavathy, K.A.: A survey on different approaches used in image quality assessment. Int. J. Emerg. Technol. Adv. Eng. 3(2) (2013) 3. Ferzli, R., Karam, L.J.: A Human Visual System Based No-Reference Objective Image Sharpness Metric. In: Editor, F., IEEE International Conference on Image Processing, Atlanta, pp. 2949–2952 (2006) 4. Ferzli, R., Karam, L.J.: A No-reference objective image sharpness based on the notion of just-noticeable of blur (JNB). IEEE Trans. image processing 18(4), 717–728 (2009) 5. Zhu, X., Milanfar, P.: A No-Reference Sharpness Metric sensitive to blur and noise. In: First International Workshop on Quality Multimedia Experience, San Diego, pp. 64–69 (2009) 6. Marziliano, P., Dufaux, F., Winkler, S. and Ebrahimi, T.: A No-Reference Perceptual blur metric. In: International Conference on Image Processing, vol 3, pp. 57–60 (2002) 7. Ong, E.P, Lin, W.S, Lu, Z.K, Yao, S.S., Yang, X.K, Jiang, L.F.: No-Reference quality Metric for measuring image. In: Proceedings IEEE International Conference on Image Processing, vol. 1, pp. 469–472 (2003) 8. Caviedes, G.E., Oberti, F.: A new sharpness metric based on local kurtosis, edge and energy information. Sign. Process. Image Commun. 19, 147–163 (2004) 9. Marichal, X., Ma, W.Y., Zhang, H.: Blur determination in the compressed domain using DCT information. In: Conference: Image Processing, ICIP 99, vol. 2 (1999) 10. Zhang, N., Vladar, A., Postek, M., Larrabee, B.: A Kurtosis-based statistical for two dimensional process and its application to image sharpness. In: Proceedings Section of physical and engineering Sciences of American Statistical Society, pp. 4730–4736 (2003) 11. Tang, H., Ming Jing, L. I., Zhang, H.J, Zhang, C.: Blur Detection for Images Using Wavelet Transform Conference of Multimedia and Expositions, vol. 1, pp 17–20 (2009) 12. Kerrouh, F., Serir, A.: A no-reference quality metric for evaluating blur image in wavelet domain. Int. J. Digital Inf. Wireless Commun. (IJDIWC) 1(4), 767–776 (2012)

336

R. Gueraichi and A. Serir

13. Tang, H., Ming Jing, L I., Zhang, H.J, Zhang, C.: Blur Detection for Images using Wavelet Transform. In: Conference of Multimedia and Expositions, vol. 1, pp. 17–20 (2009) 14. Kerrouh, F.: A No-Reference Quality measure of blurred images (videos), PhD Thesis in Electronics, Univ, Algiers (2014) 15. Bae, S.H., Kim, M.: A novel DCT-based JND model for luminance adaptation effect in DCT frequency. IEEE Sign. Process. Lett. 20(9), 893–896 (2013) 16. Cheriet, M., Kharma, N., Liu, C.L., Suen, C.Y.: Character Recognition Systems. Wiley, New Jersey (2007) 17. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, Fourth Edition, Edited by Academic Press, Elsevier Inc. (2009) 18. Duda, R.O., Hart, P.O. Stork, D.G.: Pattern Classification, snd Ed. Wiley, New Jersey (1997) 19. de Sá, J.P.M.: Pattern Recognition. In: Concepts, Methods and Applications. Edited by Springer (2001) 20. Witten, I.H, Frank, E.: Data Mining, Practical Machine Learning Tools and Technics. Morgan Kauffman Publishers, Elsevier, Burlington (2005) 21. Vapnick, V.: The Nature of Statistical Learning Theory. Springer, New York (2000) 22. Mathieu-Dupas, E.: Algorithmes des k plus proches voisins pondérés et application en diagnostic. 42èmes Journées de Statistique, 2010, Marseille, France (2010) 23. Breiman, L.: Random Forests Machine Learning, vol. 45, no. 1, pp. 5–32. Kluwer academic Publishers, Berlin (2001) 24. Hamdi, F.: Learning in unbalanced distributions, Doctorate Thesis in Computer Sciences, Univ Paris, vol. 13 (2012) 25. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009) 26. Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Australasian Joint Conference on Artificial Intelligence, pp. 1015–1021 (2006) 27. Tharwat, A.: Classification assessment methods. Journal homepage. http://www.sciencedi rect.com. Accessed August 2019 28. Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets, vol. 10, no. 3 (2015) 29. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Conference: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006) 30. Mittal, A., Soundararajan, R., Bovik, A.C.: Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 20(3), 209–212 (2013)

A Raspberry Pi-Based Identity Verification Through Face Recognition Using Constrained Images Alvin Jason A. Virata1,2(B) and Enrique D. Festijo1 1 Technological Institute of the Philippines Arlegui, Manila, Philippines

[email protected], [email protected] 2 St. Dominic College of Asia, Bacoor, Philippines

Abstract. In this paper, we proposed a novel, cost-effective and energy-efficient framework by introducing a Raspberry Pi-based identity verification through face recognition in an offline mode. A Raspberry Pi device and mobile phone is wirelessly connected to standard pocket WIFI to process the face detection and face verification. The experimental tests were done using 3000 public constrained images. The proposed method is implemented on Raspberry Pi 3 run in Python 3.7 where the datasets and trained datasets were experimented using LBP algorithm for face detection and face verification in five split testing. The result was interpreted using the confusion matrix and Area Under the Curve (AUC) and Receiver Operating Characteristics (ROC). To sum up, the proposed method showed an average result of 0.98135% accuracy, with 98% recall score and an F1-score of 0.9881. During the offline mode testing, the face detection and verification average timing is 1.4 s. Keywords: Raspberry Pi · Local Binary Pattern (LBP) · Face recognition · Confusion matrix · Area Under the Curve (AUC) and Receiver Operating Characteristics (ROC)

1 Introduction The study on face recognition has been one of the challenging topics since 1970’s due to its weaknesses and limitations. The increase of studies about face recognition have caught the attention of many researchers which aim to improve the studies on biometrics particularly with regards to security implementation [1]. With the fast-paced development of technology, even industry sector leaders have adopted these technological changes and innovations where they played significant roles in the development of embedded devices like Raspberry Pi and Arduino that greatly contribute to the expansion of the mobile applications. Noticeably, with the current impact of mobile technology, it is becoming part of human life system particularly with regards to an effective and convenient way of communication [2]. However, in the Philippines, it is still considered in the infancy period to adopt identity verification system through face recognition. Though, consumers are already fascinated and excited to experience the benefits of implementing face recognition as part of the security measures [3]. © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 337–349, 2020. https://doi.org/10.1007/978-3-030-52246-9_24

338

A. J. A. Virata and E. D. Festijo

In spite of the rapid growth of mobile technology and being recognized for its powerful trend in the field of commerce, there is still truth that cannot be left behind about the limitation of the mobile technology resources in terms of battery life, storage and bandwidth that may influence the mobility and security with regards to communication efficiency and effectiveness. These weaknesses, however, would be a core dilemma on mobile technology service qualities as it affects the capacity to perform computationally intensive application and the demand on storage processes. These obstacles may have hindered the mobile application capability in which demands to consider at least a cloud application to resolve the issue [4–6]. In addition, the human face is considered a complex pattern. Identifying human faces automatically in a certain domain could be difficult [7]. This issue on face recognition have strongly attracted the attention of researchers even from multidisciplinary studies due to its possible unique contribution to the research community. However, on its implementation phase, there are still several challenges and issues with regards to some parameters that need to be considered and calculated to generate a precise result during face recognition and verification process [8] it is because a human face is a dynamic object due to its high level of variability in its attributes. A variety of techniques such as using a simple-edge algorithm to a complex high-level approach to expedite the process of pattern recognition have been recommended [9]. Also, another challenge on face recognition arises when the face is not a rigid object and face image was taken from other different viewpoints of the face [10]. Commonly, face recognition happened during face detection process that greatly contributes to the performance of the sequential operations on identity verification. Rapid developments on face detection algorithms have also been improved by processing a multiview (multipose) images using similar framework [11]. In conducting face recognition, there are three tasks involved in the process: (1) verification, (2) identification and (3) watch list. Verification happens when the query on face image matches the expected identity. While on the other hand, identification takes place when the identity of the query face image is determined. The watch list, however, are the records in the database for future query of the person’s identity [12]. Therefore, face detection happens when the process of locating faces in a given domain using different algorithms is applied in the design environment [13, 14] while face recognition is mainly used for the tasks of identification or verification. The process of confirming the identity of any unknown face image happens when a face data of the person is compared to the face images stored in the database is called face identification. Hence, identity verification is guaranteed genuine when the face image matches the individual feature or attributes. With this wide range of scope, any face recognition is performed based on facial features, emerging technologies and algorithm [15, 16]. To resolve the aforementioned issues and challenges on face recognition on mobile technology environment and making it possible to process face recognition without using a cloud environment, this proposed framework main original work contributions are as follows:

A Raspberry Pi-Based Identity Verification Through Face Recognition

339

1. A Raspberry Pi based identity verification through face detection and recognition was done in an offline mode inserted with microSD memory card to be used as storage. 2. The local binary pattern (LBP) was incorporated in the framework and is considered computationally inexpensive making it possible for a Raspberry Pi based framework. 3. The framework is computationally inexpensive as training sets were done separately using ordinary desktop computer or laptop within a minimal amount of processing. 4. Established an average accuracy result of 98% by conducting five split testing using the 3,000 downloaded constrained images. Further, the remaining structure of this study is organized as follows: in Sect. 2, it discusses the System Design and Development, to be followed by results and discussions, and lastly in Sect. 4, the conclusion and future research directions are explained.

2 System Design and Development 2.1 System Configuration Identity verification through face recognition has widely attracted the research community due to its diverse applicability in different facets of real-time situations in the category of security, monitoring, marketing, segmental analysis and data science. However, due to mobile application challenges in terms of computational power consumption and memory storage, it opens an opportunity to address these issues. In this study, the mobile phone is wirelessly connected to Raspberry Pi device which was used to store the trained data sets to process the face detection and verification. The Raspberry Pi device used has the following specification and configuration and hardware details presented in Table 1. Table 1. Raspberry Pi specifications for the proposed framework Name

Configuration

Processor

Broadcom BCM2837B0, Cortex-A53 (ARMv8) 64-bit SoC @ 1.4 GHz

RAM

1 GB LPDDR2 SDRAM

Connectivity

2.4 GHz and 5 GHz IEEE 802.11.b/g/n/ac wireless LAN, Bluetooth 4.2, BLE

Power

5 V/2.5A DC power input

The Raspberry Pi has a Broadcom BCM2837B0 system on a chip which includes a Cortex-A53 (ARMv8) 64-bit SoC, with 1.4 GHz 64-bit quad-core processor, dual-band wireless LAN, Bluetooth 4.2/BLE, faster Ethernet, and Power-over-Ethernet support (with separate PoE HAT). Also, to facilitate the offline face detection and face verification the training of data was executed in an ordinary laptop with the following hardware and software specifications shown in Table 2.

340

A. J. A. Virata and E. D. Festijo Table 2. Hardware and software specification

Name

Configuration

Display

AMD Radeon R7 Graphics

Processor

AMD A12-9720P Radeon R7, 12 Compute Cores 4C + 8 G 2.70 GHz

RAM

8 GB (6.97 GB Usable)

System

64-bit Operating System, × 64-based processor

During the training process, most researches recommend to utilize the Graphical Processing Unit (GPU) to improve the process of training datasets. However, this has become a breakthrough in this study that improve the performance of face detection and verification without primary using the GPU but instead re-configuring the memory allocation by maximizing a page file that will help the primary storage memory to maximize the processing of training data sets. 2.2 Face Images Collection During the face images collection, the proponents administer the following: • Downloaded the images at (www.essex.ac.uk) selecting the face94 and face96 databases. • The images are stored in 24 bit RGB, JPEG format. • The constrained images have an image resolution of 180 × 200 pixels in a portrait format with green background. 2.3 Experimental Test – Set up As presented in Fig. 1 it shows the system framework of the proposed project. In the framework, it also described how the experimental test setup was conducted to validate the prediction accuracy, the timing as well as the training speed. The proponents downloaded 3000 images to be used as sampling. Out of the 3000 images collected, 80% of the images or 2400 images of the data were used to be the trained set while the remaining 20% or 600 images were used as test data. During the testing process, the proponents aim to measure the prediction accuracy, recall and the F1 score. Approximately, the time consumed in preparing the training set with 2400 constrained images was about less than three minutes using a laptop with the following specifications: AMD A12-9720P Radeon R7, 12 Compute Cores 4C + 8 G 2.70 GHz, 8 GB (6.97 GB Usable), 64-bit Operating System, × 64-based processor. 2.4 System Architecture Illustrated in Fig. 2, the system architecture of the project describes the process of identity verification. Whereby, using a mobile phone, an image will be captured to process the face

A Raspberry Pi-Based Identity Verification Through Face Recognition

341

Fig. 1. System framework

detection and face extraction in the Raspberry Pi device since the classifier/model was already installed. The classifier that processed the identity verification was established using LBP Algorithm since the use of LBP in the recent study was found successful in terms of face authentication and face detection recognition [17]. Hence, in spite of the existing challenges [18] mentioned pertaining to the implementation of face recognition in mobile application development, the proponents opt to pursue the proposed project to address the challenges on the limitation to memory allocation and computation power by adopting the offline identity verification. The offline identity verification was made possible since the process of preparing the training set were done separately using a desktop computer. 2.5 System Prototype A prototype was created to test the proposed project. As illustrated in Fig. 3, there are four menu options that can be selected from the prototype, these are (1) capture menu, where image can be collected or gathered; (2) detect menu, this will compare the image from the taken photos; (3) Identify, is an offline mode process of verifying the person’s identity; and lastly (4) person’s information menu, a database where the person’s information is registered. On the other hand, the options for home menu will help the user to easily go back to the main page/window, the settings are intended for the user to set-up easily manage the application, while the help menu provides assistance to the user which provides detailed instruction.

3 Results and Discussions Using the 3,000 downloaded (face94 and face96 image database) public controlled/constrained images, the proponents come up with different methodologies to test the algorithm being used, specifically, the local binary pattern (LBP).

342

A. J. A. Virata and E. D. Festijo

‘’’’’

Fig. 2. System architecture

Home | Settings | Help

Fig. 3. System prototype

As described through the source from the University of Essex, United Kingdom Department of Computer Science and Electronic Engineering database (www.essex. ac.uk), the images were taken within a fixed distance from the camera and the subjects were instructed to speak, while a series of camera shots were taken. In addition, the source also considered variations of the images such as including green backgrounds, no head scale, a minor variation of head turns when tilt and slant, no lighting, and including a no-hairstyle variation is required. Some of the subject who participated have beard and glasses. The collected data were used by the proponents to produce a data set and a test data set. The proponent divides the images into two subsets, where, 80% was used

A Raspberry Pi-Based Identity Verification Through Face Recognition

343

as a training set and 20% as the test sets. During the process of testing the result of accuracy, prediction and timing are a success. The challenge on timing prediction was also concluded remarkable having a less than ( ],

Structure of a blacklisted node entry in blacklist table. Where, blsrcip represents the blacklisted node IP address, detectioncount represents the total number of times node has been detected as attacker, and status represents the status of blacklisted node, i.e., set as FALSE for suspected and TRUE for permanently blocked

i = 1, . . . , N odemax

Υi ← [ < f rom, tprevious , trecent , DIOcount > ], i = 1, . . . , N odemax

Nblacklist

Structure of a node entry in neighbor table. Where, from represents the DIO sender IP address, tprevious represents the time of previous DIO receiving, trecent represents the time of most recent DIO receiving, and DIOcount represents the total number of DIO’s received from that neighbor till current time Number of blacklisted nodes

λcurrent

Current system clock time

Ψ

Flag to check if the node is present in neighbor table or not

ρ

Flag to check if the node is present in blacklist table or not

srcip

Source IP address of DIO sender node

τ

Null IP address

σ

Safe DIO interval

β

Block threshold

l

Length of the node table at that time

Active

It indicates that IDS’s detection procedure is ready to check for attackers present in neighbor table, it is set TRUE by the legitimate node after every 30 s

δ

Tuning parameter

x ˜

Median

Q1

First quartile

Q3

Third quartile

IQR

Interquartile range

Upper limit

It represents the safe threshold for the number of DIO received from a neighbor

420

A. Verma and V. Ranga

Algorithm 1. Pseudo-code of the proposed IDS 1: procedure IDS()  Checks for malicious neighbors present in neighbor table. 2: l←0  Variable to store current length of neighbor table 3: for i ← 1, Nodemax do 4: if Q[Υi .f rom] ! = τ then 5: l++ 6: end if 7: end for 8: if l > 1 then 9: sort Q on DIOcount column 10: end if 11: compute x ˜, Q1, Q3 values of Q[Υi .DIOcount ], where i = 1, . . . , l 12: IQR ← Q3 − Q1 13: Upper limit ← Q3 + (δ × IQR) 14: for i ← 1, l do 15: if Q[Υi .DIOcount ] > U pper limit then 16: if Q[Υi .trecent ] − Q[Υi .tprevious ] ≤ σ then 17: ρ ← F ALSE 18: for j ← 1, Nblacklist do 19: if Z[ℵj .blsrcip ] = Q[Υi .f rom] then 20: ρ ← T RU E 21: if Z[ℵj .detectioncount ] < β then 22: Z[ℵj .detectioncount ] ← Z[ℵj .detectioncount ] + 1 23: if Z[ℵj .detectioncount ] = β then 24: Z[ℵj .status] ← T RU E 25: Neighbor is permanently blocked 26: call remove neighbor table entry(i) procedure 27: end if 28: end if 29: end if 30: end for 31: if ρ = F ALSE then 32: k ← Nblacklist + + 33: Z[ℵk .blsrcip ] ← Q[Υi .f rom] 34: Z[ℵk .detectioncount ] ← Z[ℵk .detectioncount ] + 1 35: Z[ℵk .status] ← F ALSE 36: Neighbor is suspected to be an attacker 37: end if 38: end if 39: end if 40: end for 41: end procedure

5

Performance Evaluation

We have evaluated our proposed IDS through experiments by implementing it on a popular embedded operating system. Next subsection presents the details

Intrusion Detection in IoT

421

of our experimental setup, attacks impact on the network, and the performance results of our proposed IDS. Table 2. Simulation parameters Parameter

Values

Radio model

Multipath Ray-Tracer Medium (MRM)

Simulation area

150 m × 150 m

Simulation time

1800 s

Objective function

Minimum Rank with Hysteresis Objective Function(MRHOF)

Number of attacker nodes

4

Number of gateway nodes

1

Number of sensor nodes

16

DIO minimum interval

4s

DIO maximum interval

17.5 min

Replay interval

1, 2, 3, 4 s

Data packet size

30 bytes

Data packet sending interval 60 s Transmission power

5.1

0 dBm

Experimental Setup

The proposed IDS is implemented by modifying the ContikiRPL library of the Contiki operating system. We have evaluated the proposed IDS using the Cooja simulator. Table 2 lists the simulation parameters considered in the experiments. The Multipath Ray-Tracer Medium (MRM) radio model is used in all the experiments to simulate realistic channel. To realize the copycat attack, an attacker node is programmed to eavesdrop and capture DIO message from any legitimate node and then replays the captured message in fixed replay intervals. Random Waypoint Mobility Model is used to simulate the behavior of mobile nodes where the speed of nodes is set between 1 m/s and 2 m/s. The attacker node is programmed to launch attack after 90 s of network initialization. Similarly, the proposed is scheduled to activate after 120 s of network initialization and consistently monitors the neighbors in every 30 s. The mean values of the results obtained from 10 independent replications with errors at 95% confidence interval are reported. 5.2

Simulation Results

First, the attack impact on network performance is studied in terms of PDR and AE2ED. Then, the performance evaluation of the proposed IDS is discussed in terms of Accuracy and First Response Time (FRT).

422

5.3

A. Verma and V. Ranga

Impact of Attack on Network Performance

The performance of Static RPL (static reference model without attack), Static RPLUnder Attack (i.e. static reference model under attack), Mobile RPL (mobile reference model without attack), Mobile RPLUnder Attack (i.e. mobile reference model under attack) is evaluated and compared. In non-attack scenarios (i.e. Static RPL and Mobile RPL), the replay interval has no significance. Figure 1 shows the PDR obtained under attack with different replay intervals, i.e., 1, 2, and 3 s. It is observed that the attack severely degrades the network’s performance in both static and mobile cases. This is be confirmed from the comparison of PDR values obtained with attack and non-attack scenarios. The copycat attack has more impact on mobile network as compared with static network. Figure 2 shows AE2ED obtained with different replay intervals. It is observed that the attack increases the network latency. This is confirmed from the AE2ED values obtained in different scenarios. Besides, it is also observed that the attack does not have any significant impact on AE2ED of the static network. On the other hand, in the case of a mobile network, AE2ED significantly increases. The reason for this is the network dynamicity. 5.4

Performance Evaluation of Proposed IDS

Figure 3 shows the accuracy achieved by proposed IDS in different attack scenarios. Where, IDSStatic , IDSMobile depict the accuracy achieved in case of static and mobile network, respectively. The results indicate the effectiveness of the proposed IDS. As can be seen, the proposed IDS detects the attackers with

Fig. 1. PDR values obtained in different scenarios

Intrusion Detection in IoT

423

Fig. 2. AE2ED values obtained in different scenarios

Fig. 3. Accuracy of proposed IDS

high accuracy. The IDS achieves maximum accuracy of ≈94% in static scenario whereas ≈85% in mobile scenario. The main reason for better accuracy of IDS in the static network is network’s stability. While in the case of mobile scenario, the network dynamicity limits the detection accuracy of IDS. It is concluded that the accuracy of the proposed IDS is inversely proportional to the attacker’s replay interval. FRT of the proposed IDS is studied to analyze the responsiveness. Figure 4 illustrates FRT’s of IDS against different attackers. A1, A2, A3, and A4 represent different attackers. The proposed IDS takes lesser time

424

A. Verma and V. Ranga

to detect the attacker in a static scenario as compared to the mobile scenario. This is because of the stable network that makes it easy for the detection mechanism to quickly find the malicious neighbor present in neighbor table. The reason for delayed attacker detection in case of mobile scenario is the network dynamicity which increases the DIO transmission of legitimate nodes. Hence, it becomes very difficult for the detection mechanism to differentiate between normal and attacker neighbors present in neighbor table.

Fig. 4. FRT of proposed IDS to detect attackers with different replay intervals

6

Conclusion and Future Work

Many IoT applications are built upon LLNs due to the requirement of longer operational time. LLNs are vulnerable to different insider and outsider threats, thus users’ security and privacy come at a risk. In this paper, we have addressed a routing attack named as the copycat attack which is shown to have a major negative impact on LLNs performance. We presented an IDS to secure LLNs against copycat attacks. The experimental results show that our proposed IDS detects such attacks quickly with high accuracy in both static and mobile network scenarios. As future work, we are interested in performing testbed experiments using real LLN nodes. Acknowledgment. This research was supported by the Ministry of Human Resource Development, Government of India.

Intrusion Detection in IoT

425

References 1. Airehrour, D., Gutierrez, J.A., Ray, S.K.: SecTrust-RPL: a secure trust-aware RPL routing protocol for Internet of things. Future Gener. Comput. Syst. 93, 860–876 (2018) 2. Barnett, V., Lewis, T.: Outliers in Statistical Data. Wiley, New York (1974) 3. Ben-Gal, I.: Outlier detection. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 131–146. Springer, Boston (2005) 4. Bostani, H., Sheikhan, M.: Hybrid of anomaly-based and specification-based IDS for Internet of things using unsupervised OPF based on MapReduce approach. Comput. Commun. 98, 52–71 (2017) 5. Ghaleb, B., Al-Dubai, A., Ekonomou, E., Qasem, M., Romdhani, I., Mackenzie, L.: Addressing the DAO insider attack in RPL’s Internet of things networks. IEEE Commun. Lett. 23(1), 68–71 (2018) 6. Hui, J.W., Culler, D.E.: Extending IP to low-power, wireless personal area networks. IEEE Internet Comput. 12(4), 37–45 (2008) 7. Mayzaud, A., Badonnel, R., Chrisment, I.: Detecting version number attacks using a distributed monitoring architecture. In: Proceedings of IEEE/IFIP/In Association with ACM SIGCOMM International Conference on Network and Service Management (CNSM 2016), pp. 127–135 (2016) 8. Mayzaud, A., Badonnel, R., Chrisment, I.: A distributed monitoring strategy for detecting version number attacks in RPL-based networks. IEEE Trans. Netw. Serv. Manag. 14(2), 472–486 (2017) 9. Mayzaud, A., Sehgal, A., Badonnel, R., Chrisment, I., Sch¨ onw¨ alder, J.: A study of RPL DODAG version attacks. In: LNCS (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8508, pp. 92–104 (2014) 10. Mayzaud, A., Sehgal, A., Badonnel, R., Chrisment, I., Sch¨ onw¨ alder, J.: Using the RPL protocol for supporting passive monitoring in the Internet of things. In: Proceedings of the NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium, pp. 366–374 (2016) 11. Raoof, A., Matrawy, A., Lung, C.: Routing attacks and mitigation methods for RPL-based internet of things. IEEE Commun. Surv. Tutorials 21(2), 1582–1606 (2019) 12. Raza, S., Wallgren, L., Voigt, T.: SVELTE: real-time intrusion detection in the internet of things. Ad Hoc Networks 11(8), 2661–2674 (2013) 13. Thulasiraman, P., Wang, Y.: A lightweight trust-based security architecture for RPL in Mobile IoT Networks. In: 16th IEEE Annual Consumer Communications & Networking Conference (CCNC), pp. 1–6. IEEE (2019) 14. Tsao, T., Alexander, R., Dohler, M., Daza, V., Lozano, A., Richardson, M.: A security threat analysis for the routing protocol for low-power and lossy networks (RPLs). Technical report (2015) 15. Verma, A., Ranga, V.: Addressing flooding attacks in IPv6-based low power and lossy networks. In: IEEE Region 10 Conference (TENCON), pp. 552–557, October 2019 16. Verma, A., Ranga, V.: Mitigation of DIS flooding attacks in RPL-based 6LoWPAN networks. Trans. Emerg. Telecommun. Technol. 31, e3802 (2020) 17. Verma, A., Ranga, V.: ELNIDS: ensemble learning based network intrusion detection system for RPL based Internet of things. In: 4th International Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU), pp. 1–6. IEEE (2019)

426

A. Verma and V. Ranga

18. Verma, A., Ranga, V.: Evaluation of network intrusion detection systems for RPL based 6LoWPAN networks in IoT. Wirel. Pers. Commun. 108(3), 1571–1594 (2019) 19. Wallgren, L., Raza, S., Voigt, T.: Routing attacks and countermeasures in the RPL-based Internet of things. Int. J. Distrib. Sens. Netw. 9(8), 794326 (2013) 20. Winter, T., Thubert, P., Brandt, A., Hui, J., Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, J.P., Alexander, R.: RPL: IPv6 routing protocol for low-power and lossy networks. Technical report (2012)

On the Analysis of Semantic Denial-of-Service Attacks Affecting Smart Living Devices Joseph Bugeja(B) , Andreas Jacobsson, and Romina Spalazzese Internet of Things and People Research Center, Department of Computer Science and Media Technology, Malm¨ o University, Malm¨ o, Sweden {joseph.bugeja,andreas.jacobsson,romina.spalazzese}@mau.se

Abstract. With the interconnectedness of heterogeneous IoT devices being deployed in smart living spaces, it is imperative to assure that connected devices are resilient against Denial-of-Service (DoS) attacks. DoS attacks may cause economic damage but may also jeopardize the life of individuals, e.g., in a smart home healthcare environment since there might be situations (e.g., heart attacks), when urgent and timely actions are crucial. To achieve a better understanding of the DoS attack scenario in the ever so private home environment, we conduct a vulnerability assessment of five commercial-off-the-shelf IoT devices: a gaming console, media player, lighting system, connected TV, and IP camera, that are typically found in a smart living space. This study was conducted using an automated vulnerability scanner – Open Vulnerability Assessment System (OpenVAS) – and focuses on semantic DoS attacks. The results of the conducted experiment indicate that the majority of the tested devices are prone to DoS attacks, in particular those caused by a failure to manage exceptional conditions, leading to a total compromise of their availability. To understand the root causes for successful attacks, we analyze the payload code, identify the weaknesses exploited, and propose some mitigations that can be adopted by smart living developers and consumers. Keywords: Denial-of-Service (DoS) · Internet of Things (IoT) OpenVAS · Smart home · Security vulnerabilities

1

·

Introduction

The availability of affordable Internet-connected household devices, such as IP cameras, Wi-Fi-enabled light bulbs, and smart TVs, is stimulating the growth of smart living spaces, a typical case of which being the smart connected home. A smart connected home is a residence that uses IoT technologies, such as sensors, smart devices, and communication protocols, allowing for remote access, control, and management, typically via the Internet [12]. As much as we rely more on IoT devices in daily life, these devices are vulnerable to active cyberattacks such as Denial-of-service (DoS) [1]. DoS is typically c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 427–444, 2020. https://doi.org/10.1007/978-3-030-52246-9_32

428

J. Bugeja et al.

described as a widely used attack vector by various malicious threat agents such as hackers, hacktivists, and thieves. Indeed, traditional DoS attacks on information systems can be threats to the smart home, given Internet-connected components [10]. Such attacks may be the first step in removing a smart home component from a network to exploit a vulnerability in its disconnected failure state [10]. The impact of a DoS attack may range from a nuisance to loss of revenues to even loss of life. As an example, in 2016, a major IoT-oriented malware, i.e., Mirai [24], caused severe monetary damage by exploiting devices, mostly consumer IoT devices such as IP cameras found in homes, and converting them into a botnet. Mirai had the capabilities to perform various types of Distributed Denial of Service (DDoS) attacks – DDoS is a DoS attack that uses a high number of hosts to make the DoS attack even more disruptive [11] – like DNS, UDP, STOMP, SYN, and ACK flooding [20]. In 2019, Kaspersky Lab indicated that DDoS attacks have escalated in the IoT by 84%, and their average duration increased by 4.21 times [25]. This highlights the insecurity of current IoT devices and justifies the importance of studying DoS attacks. Research on DoS attacks tends to be primarily focused on attack detection techniques (e.g., anomaly-detection) and response mechanisms (e.g., distributed packet-filtering) [21]. To a lesser extent, fewer scholarly studies have been published that focus on the actual DoS attack scenario – crucial to determine the resilience of an IoT device. In fact, most of the available studies are made by professional penetration testers (cf. white paper “The IoT Threat Landscape and Top Smart Home Vulnerabilities in 2018” by Bitdefender [34]). The mentioned report [34] indicates that DoS is the most common vulnerability present in the smart home, followed by code execution and buffer overflow. There are two broad categories of DoS attacks: semantic and flooding attacks [30], [13], [21]. Respectively, these are also called software exploits or applicationbased attacks and brute-force attacks in scholarly literature. While in flooding attacks a victim is sent a voluminous amount of network traffic to exhaust its bandwidth or computing resources, in semantic attacks packets that exercise specific software bugs (vulnerabilities) are sent to a victim’s operating system or application. Although flooding attacks are important, in this paper we focus on semantic attacks for three main reasons: i) these attacks can be an enabler for other security and privacy threats; ii) most of the existing studies target traditional computer devices or resources, e.g., application servers, and not consumer IoT devices; and iii) arguably, while devices are in theory always prone to flooding they may not be susceptible to software exploits if their software is updated. These characteristics make semantic DoS attacks interesting to study from a scientific perspective. Specifically, we conduct an experiment on five commercial-off-the-shelf devices: a gaming console, media player, lighting system, connected TV, and IP camera. These consequently represent three different categories of smart living devices commonly found in a home – energy and resource management, entertainment systems, and security and safety. All the devices used in this paper are

On the Analysis of Semantic DoS Attacks Affecting Smart Living Devices

429

manufactured by established industry leaders. The assessment is done through vulnerability scanning. Vulnerability scanning is the process of detecting potential weaknesses on a computer, network or services. Specifically, we leverage Open Vulnerability Assessment System (OpenVAS)1 framework. To understand the root causes for successful attacks, we analyze the payload code, identify the weaknesses exploited, and propose some mitigations that can be adopted by smart living developers and consumers. The remainder of this paper is organized as follows. In Sect. 2, we provide an overview of a typical smart connected home architecture. Next, we summarize related work on DoS. The description of a DoS attack and the experiment design is elaborated on in Sect. 4. In Sect. 5, we summarize the achieved results. Subsequently, we discuss some implications of our findings and provide some guidance for mitigating such vulnerabilities in Sect. 6. Finally, in Sect. 7, we conclude and specify directions for future work.

2

Smart Connected Home Architecture

A smart connected home consists of heterogeneous devices. These typically exchange data about the state of the home, environment, and activities of residents. Commonly, the IoT devices are connected to an IoT gateway, which is in turn connected to the residential Internet router. The gateway/router is the endpoint that connects the IoT devices to the Internet Service Provider (ISP). Some connected home devices, in particular, resourceful nodes such as certain smart TVs, may also have built-in gateway functionality allowing them to connect to the Internet router and sometimes to an ISP directly. The connection between the gateway and router tends to be Ethernet or Wi-Fi based; whereas the communication between the IoT devices and the gateway usually leverages wireless protocols such as Zigbee, Z-wave, and Thread. These protocols are designed for power-efficiency making them ideal for batteryoperated devices. Users can interact with the IoT devices and manage their smart connected home devices through different platforms, most commonly through smartphones. The interaction modalities are in general two: i) directly interacting with them using the services provided by the gateway, and ii) accessing Internet cloud services that interact with the gateway and the connected IoT devices. Typically, the smart connected home relies on a cloud-based infrastructure. These two scenarios are often present simultaneously to support local and remote interactions with the IoT devices. In Fig. 1, we provide a graphical overview of the smart connected home architecture.

3

Related Work

Karig and Lee [22] classify DoS attacks into five different categories: networkdevice level, OS level, application level, data flood, and protocol feature attack. 1

https://www.openvas.org/ [accessed December 21, 2019].

430

J. Bugeja et al.

Fig. 1. Typical smart connected home architecture. The smart home devices tend to be connected to a gateway(s), typically via wireless protocols, which is in turn connected to the Internet through a broadband router. End-users commonly access the home through a mobile device, e.g., a smartphone.

This categorization is based on the attacked protocol level. The authors also provide countermeasures that mostly reflect the classification of attacks. This work is useful as a basis for understanding DoS attacks and their impact, however it falls short in elaborating on the causes of certain attack categories, e.g., application-based attacks. Mirkovic and Reiher [29] group DDoS attacks into two categories: semantic and brute-force attacks. Brute-force attacks are related to the data flood attacks in Karig and Lee [22] classification as they involve the sending of a large volume of attack packets to a target, whereas the rest are non-flooding attacks. The authors also provide a taxonomy of defense mechanisms differentiating between preventive and reactive mechanisms. While this work is relevant for comprehending DoS attacks, it is primarily focused on DDoS attacks. DDoS attacks tend to be more related to brute-force attacks and have specific attack types such as DNS, NTP, Chargen, and SSDP, which may not be as relevant to DoS. Bonguet and Bellaiche [11] classify DoS and DDoS attacks into two broad categories: overwhelm the resources and vulnerabilities. Respectively, these correspond to the brute-force and semantic attacks as described by Mirkovic and Reiher [29]. The authors present new types of DoS and DDoS attacks, in particular the XML-DoS and HTTP-DoS, that affect cloud computing. They also discuss some detection and mitigation techniques. In our case, we are mainly

On the Analysis of Semantic DoS Attacks Affecting Smart Living Devices

431

interested in investigating the causes of DoS attacks affecting devices found in smart living spaces. The Open Web Application Security Project (OWASP) [33] focuses on the type of vulnerabilities at the application level allowing a malicious user to make certain functionality or, sometimes, the entire website unavailable. They identified eight test cases, such as buffer overflows, each leading to DoS. We leverage the work of OWASP indirectly by conducting an experiment on connected home devices. In reviewing the existing work, we observe that the majority of the published work whilst providing a solid theoretical basis, it does not elaborate much on the method used for conducting a DoS attack. Specifically, we observe the shortage of such studies that test IoT devices against semantic DoS attacks aimed at the application and data processing layers. Except for a few, also most of these tend to run such tests on web applications, instead of services which may also include network and operating system-based software components. With the rise of increasingly targeted attacks and motivated attackers, we believe that semantic DoS attacks are likely to be exploited and thus are important to study. Finally, we observe that most of the mitigations proposed while generic enough, may not necessarily address certain discovered vulnerabilities. Thus, it is useful to investigate firsthand the causes of such attacks to propose more appropriate solutions.

4 4.1

Method The DoS Attack

DoS attacks attempt to exhaust or disable access to resources at the victim. These resources are either network bandwidth, computing power, or operating system data structures. Effectively, DoS attacks can target all the different protocol layers of the TCP/IP protocol stack. In the home environment, DoS can occur directly at the IoT devices, at the residential router, and at cloud endpoints [16]. Typically, web servers embedded inside IoT devices are a frequent target of attacks. In this work, we focus on semantic attacks. These attacks take advantage of specific bugs in network services that are running on a target host or by using such applications to drain the resources of their victim [22]. It is also possible that the attacker may have found points of high-algorithmic complexity and leverages them to consume all available resources on a remote host [14]. 4.2

Experiment Setup

An experiment was devised to test IoT devices for their resiliency against DoS attacks. The experiment was conducted in April 2019, and it featured smart devices that had their firmware upgraded to the latest as detailed in [3].

432

J. Bugeja et al.

The experimental platform is based on Liang et al. [26] framework which implemented a DoS attack method for IoT systems. Effectively, our experiment setup is an instance of the smart connected home architecture described in Sect. 2. Each smart device had embedded gateway functionality and was directly connected via its Wi-Fi interface to the router. The smartphone role is delegated to the PC. The router is in turn connected to the broadband modem via Ethernet. A smart connected home is typically characterised by a mix of devices, but often contains a so-called starter kit with a few core devices that are typically manufactured by one supplier [36]. Our testbed devices are chosen to reflect this; however we selected devices that were produced by different vendors to have a more generic overview of devices’ exposure to DoS attacks. A schematic illustration of the setup is shown in Fig. 2 and consists of the following components:

Fig. 2. Experiment design setup. The setup consisted of five IoT devices connected to the broadband router over Wi-Fi protocol. The attacker platform (Kali Linux) was connected to the LAN and to the different smart devices over the Wi-Fi channel. Different DoS attacks were executed through OpenVAS software.

– PC: Portable workstation that reads data from smart devices, and furnishes data to users through the help of software applications. The PC, Windows 10, had virtualization software installed; specifically, Oracle VM VirtualBox2 – a free and open-source hosted hypervisor for x86 virtualization – that is used to host the “attacker platform”. The PC had one physical network card installed. – Attacker platform: A virtual machine installed with Kali Linux3 and OpenVAS vulnerability scanner. The attacker platform was connected to the Internet in order to install OpenVAS and later to download the latest vulnerability tests for that. Also, it was connected to the Local Area Network (LAN) in order to execute DoS attacks on the smart devices. Kali Linux was configured 2 3

https://www.virtualbox.org/ [accessed December 21, 2019]. https://www.kali.org/ [accessed December 21, 2019].

On the Analysis of Semantic DoS Attacks Affecting Smart Living Devices

433

with the Network Address Translation (NAT) networking mode as a means for accessing the Internet alongside the smart devices. – Router: A networking device that forwards data packets between the connected devices and the Internet and assigns IP addresses to the PC and smart devices. In our case, the router was a Compal Router that connected the PC, smart devices, and the attacker platform in a LAN setup. – Smart devices: Five commercial-off-the-shelf IoT devices: a gaming console, media player, lighting system, connected TV, and IP camera [3]. The IP addresses for the devices were automatically assigned by the router using the Dynamic Host Control Protocol (DHCP). Smart devices process data, which are transferred to the PC via the router. In reality, the role of the PC could be that of, for instance, a smartphone application or a web page that displays processed results from the smart devices. The components and their IP addresses are summarized in Table 1. Table 1. Device types alongside their assigned IP addresses and roles. Device type

IP address

PC

192.168.0.10 Attacker host

Role

Attacker platform 192.168.0.10 Attacker Router

192.168.0.1

Local network gateway

Gaming console

192.168.0.13 Victim

Lighting system

192.168.0.4

Victim

Media player

192.168.0.9

Victim

Connected TV

192.168.0.7

Victim

IP camera

192.168.0.29 Victim

The network utility ping was used to check the connection between the PC and smart devices. This was used prior to running the vulnerability scans. 4.3

Vulnerability Scanning

Various security tools (scanners) exist that can assist in finding and analyzing security vulnerabilities. Tundis et al. [42] in their review of network vulnerabilities scanning tools, group such tools into two main categories: automatic scanning tools with publicly shared results and personal interaction-based scanning tools. Whereas in the former category tools automatically scan the Internet and render their results publicly, in the latter results are only returned to the tool operator. In our case, we rely on personal interaction-based scanning tools for ethical reasons and as the devices were not configured with a public IP address. Three personal interaction-based scanning tools that are used by security researchers, e.g., in [18], are: Nessus, Metasploit Pro, and OpenVAS. Nessus

434

J. Bugeja et al.

is a proprietary vulnerability scanner produced by Tenable Network Security. Metasploit Pro is a security scanner that also allows for the exploitation of vulnerabilities (i.e., penetration testing). Both Nessus and Metasploit Pro are commercial tools that are used by various security professionals for security compliance and assessment purposes. OpenVAS is free software; effectively a fork of Nessus; for vulnerability scanning and management. In our case, given that OpenVAS is free, it offers a comprehensive vulnerability management platform with similar features to commercial tools, and that other security researchers have used it for similar purposes to our study, we rely on it as our scanner. In the experiment, we assumed an attack model where the malicious threat agent is located inside the smart home network, having both physical and digital access to the connected devices and attacker platform. Nonetheless, we only consider semantic DoS exploits and not DoS caused by physically disabling a device. For the experiment, we configured OpenVAS on Kali Linux according to its official documentation [38]. First, we ensured that Kali Linux was updated and then installed the latest OpenVAS through the command “openvas-setup”. Once the setup was completed, the command “netstat -antp” was entered to verify that OpenVAS’ requisite network services – in particular, its manager, scanner, and the Greenbone Security Assistant Daemon (GSAD) – were open and listening. Next, the command “openvas-start” was keyed to start all the services. Once the services were successfully started, we connected to the OpenVAS web interface by pointing the web browser; in our case Mozilla Firefox; to it. Therein, we configured OpenVAS scanning to “Full and very deep ultimate” and used as input the port list “All TCP and Nmap 5.51 top 100 UDP” [19]. This allowed the scanner to test most of the smart devices’ network ports (in total 65,535 TCP ports and 99 UDP ports) for a broad range of vulnerability classes. Nonetheless, we limited the test cases to include solely DoS attacks, which at the time of the experiment, OpenVAS had 1,384 network tests for DoS. 4.4

Attack Introspection

After the scans were completed, results were displayed on the PC. For each successful attack, we inspected the attack payload, i.e., the exploit code, that resulted in the DoS attack to succeed. This was done to understand the mechanics of the attack. Online security vulnerability databases were used as a source for getting details about the exploits and their code. In doing so, the following public databases were used: SecurityFocus4 , CVE Details5 , and Vulners6 . The aforementioned databases were used in tandem with the actual test case code as executed by OpenVAS. 4 5 6

https://www.securityfocus.com/ [accessed December 21, 2019]. https://www.cvedetails.com/ [accessed December 21, 2019]. https://vulners.com/ [accessed December 21, 2019].

On the Analysis of Semantic DoS Attacks Affecting Smart Living Devices

435

Furthermore, to identify the root causes for an attack to succeed we leveraged the classification scheme employed by the National Vulnerability Database (NVD) of the National Institute of Standards and Technology7 . NVD has gained recognition from organizations such as MITRE Corporation8 and has been used by researchers for similar purposes [2] to ours. This classification is based on the causes of vulnerabilities, grouping them into eight classes: input validation error, access validation error, exception condition error handling, environmental error, configuration error, race condition error, design error, and others [2].

5

Results

5.1

Smart Living Devices Vulnerabilities

Following the execution of vulnerability scanning as described in Sect. 4.3, a total of 13 DoS-related vulnerabilities were found to affect the tested smart living devices. The device that was most prone to semantic DoS attacks was a gaming console. This had nine vulnerabilities, two of which reported as having critical severity. Critical severity indicates that the effects of exploiting the vulnerability can result in total compromise of the device. One of the discovered vulnerabilities – Linksys WRT54G DoS – was rated with the most severe score (CVSS score: 10), allowing an intruder to “freeze” the gaming console web server simply by sending empty GET requests. This leads to a total compromise of confidentiality, integrity, and availability of the system. A similar high severity (CVSS score: 9.3) vulnerability – LiteServe URL Decoding DoS – was found in an IP camera device. Here, a remote web server could simply become unavailable by parsing a URL consisting of a long invalid string of % symbols. Overall, seven of the thirteen vulnerabilities were ranked with medium severity – medium severity means that the vulnerability can reduce the performance or lead to a loss of some functionality to the targeted device – four ranked with critical severity, and two ranked as high severity. Furthermore, all discovered vulnerabilities did not require the attacker to authenticate to the victim host in order to exploit them. No DoS-related vulnerabilities were found to affect the tested lighting system and the media player. While all the conducted attacks involved semantic attacks, certain vulnerabilities, while at a minority, compromised not only the high-level application (e.g., the administration console of an embedded web server) but as well the underlying operating system (e.g., Windows), and hardware (i.e., the device’s firmware). Only one vulnerability – HTTP Windows 98 MS/DOS device names DoS – targeted the operating system software. Table 2 is a summary of discovered DoS-related vulnerabilities occurring on each category of tested smart devices. The severity follows the qualitative severity ranking scale as identified in CVSS v3.0 specification [15]. 7 8

http://www.cve.mitre.org/ [accessed December 21, 2019]. https://www.mitre.org/ [accessed December 21, 2019].

436

5.2

J. Bugeja et al.

DoS Attack Characteristics

The outcome of the attack introspection stage described in Sect. 4.4 is summarized in Fig. 3. Table 2. Summary of semantic DoS-related vulnerabilities found in five commercial smart devices. Device type

Vulnerability title

Affected component

Severity

Availability

Gaming console

Linksys WRT54G DoS

Hardware

Critical

Complete

Mongoose webserver content-length DoS

Application

High

Complete

HTTP Windows 98 MS/DOS device names DoS

Operating system

Critical

Complete

Format string on HTTP method name

Application

Medium

Complete

Webseal DoS

Application

Medium

Partial

Jigsaw web server MS/DOS device DOS

Application

Medium

Partial

HTTP unfinished line Denial

Application

Medium

Partial

mod access referer 1.0.2 NULL point dereference

Application

Medium

Partial

Mereo “GET” request remote buffer overflow vulnerability

Application

Medium

Partial

Connected TV

Mongoose “Content-Length” HTTP header remote DoS

Application

Critical

Partial

Jigsaw web server MS/DOS device DOS

Application

Medium

Partial

IP camera

LiteServe URL decoding DoS

Application

Critical

Complete

Polycom ViaVideo DoS

Hardware

High

Partial

Lighting system

/

/

/

/

Media player

/

/

/

/

From Fig. 3, we observe that most of the DoS-attacks target the high-level application and belong to the “Exception condition error handling” vulnerability class. Vulnerabilities in this class arise due to failures in responding to unexpected data or conditions.

Fig. 3. Summary of discovered vulnerability causes grouped by the affected component. The majority of DoS attacks were successful on the device’s application, typically the web server software, and were the result of a failure in managing exceptional conditions.

On the Analysis of Semantic DoS Attacks Affecting Smart Living Devices

437

Table 3. Characteristics of DoS-attacks that resulted in the compromise of IoT availability. Vulnerability type

Vulnerability title

Attack feature

Payload

Attack method

Remote

Reference

Design Error

Linksys WRT54G DoS

Empty HTTP request

Script

HTTP HEAD

Yes

[8]

HTTP unfinished line Denial

Malformed HTTP request

Script

HTTP HEAD

Yes

[5]

Mereo “GET” Request Remote Buffer Overflow Vulnerability

Large user-supplied input

Script

HTTP GET

Yes

[17]

Format string on HTTP method name

Malformed HTTP request

Script

HTTP HEAD

Yes

[4]

LiteServe URL decoding DoS

Large user-supplied input

Script

HTTP GET

Yes

[9]

Polycom ViaVideo DoS

Incomplete HTTP request

Shellcode

HTTP GET

Yes

[41]

Webseal DoS

Malformed HTTP request

Web browser

HTTP GET

Yes

[40]

Mongoose webserver content-length DoS

“ContentLength” HTTP header field

Script

HTTP GET

Yes

[37]

Mongoose “content-length” HTTP header remote DoS

“ContentLength” HTTP header field

Script

HTTP GET

Yes

[40]

Jigsaw web server MS/DOS device DOS

Resource request

Script

HTTP GET

Yes

[7]

HTTP Windows 98 MS/DOS device name

Filename in URL

Script

HTTP GET

Yes

[6]

mod access referer 1.0.2 NULL point dereference

“Referer” HTTP header field

Script

HTTP GET

Yes

[39]

Input validation error

Exception condition error handling

The rest of the vulnerabilities correspond to “Input validation error” and “Design error”. Input validation error includes vulnerabilities that fail to verify the incorrect input (boundary condition error) and read/write operations involving an invalid memory address (buffer overflows). Design error are caused by improper design of the software structure. In Table 3, we summarize the characteristics of the attacks that exploited these vulnerability classes. We observe that all of the attacks were remote exploits. Remote exploits work over a network, such as the Internet, exploiting the security vulnerability without requiring any prior access to the vulnerable system. This is in contrast to a local exploit which requires prior access to the vulnerable system. The majority of the attacks required basic programming knowledge to develop. At a minimum, this required familiarity with the workings of the HTTP protocol (e.g., HTTP methods, in particular, the GET method) and network pro-

438

J. Bugeja et al.

gramming (e.g., TCP/IP socket management). This is needed to create and send specifically crafted packets to an IoT component. Mostly, the attack payload was transferred to the connected device by manipulating the content of a legitimate HTTP header field, e.g., the “Content-Length” attribute, which specifies the length of the request body.

6 6.1

Discussion The Impact of DoS Attacks

Even though the tested device types represent only around 6% of the available device categories in a smart connected home [12], the attained results already highlight the gravity of the current situation. This is especially as these represent on average around 25% of the number of available devices in a regular smart home [34], the devices belong to the three categories of functionality with the most device types [12], and because our test subjects are manufactured by international companies with overall high-security maturity. The majority of the remaining manufacturers are IoT startups that tend to prioritize simplicity and ease-of-use over security. Due to the limited energy capacities and interconnectedness of IoT devices, the impact of DoS attacks can be severe. For instance, DoS attacks can cause battery-draining issues leading to node outages or a failure to report an emergency situation. This can happen as an example if an attacker targets an Internet-connected smoke detector, which consequently may disable the fire detection system and possibly leading to a fatality. In some cases, a successful DoS attack can also allow an attacker to lock down an entire building or access to a room, for instance, by making access to certain online authentication services, e.g., cloud service required by a smart lock, unavailable. In extreme cases, DoS attacks may lead to permanent damage to a system requiring a device replacement or re-installation of the hardware. This can happen as an example when fake data are sent to connected thermostats in an attempt to cause irreparable damage via extreme overheating. Beyond, affecting the availability of a system, DoS attacks conducted at different architecture layers can compromise other security requirements such as accountability, auditability, and privacy [31]. For instance, when devices are offline, adversaries can use that window of time to hack sensitive information or infer more information. Furthermore, when a high number of hosts are combined, as in the case of DDoS, the effects could be even more disruptive. For instance, in 2016, a DDoS attack with compromised IoT devices targeting the DNS service provider Dyn, effectively took offline sites such as GitHub, Airbnb, and Amazon [28]. Overall, this resulted in reputation damage, diminished IT productivity, and revenue losses to different stakeholders. 6.2

On the Causes of DoS Attacks

Analyzing the software weaknesses that were exploited by the successful DoS attacks, we find that improper checks for unusual or exceptional conditions are

On the Analysis of Semantic DoS Attacks Affecting Smart Living Devices

439

the root cause of such vulnerabilities. This could be indicative that: i) IoT developers are making assumptions that certain events or conditions will never occur; ii) IoT developers are reusing software libraries without performing proper security testing; iii) IoT developers are not properly trained in software security; or iv) security is not a top priority for an organization. Moreover, this raises generic concerns about the way IoT devices are being developed. Our study is similar in scope to that of Bonguet and Bellaiche [11]. However, instead of focusing on cloud computing DoS and DDoS, we focused specifically on consumer-based IoT devices. Moreover, we expanded on the semantic-based DoS attack category, which the aforementioned study classifies as “design flaws” and “software bugs” vulnerabilities, with vulnerability classes we identified firsthand. Analyzing the successful attacks, we observe the prevalence of HTTP GET DoS attacks where the application layer protocol HTTP is exploited. Interestingly, the HTTP GET DoS attacks did not have to be used repeatedly; for example by running the attack in a loop, as is required for instance, in an HTTP flood attack [27] and yet had the same consequences. This signals the dangers of these attacks and the challenges involved in detecting them. While HTTP flooding can trigger an alert about a possible intrusion since multiple HTTP requests are sent to a target device, the chance of detecting semantic DoS is relatively slim as only one request could be needed to achieve the same effect. Many IoT devices share generic components from a relatively small set of manufacturers. This means that a vulnerability in one class of IoT hardware is likely to be repeated across a vast range of products. For instance, one of the vulnerabilities we discovered with the gaming console was targeting the open source Mongoose9 web server. Mongoose is identified as GitHub’s most popular embedded web server and multi-protocol networking library. With this, it is a likely threat that other IoT platforms are prone to the same security risk. This also makes us reflect on the state of the other IoT devices available in the smart home market, and in general, about security practices being adopted by companies. Especially, since most companies develop their software by reusing existing software libraries. This is indicative that besides the functionality aspects, vulnerabilities are automatically inherited, putting the customers at risk but also the vendor’s reputation at stake. A case in point, in DefCon’22 conference [35], a popular cloud-based Wi-Fi camera was revealed as using a vulnerable version of OpenSSL library – a widely used software library for applications to secure communications – with heartbleed vulnerability. Exploiting this could allow for possible eavesdropping on seemingly encrypted communications, steal private data, and impersonate services and users. In this study, we have investigated how DoS attacks are conducted, and how exploit code violates security practices such as lack of input validation. We believe that it is relatively easy to exploit those weaknesses and potentially launch large-scale attacks without the knowledge of the owner. This is also amplified with the availability of automatic scanning tools with publicly shared 9

https://cesanta.com/ [accessed December 21, 2019].

440

J. Bugeja et al.

results, such as Shodan10 that simplify the process of discovering and exploiting Internet-connected devices. Furthermore, this is aggravated considering that some vendors offer services, oftentimes referred to as “stresser” or “booter” services, which can be used to perform, at a cost, unauthorized remote DoS attacks on Internet hosts. 6.3

Mitigating DoS Attacks

Protection against DoS and their distributed counterpart (DDoS) is a challenging task, especially for IoT architectures considering their constraints, e.g., in terms of battery, memory, and bandwidth. Limited research related solutions, e.g., [23], have been proposed for the protection of IoT against DoS attacks. However, such approaches do not focus on the application layer but are mainly dealing with network layer protection. This also concurs with reports from leading industry vendors which underscore the difficulty of defending against application attacks and simultaneously the rise of attacks in this category [32]. Hereunder, we present some approaches that can be adopted by smart living developers and end-users to prevent, detect, and react to semantic DoS attacks. Data Controller Mitigations. This represents safeguards that can be adopted by IoT device manufacturers, IoT developers, and service providers. – Authentication mechanisms. This plays a critical role in the security of any IoT device and service. It is useful for detecting and blocking unauthorized devices and services [43]. Strong authentication can be applied potentially at the home gateway, with this device often acting as the gatekeeper mediating requests between connected devices, services, and users. – Input validation. As a secure coding principle, this helps in preventing against semantic based DoS attacks. Additionally, if input validation is performed properly, including on the HTTP headers, this can also help prevent against SQL injection, script injection, and command execution attacks. – Secure architecture. IoT devices need to sustain their availability under desired levels. Possibly, a robust architecture should leverage a defense-indepth strategy, e.g., having multiple layers of controls at the device level, cloud level, and service level, and thus reducing the risk of having the entire system or stack becoming unavailable. – Secure configuration. IoT devices should be configured not to disclose information about the internal network, server software, and plug-ins/modules installed (e.g., banner information). Primarily, this is important as otherwise such information may get indexed and picked by online scanners which could then be used to conduct attacks. – Security testing: Code should be inspected for vulnerabilities before it gets released to consumers. Here, software auditing and penetration testing could be used, e.g., to detect test interfaces and weak configurations that could lead 10

https://www.shodan.io/ [accessed December 21, 2019].

On the Analysis of Semantic DoS Attacks Affecting Smart Living Devices

441

to compromise. Furthermore, a company may offer incentives, e.g., through bug bounty programs, especially to help discover zero-day vulnerabilities. At the same time, it is also key for vendors to release updates, possibly on a cyclical basis, to improve the security of their product. Consumer Mitigations. This represents controls that can be adopted by endusers, in particular by the IoT device users. – Filtering. Filtering techniques, e.g., ingress/egress filtering or history-based filtering, to prevent unauthorized network traffic from entering into a protected network [27]. Filtering can be applied to residential routers and can also be used as a strategy to respond to DoS attacks. – Intrusion prevention/detection system. Intrusion prevention/detection mechanisms, such as the signature-based detection and anomaly-based detection, can be used to proactively block malicious traffic and threats from reaching IoT devices. This system could be a separate physical device connected to the residential Internet router. – Secure configurations. Operating system and server vendor-specific security configurations should be implemented where appropriate, including the disabling or removal of unnecessary users, groups, and default accounts. – Secure network services. To prevent unauthorized users from connecting to IoT devices and implanting an attack, remote access options (e.g., Telnet or SSH) to the router and other network devices, that may have it enabled for remote administration, should be disabled, or otherwise securely configured. – Secure overlay. This method involves the creation of an overlay network, typically through a firewall, on top of the IP network. This overlay network then acts as the entry point for the outside network ensuring that only trusted traffic can get entry to the protected network. – Security patches. IoT devices should be kept updated with the latest security patches as issued from the vendor regularly to ensure that the system is not affected by malware. When updates are not available some possible alternatives are: to put another control, e.g., a perimeter firewall or intrusion prevention/detection system in front of the vulnerable device; changing the IP address of the affected device; disabling the compromised feature; or replacing the hardware with a newer release. Beyond the data controller and consumer-based mitigations, we also see the need for three other requirements that must be met to ensure the overall security and resiliency of IoT devices. First, more stringent regulations and potentially certification programs are needed for IoT device manufacturers. Second, the early integration of security from the design stage and to enforce a risk management strategy potentially as a joint effort of legislators, security experts, and manufacturers. Third, recognizing that classical security solutions are challenging to port to the IoT domain, it is crucial to increase security awareness among consumers. This could, for instance, be done through government initiatives, but also manufacturers can educate consumers about security.

442

7

J. Bugeja et al.

Conclusion and Future Work

The growth and heterogeneity of connected devices being deployed in smart living spaces, in particular, inside homes, raises the importance of an assessment of their security. In this paper, we conducted a vulnerability assessment focusing on the availability of Internet-connected devices. The experiment was carried out using OpenVAS and it featured five commercial-off-the-shelf IoT devices: a gaming console, media player, lighting system, connected TV, and IP camera. The attained results indicate that the majority of the tested devices are prone to severe forms of semantic DoS attacks. Exploiting these attacks may lead to a complete compromise of the security of the entire smart living system. This indicates the gravity of the current situation serving as a catalyst to raise awareness and stimulate further discussion of DoS related issues within the IoT community. Furthermore, to understand the root causes for successful attacks, we analyzed the payload code, profiled the attacks, and proposed some mitigations that can be adopted by smart living developers and consumers. As part of future work, we intend to generalize this study in three areas. First, we plan to include a broader selection of devices, including routers. Routers tend to be one of the most vulnerable components that a successful attack can leverage to potentially disable legitimate access to the entire smart connected home. Second, we aim to consider an attack model where the malicious threat agent is located remotely behind a cloud or service provider infrastructure. Finally, we plan to research methods that can proactively allow for the detection of DoS attacks. Possibly, this will involve the use of machine learning to learn a baseline security profile for each device. Acknowledgments. This work has been carried out within the research profile “Internet of Things and People,” funded by the Knowledge Foundation and Malm¨ o University in collaboration with 10 industrial partners.

References 1. Alanazi, S., Al-Muhtadi, J., Derhab, A., Saleem, K., AlRomi, A.N., Alholaibah, H.S., Rodrigues, J.J.: On resilience of wireless mesh routing protocol against DoS attacks in IoT-based ambient assisted living applications. In: 17th International Conference on E-health Networking, Application & Services (HealthCom), pp. 205– 210. IEEE (2015) 2. Alhazmi, O.H., Woo, S.-W., Malaiya, Y.K.: Security vulnerability categories in major software systems. Commun. Netw. Inf. Secur. 2006, 138–143 (2006) 3. Andersson, S., Josefsson, O.: On the assessment of denial of service vulnerabilities affecting smart home systems (2019) 4. Arboi, M.: Format string on http method name. https://vulners.com/openvas/ OPENVAS:11801 5. Arboi, M.: Http unfinished line denial. https://vulners.com/openvas/OPENVAS: 136141256231011171

On the Analysis of Semantic DoS Attacks Affecting Smart Living Devices

443

6. Arboi, M.: Http windows 98 MS/DOS device names DOS. https://vulners.com/ openvas/OPENVAS:136141256231010930 7. Arboi, M.: Jigsaw webserver MS/DOS device DOS. https://vulners.com/openvas/ OPENVAS:11047 8. Arboi, M.: Linksys WRT54G DOS. https://vulners.com/openvas/OPENVAS: 136141256231011941 9. Arboi, M.: LiteServe URL decoding DOS. https://vulners.com/openvas/ OPENVAS:11155 10. Barnard-Wills, D., Marinos, L., Portesi, S.: Threat landscape and good practice guide for smart home and converged media. In: European Union Agency for Network and Information Security (ENISA) (2014) 11. Bonguet, A., Bellaiche, M.: A survey of denial-of-service and distributed denial of service attacks and defenses in cloud computing. Future Internet 9(3), 43 (2017) 12. Bugeja, J., Davidsson, P., Jacobsson, A.: Functional classification and quantitative analysis of smart connected home devices. In: Global Internet of Things Summit (GIoTS), pp. 1–6. IEEE (2018) 13. Carl, G., Kesidis, G., Brooks, R.R., Rai, R.: Denial-of-service attack-detection techniques. IEEE Internet Comput. 10(1), 82–89 (2006) 14. Douligeris, C., Mitrokotsa, A.: DDoS attacks and defense mechanisms: classification and state-of-the-art. Comput. Netw. 44(5), 643–666 (2004) 15. FIRST: Cvss v3.1 specification document. https://www.first.org/cvss/ specification-document 16. Geneiatakis, D., Kounelis, I., Neisse, R., Nai-Fovino, I., Steri, G., Baldini, G.: Security and privacy issues for an IoT based smart home. In: 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) 17. GmbH, G.N.: Mereo ‘get’ request remote buffer overflow vulnerability. https:// vulners.com/openvas/OPENVAS:100776 18. Gordin, I., Graur, A., Potorac, A., Balan, D.: Security assessment of OpenStack cloud using outside and inside software tools. In: International Conference on Development and Application Systems (DAS), pp. 170–174. IEEE (2018) 19. Greenbone.net: 16. performance—greenbone security manager (gsm) 4 documentation. https://docs.greenbone.net/GSM-Manual/gos-4/en/performance.html# about-ports 20. Herzberg, B., Bekerman, D., Zeifman, I.: Breaking down mirai: An IoT DDoS botnet analysis. Incapsula Blog, Bots and DDoS, Security (2016) 21. Hussain, A., Heidemann, J., Heidemann, J., Papadopoulos, C.: A framework for classifying denial of service attacks. In: Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 99–110. ACM (2003) 22. Karig, D., Lee, R.: Remote denial of service attacks and countermeasures. Princeton University Department of Electrical Engineering, Technical report CE-L2001002, 17 (2001) 23. Kasinathan, P., Pastrone, C., Spirito, M.A., Vinkovits, M.: Denial-of-service detection in 6LoWPAN based internet of things. In: IEEE 9th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 600–607. IEEE (2013) 24. Kolias, C., Kambourakis, G., Stavrou, A., Voas, J.: DDoS in the IoT: Mirai and other botnets. Computer 50(7), 80–84 (2017) 25. Kupreev, A.G.O., Badovskaya, E.: Ddos attacks in q1 2019—securelist. https:// securelist.com/ddos-report-q1-2019/90792/

444

J. Bugeja et al.

26. Liang, L., Zheng, K., Sheng, Q., Huang, X.: A denial of service attack method for an IoT system. In: 8th International Conference on Information Technology in Medicine and Education (ITME), pp. 360–364. IEEE (2016) 27. Mahjabin, T., Xiao, Y., Sun, G., Jiang, W.: A survey of distributed denial-ofservice attack, prevention, and mitigation techniques. Int. J. Distrib. Sens. Netw. 13(12), 1550147717741463 (2017) 28. Mansfield-Devine, S.: DDoS goes mainstream: how headline-grabbing attacks could make this threat an organisation’s biggest nightmare. Netw. Secur. 2016(11), 7–13 (2016) 29. Mirkovic, J., Reiher, P.: A taxonomy of DDoS attack and DDoS defense mechanisms. ACM SIGCOMM Comput. Commun. Rev. 34(2), 39–53 (2004) 30. Moore, D., Shannon, C., Brown, D.J., Voelker, G.M., Savage, S.: Inferring internet denial-of-service activity. ACM Trans. Comput. Syst. (TOCS) 24(2), 115–139 (2006) 31. Mosenia, A., Jha, N.K.: A comprehensive study of security of internet-of-things. IEEE Trans. Emerg. Top. Comput. 5(4), 586–602 (2016) 32. Muncaster, P.: DDoS attacks jump 18% YoY in Q2—infosecurity magazine. https://www.infosecurity-magazine.com/news/ddos-attacks-jump-18-yoy-in-q2/ 33. OWASP: OWASP testing guide. https://www.owasp.org/images/5/56/OWASP Testing Guide v3.pdf 34. Pascu, L.: The IoT threat landscape and top smart home vulnerabilities in 2018. https://www.bitdefender.com/files/News/CaseStudies/study/229/BitdefenderWhitepaper-The-IoT-Threat-Landscape-and-Top-Smart-Home-Vulnerabilitiesin-2018.pdf 35. Patrick Wardle, C.M.: Optical surgery; implanting a dropcam. https://www. defcon.org/images/defcon-22/dc-22-presentations/Moore-Wardle/DEFCON-22Colby-Moore-Patrick-Wardle-Synack-DropCam-Updated.pdf 36. P˘ atru, I.-I., Caraba¸s, M., B˘ arbulescu, M., Gheorghe, L.: Smart home IoT system. In: 15th RoEduNet Conference: Networking in Education and Research, pp. 1–6. IEEE (2016) 37. SecPod: Mongoose webserver content-length denial of service vulnerability. https://vulners.com/openvas/OPENVAS:1361412562310900268 38. Security, O.: Openvas 8.0 vulnerability scanning—kali linux. https://www.kali.org/ penetration-testing/openvas-vulnerability-scanning 39. SecurityFocus: Apache mod access referer null pointer dereference denial of service vulnerability. https://www.securityfocus.com/bid/7375/exploit 40. SecurityFocus: IBM Tivoli policy director WebSeal denial of service vulnerability. https://www.securityfocus.com/bid/3685/exploit 41. SecurityFocus: Polycom ViaVideo denial of service vulnerability. https://www. securityfocus.com/bid/5962/exploit 42. Tundis, A., Mazurczyk, W., M¨ uhlh¨ auser, M.: A review of network vulnerabilities scanning tools: types, capabilities and functioning. In: Proceedings of the 13th International Conference on Availability, Reliability and Security, p. 65. ACM (2018) 43. Yoon, S., Park, H., Yoo, H.S.: Security issues on smarthome in IoT environment. In: Park, J., Stojmenovic, I., Jeong, H., Yi, G. (eds.) Computer Science and Its Applications, pp. 691–696. Springer, Heidelberg (2015)

Energy Efficient Channel Coding Technique for Narrowband Internet of Things Emmanuel Migabo1,2(B) , Karim Djouani1,2 , and Anish Kurien1 1

2

Tshwane University of Technology (TUT), Staatsartillerie Road, Pretoria 0001, South Africa [email protected] Universit´e de Paris-Est Cret´eil, Laboratoire Images, Signaux et Syst`emes Intelligents (LISSI), Vitry sur Seine, 94400 Paris, France

Abstract. Most of the existing Narrowband Internet of Things (NB-IoT) channel coding techniques are based on repeating transmission data and control signals as a way to enhance the network’s reliability and therefore, enable long distance transmissions. However, most of these efforts are made to the expense of reducing the energy consumption of the NB-IoT network and do not always consider the channel conditions. Therefore, this work proposes a novel NB-IoT Energy Efficient Adaptive Channel Coding (EEACC) scheme. The EEACC approach is a two-dimensional (2D) approach which not only, selects an appropriate channel coding scheme based on the estimated channel conditions; but also minimizes the transmission repetition number under a pre-assessed probability of successful transmission. It is aimed at enhancing the energy efficiency of the network by dynamically selecting the appropriate Modulation Coding Scheme (MCS) number and efficiently minimizing the transmission repetition number. Link-level simulations are performed under different channel conditions (good, medium or bad) considerations in order to assess the performance of the proposed up-link adaptation technique for NB-IoT. The obtained results demonstrate that the proposed technique outperforms the existing Narrowband Link Adaptation (NBLA) as well as the traditional repetition schemes, in terms of the achieved energy efficiency as well as reliability, latency and network scalability. Keywords: Link adaptation · Adaptive · Energy efficiency · Data rates · Throughput · Modulation Coding Scheme (MCS) · Repetition number · Narrowband IoT (NB-IoT)

1

Introduction

The Low Power Area Networks (LPWAN) are very promising technologies of the Internet of Things (IoT) for future wireless communications. In the recent decade, the LPWANs have rapidly developed and they have, therefore, been catching significant attention of many researchers around the globe. A study by c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 445–466, 2020. https://doi.org/10.1007/978-3-030-52246-9_33

446

E. Migabo et al.

[2] has identified the Narrowband Internet of Things (NB-IoT) as well as the Long Range (LoRa) as the two most leading LPWAN technologies within the licensed and the unlicensed bands respectively; towards enabling the future of the Internet of Things. However, due to the observed rapid growth of in terms of number of connected IoT devices, there are couple of issues that have been identified among which the energy consumption of the network which is directly related to its lifetime [3]. Narrowband Internet of Things (NB-IoT) is a new narrow-band radio technology introduced in the Third Generation Partnership Project release 13 to the 5th generation evolution for providing low-power wide-area IoT [9]. Techniques regarding the performance enhancement of NB-IoT wireless communication systems are being studied in current research. Specifically, in [4], the authors presented a systematical review about IoT which includes different definitions, key technologies, open issues and major challenges of it. Furthermore, in [5], the authors provided a systematical survey regarding NB-IoT in industry for the first time. It reviewed extensive researches, key enabling technologies, major NB-IoT applications of IoT in industry, and identified research trends and challenges. At a recent plenary meeting in South Korea, the Third Generation Partnership Project (3GPP) completed the standardization of NB-IoT [5], in which NBIoT is regarded to be a very important technology and a large step for 5G IoT evolution. Industries, including Ericsson, Nokia, and Huawei, have shown great interests in NB-IoT as part of 5G systems and have focused significant effort towards standardization. In 3GPP standardization, repeating transmission data and the associated control signaling several times has been utilized as a base solution to achieve coverage enhancement for NB-IoT [1]. However, repeated transmissions come with a multiplicative energy cost as it has been clearly demonstrated by several wireless networks energy consumption modelling studies such as [6] and [7] that the major contributors to the overall average energy consumption of a wireless network are the energy consumed during transmission as well as the energy consumed during reception according to Eq. 1, ET rans =

(PT x + PRx ) × M B × Pst

(1)

where, – Etrans is the energy consumed by the transceiver circuit on single wireless link, – PT x is the transmission Power, – PRx is the reception power, – B is the transmission bit rate and – Pst is the probability of successful transmission. This high energy consumption is what this article is mitigating by proposing of an adaptive energy-efficient channel coding approach.

Energy Efficient Channel Coding Technique for NB-IoT

447

It is also very important to note that, on one hand, the choice of different Modulation Coding Scheme (MCS) levels considerably influence the overall network performance of the NB-IoT system in terms of its reliability, its energy efficiency, its scalability and latency. In fact, the use of low MCS coupled with high transmitting power has demonstrated to be capable of improving the transmission reliability and therefore, enhance the network coverage in terms of longer transmission distance and immunity to noise. However, this results in reducing the network’s throughput and causes the overall energy consumption of the network to be significantly high because although the number of repeated re-transmissions is significantly cut-down, the transceiver’s energy consumption as modelled by Eq. 2 remains quite high. This model is based on the fact that the NB-IoT receiving node has got three operation states. It is either in synchronization mode or in active mode (handling of the received packets) or else in Idle mode. Most of its energy consumption normally occurs in its active state, then medium energy consumption occurs in its synchronization state and lower amount is consumed during its Idle state [11]. ERx = KPRx tsynch +

K 

Psleep tksleep + PIdle tK active

(2)

k=1

where, – tsynch is the synchronization time, – K is the number of cycles or iterations involved in the synchronization process for connection to be effectively established, – tksleep is the time spent in the sleep state in a reception cycle k, – tK active is the total active time during all the K cycles – PRx , Psleep and PIdle are the power value for the receiving, sleeping and idle states, respectively. On the other hand, according to 3GPP Release 13, repeating transmission data or control signals has been selected as a promising approach to enhance coverage of NB-IoT systems, since more repetition number will enhance the transmission reliability, but it results in quite significant spectral efficiency loss [12]. Thus, the present work proposes a 2-dimensional channel coding ad link adaptation scheme capable of providing a trade-off between the transmit reliability and the throughput of system by selecting the suitable MCS on one hand and the appropriate transmission repetition number on the other hand. This work is proposed based on the fact that it has been identified that most existing link adaptation schemes found in literature are solely focused on the selection of a suitable MCS value without consideration of the repetition number which as demonstrated in the paragraphs above is proven to be a key player towards addressing the energy efficiency of the NB-IoT network.

2

Methods/Experimental Approach

In this paper, we focus on analyzing key technologies in uplink scheduling and designing an uplink link adaptation scheme for NB-IoT systems. The remainder

448

E. Migabo et al.

of the article is organized as follows: First, in Sect. 3, the article provides a comprehensive literature survey that concisely and critically discusses the different existing channel coding and MCS selection approaches found through the literature. Then, in Sect. 4, the proposed algorithm is presented, modelled and discussed. This section is then followed by a detailed description of the methods used for validation of the proposed approach in Sect. 5. In this section, a comprehensive table of key simulation parameters is provided and the different design considerations and assumptions are presented and justified. Then, in Sect. 5, the obtained results are presented and critically analysed and discussed. In this Sect. 5, we perform various simulation results using MATLAB to demonstrate the effectiveness of the proposed scheme, and furthermore,the existing approaches are compared under the same setting to show the superiority of the proposed EEACC scheme as compared to the traditional repetition-based as well as the NBLA schemes. Finally, in Sect. 6, conclusions are drawn based on the research objectives and a recommendation for future work is formulated.

3

Related Work

Low-Power Wide-Area Network (LPWAN) technologies both in the licensed and in the unlicensed bands, such as the NB-IoT, Long Range (LoRa) and many others are striving to become energy efficient over very long distances [13]. NB-IoT is designed for long-life devices and targets a battery lifetime of more than 10 years. To this end, the careful design of smart channel coding schemes, has been identified as a potential approach towards enhancing NB-IoT energy efficiency. Channel coding is one of the most important aspects in digital communication systems, which can be considered as the main difference between analog and digital systems making error detection and correction possible [14]. In its current form, the NB-IoT reuses the Long Term Evolution (LTE) design extensively, including the numerologies, downlink orthogonal frequency-division multiple-access (OFDMA), uplink single-carrier frequency division multipleaccess (SC-FDMA), channel coding, rate matching, interleaving, etc. To the best of our knowledge, the only reason for this extensive reuse of the LTE channel coding was to significantly reduce the time required to develop full NB-IoT specifications [15]. However, there are issues very specific to the NB-IoT network designs among which the limited energy capacity issue. Therefore, researchers [16,17] have identified a very crucial need to develop novel channel coding techniques very specific to the NB-IoT with different design goals. Our research work as introduced by the present article has identified the energy efficiency issue as a potential research problem. However, before us, other researchers have looked into this problem from different perspectives and their different approaches are concisely reported in the next paragraphs.

Energy Efficient Channel Coding Technique for NB-IoT

3.1

449

Why is Energy Efficient Channel Coding Important for NB-IoT?

One of the most important issues in the design of NB-IoT systems is error correction because if well designed, the channel coding technique for NB-IoT can help save considerable amount of energy by significantly reducing the number of required re-transmissions. This justifies the fact that a good number of research have proposed channel coding techniques with the objective of achieving energy efficiency. In its current form, the NB-IoT uplink baseband processing can be divided into channel coding and modulation. In the case of NB-IoT uplink the channel coding includes Cyclic Redundancy Check (CRC) generation and attachment, turbo or convolutional coding and rate matching. Reliability is a key performance criterion in any form of wireless communication including the NB-IoT. It is therefore very crucial that any NB-IoT design considers ensuring that end to end communication reliably happens between terminals. Like in most of engineering designs, the strive to enhance one performance often comes with associated cost in terms of the other. Channel coding is often used in most communication systems to ensure resilience to channel impairments and ensure reliable link transmissions. Therefore, research work before the present one have proposed couple of channel coding approaches to ensure that reliability. The main identified approaches have been the traditional repetition approach and the Narrowband Link Adaptation (NBLA) approach [9]. Despite their strive to enhance communication reliability, the later have been identified to suffer significant energy costs and that is what the present research work addresses. It strives to find a balance between ensuring reliable communication on NB-IoT uplinks while maintaining energy efficiency. 3.2

Existing NB-IoT Energy Efficient Channel Coding (CC) Approaches

From the survey of the literature, the following main approaches have been selected to be the most relevant and recent works, 1. Automatic Repeat Request (ARQ) Approaches In this category of approaches, the receiver requests re-transmission of data packets, if errors are detected, using some error detection mechanism. Sami Tabbane in [16] proposes an open loop forward error correction technique for NB-IoT networks with the objective to optimize ARQ signaling. In this approach, signaling only needs to indicate the DL data transfer completion and does not have to be specific on which particular Packet Data Units (PDU) are lost during the transmission. This allows to reduce the simplicity of the channel coding approach and, therefore, allows to save on computational energy consumption. This approaches has demonstrated to be efficient in enhancing the data rate performance in the Downlink (DL) of the NB-IoT network. Due to its low complexity level, this method has further proven to also enhance the energy efficiency of the NB-IoT network as it considerably reduces the computational energy consumption, due to data reception, on the transceiver of the IoT node. One approach proposed

450

E. Migabo et al.

in [17] consists of using an hybrid automatic repeat request (HARQ) process in scenarios where the NB-IoT network can only support half duplex operations. The HARQ approach has demonstrated to be capable of reducing the processing time at the IoT node. The obtained results in [17] are able to prove that the use of the HARQ approach can lead to saving up to 20% on the overall energy consumption of the network. This energy efficiency performance has also been proven not to be significantly affected by the increase in the scalability of the NB-IoT network when the HARQ approach is used. Authors in [18] propose an hybrid channel coding approach. It consists of signaling hybrid automatic repeat request (HARQ) acknowledgements for narrowband physical downlink shared channel (NPDSCH) and uses a repetition code for error correction. In this case, the user Equipment (UE) can be allocated with 12, 6 or 3 tones. However, the 6 and 3 tone formats are introduced for NB-IoT devices that, due to coverage limitations, cannot benefit from the higher device bandwidth allocation and results in higher energy consumption performance. 2. Forward Error Correction (FEC) Approaches Recent research ([19–21]), published from 2011 and the first quarter of 2012 to date, has investigated the efficiency of different re-transmission and FEC techniques in NB-IoT systems. Several researchers have quantified the effect of a number of network parameters on the efficiency of error correction techniques (and their associated network costs). However, no effort has yet been made to unify these studies into a systematic approach that could help with the selection of the most effective technique given certain network conditions. The authors in [22] have proposed an improved error correction algorithm for multicast over the LTE network and, by extension, over the Narrowband IoT network. The used model assumes a random distribution of packet losses and a constant loss rate in each scenario. The model can be expanded to include different error distributions and varying loss conditions during a series of NB-IoT downlink transmissions. The obtained results demonstrate that the use of an hybrid approach (HARQ and FEC combined), outperforms both the HARQ method used alone as well as the FEC approach used alone in terms of energy efficiency. A research study by [23] that has proposed the use of open loop forward error correction technique as a mechanism to not only enhance the energy efficiency of the NB-IoT network, but also to concurrently achieve efficient down link data rate performance. The benefit of this approach lies in the fact that it enables extremely reliable firmware downloads which is an important IoT feature in number of applications among which sensor network applications. Another Forward Error Correction channel coding approach for Narrowband IoT as proposed by [25] has been specifically designed to reduce the number of re-transmission attempts. This is mainly because, it has been identified and demonstrated by [28], [29] and [30] that most energy consumption in Internet of Things and Wireless Sensor Networks (WSNs) is consumed through the transmission and the reception phases.

Energy Efficient Channel Coding Technique for NB-IoT

451

Another quite unique Forward Error Correction approach, which is based on algebraic-geometric theory, compares the BER performance of algebraicgeometric codes and Reed-Solomon codes at different modulations schemes over additive white Gaussian noise [26]. In this approach it is found there is gain in terms of BER performance improvement at the cost of high system complexity when algebraic-geometric codes and Chase-Pyndiah’s algorithm [27] are used in conjunction. Two main channel coding and uplink link adaptation schemes have been found in our literature survey. The first one is what we would term the MCSdominated approach, in which we first adjust the MCS level based on feedback signals and then adjust the repetition number. The second one is the repetitiondominated approach in which the focus is first on determining the appropriate repetition number based on the predefined NB-IoT network design criteria and then only focus on selecting the MCS level using the currently determined repetition number as part of the decision criteria. Apart form these two dominating approaches, there exists other approaches found in literature such as the cooperative approaches [24] in which the impact of uplink interference on resources (energy, spectral efficiency etc.) utilization efficiency is each time investigated prior to making transmission decisions by exploiting the cooperation among base stations which needs to be already designed. 3.3

Efficient Selection of Modulation Coding Scheme (MCS)

The design of a energy efficient channel coding scheme for NB-IoT is directly linked to the selection of an appropriate MCS. In order to achieve long range communication, some work on efficient NB-IoT designs found in literature [9,31,35,36] have proposed efficient techniques for modulation scheme selection. The common idea behind most proposed approaches consists of trading off high data rate for higher energy in each transmitted bit (or symbol) at the physical layer (PHY). This design technique allows for a signal that is more immune and that can travel longer transmission distances. Therefore, in general the identified aim of most designs is to achieve a link budget of 150 ± 10 dB which can translate into a few kilometers and tens of kilometers in urban and rural areas, respectively [9]. Encoding more energy into signal’s bits (or symbols) results in very high decoding reliability on the receiver side. Typical receiver sensitivities could, therefore, be as low as −130 dBm. Modulation techniques used for most LPWAN technologies can be classified into two main categories, namely the narrowband technique and the spread spectrum technique. Spread spectrum techniques spread a narrowband signal over a wider frequency band but with the same power density. The actual transmission is a noise-like signal that is harder to detect by an eavesdropper, more resilient to interference, and robust to jamming attacks (secure) [37]. As opposed to other LPWAN technologies such as the LTE Cat-M1, which mainly use spread spectrum modulation techniques, most work on NB-IoT designs found in literature [9,37] and [35], propose to use Narrowband modulation techniques. In general, Narrowband modulation techniques provide high

452

E. Migabo et al.

link budget often less than 25 KHz. They are very efficient at frequency spectrum sharing between multiple links and they experience very small noise level experienced within each individual narrow band. In order to further reduce the experienced noise, some LPWAN technologies, such as SIGFOX, WEIGHTLESS-N and TELENSA, use ultra narrow band (UNB) of width as short as 100 Hz [35]. They are, therefore, susceptible of achieving longer transmission ranges. One of the major differences between narrowband modulation techniques and spread spectrum techniques remains that spread spectrum techniques often require more processing gain on the receiver side to decode the received signal(below the noise floor) while no processing gain through frequency despreading is required to decode the signal at the receiver for the case of narrowband modulation techniques, resulting in simpler and less expensive transceiver designs. Different variants of spread spectrum techniques such as Chirp Spread Spectrum (CSS) and Direct Sequence Spread Spectrum (DSSS) are used by existing standards LPWA technologies. 3.4

Repetition-Dominated Channel Coding Approaches

In the repetition-dominated method, based on the feedback ACK/NACKs, we first adjust the repetition number and then update the MCS level. Repetition is the key solution adopted by most NB-IoT designs with the objective to achieve enhanced coverage with low complexity. On the other hand, for one complete transmission, repetition of the transmission is required to be applied to both data transmission and the associated control signaling transmission. Therefore, in most NB-IoT systems, before each Narrowband Physical Uplink Shared Channel (NPUSCH) transmission, the corresponding control signal data of which includes the Resource Unit (RU) number, the chosen MCS and repetition numbers are required to be the sent via the Narrowband Physical Downlink COntrol Channel (NPDCCH) [9]. The sequence of transmission with repetition during a single transmission is illustrated as follows,

Frequency

NPDCCH RepeƟƟon

1 ms

Max {3ms, Ɵme unƟl the beginning of the next search space} NPDCCH

Time {8, 16, 32,64} ms as indicated in DCI

NPDCCH RepeƟƟon

Fig. 1. Illustration of data repetition during a single transmission

Figure 1 clearly illustrates the repetition in NB-IoT, where both the NPDCCH and the NPUSCH transmission blocks made of the same content as clearly highlighted by using the same color, are repeated four times in the duration of a single transmission. It is also very important to point out that according to

Energy Efficient Channel Coding Technique for NB-IoT

453

the 3GPP TS 36.211 standard [32], the repetition number for the same block for NB-IoT can only be selected among 1, 2, 4, 8, 16, 32, 64 or 128. 3.5

The NBLA and Its Open-Loop Power Control Approach

The NB-IoT link Adaptation (NBLA) approach is a mainly focused on a inner loop link adaptation aspect mainly focused on addressing the issue of rapid changes that are often observed with the transmission Block Error Ratio (BLER) in NB-IoT systems. Therefore, the NBLA as proposed by [9]. The NBLA approach works in the following manner, during the duration of a single period T , all transmission ACK/NACKs are computed in order to work out an estimated value of the BLER. Based on the obtained BLER value, the transmission repetition number is adjusted accordingly in order to cope with the variation in the channel’s condition and ensure less probability of failed transmission and this way save on energy consumption of the NB-IoT network. The main challenge faced by the NBLA approach resides in the fact that, despite its effort in estimating the channel condition based on the computed BLER value; because of the very reduced (narrow) bandwidth and considerably quite unstable channel conditions of the NB-IoT systems, the NBLA power control strategy often fails to ensure reliable uplink transmissions. This results in significant energy wastage due to repeated unsuccessful transmissions. The development of new and more energy efficient approach capable of adaptively addressing the variations of channel conditions by looking at more than one dimensional aspect of previous BLER performance but also considering the MCS level in selecting an appropriate repetition number is highly needed. This is expected to contribute towards adequate energy management within NB-IoT systems. 3.6

NBLA Open-Loop Power Control

In the uplink scenario, the NB-IoT network normally only supports an openlink power control as stated in [32]. The reason for this open loop power control exclusivity is mainly motivated by limited energy and processing capacity of most NB-IoT nodes such as sensor nodes, etc. which most of the times run on batteries. The manner in which this open loop power control is implemented within the NB-IoT nodes is that, based on the MCS and RU information alone, the NB-IoT node works out an estimate of the required power necessary to achieve an uplink transmission. This means that the Base Station (BS) (eNB in the case of LTE) does not send any form of power control command (information) to the NB-IoT prior to the uplink transmission. According to research studies carried out by [32] and [34], the transmit power (PN P U SCH,c (i)) required by a NB-IoT node within a Narrowband Physical Uplink Shared channel (NPUSCH) during an uplink session within an given uplink slot i for serving a cell c, given that the number of repetitions of the allocated NPUSCH RUs is less than 2, can be modelled as follows, PN P U SCH,c (i) = min{PCM AX,c (i), 10 × log10 × (MN P U SCH,c (i)) + PO N P U SCH,c (j) + αc (j)P Lc } (3)

454

E. Migabo et al.

where, – PCM AX,c (i) is configured NB-IoT node uplink transmit power in slot i for serving cell c, – MN P U SCH,c (i) possible values are { 14 , 1, 3, 6 or 12} as defined in [34], – PO N P U SCH,c (j) is a parameter composed of the sum of two components from the higher layers within the NPUSCH data re-transmission channel model, – P Lc is the downlink path loss estimate calculated in the UE for serving cell c and – αc (j) is a coefficient configured by higher layers based on the estimated total loss over the link. Should the number of repetitions of the allocated NPUSCH RUs be higher or equal to 2, then, the PN P U SCH,c (i) can simply be modelled as, PN P U SCH,c (i) = PCM AX,c (i)

4

(4)

The Proposed Adaptive Channel Coding Technique

The objective of the EEACC approach is to design an appropriate link adaptation scheme integrated with a proper selection mechanism of repetition number and MCS for the NB-IoT systems solely based on the aim of achieving more energy efficiency, long transmission distance and high throughput while maintaining good transmission reliability. The channel coding approach proposed by the present study is a 2dimensional (2D) link adaptation approach which, translates in a bi-objective optimization problem which aims at enhancing the NB-IoT network coverage without compromising on its energy efficiency performance. The proposed adaptive channel coding technique is twofold. It is composed of an inner loop and an outer loop adaptation schemes both aimed at enhancing the energy efficiency as well as the throughput of the network. In particular, the inner loop adaptation scheme is designed based on the channel conditions to guarantee transmission reliability and consequently, enhance the data rate of the network. The outer loop scheme, on the other hand, is designed based on the Modulation Coding Scheme (MCS) number and the transmission repetition number. Due to the fact that the channel conditions of NB-IoT systems are known to be quite unstable [10] as their transmission block error ratio (BLER) rapidly changes, the present approach introduces an inner loop link adaptation procedure which focuses on dynamically varying the transmission repetition number based on a periodically sampled and estimated channel condition quantified by means of its BLER performance. The current BLER performance is each time used to predict the channel conditions on the next transmission based on the Sequential Channel Estimation in the Presence of Random Phase Noise in NB-IoT Systems as proposed by [8]. In this channel estimation model, the main consideration is that although the coherent-time of fading channel is assumed fairly long due to the assumed low mobility of NB-IoT user-equipments (UEs). Therefore, phase

Energy Efficient Channel Coding Technique for NB-IoT

455

noises is considered before combining the channel estimates over repetition as a mechanism to improve the accuracy of the approach. With phase noise φl [n] caused by oscillator fluctuations and a residual FO fe normalized by the sub-carrier frequency, the time-domain base band received signal at the nth sampling time of the lth orthogonal-frequency-divisionmultiplexing (OFDM) symbol which can be expressed as, sl(Estimated) [n] = ejφl [n] (

1 sqrtN



(N/2)−1

Sl [k]e

2πjn(fe +k) N

) ∗ hl [n] + w(n)

(5)

k=−N/2

where “∗” denotes the linear convolution, Sl [k] is the transmit symbol on the lth OFDM symbol and the k th sub-carrier, h[n] is discrete fading channel taps, w(n) is additive-white Gaussian-noise (AWGN), and N is the Fast-Fourier-Transform (FFT) size. The purpose of the inner loop link adaptation is to guarantee the transmission BLER to the target. Accordingly, we refer to the former as outer loop link adaptation meaning, MCS level selection and repetition number determination. The proposed link-adaptation method is presented as follows, 4.1

The Inner Loop Approach

As discussed in a previous section, the inner loop link adaptation approach is designed to cope with the rapid transmission BLER fluctuations. This proposed inner loop approach works as follows, – In one period T , all transmissions of both positive and negative acknowledgments (ACK and NACKs) respectively, are computed to work out the average BLER for that period which is then recorded for that specific period. Specifically, the appropriate evaluation period T for LTE systems is chosen to be in the order of tens of milliseconds while hundreds of milliseconds are used for the NB-IoT systems. This selection is mainly motivated by the realistic expected traffic rate on the LTE systems being normally higher than the NB-IoT counterpart. – At the end of the considered period, the obtained BLER value is passed to the outer loop and used as a parameter in the selection of the appropriate transmission repetition number as clearly labelled in description of the algorithm as presented in Table 1. – If the current BLER (the one of the present period) is found to be less than 7%, the repetition number for the next transmission should be decreased because it means that the channel conditions are good and therefore, the probability of successful transmission is high since there are fewer channel impairments. – On the other hand, if the BLER is found to be between 7% and 13%, the channel is considered to be in the medium condition. Our proposed link adaptation approach proposes that the repetition number should be maintained on the next period.

456

E. Migabo et al.

– Finally, if the current BLER is greater than 13%, the channel is considered to be in bad conditions and therefore, the probability of successful transmission is reduced. This requires, therefore, that the number of transmission repetitions must be increased in order to guaranty a certain level of transmission reliability. 4.2

The Outer Loop Link Adaptation Approach

The outer loop link adaptation approach consists of the MCS level selection which is performed as follows, – If a certain number of ACKs are successively successfully decoded at the NB-IoT receiver, then the MCS level is increased. – On the other hand, when certain number of NACKs are successively decoded at the NB-IoT receiver, then the MCS level is decreased. Generally, the number of ACKs is more than that of NACKs to ensure a slow increase of the MCS level with ACK feedback and quick decrease of the MCS level with NACK feedback. Because of the narrowband and low data rate for NB-IoT systems, the settings for LTE systems might no longer be applicable. Therefore, two aperiodic and event-based actions are defined for the EEACC Approach, namely the fast upgrade (FUG) and the emergency downgrade (EDG). In the event of FUG, the MCS is increased by one while it is decreased by one in the event of an EDG. Thus the EEACC approach introduces a compensation factor ΔC(t), modelled as, ⎧ min {ΔC(t − 1) + Cstepup , ΔCmax }, ⎪ ⎪ ⎪ ⎪ ⎪ if HARQ f eedback = ACK; ⎪ ⎪ ⎪ ⎨max {ΔC(t − 1) − C stepdown , ΔCmin } (6) ΔC(t) = ⎪ if HARQ f eedback = N ACK; ⎪ ⎪ ⎪ ⎪ ⎪ΔC(t − 1), ⎪ ⎪ ⎩ if HARQ f eedback = N/A; where, – ΔCmax and ΔCmin are the upper and lower limits of the compensation factor ΔC(t), – Cstepup and Cstepdown are the incremental compensation step sizes, modelled as per the formula, Cstepdown = Cstepup

1 − BLERtarget BLERtarget

(7)

– N/A means discontinuous transmission (DTX), which practically happens when the eNB does not detect any NPUSCH signal.

Energy Efficient Channel Coding Technique for NB-IoT

457

A number of simulation parameter values are selected to be used within the simulation of the proposed algorithm. The choice of these parameter values is each time motivated by the objective of making the simulation process as realistic as possible. These parameter values include, – A targeted Block Error Rate (BLER) value of 10%: Teh choice of this targeted BLER value is guided by the 3GPP standard Release 13 [32]. This targeted BLER value is considered to be the normal out-of-sync error rate condition for LTE/4G technology during Radio Link Monitoring (RLM) [33]. – The 7% and 13% threshold are the typical values for specifying the channel state in a NB-IoT system to be bad, medium or good. This choice is directly linked to the objective of maintaining the ±3% margin around the targeted 10% BLER as per the 3GPP standard specifications [32]. – The Cstepup and Cstepdown values are the incremental compensation step sizes. Like in any iterative process there is a need to choose a reasonable initial step size. The authors have chosen an initial Cstepup value and not a Cstepdown one, because first the Cstepdown value is modelled as dependent on the Cstepup . – The initial value of 0.2 for the Cstepup parameter is used to ensure that it is only under an initial 20% probability of communication error that the MCS level is stepped down. This is because 20% is the maximum 3GPP standard value for BLER when using LTE/4G Technology. The proposed algorithm can be summarized in Table 1. The inner loop part is covered from lines 2 −→ 8 of the pseudo code as presented in Table 1. In this section, the Block Error rate (BLER) for each encoded block is checked every predefined T period time, against a threshold values of 7% and 13%, which are typical values for specifying the channel state in a NB-IoT system to be bad, medium or good. Depending on the observed type channel condition (good, medium or bad), the channel repetition number N is either reduced to half, progressively increased by one or doubled, respectively. On the other hand, the pseudo code for the outer loop section is presented trough line 3 −→ 52 of Table 1. This section consists of three scenarios namely, – When the Modulation Coding Scheme (MCS) value is between the two predefined threshold values (Lmin < L < Lmax ). In this situation, the compensation value ΔC) is updated to the lowest or highest value based on whether the transmitter receives a positive feedback (ACK) or a negative feedback (N ACK)and the MCS level is increased by one or reduced by one accordingly. – When the MCS reaches the minimum value (Lmin ), then based on what type of feedback is received by the transmitter (ACK or N ACK) and the position of current repetition number N with respect to the minimum predefined value N min and NM ax , a new MCS level is defined as an increase by 1 or maintained; while the transmission repetition number N is reduced by half or doubled accordingly.

458

E. Migabo et al. Table 1. The Link adaptation pseudo-code Algorithm : Proposed Uplink Link Adaptation Algorithm for NB-IoT Systems 1: Initialization: BLERtarget = 10%, Cstepup = 0.2, 1−BLERtarget Cstepdown = Cstepup BLERtarget , ΔC = 0, ΔCmax = +5, ΔCmin = −5, MCS level L and its bounds Lmax , Lmin , repetition number N and its bounds N max , N min . We empirically initialize the MCS level and repetition number based on the channel condition. 2: if period T expired out then 3: if BLER < 7% then 4: N = N/2 5: else if 7% < BLER > 13% then 6: N =N +1 7: else if BLER > 13% then 8: N = 2N 9: end if 10: end if 11: if L > Lmin &L < Lmax then 12: if HARQ feedback=ACK then 13. ΔC = min{ΔC + Cstepup , ΔCmax } 14. else if HARQ feedback-NACK then 15. ΔC = max{ΔC − Cstepdown , ΔCmin } 16. end if 17. if ΔC = ΔCmax then 18. L=L+1 19. if ΔC = ΔCmin then 20. L=L−1 21. end if 22.else if L = Lmin then 23. if HARQ feedback=ACK then 24. if N = N min then 25. ΔC = min{ΔC + Cstepup , ΔCmax } 26. if ΔC = ΔCmax then 27. L=L+1 28. end if 29. if N > N min then 30. N = N/2 31. end if 32. else if HARQ feedback=NACK then 33. if N = N max then 34. the current channel condition is very bad 35. else if N < N max then 36. N = 2N 37. end if 38. end if 39.else if L = Lmax 40. if HARQ feedback=ACK then 41. if N = N min then 42. the current channel condition is very good.

(continued)

Energy Efficient Channel Coding Technique for NB-IoT

459

Table 1. (continued) 43. else if N > N min then 44. N = N/2 45. end if 46. else if HARQ feedback=NACK then 47. ΔC = max{ΔC − Cstepdown , ΔCmin } 48. if ΔC = ΔCmin then 49. L=L−1 50. end if 51. end if 52.end if

– Similarity, when the MCS value reaches the maximum value LM ax , then once again, based on the type of feedback received by the transmitter (ACK or N ACK) and the position of current repetition number N with respect to the minimum predefined value N min , a new MCS level is defined as a decrease by 1 or maintained accordingly. It is important to note that the validity of our proposed EEACC approach is solidly based on the accuracy of our channel conditions assessment. Some key characteristics of this channel condition assessment have been empirically defined and standardized under the 3GPP standard release 13. The parameter ΔC(t) as used within our proposed approach plays the role of a channel characteristic compensation value. In other terms this how much channel noise and interference that needs to be suppressed from the current assessed channel conditions in order to move either to the classification as bad, medium or good condition as defined in the standard. The Cstepdown and Cstepup parameters are, therefore, the exact channel coding scheme deduction or increase needed to achieve ΔC(t) channel compensation.

5

Performance Evaluation

Evaluation Setup. In order to assess and validate the performance of the proposed EEACC approach, simulations are carried out under the NB-IoT network conditions summarized within the set-up parameters Table 2. Furthermore, it is important to note that a 4:1 Multiple Input Multiple Output (MIMO) with Alamouti decoding technique a well as an eNB antenna design is considered in these simulations. Due to the fact that the study is carried on in the context of a Narrowband Internet of Things (NB-IoT) network as deployed on a LTE cellular network, the Quadrature Phase Shift Keying (QPSK) LTE modulation scheme is considered for our study. Normally, eNodeBs in a LTE network are built to support a QPSK, 16 QAM and 64 QAM modulation techniques for the Down Link direction. But considering that the end nodes (NB-IoT motes) are often computation limited

460

E. Migabo et al.

devices most of times micro-controller based (sensor nodes, smart water meters, etc.); the choice of the modulation scheme is often oriented towards the QPSK. Obtained Results and Discussion. The obtained results are as follows, First, the BLER performance of our proposed adaptive channel coding approach (EEACC) against the NB-IoT Link Adaptation (NBLA) scheme as well as the traditional repetition schemes is assessed; as the transmission power is varied. The obtained results are illustrated in Fig. 2. For fair performance evaluation and comparison between the three considered schemes namely the NBLA, the traditional repetition approach as well as our proposed EEACC approach; all the three systems are using the same block size. As per the 3GPP TS36.213 standard [32], a NB-IoT device can select a downlink Transport block size (TBS) on MAC layer from 2 bytes (16 bits) up to 85 bytes (680 bits). For the case of our simulation, we are using the average block size of 44 bytes. Table 2. Key simulation parameters System bandwidth

200 kHz

Carrier frequency

900 MHz

Subcarrier spacing

15 kHz

Chanel estimation for NPDCCH Sequential channel estimation in the presence of random phase noise [8] Interference rejection combiner

MRC

Number of Tx antennas

1

Number of receive antennas

2

Frequency offset

200 Hz

Lmin

4

Lmax

12

Nmin

2

Nmax

10

Time offset period

2.5 us

Network deployment model

Mesh network (Meshnet)

Channel model

Additive white Gaussian noise (AWGN)

No nodes mobility considered

Static nodes No fading channel consideration

LTE modulation scheme

Quadrature phase shift keying (QPSK)

Energy Efficient Channel Coding Technique for NB-IoT

461

Comparision of BLER Performance as SNR is varied 0

BLER

10

-1

10

EEACC NBLA Traditional Repetition -2

10

-18

-16

-14

-12 -10 SNR (dB)

-8

-6

-4

Fig. 2. BLER performance comparison as power is varied

As it can be clearly observed in Fig. 2, the Block Error rate (BLER) of the traditional repetition approach, the Narrowband Internet of things Link Adaptation (NBLA) approach as well as our proposed Energy Efficient Adaptive channel coding (EEACC) approach; all reduce as the Signal-to-Noise Ratio (SNR) increases. As the slope of the EEACC curve shows, the BLER of the EEACC decreases much faster than that of the others after a SNR −14 dB. This can be explained by the fact with its repeated transmission capability coupled with its smart MCS selection; all adapted to the channel conditions. The EEACC’s transmissions are already quite reliable to an extent that an increase in transmission power PT x which results in an increase in SNR assuming that for a single period T , the channel conditions remain more or less the same; simply reduces even further the number of possible transmission failures. It is also important to observe that on one hand, although for SNR values from −18 dB to −14 dB, all the BLER values are quite similar; the BLER performance of the NBLA as well as the traditional repetition approaches are quite higher as compared to the one of the EEACC proposed approach. On the other hand, it can be clearly noticed that the BLER performance of the NBLA as well as remain closer as transmission power is increased while the one of the EEACC significantly drops faster to as low as 0.18% for a SNR of −4 dB. Secondly, the average energy consumption of the NB-IoT network is computed as the number of NB-IoT nodes is varied with each of the three approaches, namely the traditional repetition, the NBLA as well as the proposed EEACC approach. The obtained results are clearly depicted in Fig. 3.

462

E. Migabo et al.

Average Energy Consumption (J)

Average Energy Consumption in a cell variation with Scalability 7 6 5 4 3 2 EEACC NBLA Traditional Repetition

1 0

200

400

600 800 1000 1200 Number of IoT Nodes

1400

1600

Fig. 3. Average energy consumption variation with increased scalability

As it can clearly been observed from Fig. 3, the EEACC is more energy efficient than the NBLA and the traditional repetition approaches. Also, as the average energy consumption of all three approaches increases as the number of NB-IoT nodes is increased in the network. For a total of 800 NB-IoT nodes within a cell, the EEACC proposed approach consumes 44.01% less energy than the NBLA approach and 49.51% even less energy than the traditional repetition approach. Thirdly, the behaviour of the BLER performance as the number of repeated transmissions is increased is also assessed. The aim of this particular performance evaluation was to assess the impact of increasing the transmission repetition on the transmission reliability of the NB-IoT network. Therefore, the transmission power has been hold constant to 0.1 W which translates in a constant SN R = −10 dB BLER Performance with Repetition Number Variation

0

10

Traditional Repetition NBLA EEACC -1

BLER

10

-2

10

-3

10

2

3

4

5 6 7 Repetition Number (N)

8

9

10

Fig. 4. BLER Performance Behaviour with increased transmission repetition number

Energy Efficient Channel Coding Technique for NB-IoT

463

assuming that the channel conditions do not change quite significantly during this particular simulation period. The obtained results are as follows, As it can be clearly depicted in Fig. 4, the BLER is almost half of what it was for a SNR of −10 dB as depicted in Fig. 2; as the transmission repetition number is doubled (N = 2). This translates in the fact for doubling the number of transmission repetitions, the probability of failed block transmission is almost reduced by half. It can also be clearly noticed that as the number of transmission repetitions is increased, the BLER is reduced for all three approaches. It is important to notice that the BLER of the traditional repetition approach as well as the one of the NBLA remain closer as the number of transmission repetitions is increased while the one of the EEACC significantly reduces to as low as 0.18% as the repetition number N reaches its preset maximum value NM ax = 10. Lastly, the average transmission delay on a link basis is evaluated as the number of NB-IoT nodes is scaled up within a cell. The obtained results are as follows, Latency Performance Comparison against Scalability

Average Propagation Delay (ms)

7 6 5 4 3 2

NBLA EEACC Traditional Repetition

1 0

200

400

600 800 1000 1200 Number of IoT nodes

1400

1600

Fig. 5. Average transmission delay with increased scalability

As it can be noticed in Fig. 5, despite its increased computational intelligence as compared to the NBLA and the traditional repetition approaches, the EEACC still achieves less latency on a link basis. This is mainly due to the fact that, thanks to the advancement of nowadays processors, the computational energy but also time is not significant as compared to the propagation delay in a wireless network of object nodes such as sensor nodes etc. Once again, as it can be noticed the latency performance of the NBLA adn the traditional repetition approaches are quite close despite the fact the NBLA still performs with less latency as compared to the traditional repetition approach.

464

6

E. Migabo et al.

Conclusion

This research work consists of the development of novel a 2-D adaptive channel coding and link adaptation technique aimed to minimize the energy consumption of the NB-IoT system. The obtained results have demonstrated that the proposed EEACC approach outperforms the traditional repetition approach as well as exhibits better performance in terms of energy, scalability, latency and reliability than the existing improved version of the traditional repetition approach named the NBLA. Acknowledgment. This work is supported in part by the National Research Foundation of South Africa (Grant Number: 90604). Opinions, findings and conclusions or recommendations expressed in any publication generated by the NRF supported research are those of the author(s) alone, and the NRF accepts no liability whatsoever in this regard. The authors would like to also thank the Telkom Centre of Excellence (CoE) for their support.

References 1. Atzori, L., Iera, A., Morabito, G.: The internet of things: a survey. Comput. Netw. 54(15), 2787–2805 (2010) 2. Miao, Y., Li, W., Tian, D., Hossain, M., Alhamid, M.: Narrow band internet of things: simulation and modelling. IEEE Internet Things J. 5, 2304–2314 (2017) 3. Li, S., Xu, L.D., Zhao, S.: The internet of things: a survey. Inf. Syst. Frontiers 17(2), 243–259 (2015) 4. Xu, L.D., He, W., Li, S.: Internet of things in industries: a survey. IEEE Trans. Ind. Inf. 10(4), 2233–2243 (2014) 5. Introduction of NB-IoT in 36.331, 3GPP RP-161248, 3GPP TSG-RAN Meeting 72, Ericsson, Nokia, ZTE, NTT DOCOMO Inc., Busan, South Korea, June 2016 6. Sharif, A., Vidyasagar, V., Ahmad, R.F.: Adaptive channel coding and modulation scheme selection for achieving high throughput in wireless networks. In: The Proceedings of the IEEE 24th International Conference on Advanced Information Networking and Applications Workshops (AINAW), Perth, WA, Australia, pp. 200–207 , 20–23 April 2010 7. Migabo, M.E., Djouani, K., Kurien, A.M., Olwal, T.O.: A stochastic energy consumption model for wireless sensor networks using GBR techniques, In: The Proceedings of the IEEE African Conference (AFRICON), Addis Ababa, Ethiopia, pp. 1–5, 14–17 September 2015. https://doi.org/10.1109/AFRCON.2015.7331987 8. Rusek, F., Hu, S.: Sequential channel estimation in the presence of random phase noise in NB-IoT systems. In: The Proceedings of the IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Montreal, QC, Canada, 8–13 October 2017. https://doi.org/10.1109/ PIMRC.2017.8292588 9. Yu, C., Yu, L., Wu, Y., He, Y., Lu, Q.: Uplink scheduling and link adaptation for narrowband internet of things systems. IEEE Access 5, 1724–1734 (2017) 10. Pollet, T., Bladel, M., Moeneclaey, M.: BER sensitivity of OFDM systems to carrier frequency offset and Wiener phase noise. IEEE Trans. Commun. 43(2), 191–193 (1995)

Energy Efficient Channel Coding Technique for NB-IoT

465

11. El Soussi, M., Zand, P., Pasveer, F., Dolmans, G.: Evaluating the performance of eMTC and NB-IoT for smart city applications, in Semantic Scholars arXiv:1711.07268v1 [cs.IT] repository, pp. 1–18, November 2017 12. Chakrapani, A.: Efficient resource scheduling for eMTC/NB-IoT communications in LTE Rel. 13. In: The Proceedings of the IEEE Conference on Standards for Communications and Networking (CSCN), Helsinki, Finland, 18–20 September 2017. https://doi.org/10.1109/CSCN.2017.8088600 13. Chen, M., Miao, Y., Hao, Y., Hwang, K.: Narrow band internet of things. IEEE Access 5 (2017). https://doi.org/10.1109/ACCESS.2017.2751586 14. Zarei, S.: LTE: channel coding and link adaptation. In: The Seminar on Selected Chapters of Communications Engineering, pp. 1–14, Erlangen, Germany (2009) 15. Spajic, V.: Narrowband internet of things. J. Mech. Autom. Ident. Tech. (JMAIT) 2(1), 1–6 (2017). Vip Mobile 16. Tabbane, S.: Internet of things: a technical overview of the ecosystem. In: The ¨ Proceedings of the Regional Workshop for Africa on Developing the ICT Ecosystem to Harness Internet-of-Things (IoT) Mauritius, pp. 1–6, 28–30 June 2017 17. Ratasuk, R., Mangalvedhe, N., Kaikkonen, J., Robert, M.: Data channel design and performance for LTE narrowband IoT. In: Proceedings of the IEEE 84th Vehicular Technology Conference (VTC-Fall), pp. 1–5, September 2016 18. Inoue, T., Vye, D.: Simulation speeds NB-IoT product development. Microwave J. China 10(1), 38–44 (2018) 19. Wibowo, F.X.A., Bangun, A.A.P., Kurniawan, A.: Multimedia broadcast multicast service over single frequency network (MBSFN) in LTE based femtocell. In: Proceedings of the 2011 International Conference on Electrical Engineering and Informatics, pp. 1–5 (2011) 20. Alexiou, A., Asimakis, K., Bouras, C., Kokkinos, V., Papazois, A., Tseliou, G.: Reliable multicasting over LTE: a performance study. In: IEEE Symposium on Computers and Communications (ISCC) 2011, pp. 603–608 (2011) 21. Bouras, C., Alexiou, A., Papazois, A.: Adopting forward error correction for multicasting over cellular networks. In: European Wireless Conference (EW) 2010, pp. 361–368 (2010) 22. Cornelius, J.M., Helberg, A.S.J., Hoffman, A.J.: An improved error correction algorithm for multicasting over LTE networks, University of North West thesis (2014) 23. Havinga, P.J.M., Smit, G.J.M.: Energy efficient wireless networking for multimedia applications, in wireless communications and mobile computing. Wirel. Commun. Mob. Comput. 1, 165–184 (2001). https://doi.org/10.1002/wcm.9 24. Li, Q., Wu, Y., Feng, S., Zhang, P., Zhou, Y.: Cooperative uplink link adaptation in 3GPP LTE heterogeneous networks. In: Proceedings of the IEEE Vehicular Technology Conference (VTC Spring), pp. 1–5, June 2013 25. Singh, M.P., Kumar, P.: An efficient forward error correction scheme for wireless sensor network. Procedia Tech. 4, 737–742 (2012). https://doi.org/10.1016/j. protcy.2012.05.120 26. Alzubi, J.A., Alzubi, O.A., Chen, T.M.: Forward error correction based on algebraic-geometric theory. Springer, 2014 edition. ISBN- 978-3319082929 27. Chase, D.: Class of algorithms for decoding block codes with channel measurement information. IEEE Trans. Inform. Theory 18(1), 170–182 (1972) 28. Roca, V., Cunche, M., Lacan, J., Bouabdallah, A., Matsuzono, K.: Reed-Solomon Forward Error Correction (FEC) Schemes for FECFRAME, IETF FECFRAME Working Group, draft-roca-fecframe-rs-00.txt (Work in Progress), March 2009. (ASCII) (HTML)

466

E. Migabo et al.

29. Walther, F.: Energy modelling of MICAz: a low power wireless sensor node. Technical report, University of Kaiserslautern, February 2006. http://www.eit.unkl.de/ wehn/files/reports/micazpowermodel.pdf. Accessed on 31 May 2018 30. Migabo, M.E., Djouani, K., Kurien, A.M., Olwal, T.O.: A stochastic energy consumption model for wireless sensor networks using GBR techniques. AFRICON 2015, 1–5 (2015) 31. Lee, J., Lee, J.: Prediction-based energy saving mechanism in 3GPP NB-IoT networks. Sensors (Switzerland) 17(9), 2008 (2017). https://doi.org/10.3390/ s17092008 32. Evolved Universal Terrestrial Radio Access (E-UTRA); Physical Channels and Modulation, 3GPP TS 36.211, 2016. http://www.3gpp.org/ftp/Specs/archive/36 series/36.211/36211-d20.zip 33. Helmersson, K., Englund, E., Edvardsson, M., Edholm, C.: System performance of WCDMA enhanced uplink. In: IEEE 61st Vehicular Technology Conference, Stockholm, Sweden, pp. 1–5 (2005) 34. Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and Channel Coding, 3GPP TS 36.212 (2016). http://www.3gpp.org/ftp/Specs/archive/36 series/36.211/36212-d20.zip 35. Massam, P., Bowden, P., Howe, T.: Narrow band transceiver, 9 January 2013, eP Patent 2,092,682. http://www.google.com/patents/EP2092682B1?cl=pt-PT 36. Maldonado, P.A., Ameigeiras, P., Prados-Garzon, J., Navarro-Ortiz, J., LopezSoler, J.M.: Narrowband IoT data transmission procedures for massive machinetype communications. IEEE Netw. J. 31(6), 8–15 (2017). https://doi.org/10.1109/ MNET.2017.1700081 37. Raza, U., Kulkarni, P., Sooriyabandara, M.: Low power wide area networks: an overview. IEEE Commun. Surv. Tutor. 19, 855–873 (2017)

An Internet of Things and Blockchain Based Smart Campus Architecture Manal Alkhammash1,2(B) , Natalia Beloff2 , and Martin White2 1 Jazan University, Jazan, Kingdom of Saudi Arabia 2 Sussex University, Brighton, UK

{ma979,n.beloff,m.white}@sussex.ac.uk

Abstract. Rapid development in science and information technologies, such as the Internet of things, has led to a growth in the number of studies and research papers on smart cities in recent years and more specifically on the construction of smart campus technologies. This paper will review the concept of a smart campus, discuss the main technologies deployed, and then propose a new novel framework for a smart campus. The architecture of this new smart campus approach will be discussed with particular consideration of security and privacy systems, the Internet of things, and blockchain technologies. Keywords: Smart campus · Internet of things · Blockchain · Security · Privacy

1 Introduction Information and communications technology (ICT) development is a never-ending process, which has led to a growth in the number of studies and research papers on smart cities in recent years. The concept of a smart city is not only about constructing traditional infrastructure, such as a transportation system, but also involves ICT infrastructure in order to improve quality of life and enhance the profile of the city [1]. Therefore, the term ‘smart city’ can be generally defined as dynamically integrating the physical and the digital worlds, in which different data resources are automatically gathered in real time [1–3]. By utilising high-speed networks, the changes in the physical world can be captured and transferred to data centres so that they can be stored and processed [4]. This means that in order to capture the necessary data, there needs to be significant numbers of sensors at diverse locations that can capture this ‘big data’. In addition, the Cloud needs to be utilised in order to store and analyse the data. Consequently, there are many areas that can be developed under the intelligent city framework to achieve the overall goal of improving citizens’ quality of life. There have been many contributions and research papers in different areas to develop smart systems, such as medical and health care [5–8], supply chain management [9, 10], traffic [11, 12], and education systems [13–15] that together can build smart cities. Since a smart campus constitutes an essential element of a smart city, and the concept of the smart campus comes from the notion of smart cities [16], many researchers have focused their attention on developing smart campuses, trying to address the topical © Springer Nature Switzerland AG 2020 K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 467–486, 2020. https://doi.org/10.1007/978-3-030-52246-9_34

468

M. Alkhammash et al.

question of ‘how to develop an intelligent campus’ by contributing the same ideas and bases of intelligent cities to propose the smart campus [17]. Therefore, the aim of this paper is to study different technologies and to design a novel architecture for a smart campus in order to develop an intelligent campus. Such a smart and intelligent campus architecture (or framework) is likely to exploit the Internet of things (IoT), blockchain, and smart contracts as part of its many technology solutions. The paper will be structured as follows. In section two a brief description of a smart campus concept will be addressed. In section three the paper provides a brief background of a smart campus and delineates the main areas of the campus. In section four the paper reports some issues related to a previous generic smart campus architecture and proposes a new one and discusses it in depth in the following section. Finally, in section six the paper provides conclusions and future work.

2 Smart Campus Concept Traditionally, a campus can be defined as a land or an area where different buildings constitute an educational establishment. A campus often includes classrooms, libraries, student centres, residence halls, dining halls, parking, etc. Nowadays, campuses have adopted advanced technologies, such as visual learning environments [18, 19] and timetabling systems [20, 21] in order to provide high-quality services for stakeholders (e.g. academics, students, administrators, and services functions) on campus and to monitor and control facilities. These developments should be evolving constantly in order to increase efficiency, cut operational costs, reduce effort, lead to better decisionmaking, and enhance the student experience [22]. Thus, the term ‘smart campus’ can be defined as a place where digital infrastructure can be developed and that has the ability to gather information, analyse data, make decisions, and respond to changes occurring on campus without human intervention [22, 23]. The authors in [24] define a smart campus as an environment where the structure of ambient learning spaces – application context based on virtual spaces – integrates social and digital services into physical learning resources. If we think of a smart campus as a holistic framework, it encompasses several themes, including but not limited to automated security surveillance and control, intelligent sensor management systems, smart building management, communication for work, cooperation and social networking, and healthcare. Several innovations have been proposed for smart campuses, ranging from developing a whole framework using technologies such as mobile technologies, blockchain, the IoT, and the Cloud to assist learning to enhancing security systems utilising technologies such as ZigBee and radio frequency identification (RFID) [25–28].

3 Smart Campus Background Many studies and architectural plans with different goals have been undertaken on the subject of the smart campus [29]. This smart campus research largely breaks down into the following areas: teaching and learning, data analysis and services, building management and energy use on campus, campus data mining, water and waste management use on campus, campus transportation, and campus security.

An Internet of Things and Blockchain Based Smart Campus Architecture

469

3.1 Smart Campus Leaning Environments Much research has been focused on constructing smart campuses by developing suitable technologies and applications that involve teaching and learning. Therefore, the common purpose of designing and developing a smart campus has often been from a learning and educational perspective. The authors in [27] developed a novel holistic environment for a smart campus known as iCampus. The aim of their research is to propose a beginningto-end lifecycle within the knowledge ecosystem in order to enhance learning. Atif and Mathew designed a framework for a smart campus that integrates a campus social network within a real-world educational facility [30]. The study’s goal was to provide a social community where knowledge could be shared between students, teachers, and the campus’s physical resources. Further, [1] proposed a model of a smart campus to enable stakeholders on the campus to shape and understand their learning futures within the learning ecosystem. Based on cloud computing and IoT, [31] stated the concept of a smart campus and demonstrated some issues that related to intelligence application platforms after establishment. However, these approaches were focused only on proposing a smart campus by using IoT technology. 3.2 Smart Campus Data Analysis and Service Orientation Other research has considered the development of smart campuses based on data analysis. According to [32], a smart campus should be able to gather data from a crowd and analyse it by using crowdsourcing technologies in order to deliver services of added value. In 2011, [33] explained the prototype of a smart campus implementation that uses semantic technologies in order to integrate heterogeneous data. However, some researchers have envisioned smart campuses from social networking aspects. For instance, [34] elaborated upon an architectural system that can be deployed on campus in order to support social interaction by using service-oriented specifications. This will depend upon their proposed social network platform (WeChat) and an examination of its architecture, functions, and features. Xiang et al. developed a smart campus framework based on information dissemination [17]. However, these approaches did not address blockchain technology in order to eliminate centralisation. 3.3 Building Management and Energy Efficiency on Smart Campuses Several of the current initiatives that are developing smart campuses have been based on high-energy efficiency perspectives. In order to decrease the energy consumption of buildings, monitoring and controlling environmental conditions is essential, such as controlling both natural and artificial lighting, humidity, and temperature. An example of this is a project that was undertaken at the University of Brescia in Italy in 2015 that aimed to enhance energy efficiency inside buildings by monitoring lighting, temperature, and electrical equipment by using control systems, automation, and grid management. The project progressed in stages towards this goal. First of all, it aimed to reduce the consumption of the buildings’ energy by analysing possible actions. Then it attempted to implement different measures and evaluated their efficiency. Simultaneously, in order

470

M. Alkhammash et al.

to enhance users’ awareness of energy consumption, a system for monitoring operational conditions was also developed. Finally, the project evaluated the energy balance between consumption and generation, renewable energy production, and energy reduction. The outcome displayed a significant energy consumption reduction of 37.3% while improving the buildings’ thermal properties [35]. In addition, [36] proposed and implemented a web-based system to manage energy in campus buildings known as CAMP-IT. The system aimed to optimise the operation of energy systems in order for buildings to achieve goals of reducing energy consumption while at the same time enhance the quality of the indoor environment in terms of visual comfort and air quality. The modelling collected, controlled, and analysed the energy load for each building and for the campus as a whole. The results showed a reduction in energy consumption of nearly 30%. Again, these approaches did not study the integration of blockchain into the proposed architectures. 3.4 Smart Campus Data Mining Additionally, some researchers have focused on applying interest mining, which is based on location, context awareness, proximity, and user profiles as well as other related information, to assist users in meeting their needs within the campus environment. In 2014, [37] studied web log mining, which is an essential technique in web data mining to determine users’ characteristic interests by developing a reliable and efficient method of data pre-processing. In 2010, [38] proposed a data-mining method from e-learning systems to identify users’ interests and obtain information about learners’ logs and knowledge background. Along these lines, the model would be able to automatically recommend resources that may be of interest to individual students. However, blockchain technology could be used to protect user profiles and preferences. 3.5 Water Management on Smart Campuses Regarding water and waste management, since they are considered expensive and important services on smart campuses, several research studies have focused on proposing management systems on campuses for these services in order to reduce the environmental and financial impact [22]. In terms of sustainable water management, there are three important pillars: water harvesting processes, water recycling, and water consumption reduction [39]. Different approaches have been proposed to manage water. Some focused on controlling and monitoring the water level and water consumption on campus. For instance, [40] developed a water monitoring system to reduce water consumption on campus. The system designed a three-dimensional map of the campus and used a geographical information system (GIS) to display a water pipeline in the electronic map with detailed status information in real time. Therefore, the model can monitor water directly from pipelines; detect any problems that occur in the equipment, such as leaking; and analyse the amount of water consumption. In 2015, [41] developed a suitable system for medium-sized campuses to monitor the water balance in real time. The design used an ultrasound level sensor, a cloud software stack, and communication links and carefully considered industrial design. To be able to

An Internet of Things and Blockchain Based Smart Campus Architecture

471

monitor the water, the system measured the water level in tanks by sending ultrasound pulses to the water’s surface. After observing the reflection, the sensors can estimate the distance and calculate the tank volume. Based on previous work, [42] developed an automatic water distribution system for large campuses so that each tank on the campus would have enough water to meet the local needs. The authors utilised ultrasonic ranging sensors, which are suitable for measuring water levels in large tanks, and a wireless network using sub-GHz radio frequency to connect sensors across long distances for further analysis. Moreover, many other experiments have proved efficient for developing water management systems, and they can be implemented on smart campuses to reduce water consumption [43]. For example, [44] developed a meter of a smart water that can provide a user with real-time reading information, analyse his consumption data, and present it in visual graphs to improve the readability. Simultaneously, the system monitors the consumption and alerts the user if there is unusual water usage. However, these approaches did not address blockchain technology. 3.6 Waste Management on Smart Campuses Similarly, numerous studies have been devoted to developing waste management systems. Authors in [45, 46] stated that general research studies in this area focused on developing waste tracks and bins with sensor devices attached to collect and analyse real-time data. This information can be used for several purposes, for example, for developing an efficient cleaning timetable and preventing overfilling of bins. Ebrahimi et al. [47] in 2017 investigated the current waste and recycling infrastructure on Western Kentucky University campus to determine whether it had an adequate service by using spatial techniques, such as GIS, to track, recognise, and visualise waste and recycling bins in a large-scale area. They used spatial information for analysis and decision making to reduce solid waste steam and improve the university’s recycling stream. Furthermore, they drew an accurate roadmap for a suitable waste management plan for the campus. Although most papers use different techniques for waste management systems on smart campuses, they are still at the primary stages, and they lack a generic model. 3.7 Smart Campus Transportation Recently, global positioning system (GPS) has become the most common method for streaming a location and tracking a moving object, such as a vehicle on the road. To improve the accuracy of GPS, external information is needed, such as Wi-Fi, digital imaging, and computer vision [48]. The authors in [49] developed a tracking system for buses using GPS devices that reported the buses’ locations every ten seconds. The location was sent from the server via SMS. The system also had safety features, such as the ability to send alerts or emergency reports when the vehicle crashed or was stolen. Other studies have tracked the location of a college bus using a mobile phone and Google Maps [50, 51]. Saad et al. [48] developed a real-time monitoring system for a university bus that used a GPS service to send the location of the bus to a cloud database every second. The system could also analyse data to estimate the bus’s arrival time. However, these approaches did not use blockchain technology to improve system security.

472

M. Alkhammash et al.

3.8 Smart Campus Security Many mobile applications have been developed for campus safety. Some of them allow users to contact campus security guards, such as EmergenSee and CampusSafe, whereas others, such as Guardly and CircleOf6, allow friends to contact each other [52]. These applications allow user location, photos, and situation descriptions to be shared with campus security guards. In addition, [53] also proposed a smart campus framework that includes several aspects, of which security was a notable one. They pointed out that a smart system can reduce burglaries by detecting glass breaking or any distinct sound; then, the system has the ability to alert security to the location. Also, the system may have the ability to reduce drug or alcohol abuse by alerting public safety to the presence of alcohol. Therefore, a smart campus can be described as an environment that has the ability to provide a suitable infrastructure in order to deliver services required in light of contextual awareness. In addition, it is a well-structured place that can generate huge amounts of information to a number of users by using their profiles and locations in order to best address their needs. Consequently, the desirable characteristics of a smart campus are accurate context awareness and ubiquitous access to networks, efficient and optimal utilisation, many varied resources, and the use of objective principles as a basis to make smart decisions or predictions. 3.9 Summary All the above approaches and implementations are useful and contribute to building smart campuses; however, they rely on IoT technologies with a centralised system architecture, which could lead to many issues and will be discussed in the next section. Next, we describe a new architecture that incorporates a distributed architecture exploiting blockchain and smart contracts to overcome some of the prevalent issues.

4 Smart Campus Concept Developing an architecture for a smart campus while considering advanced technologies, such as IoT, blockchain, and other technologies, is a complicated and difficult task since there are a large variety of devices and objects, associated services with such a system, and link layer technologies. Many different smart campus architectures have been developed with different aims [22, 29]. However, most of these frameworks usually contain three essential layers that interact with each other. First is the perception layer, which contains physical technologies, such as sensors, that collect all kinds of data from the surrounding environment. Second is the network layer, which contains all communication networks that are responsible for receiving and transmitting data. Third is the application layer, which is responsible for supporting business and personalised services and interacting with individual users. Figure 1 shows a generic illustration of this layered architecture approach. Here, we can see on the lowest level of the architecture that the perception layer is allocated, and it accommodates sensors to extract and gather the data from physical

An Internet of Things and Blockchain Based Smart Campus Architecture

473

Fig. 1. A generic smart campus layered architecture [54]

devices. In the middle of the architecture the network layer is utilised to aggregate, filter, and transmit data. The last layer is used by the Cloud or servers to store and analyse smart campus data. There are several problems with this generic architecture since it relies on IoT architecture. The IoT systems rely on centralised computing and storage platforms, such as cloud platforms, which is a suitable place to start for joining, managing, and controlling a massive number of different objects and devices as well as providing the required authentication and identification for various IoT devices. However, the centralised system architecture suffers from several limitations. Atlam and Wills [55] studied these limitations as follows. First, the centralised system has privacy vulnerabilities because data is collected from different devices and then stored in a centralised platform, which can be easily breached. Second, security is a major aspect for any system since processing and storing data through a centralised platform can lead to it being an easy target for attacks, such as distributed denial of service (DDoS) and denial of service (DoS) attacks. In addition, the devices in the IoT system are heterogeneously connected in nature, while a centralised platform uses a single operating system to connect to various devices. In this case, a centralised platform could prevent some objects from connecting to the system. Lastly, scalability is another issue related to a centralised platform since the number of connecting devices in the system is increasing. This is especially a problem for large business organisations, such as campuses, that are distributed in different areas. According to Piekarska and Halpin [56], there are concerns about the efficiency of operating and the scale of the IoT system with centralised architecture taking into account the increasing demands. Recently, blockchain technology has been involved in various application areas beyond the cryptocurrency domain since it has multiple features, such as decentralisation, support for integrity, resiliency, autonomous control, and anonymity [57]. Blockchain eliminates a central authority by using a distributed ledger and is decentralised to provide more efficiency for operating and controlling communication among all participating nodes. It also eliminates the single point of failure if the centralised platform goes down,

474

M. Alkhammash et al. Smart Building

Applica on Layer

CoAp

Smart Classroom

MQTT

Dashboard

DSS

HTTP

Analy cs and Models

Visualisa on and Decision Support

Smart Administra on

Web/Portal AMQP

Mathema cal Package

Catalogs Metadata

Data Mining Library Data Storage Real Time Reasoning Student Lecturer Data Data

Staff Data

Other Data

IaaS

PaaS

Data / Event Processing

Private Cloud

Hybrid Cloud

Public Cloud

Pla orm Layer

Library Data

Security System

Data Layer

Block chain

Business Layer

Smart Library

Smart Water/Waste Management

SaaS IoT Gateway

IEEE 802.11

IEEE 802.15.1

IEEE 802.15.4

IEEE 802.16

WIFI

Bluetooth

ZigBee

WI-MAX

Communica on Layer

Mobile Communica on 2G, 3G, 4G

WLANs Campus Sensor Networks

Physical Layer

WIFI

RFID

Devices Controllers NFC

QR

Bluetooth

ZigBee

Fig. 2. A new smart campus framework

which could lead to the failure of a whole system [55]. Therefore, blockchain can be an efficient technology to handle the issues related to a centralised IoT, particularly security. Thus, we propose a more detailed smart campus architectural framework that combines IoT and blockchain technology, as shown in Fig. 2. Our smart campus framework consists of the blockchain and six layers: 1. physical, which includes several objects, such as campus sensors and devices; 2. communication, which includes the communication protocols and IoT gateway; 3. platform, which is a cloud component since it is recently considered an ideal technique for storing and analysing of volume of data as well as for running several services; 4. data, which is used to store campus data and includes real-time events; 5. business, which produces high-level reports and analysis; and 6. application, which provides services to the end user for connecting and controlling the smart campus environment. In addition, this framework has a security system to provide a secure data connection that ensures the secure transfer of trusted data from the physical, communication, platform, data, business, to the final application layer. The following sub-sections describe each layer in more detail. 4.1 Physical Layer The first layer, the physical layer, includes devices and sensors to detect data, such as motion, temperature, humidity, locations, attendance in the physical environment, etc. When the sensors sense the physical campus environment the parameters are then

An Internet of Things and Blockchain Based Smart Campus Architecture

475

converted to data signals to be handled on the Cloud for analysing. On the way, such data may pass through brokerage protocols, such as MQTT, to a suitable blockchainbased distributed storage in the data layer. In the physical layer, actuators operate in the opposite way: they convert data signals into physical actions [58] perhaps as a response to sensor data stored on the blockchain, which is subsequently analysed and results in an actuator event. The devices in this layer represent hardware components that are connected to the upper layers of the architecture either wirelessly or by wires. 4.2 Communication Layer The communication layer is sometimes known as the network layer or transmission layer [59, 60]. The different data sources that are provided by the perception layer need to be connected to the upper architecture layers to handle collected data. Devices and sensors use protocols and adequate communication technology to connect to the Internet. The diverse data sources in a smart environment lead to diverse communication technologies. For example, Wi-Fi/IEEE 802.11 utilises radio waves to allow smart devices to exchange and communicate within a 100 m range and without utilising a router in some ad hoc configurations [61]. IEEE 802.15.4 standard uses short-wavelength radio to exchange data between smart devices and to minimise power, such as Bluetooth low energy (BLE), which operates for a longer period of time and within a 100 m range. Recently, BLE was considered a suitable technology to support IoT applications [62]. In addition, IEEE 802.15.4 protocol is the specification of low high-message throughput, low cost, low data rate, and low power consumption and is also a good candidate for machine-to-machine (M2M), wireless sensor network, and IoT. This standard is used to produce Zigbee protocol for more reliable communication and a high level of security [61]. Therefore, the main objectives of this layer are to transmit data from and to different objects through gateways to integrated networks. Biswas and Muthukkumarasamy [63] discussed using blockchain in a smart city to provide a secure communication platform. They illustrated that blockchain should be integrated with the network layer to provide privacy and the security of transmitted data. They recommended that the transaction data can be into blocks using TeleHash protocol for broadcast in the network. 4.3 Platform Layer Generally, a smart environment based on IoT uses a large number of data sources, including actuators and sensors that produce big data, which need to extract knowledge by using complex computations, applying data mining algorithms, and managing the services and allocation tasks [64]. Thus, cloud computing presents the suitable technology and a powerful computational resource for IoT to process, compute, and store big data. In addition, blockchain is used to eliminate a centralised system architecture. 4.4 Data Layer The data layer represents a database for the system and processing of the data. A huge amount of data is stored in this layer, which is called ‘big data’. The previous layer uses

476

M. Alkhammash et al.

this layer to generate useful information. In the case of a smart campus, blockchain with a decentralised structure is needed to add security and privacy to the data. 4.5 Business Layer The business layer relies on middleware technology, which manages the system services and activities. It is responsible for building flowcharts, graphs, and business models as well as analysing, monitoring, evaluating, designing, and developing smart systems. Based on big data analysis, the business layer has the ability to support processing in decision making, visualise the outcomes to the user, and operate the controlling actuators. 4.6 Application Layer This layer can consist of many different application types and services required by many different end users. For example, in a smart campus, this layer can provide data related to air humidity and temperature measurements. Therefore, the application layer’s main objectives are to provide high-quality intelligent services to stockholders [65, 66] and allow users to interact with the system and visualise the data via an interface. In addition, the application layer has some protocols to deal with. For instance: • Constrained Application Protocol (CoAP) is one-to-one communication protocol that is inspired by Hypertext Transfer Protocol (HTTP). – CoAp is suitable for smart devices and IoT technology because CoAp is thin, lightweight, and causes as little traffic as possible [58]. • Message Queue Telemetry Transport (MQTT) is a protocol for messaging, and it is responsible for connecting networks and smart devices with middleware and applications [61]. Several applications use the MQTT, such as monitoring and social media notifications [58]. Thus, this protocol is able to provide an ideal messaging protocol for M2M and IoT communications due to its low bandwidth networks, low power, and low cost. • Moreover, Advanced Message Queuing Protocol (AMQP) is an open standard protocol that supports reliable transport protocol and communication and focuses on a message-oriented environment. Data Distribution Service (DDS) is a publish–subscribe protocol for real-time communication [65]. The application layer is responsible for providing high reliability and excellent quality of service to the applications. Therefore, there are a variety of communication protocols that can each work in a different scenario and with a different device manufacturer. 4.7 Blockchain Blockchain is a distributed ledger technology that implements transactions with a decentralised digital database. The transaction is verified by a network of computers before it

An Internet of Things and Blockchain Based Smart Campus Architecture

477

is added and updated to the ledger. Blockchain allows parties to exchange assets in real time without going through intermediaries [67]. Blockchain technology is a peer-to-peer (P2P) distributed ledger technology that records contracts, transactions, and agreements [63, 68]. In other words, blockchain verifies data after receiving it from a physical layer then constructs it into a transaction. It should be stated that the details of blockchain technology and how it works are outside the scope of this paper. For more information about blockchain technology principles, [69] and [70] can be helpful. To decide which type of blockchain to use in our framework, the types will be addressed in a comparative analysis. Recently, blockchain technology has been classified into three types: public blockchain, private blockchain, and consortium blockchain [71]. The first type is also called a permission-less blockchain because there is no need for permission for a single entity, such as Bitcoin [72] or Ethereum [73], to join the network. Anyone can engage and participate successfully by downloading the blockchain and executing the code as well as by sending transactions to the network. Therefore, a public blockchain is fully decentralised so all transactions or ledgers are shared and verified by all nodes, and there is no need for a central authority. In order to prove identities, peers in networks have to solve the proof-of-work puzzle, which requires time and power. This means the chain is not centralised, and once the data is validated the ledger is changed; therefore, the ledger or the transaction is immutable. However, a private blockchain is designated by its participants in advance to allow it writing, reading, and consensus processes. In other words, this blockchain is a permission-based chain, and only those who are authorised can join the network. This type of blockchain is useful for organisations or groups of individuals that share the ledger privately. Thus, malicious nodes cannot enter the network without permission. Specific nodes or services can be removed or added as needed, which provides better scalability for the network. Since the private blockchain is controlled on the network by a single trust node and has fewer authorised participants than a public blockchain, it performs much faster on a ledger and processes more transactions for each block. Furthermore, this blockchain has many consensus methods, such as practical Byzantine fault tolerance, proof of elapsed time, and proof of stake. A private blockchain is used widely in an environment that needs more security and privacy, such as by companies and in the banking sector. Cordra [74] is an example of a private blockchain. In addition, a consortium blockchain is a hybrid that combines private and public blockchains, and it is classified as a permission-based blockchain [71]. In this blockchain, the participants engage in writing and reading on the blockchain across organisations. The preselected nodes control the consensus process in this blockchain. In other words, several institutions govern this blockchain, unlike a private blockchain, which is operated by a single node. A hybrid blockchain has many advantages that relate to a private blockchain, such as privacy and efficiency of the ledger as well as higher scalability and faster transactions. In addition, a consortium blockchain is an easily implemented environment and more energy-efficient compared to a public blockchain [71, 75]. To summarise, all blockchain types are decentralised P2P networks, and all nodes share a verified ledger. All blockchains provide a ledger’s immutability. All users in all types of blockchain maintain a replica of the ledger. However, the main difference

478

M. Alkhammash et al.

between public and private blockchain is authorisation. A public blockchain allows any users to participate in the network. In addition, private and consortium blockchains are more efficient for IoT networks since they both have faster response times in the network and lower computational requirements. While public blockchain has proved over the years to be suitable and efficient for cryptocurrencies, it is not that effective to use for IoT applications due to its bandwidth requirements and high computational requirements [76]. In our architecture, we suggest using a consortium blockchain, for example, the Hyperledger Fabric blockchain platform [77–79], for many key features. A Hyperledger blockchain is widely used for businesses and enterprises. It is designed to support pluggable implementation of components delivering high degrees of confidentiality, resilience, scalability, and low latency. Hyperledger has a modular architecture and can be used very flexibly. In addition, modular consensus protocols are been used, which permit a user to trust models and tailor the system for particular use cases. This platform runs smart contracts or chain code, which is an executing programmable code that allows participants to write their own scripts without a middleman [80].

5 Smart Campus Exploiting the Internet of Things, Blockchain, and Security Requirements The main reason for developing a blockchain in 2008 was to address the potential problem related to stakeholders’ trust in various use cases, including financial and nonfinancial fields [81, 82]. It provides security requirements for the transactions by using several cryptography mechanisms, such as signature, asymmetric cryptography, and hash. A lot of research has explored whether blockchain technology meets the need for providing more secure, trusted, and immutable data by adopting the blockchain into existing software, such as in the financial industry [83] and healthcare fields [84–86]. However, integrating blockchain technology into education institutions is still in its early stages and needs more research. We have therefore provided a discussion about security requirements for the proposed framework of a smart campus since the security aspect is the main concern in most of the recent blockchain applications. We would like to study this aspect in more detail in the following sub-sections, covering authorisation and privacy in addition to the CIA triad of confidentiality, integrity, and availability. 5.1 Authentication Authentication is one of the key security aspects and is a process of verifying a peer’s identity in order to use a system and communicate with each other [87]. There are many studies that have focused on user authentication with the majority of cases looking at data leaks and identity theft. The current authentication mechanisms, which have been used in most applications, vary from using a single factor, for example, a password or user ID, to using a multi-factor authentication, such as a smart card or biological characteristic. These traditional methods are not effective in providing appropriate protection and can cause various issues and damage, for example, recently passwords have been easily and

An Internet of Things and Blockchain Based Smart Campus Architecture

479

frequently hacked [88]. Multi-factor authentication relies on centralisation or trusting third-party services, which, as we discussed previously, have high security risks. Recently blockchain has been used to improve protection against illegitimate access of several IoT applications without the need for centralised services. For example, Cha et al. [89] designed a blockchain gateway by integrating the blockchain in an IoT gateway to securely protect user preferences while connecting to IoT devices. This approach can raise the authentication level between the users and the connected devices. In addition, Sanda and Inaba [90] used blockchain technology with a Wi-Fi network to provide the authentication to the connected users and protect the network from malicious usage. The blockchain in this implementation was used to encrypt the communication and ensure security to the network. Therefore, the blockchain has the benefit of increasing the security of the authentication aspects. 5.2 Privacy Privacy is an essential aspect for most of the systems. The majority of the researchers have taken advantage of blockchain technology to increase the level of privacy in the IoT environment and protect the individual private data being revealed [91]. For example, Kianmajd et al. [92] presented a framework that integrates blockchain to preserve users’ privacy while using community resources. The framework highlighted that the decentralised environment of the blockchain can be used to increase the users’ data privacy. In addition, Zyskind et al. [93] structured a personal data management platform in order to provide privacy for users. The study proposed a protocol that integrated with a blockchain to produce ‘an automated trustless access-control manager’. The constructed platform achieved the privacy using encrypted data in the ledger and storing pointers to it instead of the transaction of the data itself to the network. Thus, personal data should be secured and controlled by the user and not be trusted to a third party. 5.3 Confidentiality, Integrity, and Availability (CIA) Data confidentiality is an aspect of protecting data from unauthorised access. Since blockchain uses cryptography mechanisms, it offers confidentiality and protects data, such as bank account [81] and personal data [94], from parties that do not have permission. Data integrity is another security aspect that is concerned with assuring and preserving the consistency, reliability, and accuracy of the data [95]. In other words, the data stored in the database should be kept from changing throughout its lifecycle. In this case, through the use of various cryptography mechanisms, blockchain technology provides data integrity and promises to protect data from unauthorised change [96, 97]. Banerjee et al. [98] combined the blockchain with IoT devices’ firmware to maintain the integrity of shared data. Moreover, Liu et al. [99] implemented a framework for a data integrity service using blockchain to verify the integrity of IoT data without the need for a third party.

480

M. Alkhammash et al.

Data availability is one of many important terms in any system and means ensuring that the required data is available and accessible when needed [100]. One of the benefits of blockchain technology with a decentralised structure and distributed ledger is that it is resistant to outages [101].

6 Conclusion Recently, many researchers have focused on the study of developing smart and intelligent environments in many fields, such as smart cities, hospitals, and homes that mostly rely on IoT systems. The privacy and security aspects have been attracting research interest since they are considered the critical issues and challenges for connected IoT devices. This paper surveyed a number of schemes and frameworks for smart campuses that were proposed in the literature as an example of IoT and addressed the issues related to security and privacy. This paper presented an overview of the smart campus concept, including architectures; enabling different technologies, such as IoT; cloud computing; and blockchain with the aim of improving the quality of life on campuses. It studied eight varied domains in the smart campus and defined problem assets per domain. In addition, the paper discussed the generic framework of a smart campus and its limitations. Furthermore, we proposed a new smart campus framework combining IoT and blockchain to mitigate the IoT issues in the previous architectures, particularly in relation to security and privacy since blockchain technology has multiple properties, such as autonomous and decentralised control, support for integrity, and resiliency. Moreover, this study discussed the security requirements for the proposed framework of a smart campus.

References 1. Kwok, L.: A vision for the development of i-campus. Smart Learn. Environ. 2, 2 (2015) 2. Szabo, R., Farkas, K., Ispany, M., Benczur, A.A., Batfai, N., Jeszenszky, P., Laki, S., Vagner, A., Kollar, L., Sidlo, C., Besenczi, R., Smajda, M., Kover, G., Szincsak, T., Kadek, T., Kosa, M., Adamko, A., Lendak, I., Wiandt, B., Tomas, T., Nagy, A.Z., Feher, G.: Framework for smart city applications based on participatory sensing. In: Proceedings of the 4th IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2013, pp. 295– 300 (2013) 3. Caragliu, A., Del Bo, C., Nijkamp, P.: Smart cities in Europe. In: 3rd Central European Conference on Regional Science 0732, pp. 1–15 (2015) 4. Perera, C., Liu, C.H., Jayawardena, S., Chen, M.: A survey on internet of things from industrial market perspective. IEEE Access 2, 1660–1679 (2015) 5. Pramanik, M.I., Lau, R.Y.K., Demirkan, H., Azad, M.A.K.: Smart health: big data enabled health paradigm within smart cities. Expert Syst. Appl. 87, 370–383 (2017) 6. Catarinucci, L., De Donno, D., Mainetti, L., Palano, L., Patrono, L., Stefanizzi, M.L., Tarricone, L.: An IoT-aware architecture for smart healthcare systems. IEEE Internet Things J. 2, 515–526 (2015)

An Internet of Things and Blockchain Based Smart Campus Architecture

481

7. Farahani, B., Firouzi, F., Chang, V., Badaroglu, M., Constant, N., Mankodiya, K.: Towards fog-driven IoT eHealth: promises and challenges of IoT in medicine and healthcare. Futur. Gener. Comput. Syst. 78, 659–676 (2018) 8. Amendola, S., Lodato, R., Manzari, S., Occhiuzzi, C., Marrocco, G.: RFID technology for IoT-based personal healthcare in smart spaces. IEEE Internet Things J. 1, 144–152 (2014) 9. Tachizawa, E.M., Alvarez-Gil, M.J., Montes-Sancho, M.J.: How “smart cities” will change supply chain management. Supply Chain Manag. 20, 237–248 (2015) 10. Luki´c, J., Radenkovi´c, M., Despotovi´c-Zraki´c, M., Labus, A., Bogdanovi´c, Z.: Supply chain intelligence for electricity markets: a smart grid perspective. Inf. Syst. Front. 19, 91–107 (2017) 11. Ghazal, B., Elkhatib, K., Chahine, K., Kherfan, M.: Smart traffic light control system. In: 3rd International Conference on Electrical, Electronics, Computer Engineering and their Applications, EECEA 2016 (2016) 12. Galán-García, J.L., Aguilera-Venegas, G., Rodríguez-Cielos, P.: An accelerated-time simulation for traffic flow in a smart city. J. Comput. Appl. Math. 270, 557–563 (2014) 13. Nair, P.K., Ali, F., Lim, C.L.: Interact. Technol. Smart Educ. 12, 183–201 (2015) 14. Alelaiwi, A., Alghamdi, A., Shorfuzzaman, M., Rawashdeh, M., Hossain, M.S., Muhammad, G.: Enhanced engineering education using smart class environment. Comput. Human Behav. 51, 852–856 (2015) 15. Ibrahim, M.S., Razak, A.Z.A., Kenayathulla, H.B.: Smart principals and smart schools. Procedia Soc. Behav. Sci. 10, 826–836 (2013) 16. Muhamad, W., Kurniawan, N.B., Suhardi, S., Yazid, S.: Smart campus features, technologies, and applications: a systematic literature review. In: Proceedings of the International Conference on Information Technology Systems and Innovation, ICITSI 2017 (2018) 17. Dong, X., Kong, X., Zhang, F., Chen, Z., Kang, J.: OnCampus: a mobile platform towards a smart campus Background. Springerplus 5, 974 (2016) 18. Ahern, N., Wink, D.M.: Virtual learning environments: second life. Nurse Educ. 35, 225–227 (2010) 19. Alam, A., Ullah, S.: Adaptive 3D-virtual learning environments: from students’ learning perspective. In: Proceedings of the 14th International Conference on Frontiers of Information Technology, FIT 2016 (2017) 20. Komaki, H., Shimazaki, S., Sakakibara, K., Matsumoto, T.: Interactive optimization techniques based on a column generation model for timetabling problems of university makeup courses. In: Proceedings of the IEEE 8th International Workshop on Computational Intelligence and Applications, IWCIA 2015 (2016) 21. Mei, R., Guan, J., Li, B.: University course timetable system design and implementation based on mathematical model. In: 2nd International Conference on Computer and Automation Engineering, ICCAE 2010 (2010) 22. Abuarqoub, A., Abusaimeh, H., Hammoudeh, M., Uliyan, D., Abu-Hashem, M.A., Murad, S., Al-Jarrah, M., Al-Fayez, F.: A survey on internet of thing enabled smart campus applications. In: Proceedings of the International Conference on Future Networks and Distributed Systems - ICFNDS 2017, pp. 1–7 (2017) 23. Khamayseh, Y., Mardini, W., Aljawarneh, S., Yassein, M.B.: Integration of wireless technologies in smart university campus environment. Int. J. Inf. Commun. Technol. Educ. 11, 60–74 (2015) 24. Atif, Y., Mathew, S.S., Lakas, A.: Building a smart campus to support ubiquitous learning. J. Ambient Intell. Humaniz. Comput. 6, 223–238 (2015)

482

M. Alkhammash et al.

25. Chen, Y., Zhang, R., Shang, X., Zhang, S.: An intelligent campus space model based on the service encapsulation. In: Proceedings of 2nd International Conference on Logistics, Informatics and Service Science, LISS 2012, pp. 919–923 (2013) 26. Chen, Y., Li, X., Wang, Y., Gao, L.: The design and implementation of intelligent campus security tracking system based on RFID and ZigBee. In: Proceedings of the 2nd International Conference on Mechanic Automation and Control Engineering, MACE 2011, pp. 1749–1752 (2011) 27. Ng, J.W.P., Azarmi, N., Leida, M., Saffre, F., Afzal, A., Yoo, P.D.: The intelligent campus (iCampus): end-to-end learning lifecycle of a knowledge ecosystem. In: Proceedings of the 6th International Conference on Intelligent Environments, IE 2010, pp. 332–337 (2010) 28. Jackson, P.M.: Intelligent campus. In: Proceedings of the First International Symposium on Pervasive Computing and Applications, SPCA 2006, p. 3 (2007) 29. Hirsch, B., Ng, J.W.P.: Education beyond the cloud: anytime-anywhere learning in a smart campus environment. In: International Conference for Internet Technology and Secured Transactions, pp. 718–723 (2011) 30. Atif, Y., Mathew, S.: A social web of things approach to a smart campus model. In: Proceedings of the IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, GreenCom-iThings-CPSCom 2013, pp. 349–354 (2013) 31. Liu, Y.L., Zhang, W.H., Dong, P.: Research on the construction of smart campus based on the internet of things and cloud computing. Appl. Mech. Mater. 543, 3213–3217 (2014) 32. Adamkó, A., Kollár, L.: A system model and applications for intelligent campuses. In: Proceedings of the IEEE 18th International Conference on Intelligent Engineering Systems, INES 2014, pp. 193–198 (2014) 33. Boran, A., Bedini, I., Matheus, C.J., Patel-Schneider, P.F., Keeney, J.: A smart campus prototype for demonstrating the semantic integration of heterogeneous data. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 238–243 (2011) 34. Yu, Z., Liang, Y., Xu, B., Yang, Y., Guo, B.: Towards a smart campus with mobile social networking. In: Proceedings of the IEEE International Conferences on Internet of Things and Cyber, Physical and Social Computing, iThings/CPSCom 2011 (2011) 35. De Angelis, E., Ciribini, A.L.C., Tagliabue, L.C., Paneroni, M.: The Brescia smart campus demonstrator. Renovation toward a zero energy classroom building. Procedia Eng. 118, 735–743 (2015) 36. Kolokotsa, D., Gobakis, K., Papantoniou, S., Georgatou, C., Kampelis, N., Kalaitzakis, K., Vasilakopoulou, K., Santamouris, M.: Development of a web based energy management system for university campuses: the CAMP-IT platform. Energy Build. 123, 119–135 (2016) 37. Han, Y., Xia, K.: Data preprocessing method based on user characteristic of interests for web log mining. In: Proceedings of the 4th International Conference on Instrumentation and Measurement, Computer, Communication and Control, IMCCC 2014, pp. 867–872 (2014) 38. Kuang, W., Luo, N.: User interests mining based on topic map. In: Proceedings of the 7th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2010, pp. 2399–2402 (2010) 39. Amr, A.I., Kamel, S., Gohary, G.El, Hamhaber, J.: Water as an ecological factor for a sustainable campus landscape. Procedia Soc. Behav. Sci. 216, 181–193 (2016) 40. Shi, G.B.: The design of campus monitoring and managing system for watersaving based on webgis. In: Proceedings of the IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), iThingsGreenCom-CPSCom-SmartData 2017, January 2018, pp. 951–954 (2018)

An Internet of Things and Blockchain Based Smart Campus Architecture

483

41. Kudva, V.D., Nayak, P., Rawat, A., Anjana, G.R., Kumar, K.R.S., Amrutur, B., Kumar, M.S.M.: Towards a real-time campus-scale water balance monitoring system. In: Proceedings of the IEEE International Conference on VLSI Design, pp. 87–92 (2015) 42. Verma, P., Kumar, A., Rathod, N., Jain, P., Mallikarjun, S., Subramanian, R., Amrutur, B., Kumar, M.S.M., Sundaresan, R.: Towards an IoT based water management system for a campus. In: IEEE 1st International Smart Cities Conference, ISC2 2015 (2015) 43. Alghamdi, A., Shetty, S.: Survey toward a smart campus using the internet of things. In: Proceedings of the IEEE 4th International Conference on Future Internet of Things and Cloud, FiCloud 2016, pp. 235–239 (2016) 44. Mudumbe, M.J., Abu-Mahfouz, A.M.: Smart water meter system for user-centric consumption measurement. In: Proceeding of the IEEE International Conference on Industrial Informatics, INDIN 2015, pp. 993–998 (2015) 45. Goenka, S., Mangrulkar, R.S.: Robust waste collection: exploiting IoT potentiality in smart cities. i-Manager’s J. Softw. Eng. 11, 10–18 (2017) 46. Folianto, F., Low, Y.S., Yeow, W.L.: Smartbin: smart waste management system. In: IEEE 10th International Conference on Intelligent Sensors, Sensor Networks and Information Processing, ISSNIP 2015 (2015) 47. Ebrahimi, K., North, L., Yan, J.: GIS applications in developing zero-waste strategies at a mid-size American university. In: International Conference on Geoinformatics (2017) 48. Saad, S.A., Hisham, A.A.B., Ishak, M.H.I., Fauzi, M.H.M., Baharudin, M.A., Idris, N.H.: Real-time on-campus public transportation monitoring system. In: Proceedings of the IEEE 14th International Colloquium on Signal Processing and its Application, CSPA 2018 (2018) 49. Ramadan, M., Al-Khedher, M., Al-Kheder, S.: Intelligent anti-theft and tracking system for automobiles. Int. J. Mach. Learn. Comput. 2, 83 (2012) 50. Priya, S., Prabhavathi, B., Shanmuga Priya, P., Shanthini, B., Scholar, U.: An android application for tracking college bus using google map. Int. J. Comput. Sci. Eng. Commun. 3, 1057–1061 (2015) 51. Suresh Mane, M.P., Khairnar, P.V.: Analysis of bus tracking system using Gps on smart phones. IOSR J. Comput. Eng. (2014) 52. Ferreira, J.E., Visintin, J.A., Okamoto, J., Pu, C.: Smart services: a case study on smarter public safety by a mobile app for University of São Paulo. In: IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People and Smart City Innovation, SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI 2017, pp. 1–5 (2018) 53. Wang, Y., Saez, B., Szczechowicz, J., Ruisi, J., Kraft, T., Toscano, S., Vacco, Z., Nicolas, K.: A smart campus internet of things framework. In: IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2017 (2018) 54. Cheng, X., Xue, R.: Construction of smart campus system based on cloud computing. In: Proceedings of the 6th International Conference on Applied Science Engineering and Technology, Atlantis Press, pp. 187–191 (2016) 55. Atlam, H.F., Wills, G.B.: Intersections between IoT and distributed ledger (2019) 56. Halpin, H., Piekarska, M.: Introduction to security and privacy on the blockchain. In: Proceedings of the 2nd IEEE European Symposium on Security and Privacy Workshops, EuroS and PW 2017 (2017) 57. Chowdhury, M., Ferdous, S., Biswas, K.: Blockchain Platforms for IoT Use-cases, pp. 3–4 (2018) 58. Hejazi, H., Rajab, H., Cinkler, T., Lengyel, L.: Survey of platforms for massive IoT. In: IEEE International Conference on Future IoT Technologies, Future IoT 2018 (2018)

484

M. Alkhammash et al.

59. Lin, J., Yu, W., Zhang, N., Yang, X., Zhang, H., Zhao, W.: A survey on internet of things: architecture, enabling technologies, security and privacy, and applications. IEEE Internet Things J. 4, 1125–1142 (2017) 60. Leo, M., Battisti, F., Carli, M., Neri, A.: A federated architecture approach for Internet of Things security. In: Euro Med Telco Conference - From Network Infrastructures to Network Fabric: Revolution at the Edges, EMTC 2014 (2014) 61. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., Ayyash, M.: Internet of things: a survey on enabling technologies, protocols, and applications. IEEE Commun. Surv. Tutorials 17, 2347–2376 (2015) 62. Decuir, J.: Introducing bluetooth smart: Part 1: a look at both classic and new technologies. IEEE Consum. Electron. Mag. 3, 12–18 (2014) 63. Biswas, K., Muthukkumarasamy, V.: Securing smart cities using blockchain technology. In: Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2016 (2017) 64. Bryant, R., Katz, R., Lazowska, E.: Big-data computing: creating revolutionary breakthroughs in commerce, science and society. Comput. Res. Assoc. (2008) 65. Khan, R., Khan, S.U., Zaheer, R., Khan, S.: Future internet: The internet of things architecture, possible applications and key challenges. In: Proceedings of the 10th International Conference on Frontiers of Information Technology, FIT 2012 (2012) 66. Yang, Z., Yue, Y., Yang, Y., Peng, Y., Wang, X., Liu, W.: Study and application on the architecture and key technologies for IOT. In: International Conference on Multimedia Technology, ICMT 2011 (2011) 67. Morkunas, V.J., Paschen, J., Boon, E.: How blockchain technologies impact your business model. Bus. Horiz. 62, 295–306 (2019) 68. Christidis, K., Devetsikiotis, M.: Blockchains and smart contracts for the internet of things (2016) 69. Olleros, F., Zhegu, M., Pilkington, M.: Blockchain technology: principles and applications. In: Research Handbook on Digital Transformations (2016) 70. Ahram, T., Sargolzaei, A., Sargolzaei, S., Daniels, J., Amaba, B.: Blockchain technology innovations. In: IEEE Technology and Engineering Management Society Conference, TEMSCON 2017 (2017) 71. Sankar, L.S., Sindhu, M., Sethumadhavan, M.: Survey of consensus protocols on blockchain applications. In: 4th International Conference on Advanced Computing and Communication Systems, ICACCS 2017 (2017) 72. Satoshi, N., Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system. Bitcoin (2008) 73. Dannen, C.: Introducing Ethereum and Solidity: Foundations of Cryptocurrency and Blockchain Programming for Beginners (2017) 74. Hearn, M.: Corda: A distributed ledger. Whitepaper (2016) 75. Lim, S.Y., Tankam Fotsing, P., Almasri, A., Musa, O., Mat Kiah, M.L., Ang, T.F., Ismail, R.: Blockchain technology the identity management and authentication service disruptor: a survey. Int. J. Adv. Sci. Eng. Inf. Technol. 8, 1735–1745 (2018) 76. Salimitari, M., Chatterjee, M.: A Survey on consensus protocols in blockchain for IoT networks, pp. 1–15 (2018) 77. Windley, P.J.: Hyperledger Welcomes Project Indy. Hyperledger (2017) 78. Cachin, C.: Architecture of the Hyperledger. Blockchain Fabric. Work. Distrib. Cryptocurrencies Consens. Ledgers (DCCL 2016) (2016) 79. Androulaki, E., Manevich, Y., Muralidharan, S., Murthy, C., Nguyen, B., Sethi, M., Singh, G., Smith, K., Sorniotti, A., Stathakopoulou, C., Vukoli´c, M., Barger, A., Cocco, S.W., Yellick, J., Bortnikov, V., Cachin, C., Christidis, K., De Caro, A., Enyeart, D., Ferris, C., Laventman, G.: Hyperledger fabric: A distributed operating system for permissioned blockchains. In: Proceedings of the 13th European Conference on Computer System, ACM, pp. 1–15 (2018)

An Internet of Things and Blockchain Based Smart Campus Architecture

485

80. Buterin, V., Abarbanell, J.S., Bushee, B.J., Adcock, C., Adebiyi, A.A., Adewumi, A.O., Ayo, C.K., Atzei, N., B, M.B., Cimoli, T., Bartoletti, M., Cimoli, T., B, Y.H., Chakraborty, K., Mehrotra, K., Mohan, C.K., Ranka, S., Chen, M., Narwal, N., Schultz, M., Choi, H.K., Choudhry, R., Garg, K., Chrystus, J., Connor, J.T., Martin, R.D., Atlas, L.E., Corbet, S., Lucey, B., Yarovaya, L., Dechow, P.M., Hutton, A.P., Meulbroek, L.K., Sloan, R.G., Duarte Lima Freire Lopes, G., Falinouss, P., Faugeras, O.D., Fischer, T., Krauss, C., Frisiani, N., Hebrero-Martínez, M., Lerma, R.V., Trollé, C.M., Pérez-Cuevas, R., Muñoz, O., Hu, Z., Liu, W., Bian, J., Liu, X., Liu, T.-Y., Kadiri, E., Alabi, O., Kim, Y. Bin, Kim, J.G.J.H., Kim, W., Im, J.H., Kim, T.H., Kang, S.J., Kim, C.H., Lee, J., Park, N., Choo, J., Kim, J.G.J.H., Kim, C.H., Kimoto, T., Asakawa, K., Yoda, M., Takeoka, M., Kohara, K., LeCun, Y., Bengio, Y., Maciel, L.S., Ballini, R., Mu, S., Guo, Y., Yang, P., Wang, W., Yu, L., Nelson, D.M.Q., Pereira, A.C.M., De Oliveira, R.A., Of, a B., Counsel, P., Pagolu, V.S., Reddy, K.N., Panda, G., Majhi, B., Persson, S., Shaw, I., Phaladisailoed, T., Numnonda, T., Richardson, S., Tuna, I., Wysocki, P., Roche, J., Mcnally, S., Roondiwala, M., Patel, H. and Varma, S., Shukla, N., Fricklas, K., Song, Y.-G., Zhou, Y.-L., Han, R.-J., Tang, Z., de Almeida, C., Fishwick, P.A., Vargas, M.R., Lima, B.S.L.P. De, Evsukoff, A.G., Chohan, U., Nakamoto, S., DemirgucKunt, A., Klapper, L., Singer, D., Ansar, S., Hess, J., Wiederhold, B.K., Riva, G., Graffigna, G., Schöneburg, E., Guo, T., Bifet, A., Antulov-Fantulin, N., Wood, G., Vineeth, N., Ayyappa, M., Bharathi, B.: A next-generation smart contract and decentralized application platform. PLoS One (2018) 81. Crosby, M., Nachiappan, Pattanayak, P., Verma, S., Kalyanaraman, V.: Blockchain Technology - BEYOND BITCOIN. Berkley Eng (2016) 82. Davidson, S., De Filippi, P., Potts, J.: Economics of Blockchain. SSRN (2016). https://ssrn. com/abstract=2744751. https://doi.org/10.2139/ssrn.2744751 83. Khan, C., Lewis, A., Rutland, E., Wan, C., Rutter, K., Thompson, C.: A distributed-ledger consortium model for collaborative innovation. Computer 50, 29–37 (2017) 84. Benchoufi, M., Porcher, R., Ravaud, P.: Blockchain protocols in clinical trials: transparency and traceability of consent. F1000. Res. 6, 66 (2018) 85. Azaria, A., Ekblaw, A., Vieira, T., Lippman, A.: MedRec: Using blockchain for medical data access and permission management. In: Proceedings of the 2016 2nd International Conference on Open and Big Data, OBD 2016 (2016) 86. Dagher, G.G., Mohler, J., Milojkovic, M., Marella, P.B.: Ancile: privacy-preserving framework for access control and interoperability of electronic health records using blockchain technology. Sustain. Cities Soc. 39, 283–297 (2018) 87. Wazid, M., Das, A.K., Hussain, R., Succi, G., Rodrigues, J.J.P.C.: Authentication in clouddriven IoT-based big data environment: survey and outlook. J. Syst, Archit (2019) 88. Mhenni, A., Cherrier, E., Rosenberger, C., Amara, N.E.B.: Double serial adaptation mechanism for keystroke dynamics authentication based on a single password. Comput. Secur. 83, 151–166 (2019) 89. Cha, S.C., Chen, J.F., Su, C., Yeh, K.H.: A blockchain connected gateway for ble-based devices in the internet of things. IEEE Access 6, 24639–24649 (2018) 90. Sanda, T., Inaba, H.: Proposal of new authentication method in Wi-Fi access using Bitcoin 2.0. In: IEEE 5th Global Conference on Consumer Electronics, GCCE 2016 (2016) 91. Mohsin, A.H., Zaidan, A.A., Zaidan, B.B., Albahri, O.S., Albahri, A.S., Alsalem, M.A., Mohammed, K.I.: Blockchain authentication of network applications: taxonomy, classification, capabilities, open challenges, motivations, recommendations and future directions (2019) 92. Kianmajd, P., Rowe, J., Levitt, K.: Privacy-preserving coordination for smart communities. In: Proceedings of the IEEE INFOCOM (2016) 93. Zyskind, G., Nathan, O., Pentland, A.S.: Decentralizing privacy: using blockchain to protect personal data. In: Proceedings of the 2015 IEEE Security and Privacy Workshops, SPW 2015 (2015)

486

M. Alkhammash et al.

94. Peterson, K., Deeduvanu, R., Kanjamala, P., Boles, K.: A blockchain-based approach to health information exchange networks. In: Proceedings of the NIST Workshop Blockchain Healthcare (2016) 95. Moin, S., Karim, A., Safdar, Z., Safdar, K., Ahmed, E., Imran, M.: Securing IoTs in distributed blockchain: analysis, requirements and open issues. Futur. Gener. Comput. Syst. 100, 325–343 (2019) 96. Wüst, K., Gervais, A.: Do you need a Blockchain? IACR Cryptology ePrint Archive(2017) 97. Apte, S., Petrovsky, N.: Will blockchain technology revolutionize excipient supply chain management? (2016) 98. Banerjee, M., Lee, J., Choo, K.K.R.: A blockchain future for internet of things security: a position paper. Digit. Commun. Netw. 4, 149–160 (2018) 99. Liu, B., Yu, X.L., Chen, S., Xu, X., Zhu, L.: Blockchain based data integrity service framework for IoT data. In: Proceedings of the IEEE 24th International Conference on Web Services, ICWS 2017 (2017) 100. Scarfone, K., Tracy, M.: Guide to General Server Security. Natl. Inst. Stand. Technol. 800, 123 (2008) 101. Zhu, H., Zhou, Z.Z.: Analysis and outlook of applications of blockchain technology to equity crowdfunding in China (2016)

Towards a Scalable IOTA Tangle-Based Distributed Intelligence Approach for the Internet of Things Tariq Alsboui(B) , Yongrui Qin, Richard Hill, and Hussain Al-Aqrabi School of Computing and Engineering, University of Huddersfield, Huddersfield, UK {tariq.alsboui,y.qin2,r.hill,h.al-aqrabi}@hud.ac.uk

Abstract. Distributed Ledger Technology (DLT) brings a set of opportunities for the Internet of Things (IoT), which leads to innovative solutions for existing components at all levels of existing architectures. IOTA Tangle has the potential to overcome current technical challenges identified for the IoT domain, such as data processing, infrastructure scalability, security, and privacy. Scaling is a serious challenge that influences the deployment of IoT applications. We propose a Scalable Distributed Intelligence Tangle-based approach (SDIT), which aims to address the scalability problem in IoT by adapting the IOTA Tangle architecture. It allows the seamless integration of new IoT devices across different applications. In addition, we describe an offloading mechanism to perform proof-of-work (PoW) computation in an energy-efficient way. A set of experiments has been conducted to prove the feasibility of the Tangle in achieving better scalability, while maintaining energy efficiency. The results indicate that our proposed solution provides highly-scalable and energy efficient transaction processing for IoT DLT applications, when compared with an existing DAG-based distributed ledger approach. Keywords: Scalability · Distributed Ledger Technology (DLT) · IOTA Tangle · Internet of Things (IoT) · Distributed Intelligence (DI)

1

Introduction

Internet of Things (IoT) applications connect everyday objects to the Internet and enable the gathering and exchange of data to increase the overall efficiency of a common objective [1]. It is estimated that there will be approximately 125 billion devices connected to the Internet in 2030 [2–4]. Consequently, most IoT applications are required to be highly scalable and energy efficient, so that they are capable of dynamically responding to a growing number of IoT devices [5]. IoT applications have a number of common elements: (1) sensing to perceive the environment; (2) communication for efficient data transfer between objects, and (3) computation, which is usually performed to generate useful information from the raw data. c Springer Nature Switzerland AG 2020  K. Arai et al. (Eds.): SAI 2020, AISC 1229, pp. 487–501, 2020. https://doi.org/10.1007/978-3-030-52246-9_35

488

T. Alsboui et al.

Distributed Intelligence (DI) is an approach that could address the challenges presented by the proliferation of IoT applications. DI is a sub-discipline of artificial intelligence that distributes processing, enabling collaboration between smart objects, and mediating communications, thus supporting IoT system optimisation and the achievement of goals [6]. This definition is the basis for the research that is described in this article. Contribution: We propose a system architecture for IoT, called the Scalable Distributed Intelligence Tangle-based Approach (SDIT). This research successfully addresses some of the technical challenges presented by the IoT, whilst also supporting the necessary proof-of-work (PoW) mechanism in an energy-efficient way. The key contributions are summarised as follows: • A Tangle-based architecture that manages resources and enables the deployment of IoT applications with the primary motivation being scalability. • A task offloading mechanism for performing the proof-of-work (PoW) on powerful IoT, devices to minimise energy consumption when resources (such as power) are constrained. • A set of experimental results that verify the effectiveness and contribution of the proposed approach. The ultimate objective of this paper is to design, and develop a scalable and energy efficient IOTA Tangle-enabled IoT intelligent architecture to support DI. The proposed approach differs from other solutions by using a Tangle-based DLT with the primary motivation of being energy-efficient, and scalable to accommodate the growth of IoT while taking resource constrains into consideration. This work outlines the design of a scalable system that can be used in various IoT applications. IOTA Tangle is used to achieve scalability and a higher level node is responsible for performing the proof of work (PoW) to minimise energy consumption of IoT devices. The initial idea can be found in our previous positioning work [5]. The reminder of this paper is organised as follows: Sect. 2 describes distributed ledger technology from the perspective of IOTA. Section 3 presents the suitability of IOTA Tangle for IoT. Section 4 discusses the differences of our work from other closely related work. In Sect. 5 we present our proposed SDIT system architecture. The performance of the proposed implementation is evaluated in Sect. 6. Finally, we conclude this paper and present an interesting future directions in Sect. 7.

2

Distributed Ledger Technology

Distributed Ledger Technology (DLT) can be divided into three main types based on the differentiation of the data structure used for the ledger, including: BlockChain (BC) [7], IOTA Tangle (DAG) [8], and Hashgraph [9]. BC is a distributed, decentralised, and immutable ledger for storing transactions and sharing data among all network participants [10]. Hashgraph, is considered as

Towards Building a Scalable Tangle-Based Distributed

489

an alternative to BC and is used to replicate state machines, which guarantees Byzantine fault tolerance by specifying asynchrony and decentralization, as well as no need for proof-of-work (PoW), eventual consensus with probability of one, and high speed in the consensus process [11]. BC has been criticized for its cost, energy consumption and lack of scalability. To overcome these limitations, the IOTA Tangle technology has been introduced as a decentralized data storage architecture and a consensus protocol, based on a Directed Acyclic Graph (DAG). Each node in the DAG represents a transaction, and the connections between transactions represent the transaction validators [8]. BC technology recently started to receive attention from both academic and industry, since it offers a wide range of potential benefits to areas beyond cryptocurrency (in particular the IoT), as it has unique characteristics such as immutability, reliability, fault-tolerance, and decentralization [12]. It is predicted that BC will transform the IoT ecosystems by enabling them to be smart and more efficient. According to the International Data Cooperation (IDC), it is stated that 20% of IoT deployments will employ a basic level of BC enabled services [13]. This number will continue to increase for the adoption of BC in the IoT since it is in the early stages of innovation. Overall, BC is considered as an effective solution to be integrated with the IoT to achieve some of the IoT technical challenges [14]. BC is potentially able to overcome some of the IoT issues such as privacy and security [15]. However, building an energy-efficient, and scalable IoT applications remains a challenge. Firstly, all BC consensus mechanisms in either private or public BC, require all fully participating nodes to retain copies of all transactions recorded in the history of BC, which comes at the cost of scalability [12]. Furthermore, IoT devices have limited computational, memory and networking constraints, which brings an issue when using BC-based architectures. Some of the IoT devices will not be able to engage in performing Proof of Work (PoW) consensus operations due to their limited computational power and battery life. Also, IoT devices do not always come with the required storage space to place a complete copy of the BC [16]. With the IOTA Tangle, transactions are directly attached to the chain without the need to wait as they need to approve two previous transactions called tips. Hence, the Tangle is more efficient than traditional BC under the welldesigned consensus mechanisms [17].

3

The IOTA Tangle Suitability for IoT

The IOTA Tangle is intuitively understandable, and the benefits offered by it can be employed to realise a DI approach. It offers a wide range of prospective modifications to fit specific goals. The scalability and flexibility essential for IoT can be obtained by IOTA technology. IOTA can facilitate IoT interactions in the form of transactions. The following are the potential benefits and motivations for integrating the IOTA technology in the IoT infrastructure:

490

T. Alsboui et al.

• Scalability: IoT demands scalable infrastructure to cope with the increasing number of IoT applications. The IOTA Tangle private network offers high scalability due to the unique design of the decentralized consensus Tanglebased architecture, in which users are also validators, and has no scaling limitations. • Decentralization: in centralized network architectures, the exchange of data is validated and authorized by central third-party entities. This leads to a higher cost in relation to centralized server maintenance. In IOTA Tangle based architecture, nodes exchange transactions with each other without relying on a central entity. Therefore, any participants who want to exchange transaction on the Tangle need to actively engage in consensus operations. • Security and privacy: one of the most critical technical challenges of coping with IoT is related to network security and privacy. In order to ensure confidentiality and data protection, IOTA technology has developed Masked Authenticated Messaging (MAM) as a second layer data communication protocol that provides the ability to transmit and access encrypted data streams over the Tangle. MAM encryption is enabled by three modes to control visibility and access to channels including: public, private, and restricted. Consequently, it can encrypt, authenticate, and broadcast data to the IOTA network. • Zero-fees transactions: IOTA does not require miners as IOTA participants to perform the proof-of-work (PoW) themselves. The transaction cost is regarded as the electricity required to validate two previous transactions in the working mechanism. This means that all network participants are required to contribute their computational power to maintain the network, thus removing transaction fees. The Tangle method allows IOTA to operate fee-free, making the network even more distributed. • Energy-efficiency: IoT devices have limitations in terms of power consumption, and applications have to be developed to maximise energy efficiency in order to extend device and network life. IOTA technology enables proof-ofwork (PoW) to be outsourced to a more powerful device to reduce the energy consumption of constrained IoT devices. • Resiliency: IoT applications require integrity in the data being transmitted and analyzed, therefore IoT infrastructure is required to be resilient against data leaks and breakage (i.e, offline capability). An IOTA network has replicas of records stored over IOTA peers. This assists the maintenance of data integrity, and together with offline tangle capability, provides additional resilience for the IoT infrastructure.

4

Related Work

Recently, DI gained new momentum from researchers to overcome the technical challenges of IoT [18–23]. A distributed dataflow programming model to enable DI is proposed in [18]. The system consists of fog devices that are classified according to their computing resources, edge IO (input/output), and compute

Towards Building a Scalable Tangle-Based Distributed

491

nodes. The input nodes are capable of brokering communications and data transfer to compute nodes. The compute nodes are responsible for processing the data arriving from input devices. The decisions of assigning logical nodes to physical devices are based on the capability of the nodes and the system designer. The proposed architecture achieves scalability, mobility, and can cope with heterogeneity. However, privacy, offline capability, and resource efficiency are not considered in their design and the approach is not suitable for time-critical applications that require fast responses. An approach named as PROTeCt—Privacy aRquitecture for integratiOn of internet of Things and Cloud computing to enable DI is presented in [19]. The proposed approach consists of IoT devices and a cloud platform. The IoT devices are responsible for sensing and implementing a cryptographic mechanism i.e. a symmetric algorithm to ensure privacy, before transmitting the data to a cloud. Similarly, in [24], the authors presented an approach based on Mobile Cloud Computing to support DI. The main idea is to merge sensing and processing at different levels of the network by sharing the application’s workload between the server side and the smart things, using a cloud computing platform when needed. The proposed approach enables real-time monitoring and analysis of all the acquired data by networked devices and provides flexibility in executing the application by using resources from cloud computing services. However, these approaches are neither scalable nor suitable for time-critical applications. Furthermore, resiliency of the system i.e, offline capability, multi-party authentication for data security [25], and the fusion of data sources from external devices is not considered in their design. More advanced approaches are proposed in [20,21,23,26] in which they rely on fog computing to enable DI, for example the work presented in [20], in which the authors applied two techniques, device-driven and human driven intelligence to reduce energy consumption and latency. The approach relies on machine learning (ML) to detect user behaviors and adaptively adjusts the sampling rate of sensors and resource schedules (timeslots in the MAC layer) between sensor nodes. Furthermore, an algorithm was designed to deal with the offloading of local tasks among a cluster of fog nodes to further reduce energy consumption, which may reduce energy demands and system latency. However, the approach is not scalable, interactions and information sharing among sensor nodes is not explicitly defined, and it lacks the mechanisms to deal with privacy and offline processing capability. An architecture that is composed of three layers is proposed in [26]. The approach employs several technologies to achieve DI at different layers. It consists of three layers, each of which is responsible for managing a specific task. The first layer consists of IoT devices and sensor devices, which are responsible for measuring and capturing environmental data. The second layer comprises fog nodes, which is responsible for providing an offloading path for the data captured from a group of IoT devices. The third layer is the cloud, which is responsible for managing computing resources and data, and provides overall control and monitoring of the application. The proposed approach leads to a reduction in

492

T. Alsboui et al.

energy consumption and latency. However, cooperation amongst the physical devices is not provided, and issues related to privacy [27], and offline capability is not considered. In [21], the authors present a novel three tier architecture to support DI. In the three tier architecture, IoT components such as sensors, mobile phones, vehicles, base station, network connection, and management elements are connected in a multi-tier distributed schema consisting of different levels of intelligence named as follows: group of devices tier, regional tier, and global tier. The group of devices tier consists of IoT devices and is responsible for managing distributed services (data) generated by sensors. The regional tier is made up of fog colony nodes that are considered as intermediate nodes and responsible for data preprocessing and integration. The global tier consists of cloud data centres that are responsible for further data processing. The proposed approach is robust and reduces the cost of maintaining the fog computing paradigm. However, it lacks scalability, resource utilisation mechanisms and privacy, which are considered major challenges in IoT. Furthermore, it uses a predetermined static orchestration, which results in the failure of the system due to depletion of their energy. Another architecture is proposed in [22] to support DI called Distributed Internet like ArchiTecture (DIAT). The architecture is divided into three layers including: virtualization of physical objects (VO), corresponding virtual object (CVOL), and service level (SL) all of which have their own functionalities and responsibilities. The virtualization of a physical objects layer provides a semantic description of the capabilities and features of the associated real world objects. The second layer is responsible for communicating and coordinating tasks coming from the VO layer. Finally, the service layer (SL) is responsible for creating and management of services and it handles various requests from users. The proposed architecture is scalable, interoperable and privacy is considered. However, support of the other IoT technical challenges i.e. offline capability, and the conservation of IoT resources is not supported. Another approach is introduced in [23] where the authors have developed several layers to achieve DI. The approach is divided into four layers including: A first layer is cloud computing that consists of a data center for providing wide monitoring and centralized control. The second layer comprises intermediate computing nodes that are responsible for identifying dangerous events that provide and act upon them. A third layer comprises low power and high performance edge devices connected to a group of sensors that are responsible for handling the raw data coming from sensors and perform analysis in a timely manner. Finally, the fourth layer consists of sensor nodes distributed to monitor the environment. The advantage of this approach is optimal responses in real time, and low latency. However, IoT related issues such as, energy efficiency, scalability, privacy [28] are not considered in their design. Most recently, a DAG-based scalable transactive smart home infrastructure is proposed in [17]. The approach adopts IOTA tangle to build an IoT smart home infrastructure focusing on scalability and security. A network of 40 nodes were established and divided into three main parts including: smart homes, the

Towards Building a Scalable Tangle-Based Distributed

493

Tangle of inter-house transactions (TXs), and smart devices in the homes. In all smart home devices, there is an always an online computation device (“Home Node”) with pre-installed firmware and a corresponding tangle reference implementation. All home nodes are connected to their neighbour with TCP/UDP protocols for communication and synchronizing the distributed ledger. The approach is scalable to a small number of nodes, and would consume a lot of energy since all nodes are required to perform the proof-of-work. Also, other IoT-related issues such as offline capability is not considered in their design. Furthermore, since their approach relies on a coordinator, full decentralization is not achieved. The DAG-based smart home approach is similar to the SDIT approach proposed in this paper. Both approaches utilise IOTA Tangle with different number of nodes to achieve scalability, where our approach focuses more on energy efficiency of constrained IoT devices as well as maintaining a decentralized architecture.

5

SDIT: A Scalable Distributed Intelligence Tangle-Based Approach

In this section, we present our proposed Scalable Distributed Intelligence Tanglebased approach (SDIT) that aims at tackling the scalability, energy-efficiency, and decentralisation by adopting the IOTA Tangle technology. 5.1

SDIT: System Architecture

Figure 1 illustrates an abstract view picture of the proposed system architecture. The architecture is divided into three main parts including: IoT devices, Tangle to process transactions (txs), and PoW enabled server. Each IoT device is connected with neighbour nodes via TCP/IP protocols for communication, and interaction with the Tangle is in the form of transactions. Tangle is responsible for managing, collecting and processing the transactions. A PoW-enabled server has rich resources and is mostly responsible for performing all of the computations on behalf of the IoT devices. This is a critical task so as to minimise energy consumpt