Proceedings of the 8th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT’18), Vol.1 [1st ed.] 978-3-030-21004-5;978-3-030-21005-2

This two-volume book presents an unusually diverse selection of research papers, covering all major topics in the fields

3,650 117 37MB

English Pages XI, 507 [513] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Proceedings of the 8th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT’18), Vol.1 [1st ed.]
 978-3-030-21004-5;978-3-030-21005-2

Table of contents :
Front Matter ....Pages i-xi
Front Matter ....Pages 1-1
Meaning Negotiation (Dalila Djoher Graba, Nabil Keskes, Djamel Amar Bensaber)....Pages 3-13
Framework for Managing the New General Data Protection Regulation Related Claims (Sarah Bouslama, Safa Bhar Layeb, Jouhaina Chaouachi)....Pages 14-23
Automatic Processing of Oral Arabic Complex Disfluencies (Labiadh Majda, Bahou Younès, Mohamed Hédi Maâloul)....Pages 24-38
A Fuzzy Querying Using Cooperative Answers and Proximity Measure (Aicha Aggoune)....Pages 39-49
The Role of Named Entities in Linking News Articles During Preservation (Muzammil Khan, Arif Ur Rahman, Muhammad Ullah, Rashid Naseem)....Pages 50-58
Development of Supplier Selection Model Using Fuzzy DEMATEL Approach in a Sustainable Development Context (Oussama El Mariouli, Abdellah Abouabdellah)....Pages 59-71
Software Effort Estimation Using an Optimal Trees Ensemble: An Empirical Comparative Study (Abdelali Zakrani, Ali Idri, Mustapha Hain)....Pages 72-82
Automatic Classification and Analysis of Multiple-Criteria Decision Making (Ahmed Derbel, Younes Boujelbene)....Pages 83-93
An Incremental Extraction and Visualization of Ontology Instance Summaries with Memo Graph (Fatma Ghorbel, Elisabeth Métais, Fayçal Hamdi, Nebrasse Ellouze)....Pages 94-104
Choosing the Right Storage Solution for the Corpus Management System (Analytical Overview and Experiments) (Damir Mukhamedshin, Dzhavdet Suleymanov, Olga Nevzorova)....Pages 105-114
Requirements Imprecision of Data Warehouse Design Fuzzy Ontology-Based Approach - Fuzzy Connector Case (Abdelmadjid Larbi, Mimoun Malki)....Pages 115-122
Quantitative Prediction of Toxicity of Substituted Phenols Using Deep Learning (Latifa Douali)....Pages 123-130
Exploring ISBSG R12 Dataset Using Multi-data Analytics (Ghazi Alkhatib, Khalid Al-Sarayrah, Alain Abram)....Pages 131-143
A New Biomedical Text Summarization Method Based on Sentence Clustering and Frequent Itemsets Mining (Oussama Rouane, Hacene Belhadef, Mustapha Bouakkaz)....Pages 144-152
Front Matter ....Pages 153-153
An Implementation of InfluxDB for Monitoring and Analytics in Distributed IoT Environments (Maurizio Giacobbe, Chakib Chaouch, Marco Scarpa, Antonio Puliafito)....Pages 155-162
Publish a Jason Agent BDI Capacity as Web Service REST and SOAP (Hantanirina Felixie Rafalimanana, Jean Luc Razafindramintsa, Alain Josué Ratovondrahona, Thomas Mahatody, Victor Manantsoa)....Pages 163-171
FPGA Implementation of a Quantum Cryptography Algorithm (Jaouadi Ikram, Machhout Mohsen)....Pages 172-181
Health Recommender Systems: A Survey (Hafsa Lattar, Aïcha Ben Salem, Henda Hajjami Ben Ghézala, Faouzi Boufares)....Pages 182-191
Distributed Architecture of an Intrusion Detection System Based on Cloud Computing and Big Data Techniques (Rim Ben Fekih, Farah Jemili)....Pages 192-201
An Affective Tutoring System for Massive Open Online Courses (Mohamed Soltani, Hafed Zarzour, Mohamed Chaouki Babahenini, Chaouki Chemam)....Pages 202-211
Rationality Measurement for Jadex-Based Applications (Toufik Marir, Hadjer Mallek, Sihem Oubadi, Abd El Heq Silem)....Pages 212-221
A Continuous Optimization Scheme Based on an Enhanced Differential Evolution and a Trust Region Method (Hichem Talbi, Amer Draa)....Pages 222-233
Strided Convolution Instead of Max Pooling for Memory Efficiency of Convolutional Neural Networks (Riadh Ayachi, Mouna Afif, Yahia Said, Mohamed Atri)....Pages 234-243
Ear Recognition Based on Improved Features Representations (Hakim Doghmane, Hocine Bourouba, Kamel Messaoudi, El Bey Bournene)....Pages 244-260
Some Topological Indices of Polar Grid Graph (Atmani Abderrahmane, Elmarraki Mohamed, Essalih Mohamed)....Pages 261-270
Deep Elman Neural Network for Greenhouse Modeling (Latifa Belhaj Salah, Fathi Fourati)....Pages 271-280
Front Matter ....Pages 281-281
High Efficiency Multiplierless DCT Architectures (Yassine Hachaïchi, Sonia Mami, Younes Lahbib, Sabrine Rjab)....Pages 283-293
Signature of Electronic Documents Based on the Recognition of Minutiae Fingerprints (Souhaïl Smaoui, Mustapha Sakka)....Pages 294-302
Person Re-Identification Using Pose-Driven Body Parts (Salwa Baabou, Behzad Mirmahboub, François Bremond, Mohamed Amine Farah, Abdennaceur Kachouri)....Pages 303-310
High Securing Cryptography System for Digital Image Transmission (Mohamed Gafsi, Sondes Ajili, Mohamed Ali Hajjaji, Jihene Malek, Abdellatif Mtibaa)....Pages 311-322
A Novel DWTTH Approach for Denoising X-Ray Images Acquired Using Flat Detector (Olfa Marrakchi Charfi, Naouel Guezmir, Jérôme Mbainaibeye, Mokhtar Mars)....Pages 323-331
Recent Advances in Fire Detection and Monitoring Systems: A Review (Rafik Ghali, Marwa Jmal, Wided Souidene Mseddi, Rabah Attia)....Pages 332-340
Superpixel Based Segmentation of Historical Document Images Using a Multiscale Texture Analysis (Emna Soyed, Ramzi Chaieb, Karim Kalti)....Pages 341-351
Palm Vein Biometric Authentication Using Convolutional Neural Networks (Samer Chantaf, Alaa Hilal, Rola Elsaleh)....Pages 352-363
Indoor Image Recognition and Classification via Deep Convolutional Neural Network (Mouna Afif, Riadh Ayachi, Yahia Said, Edwige Pissaloux, Mohamed Atri)....Pages 364-371
Automatic USCT Image Processing Segmentation for Osteoporosis Detection (Marwa Fradi, Wajih Elhadj Youssef, Ghaith Bouallegue, Mohsen Machhout, Philippe Lasaygues)....Pages 372-381
An Efficient Approach to Face and Smile Detection (Alhussain Akoum, Rabih Makkouk, Rafic Hage Chehade)....Pages 382-389
Front Matter ....Pages 391-391
The Role of Virtual Reality in the Training for Carotid Artery Stenting: The Perspective of Trainees (Daniela Mazzaccaro, Bilel Derbel, Rim Miri, Giovanni Nano)....Pages 393-399
Bio-Inspired EOG Generation from Video Camera: Application to Driver’s Awareness Monitoring (Yamina Yahia Lahssene, Mokhtar Keche, Abdelaziz Ouamri)....Pages 400-409
A Memory Training for Alzheimer’s Patients (Fatma Ghorbel, Elisabeth Métais, Fayçal Hamdi, Nebrasse Ellouze)....Pages 410-419
Visual Exploration and Analysis of Bank Performance Using Self Organizing Map (Mouna Kessentini, Esther Jeffers)....Pages 420-434
A Pattern Methodology to Specify Usable Design and Security in Websites (Taheni Filali, Med Salim Bouhlel)....Pages 435-448
Multi-agents Planner for Assistance in Conducting Energy Sharing Processes (Bilal Bou Saleh, Ghazi Bou Saleh, Mohammad Hajjar, Abdellah El Moudni, Oussama Barakat)....Pages 449-462
Morocco’s Readiness to Industry 4.0 (Sarah El Hamdi, Mustapha Oudani, Abdellah Abouabdellah)....Pages 463-472
Anti-screenshot Keyboard for Web-Based Application Using Cloaking (Hanaa Mohsin, Hala Bahjat)....Pages 473-478
Fall Prevention Exergame Using Occupational Therapy Based on Kinect (Amina Ben Haj Khaled, Ali Khalfallah, Med Salim Bouhlel)....Pages 479-493
An Assistance Tool to Design Interoperable Components for Co-simulation (Yassine Motie, Alexandre Nketsa, Philippe Truillet)....Pages 494-503
Back Matter ....Pages 505-507

Citation preview

Smart Innovation, Systems and Technologies 146

Med Salim Bouhlel Stefano Rovetta Editors

Proceedings of the 8th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT’18), Vol.1

Smart Innovation, Systems and Technologies Volume 146

Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-sea, UK Lakhmi C. Jain, Faculty of Engineering and Information Technology, Centre for Artificial Intelligence, University of Technology Sydney, Broadway, NSW, Australia

The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, SCOPUS, Google Scholar and Springerlink **

More information about this series at http://www.springer.com/series/8767

Med Salim Bouhlel Stefano Rovetta •

Editors

Proceedings of the 8th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT’18), Vol.1

123

Editors Med Salim Bouhlel SETIT Lab University of Sfax Sfax, Tunisia

Stefano Rovetta DIBRIS - University of Genoa Genoa, Genova, Italy

ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-3-030-21004-5 ISBN 978-3-030-21005-2 (eBook) https://doi.org/10.1007/978-3-030-21005-2 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

This book collects selected and revised papers that were presented at the international conference SETIT 2018, The Conference on the Sciences of Electronics, Technologies of Information and Telecommunications, covering all topics in the fields of information and communication technologies and related sciences. The conference was held jointly in Genoa–Italy, Hammamet–Tunisia, from December 18 to 20, 2018. The aim of this conference series, a major international event, is to bring together researchers and developers from both academia and industry to report on the latest scientific and theoretical advances in their respective areas, fostering a cross-disciplinary dissemination that would be otherwise made difficult by the extreme specialization of each field. This is a recent trend that characterizes very successful events and publications, and encourages scholars and professionals to overcome disciplinary barriers. In today’s information-centered world, the relevance of hardware, software, telecommunications cannot be overestimated; even fields traditionally involved only marginally, like factory automation and production engineering, are currently discovering the value of data and information, with such trends as the Industry 4.0 or the exponential growth of machine learning and AI, and with all the consequent technological developments that are needed to support them. Both theoretical advances and interesting applications were submitted to the conference, as it gave special emphasis to interdisciplinary works at the intersection of two or more of the covered areas. The papers that are included in this collection have been selected after their presentation at the conference and were carefully revised. But, in addition to this scientific production per se, the event had also another important role, providing an occasion for exchanging experiences and for introducing many young scientists in their training phase to an international scientific community, giving them opportunities for networking and professional growth.

v

vi

Preface

We are therefore grateful to the contributors of this collection, and to all participants, for their cooperation, interest, enthusiasm, and lively interactions, that helped making the conference not only a scientifically stimulating event, but also a memorable experience. March 2019

Med Salim Bouhlel Stefano Rovetta

Contents

Information Processing Meaning Negotiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dalila Djoher Graba, Nabil Keskes, and Djamel Amar Bensaber Framework for Managing the New General Data Protection Regulation Related Claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sarah Bouslama, Safa Bhar Layeb, and Jouhaina Chaouachi Automatic Processing of Oral Arabic Complex Disfluencies . . . . . . . . . . Labiadh Majda, Bahou Younès, and Mohamed Hédi Maâloul

3

14 24

A Fuzzy Querying Using Cooperative Answers and Proximity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aicha Aggoune

39

The Role of Named Entities in Linking News Articles During Preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muzammil Khan, Arif Ur Rahman, Muhammad Ullah, and Rashid Naseem

50

Development of Supplier Selection Model Using Fuzzy DEMATEL Approach in a Sustainable Development Context . . . . . . . . . . . . . . . . . . Oussama El Mariouli and Abdellah Abouabdellah

59

Software Effort Estimation Using an Optimal Trees Ensemble: An Empirical Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdelali Zakrani, Ali Idri, and Mustapha Hain

72

Automatic Classification and Analysis of Multiple-Criteria Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmed Derbel and Younes Boujelbene

83

An Incremental Extraction and Visualization of Ontology Instance Summaries with Memo Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fatma Ghorbel, Elisabeth Métais, Fayçal Hamdi, and Nebrasse Ellouze

94

vii

viii

Contents

Choosing the Right Storage Solution for the Corpus Management System (Analytical Overview and Experiments) . . . . . . . . . . . . . . . . . . . 105 Damir Mukhamedshin, Dzhavdet Suleymanov, and Olga Nevzorova Requirements Imprecision of Data Warehouse Design Fuzzy Ontology-Based Approach - Fuzzy Connector Case . . . . . . . . . . . . . . . . 115 Abdelmadjid Larbi and Mimoun Malki Quantitative Prediction of Toxicity of Substituted Phenols Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Latifa Douali Exploring ISBSG R12 Dataset Using Multi-data Analytics . . . . . . . . . . 131 Ghazi Alkhatib, Khalid Al-Sarayrah, and Alain Abram A New Biomedical Text Summarization Method Based on Sentence Clustering and Frequent Itemsets Mining . . . . . . . . . . . . . . . . . . . . . . . 144 Oussama Rouane, Hacene Belhadef, and Mustapha Bouakkaz Computer Science An Implementation of InfluxDB for Monitoring and Analytics in Distributed IoT Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Maurizio Giacobbe, Chakib Chaouch, Marco Scarpa, and Antonio Puliafito Publish a Jason Agent BDI Capacity as Web Service REST and SOAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Hantanirina Felixie Rafalimanana, Jean Luc Razafindramintsa, Alain Josué Ratovondrahona, Thomas Mahatody, and Victor Manantsoa FPGA Implementation of a Quantum Cryptography Algorithm . . . . . . 172 Jaouadi Ikram and Machhout Mohsen Health Recommender Systems: A Survey . . . . . . . . . . . . . . . . . . . . . . . . 182 Hafsa Lattar, Aïcha Ben Salem, Henda Hajjami Ben Ghézala, and Faouzi Boufares Distributed Architecture of an Intrusion Detection System Based on Cloud Computing and Big Data Techniques . . . . . . . . . . . . . . 192 Rim Ben Fekih and Farah Jemili An Affective Tutoring System for Massive Open Online Courses . . . . . . 202 Mohamed Soltani, Hafed Zarzour, Mohamed Chaouki Babahenini, and Chaouki Chemam Rationality Measurement for Jadex-Based Applications . . . . . . . . . . . . . 212 Toufik Marir, Hadjer Mallek, Sihem Oubadi, and Abd El Heq Silem

Contents

ix

A Continuous Optimization Scheme Based on an Enhanced Differential Evolution and a Trust Region Method . . . . . . . . . . . . . . . . 222 Hichem Talbi and Amer Draa Strided Convolution Instead of Max Pooling for Memory Efficiency of Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Riadh Ayachi, Mouna Afif, Yahia Said, and Mohamed Atri Ear Recognition Based on Improved Features Representations . . . . . . . 244 Hakim Doghmane, Hocine Bourouba, Kamel Messaoudi, and El Bey Bournene Some Topological Indices of Polar Grid Graph . . . . . . . . . . . . . . . . . . . 261 Atmani Abderrahmane, Elmarraki Mohamed, and Essalih Mohamed Deep Elman Neural Network for Greenhouse Modeling . . . . . . . . . . . . 271 Latifa Belhaj Salah and Fathi Fourati Image and Video High Efficiency Multiplierless DCT Architectures . . . . . . . . . . . . . . . . . 283 Yassine Hachaïchi, Sonia Mami, Younes Lahbib, and Sabrine Rjab Signature of Electronic Documents Based on the Recognition of Minutiae Fingerprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Souhaïl Smaoui and Mustapha Sakka Person Re-Identification Using Pose-Driven Body Parts . . . . . . . . . . . . . 303 Salwa Baabou, Behzad Mirmahboub, François Bremond, Mohamed Amine Farah, and Abdennaceur Kachouri High Securing Cryptography System for Digital Image Transmission . . . Mohamed Gafsi, Sondes Ajili, Mohamed Ali Hajjaji, Jihene Malek, and Abdellatif Mtibaa

311

A Novel DWTTH Approach for Denoising X-Ray Images Acquired Using Flat Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Olfa Marrakchi Charfi, Naouel Guezmir, Jérôme Mbainaibeye, and Mokhtar Mars Recent Advances in Fire Detection and Monitoring Systems: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Rafik Ghali, Marwa Jmal, Wided Souidene Mseddi, and Rabah Attia Superpixel Based Segmentation of Historical Document Images Using a Multiscale Texture Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Emna Soyed, Ramzi Chaieb, and Karim Kalti

x

Contents

Palm Vein Biometric Authentication Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Samer Chantaf, Alaa Hilal, and Rola Elsaleh Indoor Image Recognition and Classification via Deep Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 Mouna Afif, Riadh Ayachi, Yahia Said, Edwige Pissaloux, and Mohamed Atri Automatic USCT Image Processing Segmentation for Osteoporosis Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 Marwa Fradi, Wajih Elhadj Youssef, Ghaith Bouallegue, Mohsen Machhout, and Philippe Lasaygues An Efficient Approach to Face and Smile Detection . . . . . . . . . . . . . . . . 382 Alhussain Akoum, Rabih Makkouk, and Rafic Hage chehade Human-Machine Interaction The Role of Virtual Reality in the Training for Carotid Artery Stenting: The Perspective of Trainees . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Daniela Mazzaccaro, Bilel Derbel, Rim Miri, and Giovanni Nano Bio-Inspired EOG Generation from Video Camera: Application to Driver’s Awareness Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 Yamina Yahia Lahssene, Mokhtar Keche, and Abdelaziz Ouamri A Memory Training for Alzheimer’s Patients . . . . . . . . . . . . . . . . . . . . 410 Fatma Ghorbel, Elisabeth Métais, Fayçal Hamdi, and Nebrasse Ellouze Visual Exploration and Analysis of Bank Performance Using Self Organizing Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420 Mouna Kessentini and Esther Jeffers A Pattern Methodology to Specify Usable Design and Security in Websites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Taheni Filali and Med Salim Bouhlel Multi-agents Planner for Assistance in Conducting Energy Sharing Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Bilal Bou Saleh, Ghazi Bou Saleh, Mohammad Hajjar, Abdellah El Moudni, and Oussama Barakat Morocco’s Readiness to Industry 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Sarah El Hamdi, Mustapha Oudani, and Abdellah Abouabdellah Anti-screenshot Keyboard for Web-Based Application Using Cloaking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 Hanaa Mohsin and Hala Bahjat

Contents

xi

Fall Prevention Exergame Using Occupational Therapy Based on Kinect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Amina Ben Haj Khaled, Ali Khalfallah, and Med Salim Bouhlel An Assistance Tool to Design Interoperable Components for Co-simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 Yassine Motie, Alexandre Nketsa, and Philippe Truillet Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505

Information Processing

Meaning Negotiation Dalila Djoher Graba(B) , Nabil Keskes, and Djamel Amar Bensaber LabRi-SBA Lab., Ecole Sup´erieure en Informatique (ESI-SBA), Sidi Bel Abbes, Algeria {d.graba,n.keskes,d.amarbensaber}@esi-sba.dz Abstract. Nowadays, the web is becoming highly used technology, in our community. This technology allows us to work in collaboration and to share knowledge. The pragmatic web represents the most recent extension of the web (semantic web); which facilitates the exploitation and the interpretation of the data by the machine. This web is based on three important components, the context, the community, and the meaning negotiation. The Meaning negotiation is the most important component of the pragmatic web on which we will fix our attention. It plays an important role in the exchanges and resolves conflicts in people cooperation activities. The knowledge (context) of each part in the community of users is heterogeneous; this will make the meaning negotiation complicated. This paper realizes a meaning negotiation scenario based on the ontologies merging into the geopolitical domain. This will reduce and simplifies the process, and improves the semantics of data. Keywords: Meaning negotiation · Ontology Contextual ontology · Domain ontology

1

· Pragmatic web ·

Introduction

Nowadays, the world is undergoing a radical change; everything has become digital. In the 21st century, the web becomes a pillar technology of all information sharing in our universal culture. This technology allows us to work in collaboration and to share knowledge in a given domain. It provides an unlimited amount of information in a different field: scientific research, commerce, etc. At the beginning of 2010, the semantic web appears to provide a more efficient use of information by collecting the knowledge repositories with meaningful and structure contents. However, the more the Semantic Web becomes widely usable by humans, the more social interaction becomes difficult activities to achieve. The human factor in the Semantic Web is a largely unresolved problem. However, the pragmatic web appeared to solve the limits of the semantic web and increase human collaboration. In the pragmatic web, the meaning negotiation presents the process where the agents will agree on the meaning of a set of terms while using the semantic aspect (ontology). The problem of meaning negotiation is at the intersection of two domains the Artificial Intelligence (AI) and Knowledge Representation (KR). c Springer Nature Switzerland AG 2020  M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 3–13, 2020. https://doi.org/10.1007/978-3-030-21005-2_1

4

D. D. Graba et al.

However, there are different ways to represent knowledge (logic, ontology, etc.), hence the problem of heterogeneity. So, in order to have a powerful meaning negotiation, it is necessary to present the knowledge in a clear and unambiguous way. The model in [1] is a basic example of meaning negotiation, but the realization of this process is very complex. This model was adopted by the authors [2] where they merge the contextual ontology with the domain ontology in a case study, to improve meaning negotiation and to simplify the process of [1]. For this, we cannot say that the hypothesis of [2] is valid for other domain ontologies. Our work is to realize a meaning negotiation scenario based on the model of [1], because of the advantages that we can draw from it (the improvement of the collaboration between the individuals in the context, the improvement of the semantics, etc.). The idea is to use the contribution in [2], in order to simplify and reduce the scenario of [1] in the geopolitical domain. The next four sections are organized as follows. Section 2 presents some definition of basic terms. Section 3 presents related work in the domain and synthesizes them with a comparative table. Finally, Sect. 4 briefly introduces our work and gives an example. The last section summarizes the paper and outlining directions for future research.

2 2.1

Background The Pragmatic Web

This recent web is a new version of the semantic web. According to [3] the Pragmatic Web describes how and why people exploit information using a set of tools, practices, and theories. It deals not only with the meaning of information, but with the social interaction that brings. The pragmatic web encourages the emergence of communities of interest and practice that develop their own consensus knowledge to normalize their representation [4]. It is about how making web technology to serve people to collaborate in their disorder, real world and the evolution of the domains of interaction [5]. To conclude, the pragmatic web is the extension of the semantic web, which allow a community to negotiate in order to develop their common knowledge. 2.2

Meaning Negotiation

The meaning negotiation is a learning process that receives a set of textual or verbal discourse from a community sharing the same interest. Each object in the learning process can be characterized by rules, comments, news, features, or a collaborative design and development of a team’s products [6]. In another word, the authors in [7] define the meaning negotiation as a process where a communication medium is used allowing agents to agree from a set of different preferred object. The Fig. 1 shows that the meaning negotiation process in the pragmatic web can be extended by the use of the semantic layer, where the meaning of the concept is selected.

Meaning Negotiation

5

Fig. 1. The Meaning Negotiation process in the pragmatic web.

The Classification of Automatic Negotiation. The most powerful mechanism for dealing with the conflicts between agents is automated negotiation. We classified the automated negotiation into four different classes: heuristicbased, argumentation-based approaches, the game theoretical and auction-based approaches. – Game-theoretical approaches are a branch of economics that analyze strategic interactions between self-interested agents. In this class, the negotiation process is defined as a game between the participants to determine the optimal strategy. – Auction-based approaches are defined as a structured negotiation mechanism by which an economic agent puts several agents in competition. This class is a specific class of the Game theoretic approaches. – Heuristic-based approaches are known as approximative methods used for the resolution of optimization problems [8]. It represents empiric rules that solve the problem quickly and find an approximate solution. – Argument-based approaches aim to exchange the additional information over or above proposal. This information (argument) can make a different form to explain explicitly the opinion of the agent and to identify the area of negotiation space. 2.3

Context

The context represents an important component in the meaning negotiation. It characterizes the situation of the individual community member in order to return the most relevant information during the meaning negotiation process. More concretely, the context is defined as the set of external parameters to the application environment [9]. To conclude, the context is the environment that surrounds and contains the entity. This entity (concepts set) is considered relevant to the interaction between users in meaning negotiation. Contextual Ontology. In the 21st century, ontologies have been proposed as models to structure the information in the Knowledge Representation (KR). It represents a practical solution for communication and interoperability between information [10]. Contextualization allows the partitioning of ontology according

6

D. D. Graba et al.

to their context in a different domain. Contextual ontologies are an explicit specification of a contextual conceptualization [11]. In other words, Contextual ontologies vary according to the context in which a concept is characterized by a set of properties.

3

Related Work

This section is divided according to context representation: logic, ontology, cognitive, and hybrid models. 3.1

Ontology Context Model

hese approaches use ontologies to model the context. The First Approach [12] presented an algorithm for automated meaning negotiation. It allows the semantic interoperability between the local ontology and the heterogeneous ontology for different autonomous communities. The authors used the semantic dictionary in order to disambiguate the meaning and eliminate the irrelevant concepts in the context. They focused on the merge of concept label, leaving aside their relations. In 2005 De Moore [1] established a pragmatic model to place ontology in context and operationalize the pragmatics of the Web. The author elaborates a meaning negotiation scenario between a cat mat seller (MatMaker) and an association of cat lovers (CLAW). This scenario presents the basic model of the meaning negotiation that we will take into consideration in our work. It is very difficult to find a good granularity of context at the pragmatic level with domain ontology. Then, De Moore in [13] improved the ambiguities of communication in the meaning negotiation between communities based on the DOGMA framework to be able to achieve the agreement. The negotiation process becomes very complex when the community of practice is large in the step of interaction with the ontology engineering layer. The authors in [14] implemented 3 protocols for the negotiation ontology in the Internet News system. These protocols implement a normal communication, an ontology alignment and a transition between these ontologies. The authors in [15] have developed a formal framework that provides a negotiation strategy. This framework compares between a whole contexts of two Backgrounds domain theories. This process can be very long if the similarities distance SD still highly evolved and the orphans still existed. Finally, the paper [16] integrated negotiation ontology in a multiagent communication. This system implements algorithms that compute the translation between ontologies. For successful communication, it allows agents to share and to communicate factual and terminological knowledge in the same domain. 3.2

Logical Context Model

The logical system is one of the bases of the meaning negotiation protocols on the Web, and the mappings between the logical systems are the key to develop these protocols. Farrugia in [17] used protocols based on a logical system. These

Meaning Negotiation

7

protocols allow agents to interoperate on the Web. In order to agree, each agent takes into consideration the logical system of the other. The approach [18] presented a general model of multiagent systems, where agents discuss a point of view in order to agree on a common angle. The knowledge of the agent is represented by two sets, a fixed (stub) and a flexible (flex) that can move to a more descriptive or specific state. This system evolved the complexity of negotiation processes mostly in the scenarios of several actors. 3.3

Cognitive Model

These models use different algorithms and technique in learning tasks by considering examples. The artificial neural networks are designed based on biological networks. These parallel computer networks can learn, store and recall information [19]. From these networks, we can find the self-organization map or so-called SOM. The authors in [20] verified that agents were able to build a common emerged lexicon by the use of the self-organization map (SOM). A vector that contains the subject characteristics is sent to the agents. The agents looked in the neighborhood of BMU (Best Matching Unit) for words that match the subject. To describe if the word has successfully expressed meaning in a previous step, an incremental value for word-node peers is assigned to each agent. The use of multiple SOM for each domain will become more complexes. 3.4

Hybrid Model

The authors in [2] implemented the conceptual model of meaning negotiation [1] based on the presentation of the multi-agent model [18], to improve and optimizes the process of meaning negotiation. They presented a case study, in which they merge a part of the semantic ontology with the individual context ontology. This study presents only a result in a single domain; this cannot be validated by single domain ontology. It is very difficult to find a matching between ontologies, especially in the case where the ontologies are too large. 3.5

Synthesis

The different approaches of the meaning negotiation in the pragmatic web are classified and compared in Table 1. We compare the different approaches by using five criteria (model of context, negotiation class, technical, Strengths, Weakness). The ontological representation is more widely used than the logic one that will allow us to use the ontological representation to define our own way. It also summarizes the different technical and algorithm used, such as the self-organizing map that is based on the neuron network, similarity measurements, multi-agent system, etc. The class of negotiation argument basis is compatible with the logical contextual model. This paper is in the line of different works in this field, more particularly articulates on the work of [1,2] which represents the basic model of the meaning negotiation in order to minimize the problems mentioned previously in this document.

8

D. D. Graba et al.

Table 1. Comparative table between the different approaches Ap

Model of Context

[17] Logic

Negotiation classes

Weakness

Strengths

Technical

Argument basis

The approach did not deal with the context of agents

-Use different or common languages to represent the agents conceptualizations

Logical system

- Merge between the logical systems of the agents [12] Ontology

Argument basis

The merge does not Eliminate irrelevant deal with the labels concepts in a for the relations, context but only the label of concepts

The matrix of matching, WordNet

Disambiguate the meaning [1]

Ontology

Auction

The difficulty to find a good context granularity at the pragmatic level

-Processes complex

Web service, Pragmatic Pattern

-Decrease ambiguity of semantic data -Improve collaboration between individuals [13] Ontology

Auction

-The negotiation process becomes very complex when the community of practice is large

-Improve collaboration and communication processes

-The ontology engineering process is very complex

-Reduce ambiguities of communication

lexicon based, server commitment

-achieve the agreement [15] Ontology

Argument basis

The process can be very long if the semantic distance still very height or the domain theories are very large

-Use several similarity metrics

background domain theory, Similarity measure

-Apply revision of some propositional substitutions [20] The selforganization map (SOM)

Game theoretic

The approach does not deal with the case of using multiple maps for each domain

A common emerged lexicon was built during the simulations

SOM, Observation game

(continued)

Meaning Negotiation

9

Table 1. (continued) Ap

Model of Context

[14] Ontology

Negotiation classes

Weakness

Strengths

Technical

Argument basis

The protocols will become complex if multiple agents interact

- The communication without loss of information

Description Logic, Common vocabulary

- The soundness of information [18] Logic

Argument basis

[16] Ontology

Argument basis

[2]

Hybrid model Auction (logic and ontology)

The process for multi-party scenarios is very complex

The difficulty to find a dependency between ontologies

A multi-agent system consistent and adequate

A model Eggs/Yolk, multi-agent system

It allows agents to exchange factual and terminological knowledge

Multi-agent system

-Improve the process of meaning negotiation

Web service, Pragmatic Pattern

-Optimize the meaning negotiation process

4

Introduction to Our Work

The approach [2] is developed to achieve a merging between the individual context ontology and the domain ontology. Our work is to test the proposal of the model [2] by the use of the geopolitics domain ontologies for the example. We implemented the process of meaning negotiation [1,2] in the multi-agent system in the geopolitics domain. For this, we used the language Java J2EE and the Jade framework that simplifies the implementation of a multi-agent system. To merge between ontologies, we tried to implement a simple method. This method uses two similarities, the syntactic and semantic similarity. (1) The first similarity compares two strings of characters using Cosine similarity. (2) The second similarity uses the semantic dictionary Power-Thesaurus to link between the concepts. In the semantic dictionary, we find the synonym of the concept in ontology 1 by using the user vote. After that, we tried to find the syntactic similarities between the returned result from the dictionary and the ontology 2 by using Cosine distance. If the two previous conditions are verified, we merge the concepts. In the Power Thesaurus dictionary, the user can add new concept synonyms. For the relevance of these concept synonyms, each synonym will be rated by the user votes. 4.1

Example

In this section, we give an example of the meaning negotiation in the geopolitical domain between a cartographer and geopoliticians. The cartographer proposes to

10

D. D. Graba et al.

design a map that contains an external territory named Cocos Island dependents territory of Australia.

Fig. 2. All the ontologies need.

First, we define all the ontologies: cartographer ontologies, geopolitician ontologies, domain ontologies, and the broker ontologies (See Fig. 2). Then, we implement the model of meaning negotiation process in [1]. Figure 3 illustrates the 16 steps of the scenario before using the merge.

Fig. 3. The meaning negotiation before the merge.

The next step in the work is to merge the domain ontology with the geopolitician ontology. After merging the ontologies, we tried to re-implement the meaning negotiation process of [1] according to the proposed idea of [2].

Meaning Negotiation

11

Fig. 4. The meaning negotiation after the merge.

We distinguish that the number of steps in the scenario decreases from 16 steps to 12 steps (See Fig. 4). In this section, we conclude that the merging between ontologies can reduce the meaning negotiation process in the geopolitical domain.

5

Conclusion

This paper studies and compares the approaches in the field of meaning negotiation; it classifies them according to their contextual representation. It collects the different definitions of the basic concepts existing in the field and proposes some new definitions. It also briefly introduces our work and gives an example in geopolitics domain. Furthermore, in the future our contribution will be extended by the use of 30 domain ontologies. We will try to improve and validate the ontologies merging (domain with contextual ontologies) and generalize the model [2] for any domain. For this, a benchmark of 30 semantic ontologies will be used. To validate this proposition, we will use the static test KolmogorovSmirnov. The Internet of Things (IoT) is a new paradigm that provides multiple services between objects. These smart objects are interconnected in the internet network in a simple and transparent way [21]. It will be interesting to use our approach in the IoT domain.

References 1. De Moor, A.: Patterns for the pragmatic web. In: International Conference on Conceptual Structures, pp. 1–18. Springer, Heidelberg (2005)

12

D. D. Graba et al.

2. Keskes, N., Rahmoun, A.: Meaning negotiation based on merged individual context ontology and part of semantic web ontology. Int. J. Inf. Commun. Technol. 11(3), 352–368 (2017) 3. Paschke, A.: Pragmatic web 4.0. Towards an active and interactive semantic media web. W3C Aspect of Semantic Technologies (2013) 4. Singh, M.P.: The pragmatic web: Preliminary thoughts. In: Proceedings of the NSF-Onto Web-Workshop on Database and Information Systems Research for Semantic Web and Enterprises, pp. 82–90 (2002) 5. Dimaio, P.: The missing pragmatic link in the semantic web. Bus. Intell. Advisory Serv. 8(7) (2008) 6. Mustapha, S.S.: CoP sensing framework on web-based environment. In: Web-Based Support Systems, pp. 333–357. Springer, London (2010) 7. Warglien, M., G¨ ardenfors, P.: Meaning negotiation. In: Applications of Conceptual Spaces, pp. 79–94. Springer, Cham (2015) 8. Jmii, H., Meddeb, A., Chebbi, S.: An approach for improving voltage stability by combination of SVC and TCSC. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 134–141. IEEE (2016) 9. Chaari, T., Laforest, F., Flory, A.: Adaptation des applications au contexte en utilisant les services web. In: Proceedings of the 2nd French-Speaking Conference on Mobility and Ubiquity Computing, pp. 111–118. ACM (2005) 10. Abioui, H., Idarrou, A., Bouzit, A., et al.: Multi-ontology based semantic annotation review. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 189–193. IEEE (2016) 11. Arrar, A.: The role of contextual ontologies in enterprise modeling, world academy of science, engineering and technology. Int. J. Comput. Electric. Autom. Control Inf. Eng. 4(9) (2010) 12. Magnini, B., Serafini, L., Speranza, M.: Using NLP techniques for meaning negotiation. In: Proceedings of VIII Convegno AI* IA, Siena, Italy, pp. 11–13 (2002) 13. De Moor, A.: Ontology-guided meaning negotiation in communities of practice. In: Proceedings of the Workshop on the Design for Large-Scale Digital Communities at the 2nd International Conference on Communities and Technologies (C&T 2005), Milano, Italy (2005) 14. Van Diggelen, J., Beun, J., Dignum, F., Van Eijk, R.M., Meyer, J.J.: Ontology negotiation goals, requirements, and implementation. Int. J. Agent-Oriented Softw. Eng. 1(1), 63–90 (2007) 15. Ermolayev, V., Keberle, N., Matzke, W.E., Vladimirov, V.: A strategy for automated meaning negotiation in distributed information retrieval. LNCS, vol. 3729, p. 201 (2005) 16. Souza, M., Moreira, A., Vieira, R., et al.: Integrating ontology negotiation and agent communication. In: International Experiences and Directions Workshop on OWL, pp. 56–68. Springer (2015) 17. Farrugia, J.: Logical systems: Towards protocols for web-based meaning negotiation. In: Meaning Negotiation, Papers from the AAAI Workshop, pp. 56–59 (2002) 18. Burato, E., Cristani Matteo, M., Vigan` o, L.: Meaning negotiation as inference. arXiv preprint arXiv: 1101.4356 (2011) 19. Kutucu, H., Hakan, H., Almryad, A.: An application of artificial neural networks to assessment of the wind energy potential in Libya. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 405–409. IEEE (2016)

Meaning Negotiation

13

20. Lindh-Knuutila, T., Honkela, T., Lagus, K.: Simulating meaning negotiation using observational language games. In: Symbol Grounding and Beyond, pp. 168–179. Springer, Heidelberg (2006) 21. Benkerrou, H., Heddad, S., Omar, M.: Credit and honesty-based trust assessment for hierarchical collaborative IoT systems. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 295–299. IEEE (2016)

Framework for Managing the New General Data Protection Regulation Related Claims Sarah Bouslama1, Safa Bhar Layeb1(&), and Jouhaina Chaouachi2 1

ENIT, UR-OASIS, University of Tunis El Manar, 1002 Tunis, Tunisia [email protected] 2 IHEC LR-ECSTRA, University of Carthage, 2016 Carthage, Tunisia

Abstract. The new General Data Protection Regulation (GDPR) came into force on May 2018 compelling services companies and financial institutions to comply with the new standards of Data Protection. Thus, moving toward protection and confidentiality is a need and an obligation at the same time. Within such firms, Information Systems departments seek to develop effective tools for management, monitoring and supporting decisions that respect the standards and requirements of the GDPR. In this work, we have designed, developed and implemented, within a private financial organization, an Information and Decision Support System (DSS) to manage customer claims related to the GDPR. Our DSS includes a process of management, following up and reporting of complaints according to the standards of the new regulation of data protection. Keywords: Information system  Claim management Data Protection  Monitoring  Reporting

 GDPR 

1 Introduction Technological innovations, rise of privacy concerns and economic changes have tremendously contributed to changing the classical ground rules within the wide range of service fields. In this context, the relational approach is still considered as a strategic smart mean to develop a lasting relationship with the customers [1]. This paradigm has taken a large scale during the last decades and most organizations are providing an effort to find the best currents to build this relationship. Moreover, one of the major foundations of the relational approach is the best customer satisfaction. To this end, a company should ensure its commitment to its customers to achieve their trust and commitment. Mostly, service companies handle the collection and the use of their customer’s personal data, whether to manage particular risks or to provide appropriate commercial offers. Nowadays, the management of personal data is getting complicated as data is climbing in term of intensity, speed of transfer and accessibility, because of technological advances linked to the social networks, cloud storage etc. Dealing with this phenomenon, how to work on the right balance by exploiting data while ensuring their privacy protection as well as providing to customers the possibility to access their data and claim in case of attack or withdrawal consent for example? What access could such firms guarantee? To whom? And what would be the conditions? © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 14–23, 2020. https://doi.org/10.1007/978-3-030-21005-2_2

Framework for Managing the New General Data Protection

15

In this context, the new European regulation on data protection personal data, called the GDPR (General Regulations for Data Protection), appeared on the official journal of the European Union and entered into force on May 25, 2018 to respond to these issues and ensure protecting personal data. Meanwhile, Services companies should comply with the requirements of national legislation protection of personal data. Tunisia, as many countries around the world, should still be an area of trust and a destination for the relocation of European investments. This strategic position improves the employability of young people and foreign currency inflows. This is a boon for each country that seeks to become a hub for data processing. The objective of our work is to develop a flexible framework at a lower cost that ensures automation and improvement of customer claims management concerning the acquisition of their new rights relating to the GDPR while providing decision-makers with dashboards enabling them to evaluate these claims in order to effectively interact in the process of taking customer right’s decisions. This framework was implemented within a Tunisian private bank. Without loss of Generality, this solution is flexible, configurable and scalable to provide a level of service in line with the needs of the GDRP and propose an open solution to future developments in the field of personal data security. The remainder of this paper is organized as follows. Section 2 introduces the GDPR. In Sect. 3, we present some related works. Our contribution model is described in Sect. 4. Section 5 presents the adopted methodology. Section 6 illustrates some interfaces of the developed framework. Finally, Sect. 7 draws conclusions and provides avenues for future research.

2 The General Data Protection Regulation GDPR The General Data Protection Regulation is the new European regulation on the protection of data. It came into force on May 25, 2018 and influenced any company that deals with personal data processing [2]. This regulation is mainly based on three ambitious objectives: • Unifying the regulation of data protection, • Empowering the concerned companies, • Strengthen people’s rights: right of access, right of forgetting, right of portability, etc. Chawki Gaddes, the president of the National Instance of Protection of Personal Data in Tunisia states recently that “What will change is that the EU will protect itself more: that is to say that the data processed in Europe might not be transferred elsewhere, except to countries with sufficient protection of personal data. Thus, at the cutoff date, the EU will specify to which country the data transfer is allowed and to which others it is not. That’s what we are working on now to ensure we are not on this new blacklist, with which we will only be allowed to cooperate by resorting to complex procedures,” he explained (www.businessnews.com.tn, February 28, 2018). Accordingly, most countries would be on the European blacklist of the personal data protection. There are even European member countries that are not yet responding to the

16

S. Bouslama et al.

new deal. With the law currently being adopted, Tunisia is one of the few countries meet the European criteria in this area that will have a legislation in line with the European regulation. Indeed, the regulation also allows the right to be forgotten (erasure of data after a certain time), the dereferencing that forces the search engines to delete a link at the request of a data subject, the portability of data, etc. It is worthy to mention that this new regulation affects all companies having data exchanges with the European Union, such as banks, call centers, insurances institutions, clinics, to quote just a few. Thus, public or private structures will no longer be able to process European personal data, if they do not have a Data Protection Officer (DPO), otherwise there is a fines risk of up to 4% of their previous year turnover. This DPO will be responsible of treatments mapping that will allow him to know which procedure to follow for each treatment done internally. Another prerequisite, before handling personal data of a specific person, the concerned structures should ask him for an explicit, clear, written and well signified consent.

3 Related Works Over the last few years, claims management is catching the interest of both practitioners and researchers. Studies conducted on this subject are mainly based on customer’s behavior and expectations, their loyalty and satisfaction, also on customer relationship management and marketing axis in general. As far as the automation of claims management is concerned, the majority of studies are based on data mining techniques that are mainly used to collect data about customer complaints in order to extract some classification rules and then classify each claim into a particular cluster (e.g. [3]). These techniques allow intervention within the claims process and ultimately in the decision support process in response to these claims. Many studies were conducted to propose unifying frameworks in several fields (e.g. [4, 5]). In the following, we will detail major works in this subject matter of claim management systems. In [6], Chtitia provided an excellent review of processes allowing claims detection based on past claims, thereby embodies a proactive character. She proposed appropriate solutions to improve current processes so the experiences return mechanism is integrated within a process of proactive Management claims. The used approach is the REX approach based on two sub-processes: exploitation based on past claims, and capitalization based on the identification of information, documents, tips or persons useful for the complaints processing. Furthermore, [7] investigated the automatically processing claims from beneficiaries of the Families Allowance Cash (FAC) of Rhone, France. They propose some improvements of the current process and feedback integration within the claims management process for a proactive management. The mainly used approaches are data mining tools such as analysis of multiple matches, ascending hierarchical classification, latent Dirichlet allocation. Recently, Carneiro et al. [8] explored the detection of credit fraud by combining manual and automatic classification while comparing different methods of machine learning. The goals to achieve were the design and implementation of a fraud detection system and the combination of automatic and manual classifiers. The authors also used some data mining techniques, namely support sector machines, logistic regression, and random forests. Finally, Suryotrisongko et al. [9] have developed a public

Framework for Managing the New General Data Protection

17

claim service web application using springboot microservice architecture that was deployed in a Cloud environment. We notice that the overwhelming of claims management literature is based on data mining approaches, while only few works investigate practical perspectives of the decision-making process. It is also noteworthy that this trend is mostly due to the fact that most claims management studies were conducted in the field of marketing, communication and/or legal sciences. Thus, the shortage of specialized tools in complaints management presents a tremendous limitation for its applicability. Today, professionals declare an obvious and urgent need for the automation of claims process starting from their collection until the decisions related to their responses, especially under the new General Regulations on Data Protection. Actually, there are some tools that support the entire or a specific part of the claims management process, including commercial, open source and free software, usually called Customer Relationship Management (CRM) tools. We mention Microsoft Dynamics CRM and the enterprise resource planning SAP among the most popular CRM software (e.g. [10]). Despite their performance and robustness, some professionals still express their interest in leading improvements to increase their flexibility and ease of use. Moreover, these commercial solutions require relatively high costs while monitoring of specific types of customer claims that are not necessarily aligned with the GDPR, namely the right of access, the right of opposition, and the right of portability. Let’s remind here that the inventory management, planning and resource management modules in the mentioned solutions could useless compared to the need of some companies. More precisely, some pro-software provide a complete solution of business management while customer complaint management is only one module that should be restructured to ensure its consistent with the GDPR context.

4 Modeling In order to comply with GDPR standards and to manage customer claims regarding this new regulation, we propose the following six phases to model the GDPR claim management process: 1. Dissemination phase: This phase essentially consists of broadcasting to customers’ searchable information regarding the GDPR content. Each customer should be informed on the privileges provided by this new law about its personal data protection. In terms of claim management, this step has a basic form regarding customer experience because it is simply informative. But in the GDPR context, this phase embodies an essential value within the new law, the different rights and the new topic. 2. Collection Phase: Within this phase, the received complaints are collected. The collection frequency and the collected claims amount would be justified by the Data Protection Officer (DPO). 3. Treatment phase: During this phase, the DPO needs the proposed framework to input the necessary data for each submitted claims. 4. Transfer phase: This phase is dedicated to sending technical reports according to the appropriate customer rights, as well as litigation to working group managers to help them making the right decisions.

18

S. Bouslama et al.

5. Study phase: This is an in-depth study phase for requests received from the customers. It consists on examining case by case each customer features, namely economic, professional, family situation etc. as well as his data in the information system to finally reach a decision on his claim and grant him with the desired right. 6. Response phase: In this last step, the DPO must inform the customer about the taken decision (Fig. 1).

Fig. 1. The proposed model for the GDPR claim management process

5 Methodology 5.1

Functional Requirement Analysis

The analysis of functional requirement was conducted by examining the proposed model and looking at some similar applications and studies related to decision support systems. The actor requirement obviously constitutes functional requirement. In our case, the actor is the user of this software and is actually the DPO. Thus, we derived the following functional requirements: 1. Manage rights requests: Add right requests, Add an access right, Add an opposition right, Add a portability right, Add a right of consent withdrawal. 2. Change rights requests: Modify an access right, Modify an opposition right, Modify a portability right, Modify a right of consent withdrawal.

Framework for Managing the New General Data Protection

19

3. Search for rights requests: Search for a request for a right by ID, Search a request for a right by the receipt date. 4. View the application report: Consult the report of requests for the right of access, Consult the report of requests for the right of opposition, Consult the report of applications for the right of portability, Consult the report of requests for the right of consent withdraws. 5. Export a report as a PDF file: Export a report of requests for the right of access, Export a report of applications for the right of opposition, Export a report of applications for the right of portability, Export a report of requests for the right of consent withdrawal. 6. Manage litigation cases: Add a litigation case, Modify a litigation case, Delete a litigation case, Consult the list of litigation cases. 5.2

Designing Use Case

Based on the detailed functional requirement analysis described in Sect. 5.1, we derive the use case as displayed in Fig. 2. Actually, it shows the interaction of the DPO with the system.

Fig. 2. Use case diagram

5.3

Decision Support System Development

Once the functional requirement and the use case are established, we have used Java programming language to code and implement the system within aSpringboot framework that yields to a RESTful Web Service feature with an adequate Database

20

S. Bouslama et al.

connection. Thus, a tomcat server may be conveniently included and executed. It is worthy to mention that Springboot also supports the convenience of configuration dependency. Moreover, the front-end development is made using typescript programming language and the object relational mapping software Hibernate. 5.4

System Architecture

Our system has been implemented on the basis of three tier architecture that offers userfriendly and web-based interfaces coupled with secure access controls, as shown in Fig. 3.

Fig. 3. System Architecture

6 Interfaces In this section, we present some interfaces of the proposed decision support system. As it was implemented within a Tunisian private bank and as French is the commonly used language in this company, the interfaces were proposed in Frenchin order to facilitate the work of the bank employees [11]. Now, let’s begin by Fig. 4 that shows the authentication interface. Through this interface, the Data Protection Officer accesses to the system after authentication by entering its login and password (Fig. 5).

Fig. 4. Authentication interface

Fig. 5. Menu interface

Framework for Managing the New General Data Protection

21

After authentication, the DPO will be directed to the home page, where he could choose either to register the rights of the customers via the icon “Rights exercises”, or to consult the reports on rights requests via the “Reports” icon, or to manage “Litigation”. By choosing in the menu interface to add a right of access for example, the DPO will be headed to an add page form as shown in Fig. 6.

Fig. 6. Access right interface

If the DPO click on the “Litigation” icon on the homepage, he will be directed to the form displayed in Fig. 7. The officer will have a menu on the left side where he could choose to either add a litigation (received by a customer), or to consult the litigations list that already exists in the system database.

Fig. 7. Add litigation interface

22

S. Bouslama et al.

To consult the reports of the different types of rights, the DPO could click on the Report button and will be directed to the desired report after having passed through a menu as shown in Fig. 8. If he chose to consult the access rights report for example, he will be directed to the interface illustrated in Fig. 9. Provid-

Fig. 8. Report interface

readyto-use

ing

Fig. 9. Example of report in PDF

PDF reports supports the managers to have a dashboard similar vision. This data visualization is mandatory in order to make just-in-time right decisions.

7 Conclusion To take into account the rapid digital world challenges and the privacy risks rise for personal data, the new General Data Protection Regulation (GDPR) will be applied from May 25, 2018. Accordingly, the protection of European data subjects’ rights is improved, and the requirements from companies that process personal data are clarified too. Therefore, any company that process personal data should ensure the lawfulness of its processing. Regarding this challenging issue, we have modeled and developed an information and decision support system to manage customer claims related to the GDPR. Our flexible, configurable and scalable framework includes a process of management, monitoring and reporting of claims, compliant with the standards of the new data protection’s regulation. As an open solution, it is a customizable model poised to future developmentsin the field of personal data security.

References 1. Hall, D.T.: The Career Is Dead–Long Live the Career. A Relational Approach to Careers. The Jossey-Bass Business & Management Series. Jossey-Bass Inc. Publishers, 350 Sansome Street, San Francisco, CA 94104 (1996) 2. Tobin, P., McKeever, M., Blackledge, J., Whittington, M., Duncan, B.: UK financial institutions stand to lose billions in GDPR fines: how can they mitigate this? In: The British Accounting and Finance Association Scottish Area Group Conference, BAFA, Ed., Aberd (2017)

Framework for Managing the New General Data Protection

23

3. Caron, F., Vanthienen, J., Baesens, B.: Comprehensive rule-based compliance checking and risk management with process mining. Decis. Support Syst. 54(3), 1357–1369 (2013) 4. Khalfallah, N., Ouali, S., Kraiem, N.: A proposal for a variability management framework. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 113–120. IEEE, December 2016 5. Ouzayd, F., Tamir, M., Rhouma, Z.B.: Toward making decision model of drugs supply chain: case of Moroccan university hospital center. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 338–343. IEEE, December 2016 6. Chtita, F.: Processus proactif de gestion des réclamations basé sur l’approche de retour d’expérience. Doctoral dissertation, École Polytechnique de Montréal (2016) 7. Loudcher, S., Velcin, J., Forissier, V., Broilliard, C., Simonnot, P., Rhône-Alpes-CNAF, C. N.E.D.I., du Rhône, C.A.F.: Analyse des réclamations d’allocataires de la CAF: un cas d’étude en fouille de données. In: EGC, pp. 449–460 (2013) 8. Carneiro, N., Figueira, G., Costa, M.: A data mining based system for credit-card fraud detection in e-tail. Decis. Support Syst. 95, 91–101 (2017) 9. Suryotrisongko, H., Jayanto, D.P., Tjahyanto, A.: Design and development of backend application for public complaint systems using microservice spring boot. Procedia Comput. Sci. 124, 736–743 (2017) 10. Achargui, A., Zaouia, A.: Hosted, cloud and SaaS, off-premises ERP systems adoption by Moroccan SMEs: a focus group study. In 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 344–348. IEEE, December 2016 11. Filali, T., Chettaoui, N., Bouhlel, M.S.: Towards the automatic evaluation of the quality of commercially-oriented Web interfaces. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 237–245. IEEE, December 2016

Automatic Processing of Oral Arabic Complex Disfluencies Labiadh Majda1(&), Bahou Younès1,2, and Mohamed Hédi Maâloul1,2 1

Sfax University, Sfax, Tunisia [email protected], [email protected], [email protected] 2 Hail University, Hail, Saudi Arabia

Abstract. The present work is a part of the realization of the Arabic vocal server SARF (Bahou 2014). Indeed, in this paper we propose a numerical learningbased method for the oral Arabic complex disfluencies processing. The proposed method allows, from a pretreated and semantically labeled utterance, to delimit and label the conceptual segments of a spontaneous Arabic oral utterance. Then, it allows, from a segmented utterance, to detect and delimit the disfluent segments in order to correct them. The result of the implementation of this method is the Complex Disfluencies Processing Module (CDPM). For the evaluation of our CDPM, we found satisfactory results with an F-measure equal to 91.9%. After integrating the CDPM into the SARF system, we achieved an improvement of 11.88% in acceptable understanding and 3.77% in error rate. This improvement proves the effectiveness of the numerical learning-based method in the oral Arabic complex disfluencies processing. Keywords: Arabic speech understanding  Complex disfluencies processing  Spontaneous Arabic speech  Numerical learning-based method

1 Introduction In this paper, we are interested in the automatic understanding of the spontaneous Arabic speech. Indeed, the objective of this understanding is to solve the problems related to the spontaneity of the interaction and to the errors produced by the speech recognition system. Among these problems, we focus on the phenomenon of Arabic complex disfluencies that are due to the spontaneous nature of oral productions. Complex disfluencies usually appear in the dialogue as a correction of a segment of the utterance by another or as a repetition of a segment in the same utterance. According to our research, few studies have treated this phenomenon in the spontaneous Arabic speech among them; we quote the SARF (Serveur vocal Arabe des Renseignements sur le transport Ferroviaire) system (Bahou 2014). Thus, our work is to improve this system. Indeed, our goal is to propose a numerical learning-based method, instead of the symbolic one, for the processing of complex disfluencies in the SARF system. This method is implemented in our Complex Disfluencies Processing Module (CDPM) that we propose in this paper to study and to evaluate. © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 24–38, 2020. https://doi.org/10.1007/978-3-030-21005-2_3

Automatic Processing of Oral Arabic Complex Disfluencies

25

This paper is organized around six sections. The first section is devoted to the presentation of existing complex disfluencies processing approaches. In the second section, we illustrate the method that we propose for the Arabic complex disfluencies processing. An illustrative simulation example of our method is proposed in the third section. In the fourth section, we present the CDPM evaluation results. In the fifth section we present a comparison between the initial version of the SARF system and that improved by the integration of our CDPM. The sixth section is reserved for the conclusion and for the perspectives of this work.

2 Related Works In this section, we propose to dwell on the methods of the complex disfluencies processing. To address them, we have chosen to classify them into three approaches that we present as follows: the symbolic approach, the numerical learning-based approach and the hybrid approach. 2.1

Symbolic Approach

The symbolic approach is based on a syntactic and/or semantic analysis of the utterance to be treated (Bahou et al. 2017a). Several systems use this approach. Indeed, as a part of the understanding of the human language spoken in English, Ferreira et al. (2004) developed a model for the disfluencies processing in utterances such as repetitions, repairs and also breaks (uh, um). In fact, this model is based on the Tree-Adjoining Grammar (TAG) formalism. Also, in order to detect disfluencies (repairs, fillers and selfinterruption points) in conversational speech in English, Lease et al. (2006) extended this into a model based on a TAG of speech repair. They also enhanced this system with a small set of manually-built deterministic rules for detecting fillers and showing that repair and fill predictions could also be combined to detect self-interruption points. As a part of speech processing, Kaushik et al. (2010) developed algorithms to effectively suppress the pauses and repetitions of spontaneous speech. As a part of the understanding of spontaneous Arabic speech, Bahou et al. (2010) proposed a method for the Arabic disfluencies processing. This method is composed of three steps namely, conceptual segmentation, delimiting the disfluent segments and detection and correction of the disfluencies. The disadvantage of symbolic methods is, that, the use of human expertise is necessary and costly. Also, the symbolic approach is adapted to limited applications in a specific language. 2.2

Numerical Learning-Based Approach

The numerical learning-based approach extracts a set of knowledge from a large amount of data. The numerical approach is generally more robust with regard to the difficulties of the spontaneous oral utterances. In addition, these methods are easily transferable from one application or from one language to another. Among the works carried out on this approach we quote, in particular, the work of Kim (2004). The latter experimented with methods to automatically detect the limits of

26

L. Majda et al.

utterances, disfluencies and conversation changes in spontaneous speech transcriptions. Kim’s system has a two-steps architecture in which the utterance limits and speechinterrupt locations are predicted first and the disfluent regions and conversation changes are identified afterwards. Similarly, Qader et al. (2017) presented an innovative formalization of revisions, repetitions and pauses in order to allow the automatic addition of disfluencies in the input utterance of a speech-synthesis system. The purpose of this work is to make synthetic speech signals more spontaneous and expressive. Then, Honal (2003) presented a system that automatically corrects disfluencies (repetition, false start, edit term, speech marker, filled pause and interjection) in spontaneous dialogues. Indeed, Honal adopted some statistical machine-translation ideas for the task of disfluency correction. Honal’s system and statistical models are formed on texts in which the disfluencies are annotated manually. Then the system can perform the task of disfluency correction on arbitrary sentences. In parallel and similarly, we find the work of Christodoulides and Avanzi (2015), who presented a detailed annotation scheme and a modular automatic detection system for disfluencies such as pauses, repetitions and false starts, targeting the semi-automatic annotation of these phenomena in manuallytranscribed data of a spoken corpus in French. We note, however, that our method is based on this approach. In fact, we concluded, after some researches, that this approach has not been addressed in terms of automatic understanding of spontaneous Arabic speech. The disadvantage of this approach is that it requires a large volume of annotated corpus. 2.3

Hybrid Approach

The hybrid approach is a method that combines both symbolic and numerical learningbased approaches to take advantage of their benefits and avoid their disadvantages. Among the systems that based on this approach, we quote, as a part of the automatic assessment of child reading aloud, a European-Portuguese database of recording utterances and pseudo words. It has been collected and several types of disfluencies have been identified. In fact, in order to automatically process these utterances, Proença et al. (2017) developed a method based on two steps namely, a segmentation step and a disfluencydetection step. Some of the most common disfluencies are targeted for automatic detection such as false starts, repetitions and bad predictions. Then, Constant and Dister (2012) presented an automatic procedure allowing the recognition of compound words in a transcribed corpus of spoken French while detecting four types of disfluencies which are hesitations, immediate self-corrections, repetitions and primers of morphemes.

3 Proposed Method for Oral Arabic Complex Disfluencies Processing The proposed method is a numerical learning-based method whose mission is to deal with the problem of complex disfluencies. This method is integrated in the SARF system. In fact, we aim to improve the module of the literal understanding of the SARF system by applying a numerical learning-based method.

Automatic Processing of Oral Arabic Complex Disfluencies

27

The proposed method allows, from a pretreated and semantically-labeled utterance, to delimit and label the Conceptual Segments (CS) of the utterance. Then, it allows, from a segmented utterance, to detect and delimit the disfluent segments and then correct them. Figure 1 describes the steps of the our method.

Fig. 1. Steps of the proposed method (Labiadh et al. 2018a)

In what follows, we will focus, first, on the conceptual segmentation deployed and secondly, we present the complex disfluencies processing. 3.1

Conceptual Segmentation

Conceptual segmentation consists of two phases namely, the delimitation of conceptual segments and the labeling conceptual segments (Bahou et al. 2017b). Figure 2 showing the conceptual segmentation step.

28

L. Majda et al.

Fig. 2. Conceptual segmentation step (Boughariou et al. 2017)

3.1.1 Delimitation of Conceptual Segments The first phrase is to extract the conceptual segments composing the utterance without resorting to labeling. In fact, any word can have one of the following classifications: “New” (if the word is a triggering index of a new segment), “Upstream” (if the word belongs to the previous segment) or “Isolated” (if the word does not belong to any segment). In this phase, we chose the J48 learning algorithm. Indeed, to justify our choice, we tested a set of algorithms. Table 1. Comparison of different algorithms for delimitation of CS Algorithm BayesNet NaiveBayes SVM PART J48

Delimitation of conceptual segments rate 0,728 0,734 0,841 0,835 0,857

According to the Table 1, we conclude that the J48 algorithm gives the best result with an F-measure equal to 0.857. 3.1.2 Labeling of Conceptual Segments The second phase is to detect the type of each conceptual segment extracted in the previous phase.

Automatic Processing of Oral Arabic Complex Disfluencies

29

Thus, each conceptual segment will be notified by its learning vector. In this phase, we chose the J48 learning algorithm. To justify our choice, we tested a set of algorithms. The Table 2 shows the test results for each algorithm adopted. Table 2. Comparison of different algorithms for labeling of CS Algorithm BayesNet NaiveBayes SVM PART J48

Labeling of conceptual segments rate 0,835 0,868 0,883 0,884 0,885

According to the Table 2, we conclude that the J48 algorithm gives the best results. 3.2

Complex Disfluencies Processing

The complex disfluencies processing is composed of two phases namely, detection and delimitation of disfluent segments and correction of disfluent segments. The Fig. 3 describes the proposed phases for oral Arabic complex disfluencies processing.

Fig. 3. Complex disfluencies processing step

3.2.1 Detection and Delimitation of Disfluent Segments This phase is essentially based on a numerical learning-based approach to detect complex disfluencies (complex self-corrections and complex repetitions) in a

30

L. Majda et al.

spontaneous oral Arabic utterances. Its consists in detecting and delimiting the disfluent segments in a utterance. In fact, each combination must be classified as “None” (if there is no disfluency), “Complex-Self-Correction” (if there is a segment corrected by another one), or “Complex-Repetition” (if there is a repeated segment in the same utterance). To do that, first, all the possible combinations between the conceptual segments are treated. Secondly, in order to classify these combinations, the following set of learning criteria is used: • • • • • • •

Num_Segments: Number of segment components. Pos_Combination: Combination position. ExistMarkRectif: If there is a rectification marker. If_Identique: if CS1 = CS2. Segment_Label: Segment label. If_identique_label: If the label of CS1is the same as the label of CS2. If_Identique_Part: If both segments contain the same type of the labels.

Also, these criteria are applied to all combinations in order to obtain the learning vectors. The learning vectors obtained form, in fact, an input file for the learning algorithm. For this purpose, we have applied some algorithms to test the results. Our choice fell on the SVM algorithm. To justify our choice, the Table 3 shows the result of each algorithm. Table 3. Comparison of different algorithms for detection and delimitation of disfluent segments Algorithm NBTree BFTree J48 SVM NaiveBayes FT NBTree

Detection and delimitation of disfluent segments rate 0,970 0,971 0,972 0,974 0,944 0,971 0,970

After the detection and delimitation phase where we tested the existence of a disfluencies in the treated utterance, we proceed to the phase of the correction. 3.2.2 Correction of Disfluent Segments This phase consists in correcting the utterance which contains the disfluencies, more precisely, the disfluent segments. For the correction, the disfluent segments are described, according to an annotation similar to that proposed by Bear et al. (1992), as follows:

Automatic Processing of Oral Arabic Complex Disfluencies

31

• Reparandum (the segment that will be corrected later). • “optional” Interregnum (e.g. the rectification marker). • Repair (the part that corrects the Reparandum).

4 Example In order to explain the steps of the proposed method, we propose the utterance (1) as an example. ð1Þ

[twns AlY rfAhp drjp swsp AlY rfAhp drjp Alsfr vmn km]1 How much the price of travel comfort class to Sousse comfort class to Tunis 4.1

Conceptual Segmentation

• Delimitation of conceptual segments After applying the delimitation of conceptual segment phase, we obtained the conceptual segmentation of the utterance (1) showing by the utterance (2).

• Labeling of conceptual segments After the delimitation conceptual segments phase, we came to detect the labels of each conceptual segment, hence the utterance (3) shows the result of the labeling of the conceptual segments of the utterance (2).

1

The translate Arabic example is based on Buckwalter Transliteration.

32

4.2

L. Majda et al.

Complex Disfluencies Processing

• Detection and delimitation of disfluent segments The possible combinations between the conceptual segments of the utterance (3) are as follows: Combinations

Descriptions

Combination 1 : CS1+CS2 Combination 2 : CS1+CS3 Combination 3 : CS1+CS4 Combination 4 : CS1+CS5 Combination 5 : CS2+CS3 Combination 6 : CS2+CS4 Combination 7 : CS2+CS5 Combination 8 : CS3+CS4 Combination 9 : CS3+CS5 Combination10 : CS4+CS5

The result of this phase classifies the combination as a complex disfluency (i.e. a complex repetition) and classifies the combination as a complex disfluency (i.e. complex self-correction). The utterance (4) is the result of detecting and delimiting the disfluent segments of the utterance (3).

Automatic Processing of Oral Arabic Complex Disfluencies

33

• Correction of disfluent segments The corrections of complex disfluencies of the utterance (4) are as follows: – Complex self-correction:

– Complex repetition:

Subsequently, the Reparandum will be replaced by the Repair. The utterance (5) represents the utterance (4) corrected.

34

L. Majda et al.

5 Evaluation of the Proposed CDPM In order to properly evaluate our CDPM, we used some evaluation measures namely, Precision, Recall and F-measure. Indeed, the Recall measure represents the number of correctly corrected disfluencies in relation to the number of disfluencies found by the system. The Precision measure represents the number of correctly corrected disfluencies in relation to the number of disfluencies to be found. We found the satisfactory results with an F-measure equal to 91.9%. The Table 4 illustrates the obtained results. Table 4. CDPM evaluation results Recall Precision F-measure CDPM 89.14% 95% 91.9%

The failure cases of our module are mainly due to the errors of the semantic labeling of the conceptual segments of the utterance. Indeed, a labeling of erroneous conceptual segments can cause an error in the detection and the delimitation of disfluent segments. In this case, the CDPM will not be able to detect the disfluencies. Consider the following example: twns AlY SfAqs mn EAdY qTAr sryE qTAr sfr vmn mA hw What is the price of the trip by fast train normal train from Sfax to Tunis

In the utterance above, the segment “ “Train_Type_CS” while the segment “

[sryE qTAr] (Fast train)” is labeled [EAdY qTAr] (normal train)” is

Automatic Processing of Oral Arabic Complex Disfluencies

35

labeled “Train_Rank_CS”. For this, the CDPM cannot detect the complex selfcorrection disfluency in this utterance. Thus, our CDPM does not handle the enumeration case in the utterance. In this case, the enumeration structure becomes very close to the self-correction. For example:

AyAb *hAb t*krp w*hAb t*krp t*krtyn Hjz Oryd I want to reserve two tickets, one-way ticket and return ticket

The utterance above presents an enumeration of two types of tickets “ [* hAb t * krp] (one-way ticket)” and “ [AyAb * hAb t * krp] (return ticket)” that the CDPM considers as a case of complex self-correction.

6 Integration of CDPM in SARF System Our goal is to improve the module of literal understanding of the SARF system. The Fig. 4 illustrates, in this sense, the initial version and the improved version of the module of literal understanding.

Fig. 4. Initial version and the improved one of the literal understanding module

36

L. Majda et al.

We have integrated our CDPM into the literal understanding module of the SARF system. After this integration, we compared the initial version with the improved one. We further specify that the SARF evaluation is calculated taking into account the three scores proposed by (Bahou 2014): • Complete Understanding (CU): represents the number of utterances with a complete and correct semantic frames. In other words, to be much more explicit, a complete understanding generates semantic frames as filled by all the information contained in the utterances. • Incomplete Understanding (IU): represents the number of utterances with incomplete semantic frames (omission of information). • Erroneous Understanding (EU): represents the number of utterances with an incorrect semantic frames (bad identification of the type of semantic frames). That is, a misunderstanding generates incorrect semantic frames. Thus, the rate (CU + IU) indicates the number of utterances with an acceptable literal understanding and the rate (IU + EU) indicates the error rate of literal understanding. According to the Table 5, we found an improvement of 11.88% in the acceptable understanding and 3.77% in the error rate. Indeed, this improvement is mainly due to the application of a numerical learning-based approach. Table 5. Results of the comparison between the two versions CU IU EU CU + IU IU + EU Initial version 75.53% 6.91% 17.55% 82.44% 24.46% Improved version 79.29% 15.03% 5.66% 94.32% 20.69%

7 Conclusion We proposed in this paper, a numerical learning-based method for the oral Arabic complex disfluencies processing in order to improve the literal understanding of the SARF system. This method is composed of two main steps namely, the step of the conceptual segmentation of the utterance and the step of the complex disfluencies processing. Also, we exposed, in this paper, our CDPM which represents the implementation of this method with the Java programming language. For the evaluation of the CDPM, we obtained the satisfactory results with an F-measure equal to 91.9% and an improvement of 11.88% at the rate of the acceptable understanding and 3.77% at the error rate generated by the SARF system. As prospects of this work, we plan to test our method on various applications. Also, we intend to increase the size of the learning corpus to check the convenience and to detect failures of our method.

Automatic Processing of Oral Arabic Complex Disfluencies

37

References Bahou, Y.: Compréhension Automatique de la Parole Arabe Spontanée: Intégration dans un Serveur Vocal Interactif. PhD Thesis, Faculté des Sciences économiques et de Gestion de Sfax (2014) Bahou, Y., Maâloul, M.H., Abbassi, H.: Hybrid approach for conceptual segmentation of spontaneous Arabic oral utterances. In: 3rd International Conference on Arabic Computational Linguistics (ACLing’17), Dubai, UAE (2017a) Bahou, Y., Maâloul, M.H., Boughariou, E.: Towards the supervised machine learning and the conceptual segmentation technique in the spontaneous Arabic speech understanding. In: 3rd International Conference on Arabic Computational Linguistics (ACLing’17), Dubai, UAE (2017b) Bahou, Y., Masmoudi, A., Hadrich-Belguith L.: Traitement des disfluences dans le cadre de la compréhension automatique de l’oral arabe spontané. 28èmes Journées d’Études sur la Parole (JEP’10), Mons, Belgique (2010) Bear, J., Dowding, J., Shriberg, E.: Integrating multiple knowledge sources for detection and correction of repairs in human-computer dialog. In: Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics (ACL’92), Newark, Deleware (1992) Boughariou, E., Bahou, Y., Maâloul, M.H.: Application d’une méthode numérique à base d’apprentissage pour la segmentation conceptuelle de l’oral arabe spontané. In: Proceedings of the 29th International Business Information Management Association (IBIMA’17), Vienna, Austria (2017) Christodoulides, G., Avanzi, M.: Automatic Detection and Annotation of Disfluencies in Spoken French Corpora. Institute for Language & Communication, University of Louvain, Belgium DTAL, Faculty of Modern & Medieval Languages, University of Cambridge, UK (2015) Constant, M., Dister, A.: Les disfluences dans les mots composés. Journées sur l’Analyse des Données Textuelles (JADT’12), Belgium (2012) Ferreira, F., Lau, E., Bailey, K.: Disfluencies, language comprehension, and Tree Adjoining Grammars. Cogn. Sci. 28(5), 721–749 (2004) Honal, M.: Correction of Disfluencies in spontaneous speech using a noisy-channel approach. PhD thesis, University of Karlsruhe, Karlsruhe, Germany (2003) Kaushik, M., Trinkle, M., Hashemi-Sakhtsari, A.: Automatic detection and removal of disfluencies from spontaneous speech. In: School of Electrical and Electronic Engineering. The University of Adelaide, Adelaide, South Australia. C3I Division, Defense Science and Technology Organization, Edinburgh, South Australia (2010) Kim, J.: Automatic detection of sentence boundaries, disfluencies, and conversational fillers in spontaneous speech. PhD thesis, University of Washington (2004) Labiadh, M., Bahou, Y., Maâloul, M.H.: Complex disfluencies processing in spontaneous Arabic speech. In: Proceeding of International Workshop on Language Processing and Knowledge Management (LPKM’18), Sfax, Tunisia (2018a) Labiadh, M., Bahou, Y., Maâloul, M.H.: Numerical learning-based method for the automatic processing of complex disfluencies in spontaneous Arabic speech. In: Proceeding of Joint Conference on Computing (JCCO’18), Hammamet, Tunisia (2018b) Lease, M., Johnson, M., Charniak, E.: Recognizing disfluencies in conversational speech. IEEE Trans. Audio, Speech, Lang. Process. 14(5), 1566–1573 (2006)

38

L. Majda et al.

Proença, J., Lopes, C., Tjalve, M., Stolcke, A., Candeias, S., Perdigão, F.: Automatic evaluation of reading aloud performance in children. Department of Electrical and Computer Engineering. University of Coimbra, Portugal, (2017) Qader, R., Lecorvé, G., Lolive, D., Sébillot, P.: Ajout automatique de disfluences pour la synthèse de la parole spontanée: formalisation et preuve de concept. Traitement Automatique du Langage Naturel (TALN’17), Orléans, France (2017)

A Fuzzy Querying Using Cooperative Answers and Proximity Measure Aicha Aggoune(&) Department of Computer Science, LabSTIC Laboratory, University of 8th May 1945, PB 401, 24000 Guelma, Algeria [email protected] Abstract. The fuzzy queries represent a solution to enhance the bipolar behavior of the classical querying in relational databases (RDB) in order to deal with the empty answer problem. This problem is defined by the execution of queries that do not return any answer. So, the fuzzy queries provide the user with some alternative data when there is no response satisfies his or her query. The aim of this paper is to present a solution to the empty answer problem based on cooperative answers and the proximity measure for improving fuzzy querying in RDB. The proximity measure is based on the use of the Hausdorff distance as a similarity measure between a failing query (query with the empty answer) and some other successful ones whose answers are not empty. Our idea is to assign the closest query’s answers to the one that failed as cooperative answers. We propose four gradual operators based on usual ones between predicates for enhancing fuzzy queries. The experimental results show that our approach is a promising way for improving fuzzy queries in relational databases. Keywords: Fuzzy queries  Cooperative answers Proximity measure  Gradual operators

 Empty answer problem 

1 Introduction In response to a query, classical database systems return a list of tuples that exactly match the query. When searching for desired data, users might be confronted with an empty answer problem where the query asked does not return any answer [1]. Fuzzy queries have the main advantage to bridge part of this problem. They are emerged in the database field in the year 1977 by Tahani [2]. Using Fuzzy sets theory in database field is much more adaptable to create a new formalism of queries and better suited to handle imprecise requests, than classical querying [1–3]. This new formalism of query allows using fuzzy predicates in which user preferences can be expressed [4]. The result of the fuzzy query is a set of approximate answers returned to users in ranked order [5]. Accordingly, the expression of fuzzy queries is done by the adaptable query language able to satisfy user’s needs more closely. In fact, fuzzy languages are an extension of structured query language SQL that allow getting imprecise information from the database [6]. SQLf and FQUERY are two principal languages for building fuzzy queries based on fuzzy conditions which are supported fuzzy predicates (speed, low, near, etc.), gradual © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 39–49, 2020. https://doi.org/10.1007/978-3-030-21005-2_4

40

A. Aggoune

operators (similar to, significantly higher, etc.), linguistic modifiers (very, somewhat, really, etc.) and quantifiers [7, 8]. A query in SQLf fuzzy language has the following syntax [7]: SELECT [distinct] [n|t|n,t] < fuzzy predicates> FROM < relations> WHERE < fuzzy conditions> Although fuzzy queries represent a solution to extend the bipolar behavior of the classical querying, (data exactly satisfying the search criteria or nothing) in order to return a set of approximate answers rather than a set of exact ones [3]. In fact, fuzzy queries allow dealing with the empty answer problem when there are no data satisfying user’s needs. Despite advanced possibilities of fuzzy queries for dealing with empty answer problem, relational database management systems may not return approximate answers to the user. Fuzzy queries with empty answer problem in relational databases have received a great deal of attention from many researchers [9–11]. Some researchers are interested in the failing fuzzy query by the relaxation query or query reformulating or the addition of new constraints [12–15]. These approaches are called approaches guided by a query. Other approaches guided by a workload are focused on measuring the proximity between the failing query and a workload that containing previously submitted queries whose results are nonempty. This proximity measure aims at achieving the closest query from workload whose answers were used as cooperative answers to the failing query [16–20]. In our previous work, we have made a comparison study between the first category of approaches, which are guided by the query, and the second one guided by workload [21]. This study is presented in the following Table 1. Table 1. Comparison between related work [21]. Approaches guided by a query Centred on the initial query Modification of initial query Relaxation measuring Combinatory explosion Success rate about 41% Medium complexity Reduced memory space

Approaches guided by a workload Centred on previous queries Without modification of initial query Proximity measuring The proximity guarantees approximate answers Success rate about 62% Low complexity Large memory space

Other interesting studies related to the topic have focused on identifying imprecise information from the relational database using clustering algorithms such as; Fuzzy C Means clustering, Fuzzy C Medoids clustering and Mountain clustering [22–24]. A. Kowalczyk-Niewiadomy and A. Pelikant [25] have proposed a novel fuzzy clustering algorithm allows automatic fuzzy sets generation based on the existing data distribution for much more expressed fuzzy queries. However, these studies do not take into account the problem of the empty answer during the fuzzy query processing.

A Fuzzy Querying Using Cooperative Answers and Proximity Measure

41

This paper presents an approach based on cooperative answers and a proximity measure for improving fuzzy querying in order to deal with empty answer problem. We attempt to enhance workload-based approaches by the proposition of approach more efficient in terms of performance for detecting the closest query to failing fuzzy query with decrease both response time and memory space. The main idea of this work is to focus only on query conditions include in “where” and “having” clauses rather than on the full query. Our approach is based on efficient proximity measure allows us to identify the closest queries by measuring the Hausdorff distance between queries. Thus, we propose four gradual operators based on usual ones between predicates for building and handling fuzzy queries in traditional SQL language so that it can support such imprecise expressions. Finally, we assign the closest query’s answers to the one that failed as cooperative answers having the same importance degree, which is defined by the greatest proximity value, obtained between the failing query and the closest one. Following this introduction, the next section illustrates our measuring proximity between fuzzy queries. Section 3 shows how proximity can be used for improving fuzzy querying in order to deal with the empty answer problem. In Sect. 4, we summarize the experimental results with different analyses. The final section draws conclusions and suggests further research.

2 Proximity Measure Between Fuzzy Queries According to the complexity of the fuzzy query, we have distinguished two types of queries: atomic queries containing one fuzzy predicate and compounded queries which are composed of many fuzzy predicates related between them by conjunction or disjunction operators. In this topic and according to the related work, we take the following definitions: “The notation P is used to denote the fuzzy predicate based on fuzzy logic of an attribute A and which is represented by a membership function denoted by lP whose values are from the interval [0, 1]” [26]. “Let be lP a membership function of the fuzzy predicate P, we denote by lP(x), the degree of truth that x is an element of the attribute A” [27].

2.1

Proximity Measure Between Atomic Queries

Let P and Pʹ be two fuzzy predicates. The proximity between the failing fuzzy query Q = P and the fuzzy query from workload Qʹ = Pʹ is inversely proportional to the Hausdorff distance plus one. Our proximity measure is presented as follows [21]: ðQ; Q0 Þ ¼ 1=ð1 þ d2H ðQ; Q0 ÞÞ

ð1Þ

With the Hausdorff distance between Q and Qʹ obtained by the calculating the distance between the fuzzy predicates involved in these two queries. We distinguish two kinds of predicates: fuzzy predicates modeled by discrete fuzzy sets and other ones modeled by continues fuzzy sets. The following equation present the hausdorff distance

42

A. Aggoune

between two fuzzy predicates modeled by discrete fuzzy sets with T = {t1, t2,…, tm} is the set of the distinct membership values of P and Pʹ [28]. d2H ðP; P0 Þð

Xm

0

tidH ðPti; Pti ÞÞ= i¼1

Xm i¼1

ti

ð2Þ

Where Pti (resp. Pʹti) stands for the ti-level cut of P (resp. Pʹ) and dH (Pti, Pʹti) is the maximum of the distance from an element in P to its nearest neighbor in B. The dH (Pti, Pʹti) is given by the following equation [29]: n   dH Pti ; P0ti ¼ max supu2Pti infv2P0ti d ðu; vÞ; supv2P0ti infu2Pti d ðu; vÞg

ð3Þ

With d(u, v) is the Euclidian distance between u and v. Since, the more similar the queries are, the closer they are and the smaller the distance between them is. In the case of fuzzy predicates modeled by continues fuzzy sets, the formula (2) is modified as follows [28]: d2H ðP; P0 Þ ¼

Z1

0

2.2

  Z1   Z1 0 0 tdH Pi ; Pi dt= tdt ¼ 2 tdH Pi ; Pi dt 0

ð4Þ

0

Proximity Measure Between Compounded Queries

In our previous work, we are restricted compounded fuzzy queries to the conjunctive ones, which take the following form: Q = P1^…^Pk. Where the symbol ‘^’ stands for the “and” operator from SQL and Pi (i = 1, k) is a fuzzy predicate pertaining to the attribute Ai. Therefore, we are distinguished three possible cases of the relationship between two compounded queries [21, 30]: • Q and Qʹ cover exactly the same attributes, • Qʹ covers all attributes specified in Q, • Qʹ does not cover all attributes specified in Q. More details on this topic can be found in [21]. The proximity measure between compounded queries is generally calculated according to the following algorithm.

Algorithm Proximity measure (Q, Q’) Let T= Array of k integer; Begin i:=1 Integer; For i=1 to k do Begin Compute Dist(Pi, P'i); /* Dist is a Hausdorff distance Compute Prox(Pi, P'i); T[i]:= Prox(Pi, P'i); End for; Prox(Qi, Q'i)= Min 1..k T[i] ; Return Prox(Qi, Q'i) End

A Fuzzy Querying Using Cooperative Answers and Proximity Measure

43

The algorithm presented above explains how to calculate the proximity measure between two conjunctive queries Q and Qʹ. This measure is based on the use of Hausdorff distance for computing the proximity between each pair of predicates compounded these queries (see loop bloc). The final proximity is the minimum of proximity measures between predicates. In the case of disjunctive fuzzy queries, we apply the same process instead of calculating the smallest proximity value in proximity measures between fuzzy predicates; we select the greatest one.

3 Improving Fuzzy Querying In this paper, we extend our previous approach for dealing with empty answer problem in fuzzy queries by the proposition of gradual operators between predicates in order to express user’s need in traditional SQL. On the other hand, we associate to our approach a new method of fuzzy query preparation which attempts to detect fuzzy conditions in “where” and “having” clauses in order to reduce the memory space and simplify the measuring of proximity between queries. Firstly, it was necessary to remember our previous approach before presenting its extension. We denote by W(D) a workload of database D with Qʹ2W(D) is a previous query successfully executed by a system. Let |D(Q)| denotes the number of domains of the attributes specified in Q. We present our previous approach according to the following items [21]. 1. Partitioning the workload W(D) in three subsets according to the three cases previously cited. 2. We distinguish three cases: • If the first subset does not empty, for element Qʹ, estimate the proximity Prox(Q, Qʹ) and ranking in descending order of queries, • Else if the second subset does not empty, then we apply first point, • Else (the third subset does not empty), we apply first point. 3. Choose Qapp the closest query to Q, and affect its answers a same degree of membership, which takes as its value, the proximity measure between Q and Qapp named Prox (Q, Qapp). In this paper, we propose four gradual operators used to simplify the expression and handle fuzzy queries. The main advantage of these operators is to facilitate the expression of fuzzy queries using traditional operators of SQL. In the following, we present our gradual operators: more or less equal, at most, at least, best. More or less equal operator is represented through three operators: greater (>), less ( predicate a or attribute < predicate + a or attribute = predicate. Thus, we know that the result of operator OR is the ‘union’ of all results of its operands. For example, the condition processor more or less equal speed is interpreted by processor > speed a or

44

A. Aggoune

processor < speed + a or processor = speed. Where predicate a (resp. predicate þ a) can be achieved in the following way: According to the definition mentioned in [27], if a membership degree of a fuzzy predicate is represented by four values of trapezoidal curve (A, B, a, b), then the predicate a is interpreted as (A, B, a a; bÞ (resp. predicate þ a = (A, B, a ; b þ aÞÞ. At most operator is defined by equal and less operators represented as follows: “attribute at most predicate” is (attribute = predicate) or (attribute < predicate + aÞ: Hence, the result of this operation is the union of the result of attribute = predicate and the attribute < predicate + a: At least is interpreted by equal and greater operators as follows: “attribute at least predicate” is presented by (attribute = predicate) OR (attribute > predicate aÞ. Also, the result of this operator is a set of answers via these two operators. Best operator represents functional groups according to some qualitative criteria using Max and Count aggregate functions, for example: show the best CPU (Central Processing Unit) expressed by the following query:

SELECT * FROM CPU WHERE Best CPU

The query presented above is interpreted by the traditional SQL as follows: SELECT * FROM CPU WHERE CPU.Processor=(SELECT Max(CPU.Processor) FROM CPU) AND CPU.Size_CM IN(SELECT Max(CPU.Size_CM) FROM CPU) AND CPU.Size_HD IN (SELECT Max(CPU.Size_HD) FROM CPU);

The interpretation of these gradual operators using traditional SQL has been saved and we can apply them as usual operators. The user can adjust the parameters of the “best” a gradual operator according to the application. Furthermore, for enhancement of measurement proximity between failing query and queries of workload, we propose a method to prepare fuzzy queries in order to identify easily the closest query. 1. For each fuzzy query, transforming into a set of terms using a segmentation technique; 2. Select terms presented after “Where” clause without “group by” or “having” clauses or set operators (union, intersect, minus); 3. Select terms presented after “having” clause without “group by” clause or set operators; 4. For each term obtained in 2nd and 3rd steps, extract conditions and operators. 5. Save all conditions and operators. Our global algorithm to deal with empty answer problem in relational databases is described in the following algorithm.

A Fuzzy Querying Using Cooperative Answers and Proximity Measure

45

Algorithm Dealing empty answer(Q) Let W(D)={Q’1,…Q’n}; Begin Execute Q; If answers is non-empty then Begin If Q is not in W(D) then Begin Fuzzy query preparation; Update W(D); End Else exit; End Else Compute Proximity measure(Q, Q’); Select closest query Qapp; Return answer of Qapp; End

The algorithm begins by executing a fuzzy query Q and identifying if it has answers or not. If Q has answers then, we verify if it has already existed in workload or not. In the case where Q has answers and it does not exist in workload, a fuzzy query preparation method has been executed before uploading it in workload else, we apply our solution for dealing with empty answer problem. The first step attempts to calculate the proximity measure between failing query Q and each element of workload. After that, we select the closest query Qapp identified by the high proximity value. The closest query’s answers to Q are returned as cooperative answers having the same importance degree representing by proximity value between Q and Qapp.

4 Experimental Result To evaluate our extended approach; we use the same database which contains 3500 records of some technical characteristics of CPU and a workload of past queries containing 1000 queries which have been prepared by fuzzy query preparation method. According to our previous results in [21], we have acquired that the execution time increases for both predicates of a failing query and approximate queries number. We also have made a comparative study with two other approaches and the results show the good performance of our proposal for dealing with empty answer problem. The following figure represents the result obtained by comparing the performance of our extended approach with the previous one.

46

A. Aggoune

Fig. 1. Comparison results between our extended approach and previous one.

The experiment is based on the use of the same failing query Q and updating the workload. Q = Find the CPU with a fast processor and very small of central memory size. From Fig. 1, we can see that the execution time (time occupied for dealing with the empty answer problem) of the extended approach has been improved compared to the previous one. In fact, using the fuzzy query preparation method allows decreasing both response time and memory space. Thus, using query conditions for measuring proximity rather than full query allows identifying easily the closest query. On the other hand, it is necessary to validate our four gradual predicates (More or less equal, At least, At most and Best) in order to confirm the effectiveness of our extended approach in terms of avoiding the problem of the empty answer. So, adapting the parameters of the gradual predicates to the user preference makes it possible to minimize the empty answer problem. The following table presents the results for 100 users (Table 2). Table 2. Evaluation of our gradual operators Users 1–10 11–21 22–32 33–43 44–54 55–65 66–77 78–88 89–100

Fuzzy queries Find best CPU Find CPU with the price is more or less equal 50000 and processor is at leat fast For each designation, select CPU with Ram memory is more or less equal very small For each designation, select CPU with Ram memory is more or less equal to 2 go and fast processor. Find at most expensive CPU Find CPU where memory is at least 300 go. Find CPU where memory is very small Find CPU where price is less expensive and processor is at least medium. Find CPU with large size memory and Ram memory.

Answers 156–400 100–234 80–150 120–130 300–400 80–150 360–390 450–470 389–400

A Fuzzy Querying Using Cooperative Answers and Proximity Measure

47

We attempt to estimate the precision and recall metrics according to two cases: • Fuzzy querying for the regular database, • Fuzzy querying addressed to databases containing fuzzy predicates. The precision is the ratio of the number of approximate retrieved records AR to the total number of retrieved records A. The recall is the ratio of the number of approximate retrieved records AR to the total number of records, which judged approximately by the user AJ. Thus, the following formulas illustrate these metrics: Precision ¼ AR =A

ð5Þ

Recall ¼ AR =AJ

ð6Þ

The precision measured independently from the recall and the opposite is not significant. To review the results effectively, we calculate the pair measures to each retrieved records, which means that the interpolated curve of precision according to the recall is decreasing. The following table represents the precision and recall for the first height tuples in two cases quoted above, for which the collection contains five and six best approximate answers (AJ Þ respectively (Table 3). Table 3. The results of recall and precision according to two cases. Row of A Fuzzy query in case 1 AJ Recall Precision 1 – 1.0 0.35 2 – 0.68 0.51 4 – 0.75 1 5 0.40 0.51 6 0.38 0.51 7 0.27 0.51 8 0.25 0.51

Fuzzy query in case 2 AJ Recall Precision – 1.0 0.36 – 1.0 0.50 – 1.0 0.65 – 0.80 0.64 – 0.78 0.70 0.5 0.57 0.4 0.50

The results presented above show that more the curve is higher; more the system is performing well. In this experiment, we need a high recall and reasonable precision. It is easy to see, that this experiment gives better results for improving fuzzy querying.

5 Conclusion The approach based on cooperative answers and proximity measure for improving fuzzy querying is proposed. It contributes to deal with the empty answer problem by assigning to the failing query a set of cooperative answers of closest queries, which are more convenient than an empty answer. We assign these cooperative answers the same importance degree represented by the proximity value between these two queries.

48

A. Aggoune

References 1. Pivert, O., Jaudoin, H., Brando, C., Hadjali, A.: A method based on query caching and predicate substitution for the treatment of failing database queries. In: Bichindaritz, I., Montani, S. (eds.) Case-Based Reasoning. Research and Development. LNCS, vol. 6176, pp. 436–450. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14274-1_32 2. Tahani, V.: A conceptual framework for fuzzy query processing—a step toward very intelligent database systems. Inf. Process. Manage. 13(5), 289–303 (1977). https://doi.org/ 10.1016/0306-4573(77)90018-8 3. Marín, N.: Intelligent Fuzzy Information Systems: Beyond the Relational Data Model. World Scientific (2007) 4. Bowman, D., Ortega, R.E., Linden, G., Spiegel, J.R.: Identifying the items most relevant to a current query based on items selected in connection with similar queries. Google Patents (2001) 5. Muiño, D.P.: Measuring and repairing inconsistency in knowledge bases with graded truth. Fuzzy Sets Syst. 197, 108–122 (2012). https://doi.org/10.1016/j.fss.2011.10.01 6. Blanco, I.J., Martin-Bautista, M.J., Pons, O., Vila, M.A.: A tuple-oriented algorithm for deduction in a fuzzy relational database. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 11(1), 47–66 (2003). https://doi.org/10.1142/s0218488503002260 7. Tamani, N., Liétard, L., Rocacher, D.: Bipolar SQLf: a flexible querying language for relational databases. In: Christiansen, H., De Tré, G., Yazici, A., Zadrozny, S., Andreasen, T., Larsen, H.L. (eds.) Flexible Query Answering Systems. FQAS 2011. LNCS, vol. 7022, pp. 472–484. Springer, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-247644_41 8. Kacprzyk, J., Zadrożny, S., Ziołkowski, A.: FQUERY III+: a “human-consistent” database querying system based on fuzzy logic with linguistic quantifiers. Inf. Syst. 14(6), 443–453 (1989). https://doi.org/10.1016/0306-4379(89)90012-4 9. Fredj, I.B., Ouni, K.: Fuzzy k-nearest neighbors applied to phoneme recognition. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications, pp. 422–426. Tunisia (2016). https://doi.org/10.1109/setit.2016. 7939907 10. Nevzorova, O., Mukhamedshin, D., Galieva, A., Gataullin, R., Nevzorova, O., Gataullin, R.: Corpus management system: semantic aspects of representation and processing of search queries. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications, pp. 285–290. Tunisia (2016). https://doi.org/10.1109/ setit.2016.7939881 11. Toujani, R., Akaichi, J.: Fuzzy sentiment classification in social network Facebook’ statuses mining. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications, pp. 393–397. Tunisia (2016). https://doi.org/10.1109/ setit.2016.7939902 12. Liao, H., Xu, Z., Zeng, X.-J.: Distance and similarity measures for hesitant fuzzy linguistic term sets and their application in multi-criteria decision making. Inf. Sci. 271, 125–142 (2014). https://doi.org/10.1016/j.ins.2014.02.125 13. Bhatia, S., Majumdar, D., Mitra, P.: Query suggestions in the absence of query logs. In: 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 795–804. ACM, (2011). https://doi.org/10.1145/2009916.2010023 14. Bosc, P., HadjAli, A., Pivert, O.: Weakening of fuzzy relational queries: an absolute proximity relation-based approach. Mathw. Soft Comput. 14(1), 35–55 (2007)

A Fuzzy Querying Using Cooperative Answers and Proximity Measure

49

15. Caha, J., Dvorský, J.: Querying on fuzzy surfaces with vague queries. In: Pan, J.-S., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds.) Hybrid Artificial Intelligent Systems. HAIS 2013. LNCS, vol. 8073, pp. 548–557. Springer Berlin Heidelberg (2013). https://doi.org/10.1007/978-3-642-40846-5_55 16. Ioannidis, Y.E., Poosala, V.: Histogram-based approximation of set-valued query-answers. In: 25th International Conference on Very Large Data Bases, pp. 174–185. USA, (1999) 17. Cormode, G., Garofalakis, M.: Sketching streams through the net: distributed approximate query tracking. In: 31st International Conference on Very Large Data Bases, pp. 13–24. ACM, Norway (2005) 18. Nitsche, M., Nürnberger, A.: Vague query formulation by design. In: EuroHCIR, pp. 83–86. (2012) 19. Perera, K.S., Hahmann, M., Lehner, W., Pedersen, T.B., Thomsen, C.: Modeling large time series for efficient approximate query processing. In: Liu, A., Ishikawa, Y., Qian, T., Nutanong, S., Cheema, M.A. (eds.) Database Systems for Advanced Applications DASFAA 2015. LNCS, vol. 9052, pp. 190–204. Springer, Cham (2015). https://doi.org/10.1007/9783-319-22324-7_16 20. Smits, G., Pivert, O., Hadjali, A.: Fuzzy cardinalities as a basis to cooperative answering. In: Pivert, O., Zadrożny, S. (eds.) Flexible Approaches in Data, Information and Knowledge Management. Studies in Computational Intelligence. LNCS, vol. 497, pp. 261–289. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-00954-4_12 21. Aggoune, A., Bouramoul, A., Kholladi, M.K.: Approximate flexible queries using Hausdorff distance. In: 2nd International Symposium on Modelling and Implementation of Complex Systems. Constantine, Algeria (2012) 22. Yager, R.R., Filev, D.P.: Approximate clustering via the mountain method. IEEE Trans. Syst. Man Cybern. 24(8), 1279–1284 (1994). https://doi.org/10.1109/21.299710 23. Chu, S.-C., Roddick, J.F., Pan, J.-S.: An incremental multi-centroid, multi-run sampling scheme for k-medoids-based algorithms. WIT Transactions on Information and Communication Technologies 28 (2002) 24. Sujatha, K., Keerthana, P., Priya, S.S., Kaavya, E., Vinod, B.: Fuzzy based multiple dictionary bag of words for image classification. Procedia Eng. 38, 2196–2206 (2012). https://doi.org/10.1016/j.proeng.2012.06.264 25. Kowalczyk-Niewiadomy, A., Pelikant, A.: Processing imprecise database queries by fuzzy clustering algorithms. In: Position Papers of the 2015 Federated Conference on Computer Science and Information Systems, pp. 31–38 (2015). https://doi.org/10.15439/2015f1 26. Bosc, P., Pivert, O.: On four noncommutative fuzzy connectives and their axiomatization. Fuzzy Sets Syst. 202, 42–60 (2012). https://doi.org/10.1016/j.fss.2011.11.005 27. Benferhat, S., Grant, J.: Scalable Uncertainty Management, vol. 6929. Springer-Verlag Berlin Heidelberg. LNAI, USA (2011). https://doi.org/10.1007/978-3-642-23963-2 28. Chaudhuri, B.B., Rosenfeld, A.: A modified Hausdorff distance between fuzzy sets. Inf. Sci. 118(1–4), 159–171 (1999). https://doi.org/10.1016/S0020-0255(99)00037-7 29. Fell, J.M.: A Hausdorff topology for the closed subsets of a locally compact non-Hausdorff space. Proc. Am. Math. Soc. 13(3), 472–476 (1962). https://doi.org/10.2307/2034964 30. Aggoune, A., Bouramoul, A., Kholladi, M.K.: A New semantic proximity measure for fuzzy query optimization in relational databases. In: 1st International Conference on Pattern Analysis and Intelligent Systems (2015)

The Role of Named Entities in Linking News Articles During Preservation Muzammil Khan1(B) , Arif Ur Rahman2 , Muhammad Ullah1 , and Rashid Naseem1 1

City University of Science and Information Technology, Peshawar, KP, Pakistan [email protected], {m.ullah,rashid}@cusit.edu.pk 2 Bahria University, Islamabad, Pakistan [email protected]

Abstract. In the recent, the World Wide Web has become a platform for online news publications. Many sources started publishing digital versions of news articles online to vast users through a variety of devices, i.e. television channels, magazines, and newspapers. It is observed that the news articles available can be very huge and recommendation systems can help to recommend relevant news to the news readers by filtering news articles based on some predefined criteria or similarity measure, i.e. collaborative filtering or content-based filtering approach. The paper presents named entities based similarity measure for linking digital news stories published in various newspapers during the preservation process in a digital news stories archive to ensure future accessibility. The study compares the similarity of news articles based on human judgment with a similarity value computed automatically using the proposed technique. The results are generalized by defining a threshold value based on multiple experimental results using different datasets of different size.

Keywords: News archiving Similarity measure

1

· News preservation · Linking news ·

Introduction

The news generation in the digital environment is no longer a periodic process with a fixed single output like the printed newspaper. The news are instantly generated and updated online in a continuous fashion. However, because of different reasons like the short lifespan of digital information and speed of generation of information, it has become vital to preserve digital news for the long-term. Digital preservation includes various actions to ensure that digital information remains accessible and usable, as long as they are considered important [2]. Libraries and archives preserve newspapers by carefully digitizing collections as newspapers are a good source of knowing history. Many approaches have been developed to preserve digital information like the model migration approach for database preservation and preservation of research data [4,20]. The lifespan of c Springer Nature Switzerland AG 2020  M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 50–58, 2020. https://doi.org/10.1007/978-3-030-21005-2_5

Linking News Articles

51

news stories published online vary from one newspaper to another, i.e. from one day to a month or even more. Though, a newspaper may be backed up and archived by the news publisher or national archives, it will be difficult to access particular information published in various newspapers about the same new. The issues become more complicated if a story is to be tracked through an archive of many newspapers, which require different access technologies. To facilitate the accessibility of news articles preserved from multiple sources, some mechanisms need to be adopted for linking the archived digital news articles. The Common Ratio Measure for Stories (CRMS) technique [12] is introduced to manipulate the terms appearing in the news articles to linking digital news stories during preservation process in the Digital News Stories Archive (DNSA) [10]. A news article contains many terms, i.e. noun (also known as named entity), verb, adverb, etc. The term Named Entity (NE) was introduced in 6th Message Understanding Conference (6th MUC) and mostly appeared concept in Natural Language Processing (NLP) applications [1,7]. NE plays important role in information management in many domains and Named Entity Recognition (NER) is the process to identify, extract and classify NE from textual contents. For example, NER is vital in different areas like opinion mining, populating ontologies, semantic annotation, localizations, personalization, question & answering, classify news contents, help to design efficient search algorithms, powering content recommendation, customer support, research articles customization, etc., and use for many other application domains [8,16,21]. Keeping in view the role and importance of NE in textual contents, it is decided to see, how vital are these NE appears in the news articles and how these NE can be utilized in linking digital news articles during preservation process in DNSA. In the article, a new NE based linking mechanism is introduced to link news stories in DNSA. The approach in empirically analyzed and the results are compared to get conclusive arguments.

2

Background

The news readers read about a happening or an issue from various sources in order to get a broader perspective and diverse viewpoints that help to better understand the world around, and some time to authenticate the information itself by comparing similar news from multiple news sources. This article is the continuation of Digital News Story Preservation (DNSP) framework studies [9,13]. In DNSP, an extraction tool Digital News Stories Extractor (DNSE) is developed for extracting news articles from multiple online news sources and to normalized news articles to a common format and created DNSA [10]. The DNSA has news articles from multiple sources, needs to create a mechanism that helps the reader to read a set of relevant news stories about an event or issue. The DNSA needs an efficient mechanism to link the digital stories and recommend to the readers. The CRMS technique is introduced to facilitate the linking mechanism by manipulating terms appearing the news articles [12], but other terms like noun, verbs and adverbs may also play important role in similarity computation among news articles. As named entities play a very important

52

M. Khan et al.

role in information management and the identification and extraction of NE from textual information is still a challenging task. Therefore, to find out the utility of NE in linking news articles, is considered and manipulated in such a way that helps to compute the similarity between news articles. The NE is used for different purposes in literature. For example, linking multiple versions of a news story into group using salience NE to reduce reader’s cognitive load [6], a rule-based NE recognizer in semantic retrieval architecture is used for Turkish news videos, priorly annotated with corresponding named entities in textual news transcripts [15], a news event tracking technique based NE is proposed for patterns like When, Where, What and Who with their relationship with news categories like news related to Economy, Politics, Entertainment and Sports, etc. [17], duplicate news detection using NE [22], semantic similarity of NE, which extracted from news body are combined with lexical similarity function introduced post-click recommendation “TULIP” for information retrieval systems [14], etc. There are some corpus management systems, i.e. Sketch Engine, Manatee, EXMARaLDA, etc., and some language specific corpus management system, e.g. Tatar corpus management system for Turkish language [18]. The NE may help to classify and indexing textual documents for many purposes in different languages, e.g. Arabic [5] or more similar language Urdu, etc., and with other textual collection like newspaper collection exist around the world in many different languages [11]. Here, in the articles we are proposing a similarity measure based on weighted named entities in the news articles for horizontal linkage of news during preservation and creation of DNSA.

3

Similarity Based on Named Entities in Stories

To link digital news articles in DNSA based on the similarity between news articles. The terms play a very important role that helps to identify relevant news from the archive by processing the terms, presented in the news articles. Nouns, verbs, adverbs, etc. are different terms that in combination present the news contents. In academic journals, the nouns are considered to be the main key phrases [19] but other terms like verbs and adverbs, etc. also play a vital role in representing the news articles [3,23]. Therefore, “Similarity-based on Named Entities in Stories (SNES)” based on similar named entities mentioned in the news articles is introduced. 3.1

Algorithm

The SNES algorithm pseudo-code is given as follows; Step 1: News article pre-processing Filtering non-news contents and extracts the news article from the news webpage

Linking News Articles

53

Step 2: Compute Term Frequencies 2.1 Tokenize news articles 2.2 Remove stop words 2.3 Calculate term frequencies of each term in the news articles Step 3: Compute CT (Common Terms), UT (Uncommon Terms) and TT (Total Terms) 3.1 Compute CT Count Select all common terms in both the news articles with frequencies n  (tf1 +tf2 )Wi CT= i=1

// CT = (tf1 +tf2 )W1 +(tf1 +tf2 )W2 +...+(tf1 +tf2 )Wn Where W1 is the 1st common term or word, W2 is the 2nd common term or word in both the selected news articles and so on, Wi is the common term or word in both the news articles, tf1 term frequency of word W in one news, tf2 is term frequency of word W in other news article and n is the total number of common terms in both the news 3.2 Compute UT Count Select all uncommon terms in both the news articles with frequencies m  (tf1 ∨ tf2 )Wj UT = j=1

// UT = (tf1 ∨ tf2 )W1 +(tf1 ∨ tf2 )W2 +...+(tf1 ∨ tf2 )Wm Where m is the total number of uncommon terms in both the news 3.3 Compute TT Count The total terms in both the news stories can be calculated as TT = UT + CT Step 4: Identify Named Entities (if any) 4.1 Extract common named entities form the stories 4.2 Normalized named entities n  (TF of Named Entities)/News Articles Length (TT) i=1

4.3 Assigned absolute weight to normalized value (calculated in 4.2) n  [ (TF of Named Entities)/News Articles Length (TT)] + n i=1

Where n is number of common named entities Step 5: Compute SNES Normalized CT + Normalized weighted named entities (calculated in 4.3) that n  (TF of Named Entities)/News Articles Length (TT) + n] is, (CT/TT) + [ i=1

54

M. Khan et al.

If there are no common named entities among news articles then the similarity of the news directly depends on the common ratio that is, CT/TT and the maximum value of SNES will be 1. Similarly, if maximum possible named entities are in common that is, “TF of common Named Entities” is equal to News Articles Length (TT) then the maximum value of SNES will be n+2, where n is the number of common named entities. For example, let ns1 (news 1), ns2 and ns3 are news articles and for ns1 & ns2 CT/TT = 100/160, having no common named entities and SNES = (100/160) + [(0/160) + 0] = 0.625, for ns1 & ns3 CT/TT = 100/160, having two common named entities and SNES = (100/160) + [(8/160) + 2] = 2.675, means that ns1 & ns3 are more similar than ns1 & ns2 and the SNES value can increased upto 4 that is, n + 2.

4

Evaluation

The proposed approach “SNES” for similarity between news articles based on common named entities in news articles are analyzed on different sets of news articles. Datasets overview is summarized in Table 1. Table 1. Overview: Datasets of News Articles Used for Evaluation S.No

News Articles Similarity Observed No. of News No. of No. of News- During By News By Proposed Articles/Set Sets papers Selection Reader Expert Measures

1

3

3

3

Yes

No

No

Yes

2

10

3

3

Yes

Yes

Yes

Yes

3

30

1

9

Yes

No

No

Yes

4

215

1

3

No

No

No

Yes

5

5.3 k

1

10

No

No

No

Yes

4.1

Results and Discussion

For each set of news articles, the similarity is computed and measured by two means, that is, empirically (User-based) and Automatic (using algorithm SNES). The Tables 2, 3 and 4 show the summary of the values computed for evaluation and the similarity between news articles are compared. 4.2

Precision and Recall

To measure the precision and recall of the proposed similarity measure “SNES”, the experiment is performed on a dataset of 30 news articles, extracted from nine different news sources and divided into six different topics based on the similar news on the same topic. The similarity is observed during the selection

Linking News Articles Table 2. Similarity Comparison (Likert Scale) with SNES for Set 1 Readers

Expert

SNES Similarity

News1 News2 Mean

News2 Value

SNESV alue

ns1

ns3

4.7

ns3

5

ns3

12.722

ns1

ns8

4.3

ns8

5

ns8

9.54

ns1

ns5

3.6

ns5

4

ns7

8.452

ns1

ns10

2.9

ns10

4

ns10

8.416

ns1

ns7

2.7

ns7

4

ns5

7.335

ns1

ns9

2.6

ns9

4

ns9

4.266

ns1

ns4

2.4

ns4

4

ns4

4.251

ns1

ns6

1.4

ns6

1

ns2

4.193

ns1

ns2

1.3

ns2

1

ns6

3.195

Table 3. Similarity Comparison (Likert Scale) with SNES for Set 2 Readers

Expert

SNES Similarity

News1 News2 Mean

News2 Value

SNESV alue

ns1

ns5

4.6

ns5

5

ns5

10.439

ns1

ns9

4.3

ns9

4

ns7

6.208

ns1

ns10

3.4

ns10

4

ns8

5.233

ns1

ns7

1.7

ns7

1

ns2

4.212

ns1

ns6

1.6

ns6

1

ns6

4.167

ns1

ns2

1.4

ns2

1

ns10

3.213

ns1

ns3

1.4

ns3

1

ns9

3.173

ns1

ns8

1.4

ns8

1

ns3

2.109

ns1

ns4

1.2

ns4

1

ns4

1.048

Table 4. Similarity Comparison (Likert Scale) with SNES for Set 3 Readers

Expert

SNES Similarity

News1 News2 Mean

News2 Value

SNESV alue

ns1

ns3

4.4

ns3

5

ns3

35.901

ns1

ns7

4.1

ns2

5

ns2

24.601

ns1

ns2

3.7

ns5

5

ns4

21.563

ns1

ns4

3.7

ns7

4

ns5

2.593

ns1

ns5

3.6

ns4

4

ns7

18.582

ns1

ns8

3.5

ns8

4

ns8

15.447

ns1

ns9

3.4

ns9

4

ns6

12.388

ns1

ns10

2.9

ns10

4

ns10

11.365

ns1

ns6

2.4

ns6

4

ns9

8.301

55

56

M. Khan et al. Table 5. News Articles Distribution in 30 News Articles Dataset S.No

Topic

Topic 1

Disruptive passenger in PIA at Heathrow 6 London

No. of News

Topic 2

Trump Travel Ban

Topic 3

CPEC

5

Topic 4

Nurses Protest in Karachi

4

Topic 5

Earthquake in Baluchistan

5

Topic 6

LoC Ceasefire Violation

5

5

Table 6. Precision and Recall for SNES S.No

Topic

Topic 1

Disruptive passenger in PIA at Heathrow 100% London

Precision Recall 100%

Topic 2

Trump Travel Ban

80%

100%

Topic 3

CPEC

80%

100%

Topic 4

Nurses Protest in Karachi

60%

100%

Topic 5

Earthquake in Baluchistan

80%

100%

Topic 6

LoC Ceasefire Violation

80%

100%

Fig. 1. Performance Comparison of SNES for Sports News

of news article for the experiments as overviewed in Table 5. The performance of “SNES” for 30 news articles dataset is shown in Table 6. To assess the effectiveness of each introduced text-based similarity measures has been evaluated individually by considering different size (small to large)

Linking News Articles

57

datasets. To further evaluate the measures, the dataset is enlarged to 5.3k and evaluated based on the type of news articles (based on categories). The results are compared with two known text-based similarity measures, i.e. Cosine Similarity Measure (CSM) and Extended Jaccard Coefficient (EJC) as well as with CRMS. The SNES presented good results for Sport news articles because of the dominance of named entities in the sports news as compared to other news categories as shown in Fig. 1.

5

Conclusion and Future Work

The Similarity Measure based on Named Entities (SNES) is content-based technique for linking news articles during preservation process from the original source to digital news stories archive for ensuring future accessibility of the archived contents. The SNES produced good results for different datasets as compared to the CRMS, CSM and EJC measures and will help to extract relevant news articles from an enormously huge corpus of news articles archived in DNSA in the future, especially in sports news domain. Currently, the work is going on to design alternative content-based similarity measures using various features like named entities appear in news headings, weighted terms and the position of terms used in the heading of news articles. The DNSA will also be extended to archive normalized Urdu language news articles and develop similarity measures for crossed lingual news articles linkage in the archive.

References 1. Borrega, O., Taul´e, M., Antø’nia Martı, M.: What do we mean when we speak about named entities. In: Proceedings of Corpus Linguistics (2007) 2. Burda, D., Teuteberg, F.: Sustaining accessibility of information through digital preservation: a literature review. J. Inf. Sci. 39(4), 442–458 (2013) 3. Chun, D.: On indexing of key words. Acta Editologica 16(2), 105–106 (2004) 4. da Silva, J.R., Ribeiro, C., Lopes, J.C.: A data curation experiment at u. porto using dspace (2011) 5. El Bazzi, M.S., Mammass, D., Zaki, T., Ennaji, A.: A graph based method for Arabic document indexing. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 308–312. IEEE (2016) 6. Escoter, L., Pivovarova, L., Du, M., Katinskaia, A., Yangarber, R., et al.: Grouping business news stories based on salience of named entities. In: 15th Conference of the European Chapter of the Association for Computational Linguistics Proceedings of Conference, Volume 1: Long Papers. Association for Computational Linguistics (2017) 7. Grishman, R., Sundheim, B.: Message understanding conference-6: A brief history. In: COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, vol. 1 (1996)

58

M. Khan et al.

8. Gupta, S.: Named entity recognition: applications and use cases. https:// towardsdatascience.com/named-entity-recognition-applications-and-use-casesacdbf57d595e. Accessed 10 Aug 2018 9. Khan, M.: Using Text Processing Techniques for Linking News Stories for Digital Preservation. PhD thesis, Faculty of Computer Science, Preston University Kohat, Islamabad Campus, HEC Pakistan (2018) 10. Khan, M., Ur Rahman, A., Daud Awan, M., Alam, S.M.: Normalizing digital news-stories for preservation. In: 2016 Eleventh International Conference on Digital Information Management (ICDIM), pp. 85–90. IEEE (2016) 11. Khan, M., Ur Rahman, A., Awan, M.D.: Exploring the digital world of newspapers. Sci. Technol. J. (Ciencia e Tecnica Vitivinicola), Portugal 32(6), 430–449 (2017) 12. Khan, M., Ur Rahman, A., Awan, M.D.: Term-based approach for linking digital news stories. In: Italian Research Conference on Digital Libraries, pp. 127–138. Springer (2018) 13. Khan, M., Ur Rahman, A.: Digital news story preservation framework. In: Proceedings of Digital Libraries: Providing Quality Information: 17th International Conference on Asia-Pacific Digital Libraries, ICADL, p. 350. Springer (2015) 14. Koushkestani, A.: Using Named Entities in Post-click News Recommendation. Dalhousie University, Halifax, Nova Scotia (2016) 15. K¨ u¸cu ¨k, D., Yazici, A.: Employing named entities for semantic retrieval of news videos in Turkish. In: ISCIS, pp. 153–158 (2009) 16. Marrero, M., Urbano, J., S´ anchez-Cuadrado, S., Morato, J., G´ omez-Berb´ıs, J.M.: Named entity recognition: fallacies, challenges and opportunities. Comput. Stand. Interfaces 35(5), 482–489 (2013) 17. Mohd, M.: Named entity patterns across news domains. In: BCS IRSG Symposium: Future Directions in Information Access, pp. 30–36 (2007) 18. Nevzorova, O., Mukhamedshin, D., Galieva, A., Gataullin, R.: Corpus management system: Semantic aspects of representation and processing of search queries. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 285–290. IEEE (2016) 19. Pu, P., Chen, L., Hu, R.: A user-centric evaluation framework for recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, pp. 157–164. ACM (2011) 20. Ur Rahman, A., David, G., Ribeiro, C.: Model migration approach for database preservation. In: International Conference on Asian Digital Libraries, pp. 81–90. Springer (2010) 21. Toujani, R., Akaichi, J.: Fuzzy sentiment classification in social network Facebook’ statuses mining. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 393–397. IEEE (2016) 22. Uyar, E.: Near-duplicate news detection using named entities. Bilkent University, Department of Computer Engineering (2009) 23. Xindong, W., Gong-Qing, W., Xie, F., Zhu, Z., Xue-Gang, H.: News filtering and summarization on the web. IEEE Intell. Syst. 25(5), 68–76 (2010)

Development of Supplier Selection Model Using Fuzzy DEMATEL Approach in a Sustainable Development Context Oussama El Mariouli(&) and Abdellah Abouabdellah Industrial Engineering Laboratory MOSIL, ENSA, Ibn Tofail University, Kenitra, Morocco [email protected], [email protected]

Abstract. The selection of sustainable suppliers (SSS) has a grand role in the process of securing a sustainable supply chain (SSC). In this article, we have developed a new mathematical model by using a hybrid approach fuzzy decision making trial and evaluation laboratory (Fuzzy DEMATEL). This model is used to measure suppliers’ sustainability performance in all industrial companies, it will help decision-makers and managers to choose the best supplier that meets economic, environmental and social requirements. Our methodology begins with the selection of the most relevant criteria in the literature and international standards. Then we used the hybrid method Fuzzy DEMATEL to classify and calculate the weight of the selected criteria. And we proposed an equation to calculate the scenario of the weights assigned to the three pillars of sustainable development (SD). After, we presented a mathematical model that calculates the sustainability index of each supplier. At the end we apply our model in Moroccan company. Keywords: Fuzzy  DEMATEL Sustainable development

 Supplier selection  Criteria 

1 Introduction With the appearance of the sustainable development notion at our days, industrial companies are forced to adopt the strategy of integrating SD policy into their supply chain (SC), downstream (purchase) upstream (distribution) to remain competitive and open up to new national and international markets. The first step of this approach starts by choosing the best suppliers who respect the three economic, environmental and social dimensions of SD. The adoption of the sustainability approach has involved the addition of other economic (deadlines, flexibility …), environmental (recycling, waste …) and social (human rights, condition of work …) criteria to the classic criteria (quality cost and time) in the purchase decision. The supplier selection process has therefore become a complex and multi-choice process of criteria that depends on several qualitative and quantitative criteria. © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 59–71, 2020. https://doi.org/10.1007/978-3-030-21005-2_6

60

O. El Mariouli and A. Abouabdellah

In the literature there are several researches that deal with the issue of selection of suppliers (SS), but most of this research does not include the three pillars of SD in the selection, they are focused on the environmental dimension and neglects the social dimension [1, 2], this research presented different methods and technique of SS [3–5]. In this paper we developed a new mathematical model that allows measuring the score of supplier’s sustainability to choose the best between them. This model is the result to merge the DEMATEL approach with fuzzy set theory.

2 Literature Review In the last ten years, several researchers are interested in selecting the best supplier who respects the rules of SD. In this section we present the different mathematical approaches and decision-making techniques for SSS found in the literature (see Table 1). Table 1. Approach and technique for SSS. Authors Amindous et al. (2012) [6] Ghamidi et al. (2014) [7] Arabsheybani et al. (2018) [8] Awasthi et al. (2018) [9] Azimifard et al. (2018) [10] Bai et al. (2010) [11] Buyukozkan et al. (2011) [12] Fallahpour et al. (2017) [13] Goren (2018) [14] Govindan et al. (2012) [15] Azadi et al. (2014) [16] Hatami-Marbini et al. (2016) [17] Izadikhah et al. (2017) [18] Jauhar et al. (2017) [19] Kannan et al. (2017) [20] Luthra et al. (2016) [21] Orji et al. (2014) [22] Pandey et al. (2016) [23] Sarkis et al. (2014) [24] Shabanpour et al. (2017) [5] Shi et al. (2014) [25] Yousefi et al. (2016) [26]

Approach/technique used Fuzzy inference system Fuzzy MOORA method and FMEA technique Fuzzy AHP-VIKOR AHP et TOPSIS Grey system and rough set theory Fuzzy ANP Fuzzy AHP et Fuzzy TOPSIS Fuzzy DAMATEL and Taguchi Fuzzy TOPSIS Fuzzy DEA

DEA, DE et MODE Fuzzy Delphi, ISM ANP et COPRAS-G AHP-VIKOR Fuzzy DEMATEL et TOPSIS Fuzzy GP Bayesian framework GP et DEA DEA

All of these approaches and techniques help select the suppliers, but they are not easy to apply. In the next part we present a simple mathematical model that allows selecting the best supplier.

Development of Supplier Selection Model Using Fuzzy DEMATEL

61

3 The Proposed Approach Our research methodology (see Fig. 1) is articulated around three phases of realization, which we will detail them below: • The first phase: The identification of SD criteria used to measure supplier performance. • The second phase: Calculates weights of the criteria for the three dimensions of SD, we used the method of Fuzzy DEMATEL; Choice of a scenario to determine the coefficients of the economic, environmental and social dimensions. • The last phase: Development of the equation that allows measuring the performance of the suppliers that takes into account the SD criteria in the evaluation and SS.

Step 1

Step 2

Identify the criteria for SD

Calculate weights of criteria

Calculate weights of SD dimensions

Step 3

calculate the economic dimension indicator

calculate the economic dimension indicator

calculate the economic dimension indicator

calculate the sustainable performance of suppliers

Choose the best supplier

Fig. 1. The proposed approach.

3.1

Step 1: Identify the Criteria for SD

The traditional approach to vendor selection has been based solely on conventional economic criteria (cost, quality, time), but with the emergence of the SD concept, companies are forced to add other new sectors (environmental and social) in the process of selecting suppliers to meet the requirements of stakeholders (NGOs, consumers, public …). To define these new criteria, we have analyzed the bibliography and international standards [3]. The table (see Table 2) presents the result of this analysis.

62

O. El Mariouli and A. Abouabdellah Table 2. Criteria of SD [3].

Economic criteria Innovation capacity (C1) Production capacity (C2) Technical and technological capacity (C3) Cost (C4) Deadlines (C5) Reliability (C6) Financial (C7) Flexibility (C8) Delivery (C9) Quality (C10) Reactivity (C11) Customer references (C12)

3.2

Environmental criteria Waste (C13) Emissions (C14) Environmental label (C15) Pollution (C16) Program (C17) Recycling (C18) The respect of the rules Ethical Environmental (C19) Use of Resources (C20) Toxic or dangerous substances (C21)

Social criteria Human rights (C22) Jobs and wealth (C23) Training, support and education (C24) Health and security at work (C25) Condition of work (C26)

Step 2: Calculate Weighs of Criteria and of SD Dimensions

Calculate Weighs of Criteria The DEMATEL approach is presented by Battelle Memorial Institute (Center for Science and Technology) [27], it is used to construct the interrelationships between criteria and factors [28] based on the expert’s point of view, and to visualize the MRI (impact-relation map) criteria. DEMAEL represents the best approach for finding the impact of a criterion at other criteria [29]. Fuzzy DEMATEL represent the extension of the classical DEMATEL method that integrated fuzzy set theory. The fuzzy set theory is classified the best method to solve the vagueness and uncertainty of human judgments. To calculate the weight of the criteria we used the following Fuzzy DEMATEL process: • Phase1: Construct the Direct Relation Matrix Z k In this step, experts are asked to complete a questionnaire to give the impact of the criterion i to the criterion j by 5 linguistic variables: Very high influence (VH), High influence (HI), Low influence (LI), Very low influence No influence (NI) [30]. h (VL), i The Linguistic Scale Direct Relation Matrix Z k ¼ zkij is formed according to these

variables, after these variables are converted to fuzzy number, Z k is a non-negative matrix of dimension n  n and diagonal zkij ¼ 0, zkij represents the impact of i on j. 2

0 h i 6 z 21 Z k ¼ zkij ¼ 6 4  zn1

z12 0 : zn2

3 . . . z1n    z2n 7 7 : : 5  0

ð1Þ

Development of Supplier Selection Model Using Fuzzy DEMATEL

63

• Phase 2: Construct The Initial Direct Relation Matrix H k The Matrix H k is computed by using the Converting Fuzzy Data into Crisp Scores (CFCS) process. Let zij ¼ ðanij ; bnij ; cnij Þ indicate the degree of relationship between criteria i and j and n = (1,2,3,…, p) number of questionnaire. CFCS algorithm is presented as follows: Standardization: xcnij ¼ ðcnij  minanij Þ=Dmax min

ð2Þ

xbnij ¼ ðbnij  minanij Þ=Dmax min

ð3Þ

xanij ¼ ðanij  minanij Þ=Dmax min

ð4Þ

Calculates normalized values right (cs) and left (as): xcsnij ¼ xcnij =ð1 þ xcnij  xbnij Þ

ð5Þ

xasnij ¼ xbnij =ð1 þ xbnij  xanij Þ

ð6Þ

Calculate total normalized crips values: h i xnij ¼ xasnij ð1  xasnij Þ þ xcsnij  xcsnij =ð1  xasnij þ xcsnij Þ

ð7Þ

Calculate crips values: hnij ¼ minanij þ xnij  Dmax min

ð8Þ

1 hij ¼ ðh1ij þ h2ij þ h3ij þ    þ hpij Þ p

ð9Þ

Final crips values:

The Initial Direct Relation Matrix is: 2

0 6 h 21 Hk ¼ 6 4 : hn1

h12 0 : hn2

3    h1n    h2n 7 7 : : 5 ... 0

ð10Þ

64

O. El Mariouli and A. Abouabdellah

• Phase 3: Construct The Normalized Fuzzy Direct Relational Matrix N The Normalized Fuzzy Direct Relation Matrix N is computed like: N¼

Hk max1  i  n

n P

i; j ¼ ð1; 2; 3;    ; nÞ

ð11Þ

hij

j¼1

• Phase 4: Construct The Total Relation Matrix T In this step we calculate Total Relation Matrix T like as: T ¼ NðI  NÞ1

ð12Þ

I represent the Identical Matrix [n  n]. • Phase 5: Define the Cause and Effect Relationship We add up the values of each column and row in the Total Relationship fuzzy matrix T, where (ri) represent the sum of the ith row and (cj) represent the sum of the ith column. ci and rj values represent both the direct and indirect impact between the criteria.   T ¼ tij nn

i; j ¼ ð1; 2; 3;    ; nÞ n X

ri ¼

tij

ð13Þ

8i

ð14Þ

8j

ð15Þ

1jn

cj ¼

n X

tij

1in

If ri  cj  0, the criterion represents the cause in the group. If ri  cj  0, the criterion represents the effect in the group. The causality-effect graph is obtained from horizontal axis (ri + cj) and vertical axis (ri – cj). • Phase 6: Weights of Criteria The weight of the criteria is obtained by: wi ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðri þ cj Þ2 þ ðri  cj Þ2 wi Wi ¼ P n wi i¼1

ð16Þ ð17Þ

Development of Supplier Selection Model Using Fuzzy DEMATEL

65

Calculate Weighs of SD Dimensions To calculate the weight of the dimensions of SD, we choose a scenario of calculi in such a way that: Weco þ Wenv þ Wsoc ¼ 1

Weco  0; Wenv  0; Wsoc  0

ð18Þ

For example: Weco ¼ Wenv ¼ Wsoc ¼ 13 Weco ¼ 12 and Wenv ¼ Wsoc ¼ 14 3.3

Step 3: Calculate the Economic Dimension Indicator

In this part we present our mathematical model that will allow decision makers to choose the most sustainable supplier. • Decision variables Ieco : Represent the economic index. Ienv : Represent the environmental index. Isoc : Represent the social index. Wi : Represent the weight of Ci . Weco : Represent the weight of the economic dimension. Wenv : Represent the weight of the environmental dimension. Wsoc : Represent the weight of the social dimension. • The Objective Function The objective function we have developed, which allows measuring the durability of the suppliers, must have maximum value. Each criterion is multiplied with their weight to obtain the indicators of the three dimensions of SD. 12 P

Ieco ¼ i¼1 21 P

Ienv ¼

Wi C i 12

i ¼ ð1; 2; . . .; 12Þ

ð19Þ

i ¼ ð13; 14; . . .; 21Þ

ð20Þ

Wi Ci

i¼13

9 26 P

Isoc ¼ i¼22

Wi C i 5

i ¼ ð22; 23;    ; 26Þ

ð21Þ

The sustainability index of each supplier can be modeled as follows: MaxP ¼

Weco Ieco þ Wenv Ienv þ Wsoc Isoc 3

The maximum value for the equation MaxP is one.

ð22Þ

66

O. El Mariouli and A. Abouabdellah

• Constraints The sum of the weights of the SD dimension in Eqs. (23) to (25) must respectively equal to one: 12 X

Wi ¼ 1

ð23Þ

Wi ¼ 1

ð24Þ

Wi ¼ 1

ð25Þ

i¼1 21 X i¼13 26 X i¼22

The summation of Weco ; Wenv ; Wsoc must be equal to 1: Weco þ Wenv þ Wsoc ¼ 1

ð18Þ

The criterion value must be bounded between one and zero: 0  Ci  1

8i ¼ f1; 2; . . .; 26g

ð26Þ

4 Data Analysis In enterprise X, we asked the 3 decision makers to evaluate the influence between the criteria (see Table 2) with scores according to the following table (see Table 3):

Table 3. Linguistic scale. Score 4 3 2 1 0

Linguistic scale Very high influence High influence Low influence Very low influence No influence

For example, the Direct Relation Matrix obtained from the decision maker (DM) questionnaire 1 for economic dimension of SD is shown as (see Table 4):

Development of Supplier Selection Model Using Fuzzy DEMATEL

67

Table 4. Direct Relation Matrix of economic dimension from DM number 1 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12

C1 0 NO NO NO NO NO VH NO NO NO NO NO

C2 VH 0 VH VL NO NO NO NO NO NO NO NO

C3 H NO 0 VH NO NO L NO NO L L NO

… … … … … … … … … … … … …

C8 H L VH NO VL NO NO 0 NO NO VL NO

C9 L NO NO L H NO NO NO 0 NO NO NO

C10 H VH VH VH VL VH NO VL NO 0 NO NO

C12 L H H L L VH NO L VH VH H 0

Applying the equations from (2) to (12), we obtain the Total Relation Matrix (see Table 5):

Table 5. The T Matrix of economic dimension C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12

C1 5,6/100 4,6/100 5,3/100 5,6/100 4,4/100 4,6/100 14,7/100 1,9/100 2,5/100 2,5/100 4,3/100 3,1/100

C2 19,9/100 4,5/100 18,3/100 10,2/100 5/100 4,3/100 7,7/100 4,9/100 3,3/100 3,6/100 4,3/100 2,8/100

C3 22,4/100 9,1/100 10,7/100 18,8/100 12,3/100 8,9/100 13/100 7,3/100 7,6/100 8,9/100 7,9/100 5,6/100

… … … … … … … … … … … … …

C9 13,1/100 6,9/100 10/100 9,4/100 14,5/100 6,2/100 4/100 3,8/100 2,8/100 3,5/100 3,7/100 2,9/100

C10 21,3/100 22,2/100 28,5/100 22,8/100 15,5/100 19,9/100 9,5/100 12,9/100 10/100 7,9/100 6,4/100 6,6/100

C11 29,1/100 22,2/100 25,8/100 15,9/100 21,1/100 18,6/100 8,6/100 16,1/100 7,9/100 7,2/100 6,6/100 5,9/100

C12 20,5/100 20,3/100 26,3/100 17,6/100 16,5/100 21,3/100 7,9/100 8,8/100 16,5/100 20/100 11,8/100 5,8/100

After obtaining all the managers’ questionnaire, we applied the fuzzy DEMATEL method and we had the weight of the criteria for each sustainable development dimension. Applying the equations from (19) to (21), we get 3 index of measuring the economic, environmental and social dimensions (see Table 6).

68

O. El Mariouli and A. Abouabdellah Table 6. Index of environmental, economic and social dimensions Dimension Economic

C C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 Ieco Environment C13 C14 C15 C16 C17 C18 C19 C20 C21 Ienv Social C22 C23 C24 C25 C26 Isoc

W 0,101 0,079 0,106 0,115 0,095 0,098 0,061 0,053 0,051 0,082 0,079 0,080 0,112 0,115 0,133 0,137 0,130 0,074 0,154 0,074 0,071 0,232 0,129 0,198 0,191 0,250

S1 0,500 0,200 0,500 0,910 0,167 0,500 0,917 0,333 0,005 0,710 0,500 0,850 0,528376 0,333 0,333 0,000 0,333 0,333 0,000 0,333 0,333 0 0,240426 0,333 0,333 0,333 0,200 0,5 0,349347

S2 0,600 0,400 0,600 0,920 0,200 0,600 0,929 0,500 0,006 0,770 0,500 0,880 0,595915 0,500 0,333 0,000 0,333 0,333 0,000 0,333 0,333 0 0,25913 0,333 0,333 0,333 0,240 0,5 0,356987

S3 0,700 0,600 0,700 0,930 0,250 0,700 0,941 0,500 0,007 0,790 1,000 0,900 0,691638 0,500 0,500 1,000 0,500 0,500 1,000 0,500 0,333 0 0,555642 0,500 0,500 0,500 0,250 1,000 0,57725

S4 0,800 0,800 0,800 0,940 0,333 0,800 0,950 1,000 0,010 0,870 1,000 0,980 0,701738 1,000 0,500 1,000 0,500 0,500 1,000 0,500 0,333 1,000 0,682642 0,500 0,500 0,500 0,290 1,000 0,58489

S5 0,900 1,000 0,900 0,950 0,500 0,900 0,980 1,000 0,011 0,920 1,000 1,000 0,858031 1,000 1,000 1,000 1,000 1,000 1,000 0,500 0,500 1,000 0,886 1,000 1,000 1,000 0,300 1,000 0,8663

In this case the enterprise X choice the scenario of weigh dimension like as: Weco ¼ Wenv ¼ Wsoc ¼

1 3

The final score of suppliers sustainability was used to choose the suppliers in the X enterprise (see Table 7), supplier number 5 is identified as the best sustainable supplier with MaxP = 0,29.

Development of Supplier Selection Model Using Fuzzy DEMATEL

69

Table 7. Indices of the supplier sustainability S1 S2 S3 S4 S4

Index economic 0,528376 0,595915 0,691638 0,701738 0,858031

Index environmental 0,240426 0,25913 0,555642 0,682642 0,886

Index social 0,349347 0,356987 0,57725 0,58489 0,8663

Sustainable performance 0,12423878 0,13467022 0,20272556 0,21880778 0,29003678

Rank 5 4 3 2 1

5 Conclusion The problem of selecting suppliers that respect SD issues has become an important topic for companies. In this article, we presented a mathematical model based on a Fuzzy DEMATEL multicriteria decision-making approach, designed for managers to measure supplier performance. We used Fuzzy set theory to solve the problem of inaccuracy and the uncertainty of decision makers out of the study of cause-effect relationships between criteria. Our developed model is simple to apply and can be used in any industrial enterprise to select the best sustainable supplier.

References 1. Zimmer, K., Fröhling, M., Schultmann, F.: Sustainable supplier management – a review of models supporting sustainable supplier selection, monitoring and development. Int. J. Prod. Res. (2015). https://doi.org/10.1080/00207543.2015.1079340 2. Vahidi, F., Torabi, S.A., Ramezankhani, M.J.: Sustainable supplier selection and order allocation under operational and disruption risks. J. Clean. Prod. (2017). https://doi.org/10. 1016/j.jclepro.2017.11.012 3. El Mariouli, O., Abouabdellah, A.: Model for assessing the economic, environmental and social performance of the supplier. In: 4th IEEE International Conference on Logistics Operations Management (GOL’2018), Lehavre, France (2018) 4. Sureeyatanapas, P., Sriwattananusart, K., Niyamosothath, T., Setsomboon, W., Arunyanart, S.: Supplier selection towards uncertain and unavailable information: an extension of TOPSIS method. Oper. Res. Perspect. (2018). https://doi.org/10.1016/j.orp.2018.01.005 5. Shabanpour, H., Yousefi, S., Saen, R.F.: Future planning for benchmarking and ranking sustainable suppliers using goal programming and robust double frontiers DEA. Transp. Res. Part D: Transp. Environ. 50(January), 129–143 (2017). http://dx.doi.org/10.1016/j.trd.2016. 10.022 6. Amindoust, A., Ahmed, S., Saghafinia, A., Bahreininejad, A.: Sustainable supplier selection: a ranking model based on fuzzy inference system. Appl. Soft Comput. 12(2012), 1668–1677 (2012) 7. Ghadimi, P., Heavey, C.: Sustainable supplier selection in medical device industry: toward sustainable manufacturing. Procedia CIRP 15(2014), 165–170 (2014) 8. Arabsheybani, A., Paydar, M.M., Safaei, A.S.: An integrated fuzzy MOORA method and FMEA technique for sustainable supplier selection considering quantity discounts and supplier’s risk. J. Clean. Prod. (2018). https://doi.org/10.1016/j.jclepro.2018.04.167

70

O. El Mariouli and A. Abouabdellah

9. Awasthi, A., Govindan, K., Gold, S.: Multi-tier sustainable global supplier selection using a fuzzy AHP-VIKOR based approach. Int. J. Prod. Econ. (2017). https://doi.org/10.1016/j. ijpe.2017.10.013 10. Azimifard, A.: Resources Policy (2018). https://doi.org/10.1016/j.resourpol.2018.01.002 11. Bai, C., Sarkis, J.: Integrating sustainability into supplier selection with grey system and rough set methodologies. Int. J. Prod. Econ. 124(2010), 252–264 (2010) 12. Buyukozkan, G., Cifci, G.: A novel fuzzy multi-criteria decision framework for sustainable supplier selection with incomplete information. Comput. Ind. 62(2011), 164–174 (2011) 13. Fallahpour, A., Udoncy Olugu, E., Nurmaya Musa, S., Yew Wong, K., Noori, S.: A decision support model for sustainable supplier selection in sustainable supply chain management. Comput. Ind. Eng. (2017). http://dx.doi.org/10.1016/j.cie.2017.01.005 14. Goren, H.G.: A decision framework for sustainable supplier selection and order allocation with lost sales. J. Clean. Prod. (2018). https://doi.org/10.1016/j.jclepro.2018.02.211 15. Govindan, K., et al.: A fuzzy multi criteria approach for measuring sustainability performance of a supplier based on triple bottom line approach. J. Clean. Prod. (2012). https://doi.org/10.1016/j.jclepro.2012.04.014 16. Azadi, M., et al.: A new fuzzy DEA model for evaluation of efficiency and effectiveness of suppliers in sustainable supply chain management context. Comput. Oper. Res. (2014). http://dx.doi.org/10.1016/j.cor.2014.03.002i 17. Hatami-Marbini, A., Agrell, P.J., Tavana, M., Khoshnevis, P.: A flexible cross-efficiency fuzzy data envelopment analysis model for sustainable sourcing. J. Clean. Prod. (2016). https://doi.org/10.1016/j.jclepro.2016.10.192 18. Izadikhah, M., Saen, R.F., Ahmadi, K.: How to assess sustainability of suppliers in volume discount context? A new data envelopment analysis approach. Transp. Res. Part D 51, 102– 121 (2017) 19. Jauhar, S.K., Pant, M.: Integrating DEA with DE and MODE for sustainable supplier selection. J. Comput. Sci. http://dx.doi.org/10.1016/j.jocs.2017.02.011 20. Kannan, D.: Role of multiple stakeholders and the critical success factor theory for the sustainable supplier selection process. Int. J. Prod. Econ. http://dx.doi.org/10.1016/j.ijpe. 2017.02.020 21. Luthra, S., Govindan, K., Kannan, D., Mangla, S.K., Garg, C.P.: An integrated framework for sustainable supplier selection and evaluation in supply chains. J. Clean. Prod. (2016). https://doi.org/10.1016/j.jclepro.2016.09.078 22. Orji, I.J., Wei, S.: A decision support tool for sustainable supplier selection in manufacturing firms. J. Ind. Eng. Manag. (2014). http://dx.doi.org/10.3926/jiem.1203 23. Pandey, P., Shah, B.J., Gajjar, H.: A fuzzy goal programming ap-proach for selecting sustainable suppliers. Benchmarking: Int. J. 24(5) (2017). https://doi.org/10.1108/BIJ-112015-0110 24. Sarkis, J., Dhavale, D.G.: Supplier selection for sustainable operations: a triple-bottom-line approach using a Bayesian framework. Int. J. Prod. Econ. (2014). http://dx.doi.org/10.1016/ j.ijpe.2014.11.007i 25. Shi, P., Yan, B., Shi, S., Ke, C.: A decision support system to select suppliers for a sustainable supply chain based on a systematic DEA approach. Inf. Technol. Manag. (2014). https://doi.org/10.1007/s10799-014-0193-1 26. Yousefi, S., Shabanpour, H., Fisher, R., Saen, R.F.: Evaluating and ranking sustainable suppliers by robust dynamic data envelopment analysis, Measurement (2016). http://dx.doi.org/ 10.1016/j.measurement.2016.01.032 27. Fontela, E., Gabus, A.: World Problems, an Invitation to Further Thought with-in the Framework of DEMATEL. Battelle Geneva Research Centre, Geneva (1972)

Development of Supplier Selection Model Using Fuzzy DEMATEL

71

28. Fontela, E., Gabus, A.: DEMATEL, innovative methods. Rep. No. 2, “Structural analysis of the world problematique (methods)”, Battelle Geneva Research Institute (1974) 29. Wu, W.W., Lee, Y.T.: Developing global managers’ competencies using the fuzzy DEMATEL method. Expert Syst. Appl. 32(2), 499–507 (2007) 30. Li, R.J.: Fuzzy method in group decision making. Comput. Math Appl. 38, 91–101 (1999)

Software Effort Estimation Using an Optimal Trees Ensemble: An Empirical Comparative Study Abdelali Zakrani1(&), Ali Idri2, and Mustapha Hain1 1 ENSAM, Hassan II University, Casablanca, Morocco {abdelali.zakrani,mustapha.hain}@univh2c.ma 2 ENSIAS, Mohammed V University, Rabat, Morocco [email protected]

Abstract. Since information systems have become the heartbeat of many organizations, the investment in software is growing rapidly and consuming then a significant portion of the company budget. In this context, both the software engineering practitioners and researchers are more interested than ever about accurately estimating the effort and the quality of software product under development. Accurate estimates are desirable but no technique has demonstrated to be successful at effectively and reliably estimating software development effort. In this paper, we propose the use of an optimal trees ensemble (OTE) to predict the software development effort. The ensemble employed is built by combining only the top ranked trees, one by one, from a set of random forests. Each included tree must decrease the unexplained variance of the ensemble for software development effort estimation (SDEE). The effectiveness of the OTE model is compared with other techniques such as regression trees, random forest, RBF neural networks, support vector regression and multiple linear regression in terms of the mean magnitude relative error (MMRE), MdMRE and Pred(l) obtained on five well known datasets namely: ISBSG R8, COCOMO, Tukutuku, Desharnais and Albrecht. According to the results obtained from the experiments, it is shown that the proposed ensemble of optimal trees outperformed almost all the other techniques. Also, OTE model outperformed statistically the other techniques at least in one dataset. Keywords: Software development effort estimation  Optimal trees ensemble  Random forest  Regression trees  Multiple linear regression  RBF neural networks  Support vector regression  Accuracy evaluation

1 Introduction In today’s software industry, companies are systematically and continuously seeking to strengthen their competitiveness in order to survive in a highly competitive environment. One of the main factors to achieve this goal is to allocate software project resources efficiently and schedule activities appropriately. In this respect, predicting software development effort is critical.

© Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 72–82, 2020. https://doi.org/10.1007/978-3-030-21005-2_7

Software Effort Estimation Using an Optimal Trees Ensemble

73

Numerous techniques have been considered for software effort estimation, including traditional techniques such use case point [1], and, recently, machine learning techniques such as MLP neural networks [2], radial basis function (RBF) neural networks [3], random forest (RF) [4], fuzzy analogy (FA) [5] and support vector regression (SVR) [6]. Machine learning methods employs data from historical projects to construct a regression model that is then used to estimate the effort of future software projects. However, no single method has been found to be entirely stable and reliable for all cases. Furthermore, the performance of any method depends generally on the characteristics of the dataset employed to construct the model (dataset size, outliers, categorical attributes and missing values). More recently, there is a trend to overcome the weakness of the single method by using bootstrap aggregating (bagging) methods [7]. These bagging paradigms try to construct multiple learners to improve the accuracy of models used in SDEE. Many investigation studies have been demonstrated the effectiveness of the ensemble techniques over the single technique. For instance, Elish in [8] used multiple additive regression trees (MART) to estimate software effort and compared their performance with that of linear regression, RBF and SVR models on NASA dataset. The MART model outperforms the others in terms of MMRE and Pred(0.25). Idri et al. [9] used two types of homogeneous ensembles based on single Classical Analogy or single Fuzzy Analogy. Their results obtained over seven datasets showed that classical and fuzzy analogy ensembles outperform single techniques in terms of the four performance measures. In this context, Zakrani et al. [10] adapted the method proposed by Khan et al. [11] for classification and regression to software effort estimation. The underlying assumption in this method is that combining only the strong regression trees can leads to stronger ensemble and more stable model in SDEE. The results obtained by the ensemble of optimal trees showed a significant improvement over classical tree model. The present work aims to examine further the performance of the previous model. The main contributions of the present paper are twofold: (1) to develop a new model for software effort estimation using an optimal trees ensemble; (2) to evaluate the effectiveness of the proposed model by comparing it with recent methods used and investigated in the literature for SDEE, namely, (i) regression trees (RT), (ii) random forest (RF), (iii) RBF neural networks, (iv) support vector regression, and (v) multiple linear regression (MLR). This paper is organized as follows. Section 2 presents the five other SDEE techniques evaluated and Sect. 3 gives an overview of the proposed tree ensemble. In Sect. 4, we present a brief description of the datasets, the accuracy measures, the validation method used in this empirical evaluation and also the adopted statistical tests. The experiments and results are discussed in Sect. 5. Finally, Sect. 6 concludes the paper.

2 SDEE Techniques This section introduces the SDEE techniques compared with the proposed technique in this study namely optimal trees ensemble (OTE).

74

2.1

A. Zakrani et al.

Multiple Linear Regression

MLR is a statistical method used to explore and to model the relationship between variables, where the aim is to find an approximate function between the dependent variable (effort) and independent variables (project attributes). In SDEE, multiple linear regression has been among the first techniques used to model the complex relationship between the effort and the project attributes [12]. 2.2

Support Vector Regression

Support vector machine (SVM) is a relatively new intelligent method applied in many fields such as: linear and nonlinear regression, classification problems, and pattern recognition. Support vector regression (SVR) is a specific type of the SVM and has an excellent nonlinear fitting ability and a stability performance in small datasets. The first application of SVR in software effort prediction was realized by Oliveira [6]. For the experiments in this paper, we used two different versions of SVR: the e-SVR and the m-SVR. The key parameters that should be cautiously adjusted are the penalty factor C which controls the trade-off between error minimization and margin maximization, the value of e to construct the regression function by managing the number of support vectors, the parameter c in radial basis function (RBF), and also m which represents the number of SV in m-SVR version. Unfortunately, there is no guideline how to choose the optimal values for these parameters. In addition, searching the proper values for these parameters is time-consuming and needs a great number of experimental tests. Therefore, we conducted several experiments with different values. For instance, the best performance obtained using Albrecht dataset was achieved when employing m-SVR with the following values: m = 0.5, cost = 0.255 and c = 0.142. 2.3

Radial Basis Function Neural Networks

Radial basis function neural networks (RBFNN) are attractive thanks to their fast learning and simplicity. They have been employed in many fields such as: function approximation, pattern recognition, and software engineering [3]. An RBF Neural Network is composed of three layers. An input layer which contains the input neurons (effort drivers); a hidden layer in which each neuron calculates its output by means of a radial basis function, generally a Gaussian function, and an output layer which constructs a linear weighted sum of hidden neuron outputs and provides the response of the network (effort). Since the RBF neural network is used to estimate the software effort, a regression problem, it has only one output neuron. The main parameters that influence the RBFNN performance are: (1) the number of hidden neurons, (2) the number of training epochs, (3) the learning rate and (4) the momentum. Therefore, to find the best configuration, we carried out several simulations with different values for these parameters. As example, for Albrecht dataset, we executed the RBFNN model with the following values: learning rate (L) = {0.01}, momentum (M) = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8}, number of hidden neurons (H) = {2….16}, number of training epochs (N) = {1000, 1500, 2000, 2500, 3000}. Only the best results achieved are reported in this paper.

Software Effort Estimation Using an Optimal Trees Ensemble

2.4

75

Regression Trees

Classification and Regression Trees (CART), developed by Breiman et al. [13], is a statistical method that can choose from a great number of independent variables (project attributes) those that are most significant in influencing the dependent variable (effort) to be explained. This is performed by building a tree structure, which divides the data into mutually exclusive classes, each as pure or homogeneous as possible regarding their dependent variable. This tree begins with a root node containing all the data (historical projects), which are divided into nodes by recursive binary splitting. Each split is achieved using a simple rule based on a single independent variable. In this study, the parameters chosen for CART algorithm came from the work of Zakrani et al. [4]. 2.5

Random Forest

Random forest (RF) is an ensemble learning method that aggregates a large number of decision trees, thus enabling it to reduce the variance obtained as opposed to that generated when using only one decision tree. The use of random forest in SDEE requires the determination of a set of parameters such as: the number of trees constituting the forest (ntree), the number of variables chosen randomly at the level of each node (mtry), the size of the sample ‘in bag’ (sampsize) and the maximum number of nodes of each tree (maxnodes). We used in the current study the best configuration as that found in [4].

3 The Proposed Model: Optimal Trees Ensemble As the number of trees in random forest is often very large, there has been a significant work done on the problem of minimizing this number not only to reduce computational cost, but also to improve the predictive performance [10]. Since the overall estimation error of a random forest is highly associated with the strength of individual trees and their diversity in the forest. In recent work [11], Khan et al. proposed a further refinement of random forest by proposing a trees selection method on the basis of trees individual accuracy and diversity using the unexplained variance. The proposed ensemble is referred to as optimal trees ensemble (OTE). In this paper we investigate the use of OTE in SDEE. To this end, we partition the training data L = (X,Y) randomly into two non-overlapping subsets, LB = (XB,YB) and LV = (XV,YV). Next, we grow T regression trees on T bootstrap samples from the first subset LB = (XB,YB). While doing so, select a random sample of p < d features from the entire set of d project attributes. This inculcates additional randomness in the trees. Owing to bootstrapping, there will be some observations left out of the samples which are called out-of-bag (OOB) observations. These latter take no part in the training of tree. They are used to estimate unexplained variances of each tree built on a bootstrap sample. Trees are then classified in ascending order with respect to their unexplained variances and the top ranked M trees are chosen. The selection and combination of trees are carried out as follows:

76

A. Zakrani et al.

1. Starting from the two top ranked trees, successive ranked trees are added one by one to see how they perform on the independent validation data, LV = (XV,YV). This is done until the last Mth tree is added. 2. Select tree Lk, k = 1, 2, 3…, M if its inclusion to the ensemble without the kth tree satisfies the following criteria given by the Eq. 1. UnExpVar f þ kg \UnExpVar fkg

ð1Þ

Where UnExpVar fkg is the unexplained variance of the ensemble not having the kth tree and UnExpVar f þ kg is the unexplained variance of the ensemble with kth tree included. The major steps of this algorithm are detailed as follows [11]: Step 1: Take T bootstrap samples from the given subset of the training data LB = (XB,YB). Step 2: Grow regression trees on all the bootstrap samples using random forest technique. Step 3: Choose M trees with the smallest individual prediction error on the training data. Step 4: Add the M selected trees one by one and select a tree if it improves performance on validation data, LV = (XV,YV), using unexplained variance performance measures. Step 5: Combine and allow the trees to average for testing dataset.

4 Experimental Design This section presents a description of the datasets, the accuracy measures, the validation method, and the statistical tests used in this empirical evaluation. 4.1

Datasets Description

The data employed in the current study come from five datasets namely, Tukutuku, ISBSG R8, COCOMO, Albrecht and Desharnais. Table 1 displays the summary statistics for these datasets. • The Tukutuku dataset includes 53 Web applications. Each Web project is described by 9 numerical features. In fact, each project offered to the Tukutuku database was initially described by more than 9 software features, but some of them were combined together [3]. • The ISBSG R8 repository is a multi-organizational dataset containing more than 2,000 projects collected from different organizations in different countries [14]. To decide on the number of software projects, and their descriptions, a data preprocessing study was already conducted by [15], the objective of which was to select data (projects and attributes), in order to retain projects with high quality. The selected historical projects are used in this study.

Software Effort Estimation Using an Optimal Trees Ensemble

77

• The COCOMO’ 81 dataset includes 63 software projects [12]. Each project is characterized by 14 attributes. Since the original COCOMO’81 dataset contains relatively a small number of software projects and so as to conduct a robust experimental study, we have artificially generated, from the original COCOMO’81 dataset, three other datasets each one contains 63 software projects. The union of the four datasets constitutes the artificial COCOMO’81 dataset that is employed in this paper. • The Albrecht dataset [16] is a well-known dataset employed by many recent works [5, 17]. This dataset contains 24 projects characterized by 7 attributes and developed using third-generation languages. Eighteen out of 24 projects were written in COBOL, four were written in PL1, and two were written in DMS languages. • The Desharnais dataset was collected by [18]. In spite of the fact that Desharnais dataset is quite old, it is one of the largest available published datasets. Hence, it still has been used by many recent studies, such as [5, 17]. This dataset contains 81 projects. Each project is described by nine attributes. The projects belong to one Canadian software company. Four of 81 projects contain missing values. Since then, they have been left out from further investigation. Table 1. Description statistics of the selected datasets Datasets

# of software projects ISBSG (R8) 151 COCOMO 252 TUKUTUKU 53 DESHARNAIS 77 ALBRECHT 24

4.2

# of Distribution of effort attributes Min Max Mean Median Skewness Kurtosis 6 13 9 8 7

24 6 6 546 0.5

60 270 11 400 5 000 23 940 105.20

5 039 683.4 414.85 4 834 21.88

2 449 98 105 3 542 11.45

4.17 4.39 4.21 2.04 2.30

21.10 20.50 20.17 5.30 4.67

Evaluation Methods and Criteria

We use three measures to evaluate and compare the accuracy of the effort prediction models. The first measure is the magnitude of relative error (MRE) which the most used measure for the assessment of effort prediction models and is computed as follows:    Effortactual  Effortestimated   MRE ¼  ð2Þ  Effortactual The MRE values are computed for each project, whereas the mean magnitude of relative error (MMRE) calculates the average over N projects MMRE ¼

N 1X MREi N i¼1

ð3Þ

The commonly used value for MMRE is 0.25. This latter specifies that on the average, the accuracy of the established prediction models would be less than 25%.

78

A. Zakrani et al.

The second employed measure is the Pred(l) which denotes the percentage of MRE whose value is less than or equal to l among all projects. This criterion is frequently employed in the SDEE studies and is the percentage of the projects for a certain level of accuracy. The formula to compute Pred(l) is defined by the following equation: Pred ðlÞ ¼

k N

ð4Þ

Where N denotes the number of projects and k is the number of projects whose MRE is less or equal to l. In the current study, we use the value 0.25 for l since it is the commonly used value. The Pred(0.25) value indicates the effort prediction models that are, in general, accurate while the MMRE value is actually conservative with a bias against overestimates [17]. Because of this fact, the median of MRE, MdMRE, has been also employed as third measure in this study because it is less sensitive to large individual estimates (Eq. 5). MdMRE ¼ medianðMREi Þ

4.3

ð5Þ

Statistical Testing

Even though the accuracy measures can indicate if any effort prediction techniques are better than others in a clear and graphical way, it is still necessary to check if the observed differences are statistically significant. Hence, we employed the MannWhitney test at the significance level of 0.05 to verify the significance difference between absolute errors of the SDEE methods. This kind of significance test has been adopted because the distribution of the absolute errors is not normal. 4.4

Validation Method

A 30% holdout validation method was employed to assess the generalization ability of the estimation models. So, the datasets were split randomly into two non-overlapping sets: training set containing 70% of data and testing set composed from 30% of the remaining data. The aim of holdout validation is to test a model using different data on which the model was developed. This gives less biased estimate of learning performance than all-in validation method. Table 2 shows the size of the training and the testing sets. Table 2. Training and testing sets. Datasets # of projects in training set # of projects in testing set ISBSG (R 8) 106 45 COCOMO 176 76 TUKUTUKU 37 16 DESHARNAIS 77 24 ALBRECHT 24 8

Software Effort Estimation Using an Optimal Trees Ensemble

79

5 Results of the Experimental Studies Once the six models were trained using training sets, we compared the generalization capability of the proposed ensemble with the other five SDEE models using the testing sets. The evaluation was based on the MMRE, MdMRE and Pred(0.25) measures. The empirical results are shown in Figs. 1, 2, and 3. It can be seen from these three figures that the proposed ensemble of optimal trees performs better, in terms of MMRE, MdMRE and Pred(0.25), than the other five models in all datasets except RBFNN with Albrecht dataset. As shown in Fig. 1, the proposed model generates always a smaller MMRE compared to the other SDEE techniques. For the optimal trees’ ensemble, the lowest MMRE was obtained when using Albrecht whereas the highest MMRE was obtained when using ISBSG R8 dataset. These values of MMREs are not surprising since the Albrecht dataset is exhibiting the lowest non-normality while ISBSG has the highest non-normality according to Kurtosis coefficient (see Table 1). From the chart in Fig. 2, we observed that all models made much lower values of MdMREs than MMREs especially for ISBSG, COCOMO and Tukutuku datasets. This is due to the fact that MMRE measure is extremely sensitive to individual predictions with excessively large MREs [19], which is, in turn, a result of the presence of outliers in these datasets (kurtosis > 20 as shown in Table 1). Looking at Fig. 3, it is apparent that the ensemble of optimal trees yields, in general, to the highest values of Pred(0.25). The performance achieved by the proposed model presents a notable increase of 13.5 on average over RT and RF based models. The best improvement was obtained when using Tukutuku dataset with +25% followed by Albrecht dataset with +14.25%.

Fig. 1. Comparison of MMRE values for the six SDEE models.

80

A. Zakrani et al.

Fig. 2. Comparison of MdMRE values for the six SDEE models.

Fig. 3. Comparison of Pred(0.25) values for the six SDEE models.

In order to statistically verify the results obtained, we employed the Mann-Whitney test based on absolute errors, at the significance level of 0.05. The results of this statistical test are given in Table 3.

Table 3. Statistical significance (Mann-Whitney U Test) of the results over all datasets. ISBSG R8 COCOMO TUKUTUKU DESHARNAIS ALBRECHT

SVR-RBF 0,168 0,041 0,096 0,049 0,109

RF 0,258 0,027 0,058 0,063 0,078

RBFN 0,045 0,000 0,217 0,067 0,813

RT 0,199 0,000 0,023 0,005 0,054

MLR 0,438 0,000 0,003 0,111 0,406

Software Effort Estimation Using an Optimal Trees Ensemble

81

As it can be seen from Table 3: • For COCOMO dataset: the optimal trees ensemble statistically surpassed the other five SDEE methods. • For ISBSG R8 dataset: the OTE statistically outperformed RBFNN. Nevertheless, the difference of OTE performance compared with the RT, RF, SVR and MLR was not significant. • For Tukutuku dataset: the OTE statistically outperformed RT and MLR. However, the difference of OTE performance compared RBFN, RF, SVR was not significant. • For Desharnais dataset: the OTE significantly outperformed SVR and RT. By the contrary, the difference of OTE performance compared with RBFN, RF and MLR was not significant. • For Albrecht dataset: the p-value indicate that the difference OTE performance compared with the five SDEE methods is not significant (p-value > 0.05).

6 Conclusion and Future Work In this paper, we have empirically investigated the use of a novel tree’s ensemble for software effort estimation. This ensemble is built by combining only the top ranked trees from each generated RF whose it inclusion decrease the unexplained variance of the ensemble for SDEE. Next, the proposed model was compared to the regression trees, random forest, RBF neural networks, MLR and SVR models using 30% holdout validation method over five datasets namely: COCOMO, ISBSG R8, Tukutuku, Desharnais and Albrecht. The accuracy measures employed were MMRE, MdMRE and Pred(0.25). The results indicated that the ensemble of optimal trees outperforms almost all the other techniques. Also, OTE model outperformed statistically the other techniques at least in one dataset. In the light of these empirical results, we can conclude that the ensemble of optimal trees is a promising technique for software development effort estimation. As future work, we are planning to replicate this study using new datasets and employing a leave-one-out validation method.

References 1. Kusumoto, S., Matukawa, F., Inoue, K., Hanabusa, S., Maegawa, Y.: Estimating effort by use case points: method, tool and case study. In: Proceedings of the 10th International Symposium on Software Metrics, 2004, Chicago, Illinois, USA, pp. 292–299 (2004) 2. de A. Araújo, R., Oliveira, A.L.I., Meira, S.: A class of hybrid multilayer perceptrons for software development effort estimation problems. Expert. Syst. Appl. 90, 1–12 (2017) 3. Zakrani, A., Idri, A.: Applying radial basis function neural networks based on fuzzy clustering to estimate web applications effort. Int. Rev. Comput. Softw. 5(5), 516–524 (2010) 4. Zakrani, A., Namir, A., Hain, M.: Investigating the use of random forest in software cost estimation. Procedia Comput. Sci. 148, 343–352 (2019)

82

A. Zakrani et al.

5. Idri, A., Abnane, I.: Fuzzy analogy based effort estimation: an empirical comparative study. In: 17th IEEE International Conference on Computer and Information Technology, CIT 2017. Institute of Electrical and Electronics Engineers Inc. (2017) 6. Oliveira, A.L.I.: Estimation of software project effort with support vector regression. Neurocomputing 69(13–15), 1749–1753 (2006) 7. Sehra, S.K., et al.: Research patterns and trends in software effort estimation. Inf. Softw. Technol. 91, 1–21 (2017) 8. Elish, M.O.: Improved estimation of software project effort using multiple additive regression trees. Expert Syst. Appl. 36(7), 10774–10778 (2009) 9. Idri, A., Hosni, M., Abran, A.: Improved estimation of software development effort using Classical and Fuzzy Analogy ensembles. Appl. Soft Comput. J. 49, 990–1019 (2016) 10. Zakrani, A., Moutachaouik, H., Namir, A.: An ensemble of optimal trees for software development effort estimation. In: The 3rd International Conference on Advanced Information Technology, Services and System, Mohammedia, Morocco (2018) 11. Khan, Z., et al.: An ensemble of optimal trees for class membership probability estimation. In: Wilhelm, A., Kestler, H. (eds.) Analysis of Large and Complex Data, pp. 395–409. Springer, Cham (2014) 12. Boehm, B.W.: Software Engineering Economics, p. 768. Prentice Hall PTR, Englewood Cliffs, NJ (1981) 13. Breiman, L., et al.: Classification and Regression Trees. Wadsworth, Belmont (1984) 14. ISBSG, International Software Benchmarking Standards Group, Data Release 8 Repository, Data Release 8 Repository (2003). http://www.isbsg.org 15. Amazal, F.A., Idri, A., Abran, A.: Software development effort estimation using classical and fuzzy analogy: a cross-validation comparative study. Int. J. Comput. Intell. Appl. 13(3), 1450013 (2014) 16. Albrecht, A.J., Gaffney Jr., J.E.: Software function, source lines of code, and development effort prediction: a software science validation. IEEE Trans. Softw. Eng. SE-9(6), 639–648 (1983) 17. Idri, A., Abnane, I., Abran, A.: Evaluating Pred(p) and standardized accuracy criteria in software development effort estimation. J. Softw.: Evol. Process 30(4), e1925 (2018) 18. Desharnais, J.-M.: Analyse statistique de la productivitie des projets informatique a partie de la technique des point des fonction. University of Montreal (1989) 19. Foss, T., et al.: A simulation study of the model evaluation criterion MMRE. IEEE Trans. Software Eng. 29(11), 985–995 (2003)

Automatic Classification and Analysis of Multiple-Criteria Decision Making Ahmed Derbel(&) and Younes Boujelbene Faculty of Economics and Management of Sfax, Sfax University, 3018 Sfax, Tunisia [email protected]

Abstract. As part of the automatic decision-making process, we propose to highlight the importance of business intelligence and its contribution to management and decision-making in companies. The multi-criteria automatic analysis proposes to set up a complete computer chain that automates all the classic steps of the multi-criteria decision-making. The automatic multi-criteria decision relies mainly on the two learning techniques. Unsupervised classification is used to find two compact and well-separated groups in a dataset. Supervised classification is a learning method for automatically generating rules from a learning database. Both techniques must have existed to produce comprehensive and automatic classification procedures by the user. In this context, we will focus on showing how business intelligence, particularly through data mining and integrated software packages, can be an important decision-support tool for companies. Keywords: Business intelligence Data analysts  Data scientists

 Multi-criteria decision making 

1 Introduction Every day, we are faced with situations where decisions are taken to make a difference in several areas. Sometimes, decisions are easy to make, while others are complex and decision-making have been difficult to address. Decision making is a key for success in any discipline. For this reason, we do of course need a mechanism to guarantee swift decision making. Multi-criteria decision analysis (MCDA) or multi-criteria decision making (MCDM) has emerged as a branch of operations research, which is aimed at facilitating the resolution of issues. Multi-criteria decision is able to evaluate, rank, choose or reject a set of actions that can be exercised over several applications. The MCDA is particularly based on the evaluation of a set of criteria using scores, values and intensity of preference. From the operational research point of view, there are two main schools of thought in multi-criteria decision support known as the American and European approach. The methods of the first family (American school) are based on complete aggregation. In general, complete aggregation is the way to proceed and consider the criteria as comparable and combine them into a mathematical form, called a utility or aggregation function. In fact, the methods of complete aggregation are used to calculate the score for each attribute. The score calculation is obtained by the decision maker whose goal is to assign confidence to actions, rank alternatives in decision © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 83–93, 2020. https://doi.org/10.1007/978-3-030-21005-2_8

84

A. Derbel and Y. Boujelbene

making and rank the importance of each alternative once this is complete. For example, if we take two actions, therefore we can compare them by specifying the relative importance of one action over the other. In the complete aggregation, the ranking is clear with an easy interpretation of the parameters. The main methods belonging to this approach are: MAUT (Multiple Attribute Utility Theorem), AHP (Analytic Hierarchy Process), GP (Multi-Criterion Mathematical Goal Programming), etc. The methods of the second family (Francophone school) correspond to a constructive approach, allowing developing binary relations (generally neither complete nor transitive) of overclassification based on the preferences of the decision-maker. Once all the actions are compared in this way, a synthesis of all the binary relations is developed in order to bring results in the decision-making processes. This approach contains several methods, the main ones being: ELECTRE, PROMETHEE, ORESTE, QUALIFLEX, etc. Like all other methods, the MCDA model has created limits at several levels. On the one hand, the analyze duration is often one of the most limiting factors to making sound decisions in a multi-criteria assessment. MCDA methods are often based on slow and iterative processes, which may require significant long-term evaluation plans. On the other hand, the complexity of mathematical aggregation increases the likelihood of coming to erroneous conclusions or to lead the analysis into confusion. In this research, we developed a new way that brings an intelligent and creative approach to making an analytic and automatic decision. We proposed a method to improve the performance of the MCDA model. It is a method which is very different, very practical and very conceptual in order to develop automatic multi-criteria decision analysis. We also tested our proposal to implement them in a practical way which is best suited to specific application, especially in the case of the classification of public transport operators in Tunisia urban.

2 Research Methodology The MCDA methods are used for relative comparisons between individuals. This comparison is done through a relational model of preference and aggregation (complete and partial). The aggregation is used to obtain a score on an individual, and that according to the profile of the decision makers, in order to be able make a ranking. To improve the classical models of decision-making, we introduced smart concepts to make decision making more and more sophisticated. The developed technique of automatic classification is used to extract relevant synthetic information and come up with a solution that actually improves the performance of MCDA methods. For this reason, we used two categories of statistical analysis methods: descriptive and predictive approaches. Descriptive methods have focused on the analysis of an important set of data, and predictive methods aim at obtaining information about a set of labeled data. In the fundamental aspect, we used two classification techniques, supervised classification and unsupervised classification. Unsupervised classification is used to find two compact and well-separated groups in a set of data, and therefore it is necessary to assign a class label to each observation. The supervised classification is a machine task consisting of learning a prediction function from annotated examples. The methodology used is based on several steps. The first step consists of extracting the scores through the MCDA method. This technique is managed by the unsupervised learning method, and more

Automatic Classification and Analysis of Multiple-Criteria

85

specifically by (CAH) method. Supervised learning is used to automate decision-making when we have considered it interesting to add individuals or update existing data. Automatic classification in this case is established to classify and predict the label of a new data from the previous model knowledge learned, as indicated in Fig. 1.

Fig. 1. The automatic decision method is applied to classify the alternatives; we used two machine learning techniques (unsupervised and supervised classification) to ensure this objective

It was constructed around a practical case used to test and enrich our methodology. We used an application developed by [1], which refers to the ranking of public transport operators in Tunisia on the basis of MCDA method (see Fig. 2). The goal is not only to find the operators who are not performing, but also to facilitate decisionmaking. Indeed, the processing of information and the analysis of data appear to make a good decision by the public authority and more specifically to facilitate administrative tasks. When public transport decision makers in Tunisia have decided to integrate a new attribute or update the database. The managers in this case do not need to repeat the procedure of the MCDA method, and it will not be necessary to treat again the work. The role of classification is therefore to reproduce an automatic decision and to exclude any human intervention.

Fig. 2. The automatic decision consists of transforming performance scores obtained by the MCDA method into a classification

86

A. Derbel and Y. Boujelbene

3 Unsupervised Classification: CAH Unsupervised classification is a mathematical method of data analysis that facilitates grouping into multiple distributions. Individuals grouped within the same class (intraclass homogeneity) are subjected to a similar process, while heterogeneous classes have a dissimilar profile (inter-class heterogeneity). Clusters can be considered as classes or groups of similar entities separated by other clusters with non-common features. In our case, we have classified the public transport operators in Tunisia, so that the operators must belong to one of the two classes generated by the classification. We have a set of operators that we denote by X={x1, x2, …, xN} characterized by a set of descriptors (D). The objective of unsupervised classification is to find the groups K= {C1, C2, …, Ck} and verify which elementary operators (x) belong to each cluster. This means for determining a function noted by (Y) that associates each element of (X) with one or more elements of (C). There are several unsupervised classification algorithms to give results on the problem of the data classification. Subsequently, we present two categories of unsupervised classifications, ascending hierarchical classification (CAH) and nonhierarchical classification (K-means). We chose the (CAH) method for the following reasons. The (CAH) method offers a clear and simple approach to facilitate structuring of information and gives a high visibility in the area of multi-criteria analysis. The hierarchical classification is based on three principles: The dendrogram: the (X) partitions made at each stage of the (CAH) algorithm can be visualized via a tree called a dendrogram. On one axis appears the individuals to be grouped and on the other axis are indicated the differences corresponding to the different levels of grouping, this is done graphically by means of branches and nodes. The dendrogram, or hierarchical tree, shows not only the links between classes, but also the height of the branches which indicates the level of proximity. Indeed, this technique is based on the measurement of a distance between clusters. And again, there is the choice, depending on the options selected and depending on the different methods of aggregation. The cut-off point of a dendrogram: it is the configuration of the dendrogram, a predefined number of clusters make it possible to trace a break at a certain level of aggregation. This method determines the number of classes retained for subsequent events. To select a partition of the population, simply cut the dendrogram is obtained at a certain height. An analysis of the shape of the dendrogram may give us an indication of the number of classes can be selected [2]. Estimation the number of clusters (k): from the CAH method, the number of classes is not necessarily known a priori. Different techniques exist, and one of the most common is based on information criteria such as BIC (Bayesian Information Criterion) or AIC (Aikake Information Criterion). Homogeneity (intra-class distance) and separation (inter-class distance) are the most common technique used for estimating the number of classes. The silhouette criterion is considered a relevant measure for assessing the quality of partitioning. We chose to use this criterion as a concrete measure aimed at ensuring better consideration of both the homogeneity and heterogeneity of classes. Let a(i) is the average of the dissimilarities (or distances) of the observation (i) with all the other observations within the same class. The more a(i) is small, the assignment of (i) is giving a better classification. Let b(i) is the lowest

Automatic Classification and Analysis of Multiple-Criteria

87

average of the observation dissimilarities (i) to each other class. The silhouette of the nth observation is then given by: 8 < 1  aðiÞ=bðiÞ; if aðiÞ\bðiÞ bðiÞ  aðiÞ ; sðiÞ 0; if aðiÞ ¼ bðiÞ ð1Þ sðiÞ ¼ : max aðiÞ; bðiÞ bðiÞ=aðiÞ  1; if aðiÞ [ bðiÞ The silhouette of an element (i) is between (−1, 1), if s(i) is close to (1) then it means that the data are correctly grouped to build a strong inter-class variability and low variability intra-class, if s(i) is close to (−1), of the same logic the data are grouped with the neighboring class, if s(i) is close to 0, it means that the data is on the boundary of two classes. When running of ascending hierarchical classification, the dendrogram illustrated in the form of a tree, the level of similarity is measured along the vertical axis and the various operators of public transport are listed along the horizontal axis. The graph of the (CAH) method consists in illustrating the dispersion (intra and inter classes) between the observations and makes it possible to verify that the classes are sufficiently individualized. This procedure is repeated until all observations are fully merged. The dendrogram (Fig. 3) grouped public transport operators in Tunisia with two classes. The first class in red includes the operators of Sfax, Kairouan, Beja, Tunis, Jendouba, Nabeul, Kef and Kasserine. The second class in blue includes operators Medenine, Sahel, Bizerte, Gabes, and Gafsa. The classification reliability study was established by the silhouette analysis. The result the analysis of the silhouette showed that the average value of classification by two (K = 2, with s(i) = 0.6) is the best way to illustrate the dispersion of the data, which implies that the classification into two classes can provide group level reliability. However, individuals with large silhouette indices are well grouped by a strong distribution structure, and the silhouette indices are totally positive, as indicated in Fig. 4.

Fig. 3. The results of CAH classification showed that there are two types of clusters, in blue the operators have a high-performance and in red the operators are not performing

Fig. 4. The silhouette analysis indicates that the classification of public operators is well grouped by a positive distribution.

88

A. Derbel and Y. Boujelbene

4 Supervised Classification In the supervised context, we already have examples whose data are associated with class labels denoted K= {Cyes; Cno}. The supervised classification is used to assign a new observation to a class from the available clusters. Among the supervised methods we cited: k-nearest neighbors, decision trees and Bayes naive classifiers. In the rest of this report, we presented the three supervised classification methods with detailed examples. Decision tree: The decision tree is a recent method of data mining and has been widely studied and applied in the supervised classification domain, for the purpose of predicting a qualitative decision using variables of all data types (qualitative and/or quantitative). However, the decision tree is based on a hierarchical representation for managing a sequence of tests to predict the outcome classification. The different possible classification decisions are located at the ends of the branches (the leaves of the tree). In some areas of application, it is important to produce user-understandable classification procedures. Decision trees respond to this constraint and graphically represent a set of well-designed and clearly interpretable rules. The operating principle is as follows: a decision tree is the graphical representation in the form of a tree to develop the classification procedures. For each node, we chose the variable that best presents the individuals according to the categories of the other variables. In these cases, the score evaluation criterion is characterized by the maximum gain in information. Let (S) a sample, and {S1, …, Sk} the partition of (S) according to the classes of the target attribute. Entropy is defined as follows: EntðSÞ ¼ 



k X j Si j i¼1

jSi j  log S S

 ð2Þ

The “Gain” information makes it possible to locally evaluate the attribute that brings the most information to the result can be predicted. This function is expressed as follows: GainEnt ðp; TÞ ¼ EntðSp Þ 

2 X

Pj  EntðSpj Þ

ð3Þ

j¼1

These criteria will calculate values for every attribute. The values are sorted, and attributes are placed in the tree by following the order, i.e., the attribute with a high value (in case of information gain) is placed at the root. The process also stops automatically if the elements of a node have the same value for the target variable. Naive Bayesian Classifier: The naive Bayesian classification is a simple type of probabilistic classification based on Bayes’ theorem with strong independence of hypotheses (so-called naive). Depending on the nature of each probabilistic model, naive Bayesian classifiers can be effectively trained in a supervised learning context to classify a set of observations [3]. The algorithm is generally based on the first stage of

Automatic Classification and Analysis of Multiple-Criteria

89

“Apprenticeship Works” data to perform the classification. During the learning phase, the algorithm makes it possible to elaborate classification rules on this data set that will be used for testing and prediction. Given a set of variables X = {x1, x2, …, xd}, we want to calculate the posterior probability of the event (Yj) among a set of possible Y = {c1, c2, …, cd}. In more common terminology, (X) represents the preachers and (Y) represents the variable to predict (the attribute that has K modalities). The Bayes rule is defined as follows: PðY ¼ c=X ¼ xÞ /

PðX ¼ x=Y ¼ cÞPðY ¼ cÞ Likelihood  prior / PðX ¼ xÞ evidence

ð4Þ

Thanks to the Bayes rule above, we will assign a new observation (X) to the (Yj) class which has the highest posterior probability.   ^yðxÞ ¼ yj , yj ¼ arg max PðY ¼ cj Þ  P XðxÞ=Y ¼ cj j

ð5Þ

K nearest neighbor method (k-NN or k-ppv): The k nearest neighbor method (kNN) is used to classify target points based on their distance from a learning sample. The k-ppv method is an automatic classification approach, and is a generalization of inductive classification methods. The general principle of the k-NN method is as follows, given a properly labeled learning base, the k-NN classifier determines the class of a new object by assigning it the majority class of (x) objects in the database. In this context, we have a learning database consisting of (N) input-output pairs. To estimate the output associated with a new input (x), the method of k nearest neighbors consists of taking into account the (k) training samples whose input are closest to the new input (x), according to a predetermined distance. Generally, the determination of similarity is based on the Euclidean distance. The algorithm illustrates a decision-making based on the search for one or more similar cases. Indeed, the algorithm looks for the k nearest neighbors of the new case and predicts the most frequent answer by classifying the target points according to their distance from points in the learning base [4].

5 Classification Performance Indicators Evaluation of classifiers is often an inevitable step in order to test classification performance. The results of a supervised classification must always be validated. Not only this step makes it possible to verify that the model presents a good capacity of generalization, but also makes it possible to compare the results of several techniques and to privilege the most adapted methods in the MCDA application. There are many statistical techniques to evaluate the performance of classifiers. We presented the confusion matrix because it is the most effective method applied in the field of data analysis [5]. In this regard, we constructed a prediction model based on three learning methods (decision tree, nearest neighbor and Bayesian approach method) to classify the performance of public transport operators. From three supervised classification methods, several ratios can measure the performance of classifier such as recall, precision,

90

A. Derbel and Y. Boujelbene

error rate and F − 1. In our application, we calculated by the MCDA method the performance scores for each operator and for each year from 2007 to 2015, so that we obtained 117 observations. We applied the method of the automatic multi-criteria decision analysis illustrated in Fig. 1, and using the confusion matrix we are able to find the most suitable supervised learning that can be applied directly in the MCDA procedure. Confusing matrix: it allows visualization of the performance of an algorithm. Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa). It is a special kind of contingency tables, with two dimensions (“actual” and “predicted”), and identical sets of “classes” in both dimensions (each combination of dimension and class is a variable in the contingency table). In this article, we used a confusion matrix to predict two classes, a class “Yes” indicates that public transport operators perform well, and a “No” class indicates the non-performance of operators, as indicated in Table 1. To measure reliability, it is customary to distinguish 4 types of elements classified: True positive TP which designates an element of the class “Yes” correctly predicted. True negative TN which designates an element of the class “No” correctly predicted. False positive FP which designates an element of the class “Yes” poorly predicted. False negative FN which designates an element of the class “No” poorly predicted. Table 1. Confusion matrix

Actual class

No Yes

Bayes naive Predicted class No Yes 43 4 2 68

Decision tree Predicted class No Yes 44 3 3 67

k-NN Predicted No 32 18

class Yes 15 52

From the confusion matrix, we can release 4 measures: Precision: It expresses the proportion of the data points our model is performing. Precision ¼

TP TP þ FP

ð6Þ

Recall: it used to determine the ability of a model and to find all the relevant cases within a dataset. Recall ¼

TP TP þ FN

ð7Þ

Automatic Classification and Analysis of Multiple-Criteria

91

The error rate: it estimates the misclassifying probability. Error rate =

FN þ FP Number of observations

ð8Þ

F − 1: It is a measure of a test’s, and represents the harmonic average of the precision and recall, where an F − 1 score reaches its best value at 1 (perfect precision and recall) and worst at 0. F1¼

2  ðPrecision  recallÞ Precision þ recall

ð9Þ

The results demonstrated that the Bayes classification provided better predictive performance, comparable to other techniques whose effectiveness is recognized. We can also see that this method can be used to classify and predict the performance of public transport operators with an error rate of 5%. The naive Bayesian network also showed a high accuracy of 97% compared to other classification series. A classification system for perfect public transport operators will therefore provide 100% precision and recall. The classification of k-NN is far from performing, when compared to other classification methods, a low accuracy rate of 74% with a very limited recall rate of 77%, while the Bayesian classification algorithm is accurate with a very high score of precision, and more performance at the level of the recall information (the Bayesian classification found 95% of the possible answers compared to 94% of decision trees, as indicated in Table 2.

Table 2. The classification of performance indicators Method Bayes naïve Decision tree k-NN

Precision 0.971 0.957 0.743

Recall 0.957 0.944 0.776

Error rate 0.029 0.051 0.282

F−1 0.958 0.957 0.759

As we have seen, the Bayes naive provides a useful assessment on several crucial problems, and can be applied successfully in the MCDA application. Despite the relatively simplistic independence assumptions, the naive Bayesian classifier has several properties that make it very practical in real cases. In particular, the dissociation of class conditional probabilities between the different characteristics results in the fact that each probability law can be independently estimated as a one-dimensional probability law. This avoids many problems from the scourge of the dimension, which gives an immediate advantage in terms of computability [6]. The naive Bayes classifier works like the MCDA model, both methods are based on the independence of variables, which gives it a high compatibility in the level of the automatic classification.

92

A. Derbel and Y. Boujelbene

6 Conclusion The processing of information and the analysis of data appear to make a good decision by the public authority and more specifically to facilitate administrative tasks. When public transport decision makers in Tunisia have decided to integrate a new operator, the managers do not need to repeat the procedure of the MCDA method or to duplicate the work. The role of machine learning is therefore to reproduce an automatic classification decision on the database stored in the system and to exclude any human intervention. This is an interesting procedure that allows us to monitor the traceability of public transport operators and identify failures via a permanent crisis management mechanism. This strategy makes it possible to distinguish between operators (performers and non-performers) and to find the operators who are very gathered, it is also an intuitive way to compare performance between different operators. The use of machine learning and predictive analysis involves calculating future trends and opportunities for making recommendations. In this context, the Bayes classifier proves in practice to be well adapted in the context of the automatic decision making, and to the advantage of being extremely efficient in terms of decision-making [7]. Despite a simplistic conception, this model show, in many real situations, a predictability that is surprisingly superior to other competing models, such as decision trees k-NN. The Bayesian classification is also very easy to program, its implementation is of even greater significance, the estimation of its parameters and the construction of the model are very fast on databases of small or medium size, either in number of variables or in the number of observations. The limit of our proposal that the size of the sampling is very limited, 117 observations are not able to release an effective and automated model [8]. The next step is to prune the database, and we able to integrate other methods such as: Boosting, Random Forests, Artificial Neural Network and Support Vector Machine for operating an automatic usage decision.

References 1. Boujelbene, Y., Derbel, A.: The performance analysis of public transport operators in Tunisia using AHP method. Procedia Comput. Sci. 73, 498–508 (2015) 2. Loussaief, S., Abdelkrim, A.: Machine learning framework for image classification. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 58–61 (2016) 3. Jiang, L., Wang, S., Li, C., Zhang, L.: Structure extended multinomial naive Bayes. Inf. Sci. 329, 346–356 (2016) 4. Peng, X., Cai, Y., Li, Q., Wang, K.: Control rod position reconstruction based on K-Nearest Neighbor Method. Ann. Nucl. Energy 102, 231–235 (2017) 5. Silva-Palacios, D., Ferri, C., Ramírez-Quintana, M.J.: Improving performance of multiclass classification by inducing class hierarchies. Procedia Comput. Sci. 108, 1692–1701 (2017) 6. Tsangaratos, P., Ilia, I.: Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: the influence of models complexity and training dataset size. CATENA 145, 164–179 (2016)

Automatic Classification and Analysis of Multiple-Criteria

93

7. Derbel, A., Boujelbene, Y.: Road congestion analysis in the agglomeration of Sfax using a Bayesian model. In: Lecture Notes in Computer Science book series LNCS 11277, pp. 131– 142 (2018) 8. Derbel, A., Boujelbene, Y.: Bayesian network for traffic management application: estimated the travel time. In: 2nd World Symposium on Web Applications and Networking (WSWAN) (2015)

An Incremental Extraction and Visualization of Ontology Instance Summaries with Memo Graph Fatma Ghorbel1,2(&), Elisabeth Métais1, Fayçal Hamdi1, and Nebrasse Ellouze2 1

CEDRIC Laboratory, Conservatoire National des Arts et Métiers (CNAM), Paris, France [email protected], {metais,faycal.hamdi}@cnam.fr 2 MIRACL Laboratory, University of Sfax, Sfax, Tunisia [email protected]

Abstract. In the context of a prosthesis for Alzheimer’s patient, we want to show the family and entourage tree of the patient from their saved data structured based on the PersonLink ontology. The generated graph ought to be accessible and readable to this particular user. In our previous work, we proposed our ontology visualization tool called Memo Graph. It aims to offer an accessible visualization to Alzheimer’s patient. In this paper, it is extended to address the readability requirement. The second version is based on our approach called IKIEV. It extracts instance summarizations from a given ontology and generates a set of “summary instance graphs” from the most crucial data (middle-out browsing method). The extraction and visualization processes are undertaken incrementally. First, an “initial summary instance graph” is generated, then permitting iteratively the visualization of supplementary key-instances as required. This tool is integrated in the memory prosthesis to visualize data structured using PersonLink. We discuss the reassuring results of the usability evaluation of our IKIEV approach. Keywords: Ontology visualization  Readable visualization  Ontology summarization  Key-instances  Alzheimer’s patient

1 Introduction In the VIVA1 project, we are developing a memory prosthesis to help Alzheimer’s patient to palliate problems related to memory loss. It is called Captain Memo. Personal data of the patient are structured semantically using PersonLink [1] which is a multicultural and multilingual OWL 2 ontology for storing, modelling and reasoning about interpersonal relationships. Among the services offered by Captain Memo, one aims to “remember things about people” via the generation of the family and entourage tree. This graph is generated from their stored data. Hence, there is a need to integrate in Captain Memo an ontology visualization tool. 1

http://viva.cnam.fr/ (« Vivre à Paris avec Alzheimer en 2030 grâce aux nouvelles technologies »).

© Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 94–104, 2020. https://doi.org/10.1007/978-3-030-21005-2_9

An Incremental Extraction and Visualization

95

Alzheimer’s patient presents own characteristics that are different from non-expert users. Some of these characteristics are related to Alzheimer’s (e.g., attention and concentration deficit) and other characteristics are linked to the aging process (e.g., sight loss). These characteristics impair this particular user to interact with graphs offered by standard ontology visualization tools targeting non-expert users e.g., Alzheimer’s patient has difficulty to read small nodes and loses concentration when reading dense and crowded graphs. Hence, there is a need to integrate in Captain Memo an ontology visualization tool that generates an instance graph which has the particularity to be accessible and readable to Alzheimer’s patient. Several ontology visualization tools have been proposed in the last two decades. However, to the best of our knowledge, there is no existing tool that is proposed to be used by Alzheimer’s patient. In [2], we proposed a tool, called Memo Graph, which aims to offer accessible ontology visualizations to Alzheimer patient. It is proposed mainly to be integrated in Captain Memo. In the present paper, we propose an extension of our early work that addresses the readability requirement. The aim is to alleviate the generated graphs. The new version of Memo Graph is based on our IKIEV approach (acronym for Incremental KeyInstances Extraction and Visualization). It extracts and visualizes, in an incremental way, instance summarizations of a given ontology to offer concise and readable overviews and support a middle-out navigation method, starting from the most important instances. The remainder of the present paper is structured as follows. In Sect. 2, we focus on related work, focusing on ontology visualization and summarization. Section 3 presents our first version of Memo Graph. Section 4 presents our extension of Memo Graph that is based on our IKIEV approach. Section 5 details the evaluation results. In Sect. 6, we present the conclusions and we propose some future research directions.

2 Related Work The present work is closely related to the two following research areas: (1) ontology visualization and (2) ontology summarization. 2.1

Ontology Visualization

Only very few ontology visualization tools target non-expert users e.g., OWLeasyViz [3], WebVOWL [4] and ProtégéVOWL [4]. However, they are not designed to be used by Alzheimer’s patient. The current tools overlook the importance of the issues related to the Human–Computer Interaction filed. Most tools do not take into account the understandability requirement. OWLGrEd [5], ezOWL [6] and VOM2 offer UML-based visualizations. A major drawback of these tools is that they require knowledge about UML. Thus, they are understandable

2

http://thematix.com/tools/vom.

96

F. Ghorbel et al.

only for expert users. Besides, almost all tools use technical jargon. For instance, WebVOWL and ProtégéVOWL, targeting users less familiar with ontologies, use some Semantic Web words. SOVA3, GrOWL [7], WebVOWL and ProtégéVOWL aim to offer understandable visualizations by defining notations using different symbols, colors, and node shapes for each ontology key-element. However, the notations proposed by SOVA and GrOWL contain many abbreviations and symbols from the Description Logic. As a consequence, the generated visualizations are not suitable for Alzheimer’s patients. WebVOWL and ProtégéVOWL visualize only the schema of the ontology. Most tools overlook the importance of the readability requirement. According to [4], the current generated visualizations are hard to read for casual users. This problem becomes worse with Alzheimer’s patient. For instance, SOVA, GrOWL, IsaViz4 and RDF Gravity5 require the loading of the entire resulting graph in the limited space provided by the computer screen which generates an important number of nodes and a large number of crossing edges. Without applying any filter technique, the generated graphs appear crowded, which have a bad impact on its readability. According to [4], all RDF visualizations are hard to read due to their large size. KC-Viz [8] aims to offer a readable visualization of the TBox of the ontology to expert users. It is based on an approach that summarizes the schema of the ontology. Only few tools aim for a comprehensive ontology visualization. For instance, OWLViz6, OntoTrack [9] and KC-Viz visualize merely the class hierarchy of the ontology. OntoViz Tab [10], TGViz [11] and OntoRama [12] show only inheritance relationships between the graph nodes. Besides, most tools do not offer a clear visual distinction between the different ontology key-elements. This issue has a bad impact on the understandability of generated visualizations. For example, there is no visual distinction between datatype and object properties visualized by RDF Gravity. TGViz and NavigOWL [13] use a plain node-link diagram where all links and nodes appear the same except for their color. Most ontology visualization tools are implemented as Protégé plug-in. Thus, they cannot be integrated in Captain Memo. 2.2

Ontology Summarization

Ontology summarization is the process of extracting important knowledge from a given ontology to produce a reduced version [14]. It helps users to easily make sense of a given ontology. However, a “well” summary is not a trivial task [15]. Numerous approaches have been proposed to identify relevant concepts in ontologies [16]. In [17], the authors introduce an ontology summarization approach. It is based on a number of criteria, drawn from the lexical statistics (i.e., popularity), the network topology (i.e., density and coverage) and the cognitive science (i.e., natural

3 4 5 6

http://protegewiki.stanford.edu/wiki/SOVA. https://www.w3.org/2001/11/IsaViz/overview.html. http://semweb.salzburgresearch.at/apps/rdf-gravity/. http://protegewiki.stanford.edu/wiki/OWLViz.

An Incremental Extraction and Visualization

97

categories). The natural categories criteria include: the name simplicity measure which favors entities that are labeled with simple names and penalizes compounds; and the basic level measure which measures how an entity is “central” in the taxonomy of the given ontology. Density is computed based on the number of instances, sub-classes and properties. Coverage is based on the dissemination of important concepts in the ontology. Popularity is based on the number of hits returned in the search by the concept name on Yahoo. This approach ignores the graph’s complexity and focuses only on hierarchical relations [15]. In [18], the authors propose an approach to summarize ontologies. Two measures are used. The centrality measure is calculated using the number of relationships between the concepts. The frequency is used as a distinguishing criterion if the ontologies to be summarized are merged ontologies. In [16], the authors introduce an algorithm to produce a personalized ontology summary. It is based on a set of relevance measures and it allows taking into consideration of the users’ opinion. Troullinou et al. [15] introduce an approach to summarize RDF/S knowledge bases. They exploit the semantics of the knowledge base and the associated graph’s structure. Different measures are used. All the mentioned approaches focus only on summarizing the schema. They do not define measures to identify relevant instance in ontology.

3 Background: Our Ontology Visualization Tool Memo Graph Memo Graph is a tool that visualizes ontology. It aims to offer an accessible graph to Alzheimer’s patient. The graph design is based on our 146 guidelines for designing user interfaces dedicated to people with Alzheimer, presented in [19]. The generated graph has the adequate size of nodes and text. An auditory background is added to help users in their interactions. For instance, if they position the cursor on the keyword search field, they are informed that they can search a given element in the graph via an input field. We provide the traditional and speech-to-text modalities. We use a facile-to-understand wording. For instance, we do not use Semantic Web vocabulary. Graph nodes are identified using both photos and labels. The photo facilitates the comprehension. It can be automatically added from Google if it is not given by the user. Nodes representing classes are slightly larger than nodes representing instances. Our visualisation tool offers the interaction techniques detailed by Shneiderman [20]: zoom, overview, details-on-demand, filter, history, relate and extract. Two other interactions are also supported: keyword search and animation. We evaluated the accessibility of the generated graph with 22 Alzheimer’s patients. The results are promising. However, we noticed that they lose concentration when reading dense entourage/family graph. Attempting to address this issue, we extend Memo Graph to be based on our IKIEV approach.

98

F. Ghorbel et al.

4 Extending Memo Graph: Our IKIEV Approach In this paper, we extend Memo Graph. Its second version is based on our IKIEV approach. It tends to avoid problems related to dense and non-legible instance graph by limiting the number of visible nodes and preserving the most important ones. It allows an incremental extraction and visualization of instance summaries of the ontology – incremental being the operative word. Initially, it generates an “initial summary instance graph” of N0 key-instances with the associated properties, then allowing iteratively the visualization of supplementary key-instances as required (key-instances are visualized as nodes and properties are visualized as labeled edges). For each iteration i, it extracts and visualizes Ni = Ni-1 + Ai key-instances; where Ai represents the number of additional key-instances compared to the previous iteration. N0 and Ai are set by the user. Figure 1 summarizes our IKIEV approach.

Ontology

N0 key-instances

Summary extractor

Summary extractor

Summary of ABox

Summary of ABox

Graph Generator

Graph Generator

Iteration 0

NI-2 + AI-1 key-instances

N0 + A1 key-instances

Summary extractor



NI-1 + AI key-instances Summary extractor

Summary of ABox

Summary of ABox

Graph Generator

Graph Generator

Iteration 1 Iteration (I – 1)

Iteration I

Fig. 1. Our IKIEV approach.

4.1

Measures Determining Key-Instances

We propose three measures to elicit key-instances in ontology: Class Centrality, Property Centrality and Hits Centrality. The measures are categorized in global measures and local measures. Global measures reflect the importance of the instance with respect to all instances. Local measures consider its importance with respect to its surroundings. Each measure is obtained by combining the global measure and the local measure associated with a specific weight. Our aim is to insure that the instances

An Incremental Extraction and Visualization

99

locally significant have a higher score, even if they do are not significant based on the global measures. Let O = (C, P, N) be an ontology. C is a finite set of classes ci; P is a finite set of datatype and object properties pi (ci) with ci 2 C and N is a finite set of instances ni. Class Centrality. We define the Class Centrality measure to estimate the importance of the class ck 2 C. It is an adaptation of the density and relative cardinality class summarization measures defined respectively in [8] and [15]. The importance of a class depends on the number of the associated properties, the instances it has and the direct sub-classes it has and its super classes. Class Centralityðck Þ ¼ Global Class Centrality ðck Þ  WGCC þ Local Class Centrality ðck Þ  WLCC Global Class Centrality ðck Þ ¼ fCC ðck Þ = Max f8ci 2 C ! fCC ðci Þg

ð1Þ ð1:2Þ

Local Class Centrality ðck Þ ¼ fCC ðck Þ = Max f8ci 2 Nearest ðck Þ ! fCC ðci Þg ð1:3Þ fCC (ck) is a weighted aggregation on the number of super classes, sub-classes, properties and individuals of ck. The function Nearest (ck) returns the class ck and the associated sub-classes and super-classes. In our experiment, we use WGCC = 0.5 and WLCC = 0.5. Property Centrality. We define the Property Centrality measure to estimate the importance of a given property pk (ck) 2 P. Property Centralityðpk Þ ¼ Global Property Centrality ðpk Þ  WGPC þ Local Property Centrality ðpk Þ  WLPC Global Property Centrality ðpk Þ ¼ NIP ðpk Þ = Max f8pi 2 P ! NIP ðpi Þg

ð2Þ ð2:1Þ

Local Property Centrality ðpk Þ ¼ NIP ðpk Þ =Max f8pi 2 Properties ðck Þ ! NIP ðpi Þg ð2:2Þ NIP (pk) returns the number of instantiations of pk (ck). Properties (ck) returns the datatype properties of ck and its outgoing object properties. In our experiment, we use WGCC = 0.5 and WLCC = 0.5. Hits Centrality. The Hits Centrality measure identifies the instances that are commonly visited by the users. It is an adaptation of the popularity summarization measure defined in [8]. Hits Centrality ðnk Þ ¼ Global Hits Centralityðnk Þ  WGHC þ Local Hits Centralityðnk Þ  WLHC

ð3Þ

100

F. Ghorbel et al.

Global Hits Centrality ðnk Þ ¼ GH ðnk Þ  WG þ MGH ðnk Þ  WM = Max f8ni 2 N ! GH ðni Þ  WG þ MGH ðni Þ:  :WM g ð3:4Þ Local Hits Centrality ðnk Þ ¼ GH ðnk Þ  WG þ MGH ðnk Þ  WM = Max f8ni 2 Instances ðck Þ ! GH ðni Þ  WG þ MGH ðni Þ  WM g ð3:2Þ Instances (ck) returns the associated instances of ck. GH (nk) returns the number of hits returned when querying Google with the name of nk as a keyword. MGH (nk) returns the number of hits of nk recorded in previous sessions by the user in Memo Graph (we use the click-tracking tool). In our experiment, we use WGHC = 0.5, WLHC = 0.5, WG = 0.2 and WM = 0.8. Importance Score. All mentioned measures are used in an overall value, called Importance Score to estimate the importance of an instance nk. Importance Score ðnk Þ ¼ Class Centrality  X ð ck Þ  Property Centrality ðpk Þ = M  WC þ  WP þ Hits Centrality ðnk Þ  WH

ð4Þ

M is the number of incoming/outgoing object properties and datatype properties. In our experiment, we use WC = 0.3, WP = 0.3 and WH = 0.4. We rank instances based on their Importance Score. A higher score for an instance means that it is more adequate for the summary. 4.2

General Algorithm

The general algorithm of our IKIEV approach is given below.

Algorithm. 2. The general algorithm of our IKIEV approach.

An Incremental Extraction and Visualization

101

5 Experimentation The algorithm described in the previous section is implemented using the J2EE platform as a semantic web application. We use the JENA API7. 5.1

Application to Captain Memo

We integrated the proposed tool in Captain Memo to show the family and entourage tree of given patient from data saved using PersonLink. Figure 2 shows a summary instance graph generated using Memo Graph. It shows 10 key-instances.

Fig. 2. A family/entourage tree created based on Memo Graph.

5.2

Evaluation

In [16–18], the evaluation is done by comparing the ontology summaries produced by the proposed approach against the summaries generated by human experts. In this direction, we evaluate the usability of our approach IKIEV in determining the keyinstances. This evaluation is done in the context of Captain Memo. A total of 12 Alzheimer’s patients {P1… P12} and their associated caregivers {C1… C12} were recruited. All caregivers are first-degree relatives. Let us consider {KB1 … KB12}, where KBi represents knowledge base associated to Pi and structured using PersonLink. In our experiment, the number of the key-instances is set as 10. Three scenarios are proposed: – “Golden standard scenario”: Each caregiver Ci is requested to identify the 10 closest relatives of the patient Pi. The last ones formed the “gold standard” GSi. 7

https://jena.apache.org/.

102

F. Ghorbel et al.

– “IKIEV scenario @ 2 weeks”: For each KBi, we associate a summary Si@2 based on our IKIEV approach. The summaries are generated after 2 weeks of using and interacting with the resulting graph. – “IKIEV scenario @ 10 weeks”: For each KBi, we associate a summary Si@10 based on our IKIEV approach. The summaries are generated after 10 weeks of using and interacting with the resulting graph. We compare the generated summaries against the golden standard ones. We use the Precision evaluation metrics. PRi@2 (|Si@2 \ GSi| /|Si@2|) and PRi@10 represent respectively, the Precision associated to “IKIEV scenario @ 2 weeks” and “IKIEV scenario @ 10 weeks”. Figure 3 shows the results.

100 90 80 70 60 50 40 30 20 10 0

IKIEV scenario @ 2 weeks IKIEV scenario @ 10 weeks

1

2

3

4

5

6

7

8

9

10

11

12

Fig. 3. Evaluation’s results.

All entities of KBi are instances of the same class (Person). Thus, the Class Centrality measure has no influence on determining key-instances. The overall mean of the precision associated to “IKIEV scenario @ 10 weeks” is better than the overall mean of the precision associated to “IKIEV scenario @ 2 weeks”. This difference is explained by the fact that the Hits Centrality measure is improved from one navigation session to another.

6 Conclusion This paper introduced an extension of Memo Graph to offer readable instance visualizations. It is based on our IKIEV approach. It allows an incremental extraction and visualization of instance summaries of the ontology. To determinate the relevance of a given instance, we are based on the relevance of its associated class and properties as well as the history of its user hits. The proposed tool is integrated in the prototype of Captain Memo to generate the family/entourage tree of the Alzheimer’s patient from their personal data structured using the PersonLink ontology. We evaluated the usability of our IKIEV approach in determining key-instances. The results are promising.

An Incremental Extraction and Visualization

103

As consequential effects, Memo Graph can be used by expert users to offer readable visualizations not only of small-scale inputs, but also for the large-scale ones thanks to our IKIEV approach. Future work will be devoted to redefine the Hits Centrality measure to take into account the history of the user hits when using social networks. Moreover, we plan to extend our IKIEV approach to allow an incremental extraction and visualization of summaries of the ontology’s schema.

References 1. Herradi, N., Hamdi, F., Métais, E., Ghorbel, F., Soukane, A.: PersonLink: an ontology representing family relationships for the CAPTAIN MEMO memory prosthesis. In: ER 2015 (Workshop AHA), pp. 3–13. Series Lecture Notes in Computer Science (2015) 2. Ghorbel, F., Ellouze, N., Métais, E., Hamdi, F., Gargouri, F., Herradi, N.: MEMO GRAPH: an ontology visualization tool for everyone. In: KES 2016, pp. 265–274. Procedia Computer Science, York, United Kingdom (2016) 3. Catenazzi, N., Sommaruga, L., Mazza, R.: User-friendly ontology editing and visualization tools: the OWLeasyViz approach. In: International Conference Information Visualisation (2013) 4. Lohmann, S., Negru, S., Haag, F., Ertl, T.: Visualizing ontologies with VOWL. In: Semantic Web, pp. 399–419 (2016) 5. Bārzdiņš, J., Bārzdiņš, G., Čerāns, K., Liepiņš, R., Sproģis, A.: OWLGrEd: a UML style graphical notation and editor for OWL 2. In: OWL: Experiences and Directions Workshop (2010) 6. Chung, M., Oh, S., Kim, K., Cho, H., Cho, H.K.: Visualizing and authoring OWL in ezOWL. In: International Conference on Advanced Communication Technology (2005) 7. Krivov, S., Williams, R., Villa, F.: GrOWL: a tool for visualization and editing of OWL ontologies. In: Web Semantics: Science, Services and Agents on the World Wide Web (2007) 8. Motta, E., Mulholland, P., Peroni, S., Aquin, M., Gomez-Perez, J.M., Mendez, V., Zablith, F.: A novel approach to visualizing and navigating ontologies. In: International Semantic Web Conference, pp. 470–486. Springer, Berlin, Heidelberg (2011) 9. Liebig, T., Noppens, O.: OntoTrack: a semantic approach for ontology authoring. In: International Semantic Web Conference (2004) 10. Singh, G., Prabhakar, T.V., Chatterjee, J., Patil, V.C., Ninomiya, S.: OntoViz: visualizing ontologies and thesauri using layout algorithms. In: Fifth International Conference of the Asian Federation for Information Technology in Agriculture (2006) 11. Harith, A.: TGVizTab: an ontology visualisation extension for Protégé. In: Knowledge Capture (2003) 12. Eklund, P., Nataliya, R., Green, S.: OntoRama: browsing RDF Ontologies using hyperbolicStyle browser. In: First International Symposium on Cyber Worlds (2002) 13. Hussain, A., Latif, K., Tariq Rextin, A., Hayat, A., Alam, M.: Scalable visualization of semantic nets using power-law graph. In: Applied Mathematics and Information Sciences (2014) 14. Zhang, X., Cheng, G., Qu, Y.: Ontology summarization based on RDF sentence graph, pp. 707–716 (2007) 15. Troullinou, G., Kondylakis, H., Daskalaki, E., Plexousakis, D.: Ontology understanding without tears: the summarization approach. Semant. Web J. 8, 797–815 (2017)

104

F. Ghorbel et al.

16. Queiroz-Sousa, P.O., Salgado, A.C., Pires, C.E.: A method for building personalized ontology summaries. J. In-Form. Data Manag. 4, 236 (2013) 17. Peroni, S., Motta, E., Aquin, M.: Identifying key concepts in an ontology, through the integration of cognitive principles with statistical and topological measures. In: Semantic Web Journal, pp. 242–256. Berlin, Germany (2008) 18. Pires, C.E., Sousa, P., Kedad, Z., Salgado, A.C.: Summarizing ontology-based schemas in PDMS. In: Data Engineering Workshops (ICDEW), pp. 239–244 (2010) 19. Ghorbel, F., Métais, E., Ellouze, N., Hamdi, F., Gargouri, F.: Towards accessibility guidelines of interaction and user interface design for Alzheimer’s disease patients. In: Tenth International Conference on Advances in Computer-Human Interactions (2017) 20. Shneiderman, B.: The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the 1996 IEEE Symposium on Visual Languages (1996)

Choosing the Right Storage Solution for the Corpus Management System (Analytical Overview and Experiments) Damir Mukhamedshin(&), Dzhavdet Suleymanov, and Olga Nevzorova The Tatarstan Academy of Sciences, Kazan, Russia [email protected], [email protected], [email protected]

Abstract. Corpus management systems are widely used to solve the problems of human-computer interaction. There are many developments associated with the management of language corpora, for example, Sketch Engine [1], Manatee [2], EXMARaLDA [3], etc. We developed the system which considers certain specific features of Turkic languages on the one hand and has new search functions and components from the other hand. The corpus management system “Tugan Tel” (http://tugantel.tatar) is specifically designed to work with the National Corpus of Tatar and can be used to work with both the linguistic corpora of Turkic languages and the corpora of other languages. The corpus management system developed by the authors allows searching of lexical units, morphological and lexical searching, searching of syntactic units, searching of the n-gram, named entity extraction and others. The semantic model of the Tatar language data representation is the core of the system. Storage and processing of corpus data, searching in corpus data are performed using open source tools (MariaDB DBMS, Redis data storage). There are three basic stages of corpus management search engine development: the data model development, the system architecture development, and the database architecture development. The issues of collecting and processing of corpus data should also be considered. The main task of our research is the identification and description of solutions for the corpus data storage, collection, and processing. The developed data model can be used for supervised and unsupervised document classification, as well as in corpus exploring. The proposed solutions have been implemented in the corpus management system which is currently used for data representation and processing for the National Corpus of Tatar “Tugan Tel”. Keywords: Corpus manager Data processing

 Corpus data  Data storage  Search engine 

© Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 105–114, 2020. https://doi.org/10.1007/978-3-030-21005-2_10

106

D. Mukhamedshin et al.

1 Introduction Developing corpus management search engine architecture and database implies analyzing all the necessary functionality and testing various storage systems. Correct, fast enough and optimal in terms of resource consumption search engine performance depends entirely on a proper system and database architecture. One of the main goals of the development of the corpus management system “Tugan Tel” is to expand search capabilities for the Tatar National Corpus database. The corpus manager of the Tatar language is actively used in humanitarian and educational applications, as well as in applications belonging to computational linguistics sphere, which are used for researches of the Tatar language. Ready-made solutions are often used for developing national corpuses, which has its pros and cons. In particular, such solutions are usually proprietary.

2 Storage Systems for the Corpus Management System 2.1

Storage Systems Choice Criteria

First of all, it is necessary to determine a number of criteria that are important for choosing a storage system for a corpus management system. As the most important criteria authors identify the following: • • • • • •

Performance Functionality Cost Compatibility with other software Completeness of documentation Development prospects

The authors analyzed the information in public sources in order to evaluate the existing software solutions that can be used in corpus management systems according to the above mentioned criteria. 2.2

NoSQL Solutions

The fastest data storage systems are key-value storage systems, which are most often used to cache data in RAM and quickly access to them. In most cases, such systems use hash tables, less often trees and other methods of storing and accessing data. These storage systems often support distributed work on several machines, which can positively affect the speed even with large amounts of cached data. Memcached, memcacheDB. Memcached [4] is a storage system of keys and values in memory for small pieces of arbitrary data. Its simple design facilitates rapid deployment and ease of development, and solves many problems that can be encountered with large amount of cached data. Memcached API is available for most popular programming languages.

Choosing the Right Storage Solution

107

MemcacheDB [5] is a distributed key-value storage system and is designed for persistent data. It uses the memcache protocol, so any Memcached client can connect to it. MemcacheDB uses Berkley DB as a storage, so it supports most functions, including transactions and replication. Developers present the results of experiments [6], according to which the average performance in one thread is 18868 writes per second and 44444 reads per second. The experiments were performed on the Dell 2950III server. Memcached and MemcacheDB are free software; they are poorly documented and receive almost no further improvement. Redis. Redis [7] is a high-performance, non-relational distributed data storage system. Unlike Memcached, Redis stores data constantly, so it looks like MemcacheDB. Redis allows storing both strings and arrays, to which one can apply atomic pop/push operations, make selections from arrays, perform sorting of elements, obtain joins and intersections of arrays. The performance of Redis on an entry-level Linux server is 110000 SET requests per second and 81000 GET requests per second [8]. On the Intel Core i5, 4 GB RAM, 7200 RPM SATA, the Redis showed the following results: 120000 SET requests per second, 270000 GET requests per second. Redis is a free software that has full official documentation in English. This project is being actively developed. FoundationDB. FoundationDB [9] is a NoSQL database with a shared nothing architecture. The main database provides an ordered key-value store with transactions. Transactions can read or write multiple keys stored on any machine in the cluster that fully supports the ACID properties. The performance of FoundationDB in typical load mode by random operations in the database over time (90% of reads and 10% of write operations) is an average of 890000 operations per second [10]. The peak initial performance is about 1450000 operations per second. Experiments were performed on an Ethernet cluster of 24 machines with the configuration E3-1240v1 CPU, 16 GB RAM, 2x200 GB SATA SSD. The data that was used for experiments was in the form of 2 billion keyvalue pairs with 16-bit keys and random 8-100-bit values. It was stored with triple replication. The Community license allows FoundationDB to be used free of charge with the restriction of 6 running processes on the production cluster. 2.3

Solutions Based on Search Systems

Sphinx. Sphinx [11] is a search system consisting of an indexer and a full-text search module that contains the lemmatization function for many languages, is used in projects such as Avito and supports distributed storage of data. The functionality of Sphinx allows the use of additional fields. Sphinx supports the API and can work with the help of an additional module for the MySQL DBMS. According to the results of performance testing that was published in public sources, the reading speed for full-text search is 1800 requests per seconds. Sphinx is a free software; it has full official documentation in English and is being actively developed.

108

D. Mukhamedshin et al.

ElasticSearch. ElasticSearch [12] is a search engine that supports full-text search for structured data and has an advanced syntax that allows to perform complex search queries, works with the JSON format and supports distributed data storage. ElasticSearch supports work on the HTTP protocol, which excludes any problems of compatibility with other software. ElasticSearch is a free software, has full official documentation in English and is being actively developed. 2.4

Relational Databases

MariaDB (MySQL). MariaDB [13] is a relational DBMS, a branch from MySQL. It has a number of advantages over MySQL, in particular an improved XtraDB database driver that shows the performance gain at high loads. The performance of MariaDB, claimed in public sources, is 3000 transactions containing read and write operations, per second. MariaDB supports slave-master mode, which allows distributing the load to several servers. MariaDB is free, has full documentation in the English and other languages. The project is in active development with a big community of developers around MariaDB and MySQL, which contributes to the accumulation of a large number of use-cases in a particular task.

3 Experiments To assess the performance of storage systems for the corpus management systems, we performed experiments on an entry-level machine. The technical characteristics of this computer are 4 CPU cores (2.7 GHz each), 4 GB DDR3 RAM, 20 GB HDD (5400RPM), 1 GB SSD for swap. This computer was running under the completely free Debian 7.5 operating system. Data was randomly generated for each data storage system. We created an inverse index of text from randomly selected word forms with random morphological properties in the form of a binary string or a set of values “0” and “1”, depending on the functionality of the data storage system. The documents were not stored in the storage system. To accurately assess performance, the speed of performing write and read operations over time is important. Approximating functions were used to estimate the predicted speed. 3.1

Memcached Storage

Approximately the same recording speed with small losses over time and with an increase in the amount of data stored were detected for the Memcached storage. The maximum writing speed was 7176.17 word forms per second, and the minimum speed was 3283,086 words per second. The results of the experiment are shown in Fig. 1.

Choosing the Right Storage Solution

109

Wordforms per second

Wordforms wriƟng 9000

10000000

8000 8000000

7000 6000

6000000

5000 4000

4000000

3000 2000

2000000

1000 0

0 Fig. 1. The write speed in Memcached storage

There is also a slight decrease in the speed of processing queries over time (Fig. 2) when search queries are performed. The maximum speed of performing search queries was 20,782 requests per second, and the minimum speed was 13,249 requests per second.

Search queries

Search queries per second

14000

35

12000

30

10000

25

8000

20

6000

15

4000

10

2000

5

0

0 0

253

504

0

253

504

Fig. 2. The search queries speed in Memcached storage

110

D. Mukhamedshin et al.

3.2

MemcacheDB Storage

The write speed changed little over time when data were written in the MemcacheDB storage. Thus, the maximum data writing speed was 745.08 words per second, and the minimum speed was 560.663 word forms per second. The results of the experiments are shown in Fig. 3.

Wordforms

Wordforms per second

7000000

1200

6000000

1000

5000000

800

4000000 600 3000000 400

2000000 1000000

200

0

0 Fig. 3. The write speed in MemcacheDB storage

There is a sharp decrease in speed when performing searches, but then the speed is restored. This is due to the fact that most of the data are stored on the HDD. The maximum speed of search requests was 7.906 requests per second, the minimum speed was 1.76 requests per second. The results of the experiments are shown in Fig. 4. Search queries per second

Search queries 30000 25000 20000 15000 10000 5000 0

9 8 7 6 5 4 3 2 1 0 Fig. 4. The search queries speed in MemcacheDB storage

Choosing the Right Storage Solution

3.3

111

MariaDB (MySQL) DBMS and Redis Storage

The next solution for the experiments was to become Redis storage, but the authors offer a more reliable solution that allows solving the problem of reverse search. This is a pair of MariaDB DBMS and Redis storage. The data architecture in this case is developed in such way that the MariaDB DBMS stores collections of documents, contexts and the main index of the corpus data, divided into several sections to ensure fast execution of reverse search queries. The Redis storage stores the main indexes of wordforms and lemmas, indicating their occurrence in one or other section of the main index, which allows optimizing executing of database queries in such way as to speed up the execution of direct search queries [14]. This data architecture does not have a high write speed, and the speed decreases with time. Thus, the maximum write speed was 1593.893 wordforms per second, the minimum speed – 557.968 wordforms per second. The results of the experiment are shown in Fig. 5.

Wordforms

Wordforms per second

6000000

3000

5000000

2500

4000000

2000

3000000

1500

2000000

1000

1000000

500

0

0 0

2520 5040 7556

0

2520 5040 7556

Fig. 5. The write speed in MariaDB DBMS and Redis storage

When executing search queries, there was also a decrease in speed over time. The maximum speed of query execution was 20.656 wordforms per second, the minimum speed – 14.114 wordforms per second (Fig. 6).

112

D. Mukhamedshin et al.

Search queries 20000

Search queries per second 30 25

15000 20 10000

15 10

5000 5 0

0 0

251 502

0

251 502

Fig. 6. Search queries speed in MariaDB DBMS and Redis storage

3.4

Experiments with Other Solutions

The authors also tested other solutions described in Sect. 2. For example, the Redis storage (this time not in a pair with MariaDB) showed a high write speed – the maximum speed of 5340.173 wordforms per second at the beginning of the experiment and 3791.981 wordforms per second at the end of experiment. As for the speed of search queries execution, it was also quite high – the maximum speed was 20.953 queries per second and the minimum – 14.056 queries per second. Another solution, based on the NoSQL database FondationDB, showed a stable write speed of 697.113 wordforms per second, which did not change much over time. The speed of search queries execution for this experiment was 9.665 queries per second at the beginning of the experiment (maximum) and 5.838 queries per second at the end of the experiment (minimum). Sphinx search engine showed the maximum write speed of all tested solutions – 9317.926 wordforms per second at the beginning of the experiment and 6814.857 wordforms per second at the end of the experiment. But when executing search queries, the speed turned out to be the slowest – from 0.631 to 1.062 queries per second. Experiments with another search engine ElasticSearch showed the following results. During writing, the average speed was 3546.570 wordforms per second, during the execution of search queries the maximum speed was 16.135 queries per second, and the minimum speed – 11.592 queries per second.

Choosing the Right Storage Solution

113

4 Conclusion The solutions described in this article are widely used for solving various problems. Some solutions are good for data caching, others are good for retrieving data by complex queries. The main task of this article is to identify the most optimal solution for storing and processing of corpus data. The maximum speed when executing random search queries was achieved using the Redis storage. In general, this solution fully meets the criteria of performance, functionality, cost, compatibility, completeness of documentation and development prospects. A slightly lower speed of query execution was shown by MariaDB + Redis pair. These solutions are also optimal according to the criteria of performance, functionality, cost, compatibility, completeness of documentation and development prospects. Note that the maximum speed of search queries execution between the Redis storage and the MariaDB + Redis pair is only 1.4% in favor of Redis storage. MariaDB’s functionality as a relational database can be used to solve various statistical tasks and tasks that require the execution of non-standard data selecting without requiring the development of additional applications and limited only to the MySQL query language. Thus, for the corpus manager system, the MariaDB + Redis pair and the data architecture described in Sect. 3.3 were chosen. Already after the development of the corpus manager, performance testing was conducted, the results of which confirmed the correctness of the choice of the solution on real corpus data. The results can be seen in Fig. 7.

Performance testing results 100%

98.71%

80% 60% 40% 20% 1.16%

0.12%

0.01%

0.05 - 0.1 sec.

0.1 - 1 sec.

1 - 2.5 sec.

0% < 0.05 sec.

Fig. 7. Corpus manager system performance testing results on direct search queries

114

D. Mukhamedshin et al.

Performance testing of the corpus manager system showed that the time required to process and execute the search query does not exceed 0.05 s in 98.71% of cases for lexical (direct) search, in 77.71% of cases for morphological (reverse) search and in 98.08% of cases for lexical-morphological search. In many respects, such results were achieved due to the use of MariaDB + Redis pair as a data storage and processing solution.

References 1. Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Suchomel, V.: The sketch engine: ten years on. Lexicography 1(1), 7–36 (2014) 2. Rychlý, P.: Manatee/bonito-a modular corpus manager. In: 1st Workshop on Recent Advances in Slavonic Natural Language Processing, pp. 65–70, December 2007 3. Schmidt, T., Wörner, K.: EXMARaLDA – creating, analyzing and sharing spoken language corpora for pragmatics research. Pragmat.-Q. Publ. Int. Pragmat. Assoc. 19(4), 565 (2009) 4. Memcached: A distributed memory object caching system. https://memcached.org/. Accessed 30 June 2018 5. MemcacheDB: Wikipedia. https://en.wikipedia.org/wiki/MemcacheDB. Accessed 30 June 2018 6. MemcacheDB: Bauman National Libriary. https://en.bmstu.wiki/MemcacheDB. Accessed 30 June 2018 7. Nelson, J.: Mastering Redis. Packt Publishing Ltd, Birmingham (2016) 8. How fast is Redis? – Redis. https://redis.io/topics/benchmarks. Accessed 30 June 2018 9. FoundationDB | Home. https://www.foundationdb.org/. Accessed 30 June 2018 10. Performance: FoundationDB 5.2. https://apple.github.io/foundationdb/performance.html. Accessed 30 June 2018 11. Sphinx | Open Source Search Engine. http://sphinxsearch.com/. Accessed 30 June 2018 12. Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. O’Reilly Media, Inc., Newton (2015) 13. Bartholomew, D.: Getting Started with MariaDB. Packt Publishing Ltd, Birmingham (2013) 14. Nevzorova, O., Mukhamedshin, D., Gataullin, R.: Developing corpus management system: architecture of system and database. In: Proceedings of the 2017 International Conference on Information and Knowledge Engineering. CSREA Press, United States of America, pp. 108– 112 (2017)

Requirements Imprecision of Data Warehouse Design Fuzzy Ontology-Based Approach - Fuzzy Connector Case Abdelmadjid Larbi1(&) and Mimoun Malki2 1

ENERGARID Laboratory, SimulIA Team, Tahri Mohamed Bechar University, Bechar, Algeria [email protected] 2 LabRI-SBA Laboratory, ESI, Sidi Bel Abbes, Algeria [email protected]

Abstract. Imprecision in decision systems can negatively affect the data warehouse (DW) quality during a bad interpretation case. In order to evaluate the imprecision expression in decisional requirements and differently to our previous paper, we present an ontological solution using our GLMR ontology model for fuzzy connector evaluation in a query-based requirement. For simplification reasons, we present in this paper only “and if possible” fuzzy connector case. We will propose a new solution combining two recent existent solutions. Although the fuzzy connector assessment was already treated but, according to our knowledge, never in the decision need context and proposing an ontological solution. The preliminary tests of our ontological solution are encouraging. Keywords: Requirement expression  Data warehouse  Design  Imprecision  Fuzzy ontology  Fuzzy connector  Quality  Evaluation

1 Introduction Few works, as [2] can notice, deal with the problems related to the DW design quality. In a previous literature review [3] on the decision-making requirements vagueness, we found that none of the works takes into account this vagueness problem. We have already presented, in later work [4], the vagueness study of the query-based requirements where the fuzziness is presented either in the predicate operator/value. In this work, we focus the study on the fuzzy connector. This assumes that the requirement includes, in this case, at least two predicates linked by a fuzzy connector, called a fuzzy bipolar query. The rest of this article is organized as follows: The second section presents the background. The third section presents the related works to the research topic. Our proposed approach is presented in the section four. The section five presents the preliminary tests. The section six concludes the paper and cites some perspectives.

© Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 115–122, 2020. https://doi.org/10.1007/978-3-030-21005-2_11

116

A. Larbi and M. Malki

2 Background First, we define first the fuzzy bipolar query, before introducing the fuzzy connector, which will be evaluated using ontological solution. 2.1

Fuzzy Bipolar Query

The authors in [5] define a fuzzy bipolar query as a query implying fuzzy bipolar conditions: a fuzzy conditions particular case. The authors in [6] define a bipolar including two components: A constraint represents a mandatory condition while a wish represents an optional condition. Example: “Acceptable bus have less than 3 ( < qt cX þ dY  > 2 > : eX þ fY\T

ð1Þ ð2Þ ð3Þ

X is the number of people who take care of the household garbage collection work, Y is the distance traveled at each evacuation, a financial charge per person, b financial charge per kilometer, c_t is the total charge, it is the quantity garbage collected per person, d is the amount of garbage collected per distance traveled, q_t is the total amount of garbage, e time is given per person then f is the time given in relation to the distance traveled and T is Time exempted total. In (1) we will minimize the evacuation charge, in (2) we will maximize garbage collected and (3) we minimize the evacuation time. This resolution is done automatically by Jason-RS and Jason-WS by selecting, composing and then providing reusable Web Service between agent and other applications. The Fig. 2 represents the architecture corresponding to the case study. The different agents automatically select the Web Services provided by the “Voirie” and “Le Relais” entities. Then, the agents process the information provided by the Web Service which is only the resolution of the constraint in the Eq. (1) (2) (3). Then the agents compose or orchestrate the Web Services to optimize the waste evacuation. And finally, the final result is exposed by the agent as a Web Service. This scenario consists of 04 Web Service and two BDI agents. So in this case there are two entities that take care of the disposal of household waste that will know the number of staff needed and the distance to go to evacuate waste. The proposed approach therefore allows to couple the BDI agent with the Web Services type REST and SOAP. In this case, the BDI agent is able to publish these capabilities as a MAS agent and these capabilities in the Web browser as Web Services. In term of performance, after many test, the following Table 1 gives a summary of duration of the task by the Jason-RS and Jason-WS from browser. After many test of performance, we can evaluate that all the maximum value of the duration task is 1000 ms so 1 s.

Publish a Jason Agent BDI Capacity as Web Service

169

Fig. 2. JASON-RS, JASON-WS micro service architecture on waste disposal Table 1. Jason-RS and Jason-WS task duration. Task WS consumer WS provider WS PLC WS orchestration

Duration (millisecond) 200 478–1000 428–1000 475–1100

Application client MAS console Browser or other client application Browser or other client application Browser or other client application

5 Conclusion In this paper, we proposed an approach allows running Web Services (REST and SOAP) pairs with Agent Jason BDI in a Java SE environment without using a modern Web-App server or application server, or developing different middleware. Because the development of a middleware requires a lot of time. And deploying an agent inside a server is a tedious task [2]. So in our strategy, we reused the existing Java frameworks called Non-Blocking Input Output or NIO APIs such as Grizzly and Netty to run the Web Services and Jason BDI Agent pairs in a Java SE environment. In this case, Grizzly takes the role of a server, and Netty provides the communication between the developed Web Service and the BDI agent Jason. The proposed strategy describes the operation of agents and Web Services called Jason-RS or Jason-WS to build an intelligent SOA application as a Multi-Agent System based only on Java SE. SOA require a service to handle multifaceted interfaces so Jason-RS and Jason-WS play an important role not only the both permit to design an orchestration of services but also we can serve it in complex workflow of goal oriented activities in term of Enterprise Architecture. In particular, different Agents collaborated together in the Jason

170

H. F. Rafalimanana et al.

environment with agents that provide both different types of Web Services such as REST and SOAP, some composite Web Services invokers, and other agents are only consumers of Web Services. As a result, agents can encapsulate business logic, activities, task controls, and business processes [7] and publish their capabilities as Web Services. For the future work we will exploit the intelligence of the agent in order to make MAS principle in the internet of thing [13, 14]. So we need a knowledge in the IOT network and especially the possibility of its complexity [15]. There is also an interesting perspective of this work like exploiting the deep learning in the BDI agent or other deep neural network system [16, 17].After this exploitation, we contemplate to publish this deep intelligent capacity as a web service in order to make it flexible.

References 1. El Falou, M., Bouzid, M., Mouaddib, A.-I., Vidal, T.: A distributed multi-agent planning approach for automated web services composition. WIAS 10(4), 423–445 (2012) 2. Mitrovic, D., Ivanovic, M., Badica,C.: Jason agents in Java EE environments 2013. In: 17th International Conference on System Theory, Control and Computing (ICSTCC). Romania (2013) 3. Mitrovic, D., Ivanovic, M., Vidakovic, M., Al-Dahoud, A.: Developing software agents using enterprise Javabeans (2016) 4. Greenwood, D., Lyell, M., Mallaya, A., Suguri, H.: The IEEE FIPA approach to integrating software agents and web services. In Proc. Of Autonomy Agents and Multi-Agent System (AAMAS-207). Hawaii (2007) 5. Ma, Bo, et al.: Design of bdi agent for adaptive performance testing of web services. In: Quality Software (QSIC), 10th International Conference on. IEEE (2010) 6. Ciortea, A., Boissier, O., Zimmermann, A., Florea, A., M.: Give agents some REST: a resource-oriented abstraction layer for internet-scale agent environments. In: AAMAS 17 Proceedings of the 16th Conference on Autonomy Agents and MultiAgent Systems, pp 1502–1504, Brazil (2017) 7. Piunti, M., Ricci, A., Santi, A.: SOA / WS applications using cognitive agents working in Cartago environments (2009) 8. Leon, F., Badica, C.: Freight brokering system architecture based on web services and agents. In: 7th International Conference, IESS 2016, Bucharest, Proceedings (pp. 537–546). Springer, Romania (2016) 9. Vadivelou, G.: Multi-agent system integration in enterprise environments using web services (2017) 10. Micsik, A., Pallinger, P., Klein, A.: SOA based Message transport for the Jade multi-agent platform. Hungary (2009) 11. Seghrouchni, A.E.S., Haddad, S., Melitti, T.: Interoperability of multi-agent systems using web services (2004) 12. Maurer, N., Wolfthal, M., A.: Netty in Action (2017) 13. Benkerrou, H., Heddad, S., Omar, M.: Credit and honesty-based trust assessment for hierarchical collaborative IoT systems. In: Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2016 7th International Conference on. IEEE (2016)

Publish a Jason Agent BDI Capacity as Web Service

171

14. Kasmi, M., Bahloul, F., Tkitek, H.: Smart home based on internet of things and cloud computing. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE (2016) 15. Mokhlissi, R., Lotfi, D., Marraki, M.E.: A theoretical study of the complexity of complex networks. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE (2016) 16. Alvarez, N., Noda, I.: Inverse reinforcement learning with BDI agents for pedestrian behavior simulation (2018) 17. Zhu, Li: A novel social network measurement and perception pattern based on a multi-agent and convolutional neural network. Comput. Electr. Eng. 66, 229–245 (2018)

FPGA Implementation of a Quantum Cryptography Algorithm Jaouadi Ikram1(&) and Machhout Mohsen2 1

2

National Engineers School of Tunis, Communications Systems Department, University of Tunis El Manar, Tunis, Tunisia [email protected] Sciences Faculty of Monastir, Electronics and Microelectronics Laboratory, Tunis, Tunisia [email protected]

Abstract. Quantum cryptography is a process for developing a perfectly secret encryption key that can be used with any classical encryption system. This paper presents a study of the EPR state protocol, the first continuous variable quantum key distribution protocol. We propose an algorithm for this protocol and subsequently its implementation on FPGA (Field-Programmable Gate Array). For the implementation, we used Xilinx’s ISE System Edition tool as Software and Xilinx’s Artix7 Nexys4 DDR board as hardware. Keywords: Communication protocol  QKD  Security  Secret key  FPGA platform  EPR paradox  Bell’s inequality  Quantum cryptography

State of the art: This work was developed as part of the doctoral research work, in collaboration between the doctoral school of engineering sciences and techniques (EDSTI) within the National School of Engineers of Tunis (ENIT) and the Electronic and Microelectronics Research Unit within the Monastir Faculty of Science (FSM). The purpose of this work was to propose a prototype of a continuous variable quantum key distribution over an FPGA network. We chose the EPR protocol to rely on the properties of the variables involved in establishing a perfectly secret key.

1 Introduction The first means of communication put in place by humans are accompanied by a need for confidentiality in the information transmission. The first cryptographic systems appear around 200 before J.C [1]. Today, most classical cryptographic systems rely on mathematical algorithms whose safety and robustness to cracking have not been formally demonstrated. The computational complexity poses very little resistance to the increase in computational power of computer systems. In 1900, based on the quantum theory, Planck [2] showed that the emission and absorption of light can only be in whole energy packets. He thus defined Planck’s constant h, which quantifies the energy exchanges between light and material. In 1905, Einstein was the first to introduce the quantification of radiant energy by expressing © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 172–181, 2020. https://doi.org/10.1007/978-3-030-21005-2_17

FPGA Implementation of a Quantum Cryptography Algorithm

173

light as grains, then explaining the photoelectric effect. In 1926, Newton then proposed the term photon. In 1925, Heisenberg unified the various approaches under the “Matrix Mechanics”, the base of quantum mechanics, which was enriched in 1926 by the Schrödinger approach. The central purpose of this approach was a complex-valued wave function, thus satisfying the equation that now bears his name [3]. The beginnings of quantum cryptography appeared in the late sixties in Stephen Wiesner’s (unpublished) article [4], in which he explains the importance of Heisenberg’s uncertainty principle in coding currencies in order to protect them from forgery. He then proposes the use of a multiplexer quantum channel interspersing two messages so that reading one of them makes the other unreadable. In 1979, Charles H. Bennett and Gilles Brassard returned to this work to design a secret key distribution system based on quantum mechanics principles [5]. In 1983, Wiesner’s article was finally published, the photon would now be used for information transmission and not for storage. Quantum cryptography is not a new cryptographic process. Indeed, it does not directly allow the communication of intelligible messages but allows (mainly) the cryptographic key distribution, which often leads to designate the quantum key distribution (QKD) by the more general term of quantum cryptography. It, therefore, appears as a complement to classical cryptography, it meets the need for private key distribution. The safety of this method is based on the laws of quantum mechanics and is considered unconditionally safe [6, 7]. Quantum entanglement, a quantum mechanics astonishing phenomenon, revealed by Einstein and Schrödinger in the 1930s, assumes that two particles, even distant ones, of a physical system have dependent quantum states. Any measurement of one of these two particles affects the other. These entangled states seem to contradict the locality principle. Quantum entanglement was the base of two famous thought experiments proposed in 1935: the Schrödinger’s cat experiment [3] by Erwin Schrödinger, and the EPR experiment of Einstein, Podolsky, and Rosen [8]. Einstein, Podolsky, and Rosen then concluded that quantum mechanics is incomplete. They were based on the fact that any quantum state measurement performed at a position A can’t influence the measurement result of this state at a position B (locality hypothesis) and that a quantum state has defined values regardless of its measure (realism hypothesis). They ended by saying that the description of a quantum system can only be completed with the use of hidden variables [9] whose role is to predetermine the measurement result of the quantum states and subsequently solve the EPR paradox. In 1964 and in order to quantify the debate between quantum mechanics and the notion of hidden variables, J. Bell introduced a set of inequalities to check for any local and realistic theory [10]. The Bell’s inequality test implies that, at a time t, two detection systems A and B simultaneously perform measurements on the two elements of an entangled quantum states pair. The Bell’s inequalities verification is an important step in the processing of information. These are relationships that entangled state measurements must respect. Several quantum cryptography protocols are based on the principle of entangled photons, their security is based on their properties. In 1991, Ekert proposed a QKD protocol combining the EPR paradox and Bell’s inequalities [11]. It was known as the EPR protocol.

174

J. Ikram and M. Mohsen

2 EPR Protocol Description It is a protocol whose states are correlated or entangled. In this protocol, the term EPR pairs is used to denote a pair of states emitted at a time t. The EPR pairs may be pairs of particles separated at great distances. This protocol also uses Bell’s inequality verification for spy detection. The EPR protocol can be described as follows: • At each instant t, an EPR pair is created. The first photon of this pair is transmitted to Alice while the second is transmitted to Bob. On their part, Alice and Bob, each randomly and with equal probabilities select their operator from Ai and Bi with i 2 f1; 3g. Depending on the chosen measurement operator, Alice and Bob proceed to measure their received photons respectively. They reserve their measurement results as well as their choice of measurement operators (see Table 1). Table 1. Measurement bases of Alice and Bob Alice A1 ¼ Z A2 ¼ X

Bob B1 ¼ Z B2 ¼ Xpþffiffi2Z

pffiffi A3 ¼ Zpþffiffi2X B3 ¼ ZX 2

With Z ¼ j0ih0j  j1ih1j and X ¼ j0ih1j þ j1ih0j • In this step, Alice and Bob carry out a public discussion via a conventional channel to determine the set of bits for which they have used the same measurement operators. Each separates his bit sequence into two groups. The first group named “Raw Key” contains the set of bits measured with the same measurement operator ðAi ¼ Bi Þ. The second group named “Rejected Key” contains the rest of the bits, the bits for which Alice and Bob didn’t use the same measurement operator. • Unlike other quantum communication protocols such as BB84 and B92, for the EPR protocol, nothing is discarded. Indeed, the set “Rejected Key” is used to check the presence of the spy Eve and this through the test of Bell inequalities. If these inequalities are violated, it is a sign of intrusion. Alice and Bob then proceed to a discussion through the public channel to compare their rejected keys. We will match to the set fA1 ; A2 ; A3 g the set fa; b; cg. Similarly for the set fB1 ; B2 ; B3 g the set fa; b; cg. Let P(a,b) be the probability that two corresponding bits of the rejected keys of Alice and Bob are respectively Alice’s measurement result by the operator A1 and Bob’s measurement result by the operator B2 . According to the same reasoning we search for Pða; cÞ and Pðb; cÞ.

FPGA Implementation of a Quantum Cryptography Algorithm

175

Bell’s inequality can be noted by the following equation: Pða; bÞ  Pða; cÞ  1 þ Pðb; cÞ

ð1Þ

If the inequality is satisfied, no intrusion is detected and the communication is safe. Else, the system indicates the presence of the spy. Recall that the quantum noncloning theorem makes remarkable every movement of the spy. • We take back the “Raw Key” in this step and always via the public channel. In this step, which is common between various protocols, both interlocutors estimate the error rate QBER (Quantum Bit Error Rate) on their “Raw Key” sequences. They then correct the transmission errors to ensure that the generated key is secret. Several error correction algorithms are used [12]. To amplify the confidentiality of the key, the two interlocutors can apply the parity check by adding a parity bit to their keys [13]. The EPR protocol uses an authenticated public channel, so the spy can’t pretend to be one of the two legitimate actors. Authentication is obviously possible by an appropriate algorithm. Therefore, the spy can’t perform an impersonation attack (Eve listening to the quantum and classical channels and pretending to be Bob). In this paper, we did not deal with the behavior of the spy, because his modeling is subject to two major difficulties. On the one hand, the particles transmitted to Alice and Bob by the EPR source are propagated on two different quantum channels, so the spy must make a global attack, rejected hypothesis. On the other hand, the spy must be equipped with the same technology (Hardware) as the two interlocutors and behave in the same way, an experimentally impossible requirement.

3 FPGA Implementation of the Protocol For the EPR protocol implementation on FPGA, we used as hardware the Artix7Nexys4 DDR card from Xilinx [14] with a clock frequency of 100 MHz. Both interlocutors are linked to the card via a USB link. We simulated the behavior of the entangled photon source using a pseudo-RNG (Random Number Generator). We have also integrated the EPR source into Alice, so only one quantum channel is used. The pseudo-RNG is also used for the generation of polarization bases for both interlocutors instead of a true RNG. For pseudo-random sequences generation, we used the LFSR (Linear Feedback Shift Register) algorithm. Performed electronically, in the particular case of a sequence of 0 and 1, it is a shift register with linear feedback, which means that the incoming bit is the result of an exclusive OR (XOR) between several bits of the register. The recurrent sequence produced by an LFSR is necessarily periodic from a certain rank. LFSRs are used in cryptography to generate sequences of pseudorandom numbers [15]. This algorithm was originally proposed in various algorithms (the Berlekamp– Massey algorithm, Fibonacci algorithm, Galois algorithm). In our implementation, we opted to choose the Fibonacci algorithm which strictly applies the definition of an LFSR. Since we are working with continuous variables, we implemented a 32-bit

176

J. Ikram and M. Mohsen

LFSR algorithm. So, we’ll have 232 (4 294 967 296) possible combinations of random bits (Appendix 1 is the simulation of the 32-bits LFSR). However, it should be noted that the use of LFSRs in their original configuration became vulnerable to mathematical attacks (a result demonstrated by the Berlekamp– Massey algorithm) [16, 17]. Therefore, an LFSR should never be used by itself as a key flow generator despite the fact that LFSRs are still used for their very low implementation costs. 3.1

Implementation

Our system architecture is composed as follows: the two interlocutors are related to the same FPGA with a USB link. The software behavior of each of the two interlocutors is controlled by a PC ensuring the exchanges with the FPGA which is synchronized. The two PCs are connected by an Ethernet link representing the authenticated classic channel. For the quantum channel, it is represented by an optical fiber. We have integrated the EPR source at the transmitter, only one detector is needed at the receiver. It’s a balanced homodyne detector. Our system behavior can be described as follows: • At each moment t, a pair of entangled photons is generated, one photon is for Alice and the other is transmitted to Bob via the quantum channel. • At each moment t, for Alice as for Bob, the FPGA generates a random sequence corresponding to the choice of the measurement operators. • Simultaneously, the two interlocutors proceed to measure each photon received by the corresponding base. They then save their base choices as well as the measurement results in memory. Here we used FIFO memories (First In First Out). The role of the quantum channel stops at this level. • Via a classic channel, the two actors proceed to reconciliation. For continuous variable protocols such as the EPR protocol, the reconciliation phase is preceded by a discretization of the data, that is to say that we only take a part of received data. This is called slice reconciliation [18]. Bob then sends to Alice his base choices for this extracted slice. They then estimate the error rate QBER (Quantum Bit Error Rate). The recovery of base choices is made by access to the FIFO memory. • Remember that for the EPR protocol nothing is useless. The reconciliation phase ends by dividing the data into two groups: the raw keys (the raw keys of Alice and Bob corresponding to the conforming base choices) and the Rejected Keys (the strings corresponding to the rest of the data). The results are obviously saved in memory. • Rejected Keys strings are used to verify Bell’s inequality. • We take back with the raw keys to correct the transmission errors in order to reduce the error rate. For this phase, we used the algorithm of choosing randomly two identical blocks of the two raw keys. Alice and Bob estimate the error rate on these blocks before deleting them. If the error rate is still high, they repeat the process, if not they can generate the final key whose size will be smaller than that of the raw keys.

FPGA Implementation of a Quantum Cryptography Algorithm

177

• One last step is to increase the confidentiality of the key. We have applied for this purpose a parity check algorithm, we added to the final key the parity bit. The final key will then be an unconditional security. (Appendix 2 presents the simulation of the EPR protocol starting form a distribution of 232 bits to generate a secret key of 15bits). Of course, since our communication protocol is multi-step, it was necessary to implement a FSM (Finite State Machine) that controls transition from a step to another in our system. Every states represents the current step of our protocol. Once data are received, we can move to the next task. 3.2

Results and Discussion

Simulation results are necessary to verify the success of the simulation experience. In this part, we are considering to present our system simulation results. In this paper, our main goal was to calculate the QBER of the EPR protocol. We implemented a system working with 232 bits exchanged between Alice and Bob. The following graph shows the evolution of QBER and the secure key rate during the evolution of the protocol (Fig. 1).

Secure Key Rate

30 25 20 15 10 5 0

28

23

25

QBER

18

15

8

Fig. 1. Plot of secure key rates versus bit error rates of the EPR protocol

As we said before, in this paper we are working with continuous variables, but we need to discretise them once we achieve the reconciliation step. We used slice reconciliation in our protocol for error correction. In each step of our simulation, we record the sifted key generation rate (bit per second) and evaluate the QBER of the sifted key. The error revealed can be quantified by the ratio between the number of wrong bits to the total number of bits in some subset of the key. As shown in the curve, the bit error rate appears to be slightly decreasing. It starts from a value of 28% until reaching about 8%. Sifted keys with errors can’t be considered secure until proceeding to error correction and privacy amplification in order to extract the final secure keys. If the QBER value is too high, no secure keys can be extracted. However, the generation of the final secure key

178

J. Ikram and M. Mohsen

is at the expense of the key generation rate since. As explained previously, for error correction, we applied a method of extracting a sub-block from the raw key. These subblocks are subsequently deleted. The size of the final key is obviously less than the size of the raw key. The curve shows that the secure key rate value is from 0% to 20%. From a performance point of view, we can present the following table (Table 2):

Table 2. The rate of use of the hw component logical block Logic block Use Used LUT-Flip Flop 7561 LUT 7399 FF register 513 IO buffers 39 32 bits RAM memories 4

The values shown in this table reflect the sequential logic of our system. Indeed, each step is highly dependent on the previous one, from which comes the need to use an FSM (Finite State Machine) to control this sequence of events and the transition from one state to another. The use of Look Up Tables (LUTs) and Flip Flops is due to the use of logical operators (XoR, Comparator, etc.) during the implementation of the system.

4 Conclusion We have successfully simulated the EPR protocol, the first quantum key distribution protocol with entangled variables. We worked on FPGA platform where we used the Nexys4 DDR board. We didn’t talk about the eavesdropping strategy. We also used a pseudo-random number generator instead of a true one. In principle we are working with continuous variables, but, as we explicated, we have to resort to discrete variables once we attained reconciliation phase. Our system demonstrated a judged low QBER of 8%. So we can say that our system is able to generate a final secure keys. Currently, we are not only working to ameliorate our system, but also to combine between QKD and classical cryptography to demonstrate the importance of the QKD to increase the security of any cryptographic protocol. Quantum key distribution protocols using entangled variables take advantage of quantum mechanics principles, especially quantum entanglement, to ensure the unconditional security of the communication process, even in the presence of a spy. Acknowledgments. This work is supported by Electronics and Microelectronics Laboratory, Sciences Faculty of Monastir-Tunisia (code: LR99ES30) and National Engineering School of Tunis-Tunisia, Communication System Department. The first author thanks Mr. Tayari Lassaad, Master computer scientist of industrial systems at the Higher Institute of Technological Studies Gabes-Tunisia, who provides Nexys4 board.

FPGA Implementation of a Quantum Cryptography Algorithm

Appendix 1

Fig. 2. Simulation of 32-bits LFSR with ISim tool

See Fig. 2

179

Fig. 3. Simulation of EPR protocol with ISim tool. We estimated the length of the final key to be 15 bits.

180 J. Ikram and M. Mohsen

Appendix 2

See Fig. 3

FPGA Implementation of a Quantum Cryptography Algorithm

181

References 1. Kahn, D.: The Codebreakers: A Comprehensive History of Secret Communication from Ancient Times to the Internet, Revised and Updated. Scribner, New York (1996) 2. Smolka, J.: PLANCK MAX - (1858–1947). Encyclopædia Universalis [en ligne]. http:// www.universalis.fr/encyclopedie/max-planck/. Accessed 7 Dec 2017 3. Schrödinger, E.: Quantum physics and representation of the world. Le Seuil, coll. “Science Points”, 1992, 184 pp. French translation of two popular articles: The current situation in quantum mechanics (1935), Science and humanism - the physics of our time (1951) 4. Wiesner, S.: Conjugate coding. Sigact News 15(1), 78–88 (1983). Written around 1969– 1970, this article was published only in 1983 5. Bennett, C.H., Brassard, G.: Quantum cryptography: public-key distribution and coin tossing. In: Proceedings of the IEEE International Conference on Computers, Systems and Signal Processing, pp. 175–179. IEEE (1984) 6. Renner, R., Gisin, N., Kraus, B.: Information-theoretic security proof for quantum-keydistribution protocols. Phys. Rev. A 72, 012332 (2005) 7. Christandl, M., Renner, R., Ekert, A.: A generic security proof for quantum key distribution (2004). E-print quant-ph/0402131 8. Einstein, A., Podolsky, B., Rosen, N.: Can quantum-mechanical description of physical reality be considered complete? Phys. Rev. Lett. 47, 777–780 (1935) 9. Bell, J.S.: Introduction to the hidden variable question. In: Proceedings of the International School of Physics Enrico Fermi, Course IL, Foundations of Quantum Mechanics, pp. 171– 181 (1971) 10. Bell, J.S.: On the Einstein Podolsky Rosen paradox. Physics 1, 195 (1964) 11. Artur, K.: Ekert: quantum cryptography based on Bell’s theorem. Phys. Rev. Lett. 67(6), 661–663 (1991) 12. Brassard, G., Salvail, L.: Secret-key reconciliation by public discussion. In: Advances in Cryptology-Eurocrypt’93. Number 765 in Lecture Notes in Computer Science, pp. 411–423. Springer, New York (1993) 13. Bennett, C.H., Brassard, G., Crépeau, C., Maurer, U.M.: Generalized privacy amplification. IEEE Trans. Inf. Theory 41(6), 1915–1935, November 1995. http://www.crypto.ethz.ch/ *maurer/publications.html 14. https://reference.digilentinc.com/reference/programmable-logic/nexys-4-ddr/referencemanual 15. Liang, W., Jing, L.: A cryptographic algorithm based on linear feedback shift register. In: 2010 International Conference on Computer Application and System Modeling (ICCASM), 22–24 October 2010 16. Koutsoupia, M., Kalligeros, E., Kavousianos, X.: LFSR-based test-data compression with self-stoppable seeds. In: Design, Automation & Test in Europe Conference & Exhibition, 20–24 April 2009, pp. 1482–1487 17. Reeds, J.A., Sloane, J.A.: Shift register synthesis (Modulo m). SIAM J. Comput. 4, 505–513 (1985) 18. Van Assche, G., Cardinal, C., Cerf, N.J.: Reconciliation of a quantum-distributed Gaussian key. IEEE Trans. Inf. Theory 50(2), 394–400 (2004)

Health Recommender Systems: A Survey Hafsa Lattar1(B) , A¨ıcha Ben Salem1(B) , Henda Hajjami Ben Gh´ezala1(B) , and Faouzi Boufares2(B) 1

RIADI Laboratory, National School of Computer Science, Manouba University, 2010 La Manouba, Tunisia [email protected], [email protected], [email protected] 2 Laboratory LIPN-UMR 7030-CNRS, University Paris 13, Sorbonne Paris Cit´e, Villetaneuse, France [email protected]

Abstract. With the big amount of data that become available on the internet, and with the appearance of the information overload problem, it is becoming essential to use recommender systems (RS). RSs help users to extract relevant information that interests them, also to increase their quality decisions. These systems have proven their effectiveness in several domains, such as: e-commerce, e-learning, etc. Furthermore, they play a very important role in the field of medicine, in which the discovery of a tiny knowledge can save thousands of lives. In this paper, we will present a state of the art on RS approaches, their applications in general, and in the medical field. Keywords: Recommender systems · Health recommender systems Quality decision · Machine learning · Artificial intelligence

1

·

Introduction

The amount of data is increasing over time due to the diversity of its sources. This evolution gives rise to the phenomenon of information overload. This phenomenon makes the user unable to understand, filter, and discover new knowledge from this information excess, and consequently, it makes the quality of their decisions increasingly low. In order, to assist and help the users to take advantage of these treasure, and to increase the quality of their decisions, the presence of a RS is essential. In the literature, the recommendation has several definitions that flow in the same direction. Lerato et al. [1], consider the recommendation as an application that filters personalized information based on user preferences from a large obtained information. Also, according to Adomavicius and Tuzhilin. [2], the recommendation is a process to estimate ratings for the yet unrated item(s) to recommend the highest rated rating(s). There are several algorithms for executing the recommendation process. RSs have proven their effectiveness in several domains, for example in movies (eg. Netflix), friends suggestions (eg. Facebook), books (eg. Amazon). In the medicine field, RSs have c Springer Nature Switzerland AG 2020  M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 182–191, 2020. https://doi.org/10.1007/978-3-030-21005-2_18

Health Recommender Systems: A Survey

183

made miracles, and they continue to progress and achieve results that have been impossible before. In this article, we will focus in particular on health RS. This paper will be presented as follows: In Sect. 2 we present the recommendation algorithms, in Sect. 3 we present the applications of these algorithms in medicine field. We conclude and propose some perspectives in Sect. 4.

2

Recommendation Algorithms

The recommendation is a process that provides a user with the information that interests him in a personalized manner, and thereby reduces the time spent searching for the relevant information he needs, to make his life easier. We distinguish two type of RS: traditional ones and modern ones. For the traditional RSs, there are two categories, (i) systems based item’s and/or the user’s content, (ii) and systems based on the collaboration between users. For the modern RSs, they take into account additional information on items and/or users. 2.1

Traditional Recommendation Approaches

As said before, traditional recommender approaches are based on the content of items and/or users. Here we present four approaches: content-based (CB) approach, collaborative-based (CF) approach, demographic-based (DB) approach, and hybrid approach.

Fig. 1. Traditional recommender approaches

Content-Based Approach. [2,8], consists to recommend to a user, items that are similar to those previously highly evaluated by the same user (Fig. 1a). So, the items recommendation process is based on the similarity between the user profile and the items profile, based on the concept that items with similar attributes will be rated similarly. The user profile is a structured representation that contains representative data of the user’s tastes (how the user rated items), preferences and needs. These data can be obtained from users explicitly or implicitly and they are represented by descriptors. The items profile contains the item features. These features are obtained by exploiting and analyzing items data content. The

184

H. Lattar et al.

similarity process is applied to get a relationship between items and user profile to find the best matching items (items with a high degree of similarity) to be recommended. Table 1 illustrates some approaches that used CB approach, and the techniques used to calculate the similarity. This approach can recommend new items never evaluated by any user. Also, the user using this approach is totally independent of other users. Despite these advantages, this approach suffers from the difficulty of analyzing the content of some documents type, from the new user problem (it is hard to recommend relevant items to a user with any ratings), and from the novel problem (over specialization: a user is recommended by items similar to those already rated by him). Table 1. Content-based approaches Authors

Approach

Shu et al. [9], 2018

Predict personalized learning resources to – Model-based students. First, a convolution neural technique using network is trained, based on the existing CNN classifier learning resources and the students’ historical data. Then, the generated CNN is used to predict the rating of a new learning resource

Techniques

Wang et al. [10], 2018

Recommend computer science journals and conferences to authors. They generate a softmax regression classifier using documents’ feature vectors and their categories. This classifier is used to generate an ordered journals and conferences list, based on the manuscript’s abstract

– Model-based technique adopting softmax regression classifier

Collaborative-Based Approach. [3,11], consists to recommend to a user items that are liked by users who are similar to him (Fig. 1b), based on the concept that users who have same or almost same rated items in common, then they have similar tastes. There are two CF recommendation approaches, memorybased approach and model-based approach. Memory-based (or heuristic-based), generates predictions based on the user-item matrix that contains the items ratings done by users, using user-based or item-based approaches. Memory-based approach has several disadvantages, the main one is the time consumption and the similarity calculation problems. These disadvantages gave birth to the modelbased approach. Model-based, generates a model to make predictions based on the user-item matrix using data mining and machine learning algorithms. Table 2 illustrates some CF approaches, and the used techniques. Although CF approach does not require items and users analysis as CB approach, it suffers from several problems. This approach suffers from sparsity (only a small subset of the available items in the database are rated compared to the number of items that need to be predicted), scalability (the system is not able to make computations

Health Recommender Systems: A Survey

185

for millions of users and items), and cold start problem (the system is not able to recommend items to users, because of users without any past ratings (new user) and of items with few ratings (new item)). Table 2. Collaborative-based approaches Authors

Approach

Techniques

Bobadilla et al. [7], RS based on the concept of significance. 2012 They do not take all items and users in consideration to find the k-neighborhood, they only take the significant ones using significance-based similarity metrics and quality metrics

– Memory-based approach using pearson correlation, cosine and mean squared metrics

Nilashi et al. [12], 2018

– Model-based approach using EM algorithm and dimension reduction technique

Movie RS based on CF approach, to solve sparsity and scalability problems. This system cluster the user’s ratings on movies using expectation maximization (EM) algorithm, then provide the semantic similarity calculation from the movie ontology repository

Demographic-Based Approach. [3,8] consists to recommend to a user items that are similar to those highly evaluated by the user that have similar demographic information (sex, age, country, etc). It is based on the principle that users with certain common demographic information will also have common preferences (Fig. 1c). We present in Table 3 some demographic-based filtering approaches, and the used machine learning techniques. Table 3. Demographic-based approaches Authors

Approach

Korfiatis and Poulos. [13], 2013

Demographic RS for online travel based on – K-means online consumer reviews. The system uses a clustering user-defined hierarchy of service quality algorithm indicator importance, and apply a classification of travel types

Zhao et al. [14], Demographic-based product RS using social 2016 media platform (Twitter). The system identifies first the users’ purchase intents from their tweets in near time. Then recommend products based on matching the users demographic information with the product demographics information

Techniques

– Support vector machine algorithm

186

H. Lattar et al.

Hybrid Approach. As explained in the previous sections, each filtering algorithm has advantages and limitations. So to avoid some of these limitations and to benefit of the advantages of each one, we apply the hybrid approach [2,4,5,8,15]. This approach allows to combine several algorithms. The combination of these algorithms is made in different ways: (i) run each algorithm separately, then combine their scores, or (ii) execute an algorithm by introducing the characteristics of another, or (iii) build a unified model for multiple algorithms. We give some works that used hybrid approach in Table 4. Table 4. Hybrid approaches Authors

Approach

Techniques

Hendrik et al. [16], Semantic hybrid books RS. The proposed – Content-based 2015 system overcomes the over-specialization and collaborative problem using an ontology as a semantic based approaches web technology. The results of the used approaches are combined using mixed strategy Pandya et al. [17], 2016

2.2

Hybrid RS to overcome the sparsity problem, by combining k-means clustering algorithm with Eclat Algorithm for the rules generation

– Content-based and collaborative based approaches

Modern Recommendation Approaches

Modern recommendation system take into account additional information on items and/or users. We list different approaches such as context-aware-based approach, semantic-based approach, and social network-based approach. Context-Aware-Based Approach. [4–6,18] uses in addition to the user’s ratings of the items, contextual information (eg. time, location, weather, etc) about the environment and the situation which the user is in, when consuming items, to be able to recommend items to users in specific circumstances. The context can be introduced in several phases of a RS: contextual pre-filtering, contextual post-filtering, and contextual modeling. Table 5 lists works that used context-aware-based approach, and the used context information. Semantic-Based Approach. Items and users in the previous explained algorithms are represented using only tags and keywords, and analyzed using classical analysis techniques. To make intelligent recommendations, the semantic-based approach [4,21] introduces the notion of semantic for understanding and structuring items and users, using semantic analysis techniques and semantic web technologies. Semantic-based RS use a knowledge base, to improve the comprehension and the representation of items, users and the correspondence between

Health Recommender Systems: A Survey

187

Table 5. Context-aware-based approaches Authors

Approach

Techniques

Xu et al. [19], 2015 Recommend top m travel locations to – Weather and users, based on the user travel histories location context and contextual constraints, by exploiting information geotagged photos Shi et al. [20], 2017 RS based on item-grain context – Time, location, clustering. The context is extracted from and status context each item in cluster form, using the information k-means algorithm. These clusters are subsequently introduced into the factorization model to improve the prediction (in the context modeling phase) Table 6. Semantic-based approaches Authors

Approach

Techniques

Musto et al. [22], 2017

Semantic-aware movies RS that exploits linked open data to enrich item representations with new and relevant features

– Graph-based Features

Duran et al. [23], 2017

Semantic RS for recommending educational interactive digital television programs supporting students’ learning and teaching processes, based on educational competencies as context information

– An ontology

them. We distinguish according to the knowledge base type, two categories of systems: ontological-based system and conceptual-based system. Some works that used semantic-based approach, with the used data structure are presented in Table 6. Social Network-Based Approach. [5,9], enriches the users experiences through their interaction with other users using social networks tools. This approach overcomes the collaborative sparsity problem. Capdevila et al. [24] proposed a hybrid RS for a popular location-based social network. The proposed system combines the text review content and the sentiment to generate personalized suggestions. They use Foursquare social network. Pereira et al. [25] propose social RS for recommending educational resources to users. the users’ profiles and the educational context are extracted using facebook social network interactions and linked data. Then the recommendations are generated according to the extracted information.

188

H. Lattar et al.

We note that traditional approaches have given very good results. We have, content-based approach that based on the analysis of items and users content, collaborative-based approach that introduces other similar users in the process, and demographic-based approach which is based on users demographic information. Each approach presents some drawbacks. To overcome these problems, a hybridization of these approaches is applying. Traditional approaches in most cases are based only on information about users or/and items. Therefore, they do not use the big amount of available information. In order to improve their performances, other approaches are emerged called: modern approaches, like contextaware, semantic, and social network-aware-based approaches. These approaches use in addition to the information used in basic approaches, additional information such as context information, semantic, etc. In the next section, we will discover the applications of the discussed approaches in medicine.

3

Health Recommendation Algorithms

RSs have been successfully used in several fields, in medicine are called: Health recommender systems (HRS). HRSs are a specialization of RS [26], so we can not define them clearly and precisely [27]. According to Sezgin and Ozkan [28], HRSs are complementary tools in the decision-making process in all healthcare services. These systems are used as Wiesner and Pfeifer elucidate in their paper [26] by two end-users: health professionals and patients. In general, the HRS’s goals are to allow effective decision making [29], to supply the users with medical information [26], to reduce the cost of healthcare [27]. Despite the challenges and the difficulties of the medical domain, research on HRS is in constant evolution. This evolution has affected several medical fields applications, such as: drug recommendation, therapy recommendation, patient lifestyle improvement, diagnostic recommendation, etc. Through the following paragraphs, we will present some work in different medical fields. Drug Recommendation: In [31], the authors proposed a hybrid RS to recommend drugs for supporting practitioner’s decision making in clinical prescription, by integrating artificial neural network and case-based reasoning. The proposed framework clusters the drugs based on their remedy functions to symptoms. Then relates patient’s need to different drug, includes patient features extracted from free text in medical records and analyze the effectiveness of the drug in a time period. Also, Hu et al. [33], proposed a cloud-based clinic system integrated with a RS that allows both drugs recommendation and decision support in diagnosis. They used the cloud notion to store the big amount of data patients produced by several hospitals. It allows exchanging data patients between them for recommending to one hospital the widely-used drugs in many other hospitals to optimize their pharmacy stock for treating certain disease. Therapy Recommendation: Graber et al. [32], proposed a neighborhood-based CF method in clinical decision support system, for recommending optimal personalized systemic therapy to patients, using patients’ therapy history and characteristics. Also, they extend a data-driven approach by a set of evidence-based

Health Recommender Systems: A Survey

189

contraindication rules to exclude inappropriate therapies from the recommendation list. The same researchers, in another work [34], they proposed and compared collaborative and hybrid demographic-based recommender approaches to develop a therapy decision system for therapies recommendation. The proposed system recommend the best systemic therapy for a given patient using the two approaches. Patient Lifestyle Improvement: The researchers in [37] proposed a hybrid RS to help People who want to quit smoking through a mobile application, this application send to the patients motivating messages that help them change their behaviours. To increase adults physical activity, the authors in [36] proposed cyber-physical RS that allow recommending exergames to users. The exergames are discovered by learning the types of games liked by the active user. Diagnostic Recommendation: Hu et al. [35], proposed a top-N gene-based CF algorithm based on the gene interest of patients, to recommend the patient topN genes that could be the cause of the liver cancer. As well, in [38] the authors proposed a hybrid RS for heart disease diagnosis based on deeplearning methods. They used Multiple Kernel Learning method to divide parameters between heart disease patient and normal individuals, and Adaptive Neuro-Fuzzy Inference System to classify the heart disease and healthy patients. Collaborative-based algorithm is the most used in medicine domain, because it is taking advantage of the information and the experiences of the other similar users. For example, in order to cure a disease of a given patient, the doctor will try several treatments to achieve the best. So, in the case of a new similar patient, the doctor will directly and quickly apply to him the best treatment. Or this algorithm does not take into account the items’ content (eg. prescriptions content) and the patients’ demographic data, for that, it is usually combined with content-based and demographic-based algorithm (hybridization).

4

Conclusion

RSs are powerful tools to address the information overload problem. These systems are supposed to help users in different domains. In this paper, we have presented an overview of the existing RSs. Some RSs are based on content, collaborative and demographic approaches, and their hybridization. Others take into account supplementary information: social, knowledge, context, and semanticbased approaches. Second, we have concentrated our study on the application of these approaches in medicine. The medical field is known for his challenges and difficulties. These difficulties are due to the sensitivity of the domain itself, and to the complexity of the medical data. Despite these barriers, RSs are applied in different medicine cases. In our future work, we will test and compare some recommendation approaches, for predicting diseases and their causes (disease origin) to avoid them.

190

H. Lattar et al.

References 1. Lerato, M., Esan, O.A., Ebunoluwa, A.-D., Ngwira, S., Zuva, T.: A survey of recommender system feedback techniques, comparison and evaluation metrics (2015) 2. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17, 734–749 (2005) 3. Pazzani, M.J.: A framework for collaborative, content-based and demographic filtering. Artif. Intell. Rev. 13, 393–408 (1999) 4. Asanov, D.: Algorithms and Methods in Recommender Systems. 7. Berlin Institute of Technology, Berlin, Germany (2011) 5. Lu, J., Wu, D., Mao, M., Wang, W., Zhang, G.: Recommender system application developments: a survey. Decis. Support. Syst. 74, 12–32 (2015) 6. Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.): Recommender Systems Handbook. Springer, Boston (2011) 7. Bobadilla, J., Hernando, A., Ortega, F., Guti´errez, A.: Collaborative filtering based on significances. Inf. Sci. 185, 1–17 (2012) 8. Bobadilla, J., Ortega, F., Hernando, A., Guti´errez, A.: Recommender systems survey. Knowl. Based Syst. 46, 109–132 (2013) 9. Shu, J., Shen, X., Liu, H., Yi, B., Zhang, Z.: A content-based recommendation algorithm for learning resources. Multimed. Syst. 24, 163–173 (2018) 10. Wang, D., Liang, Y., Xu, D., Feng, X., Guan, R.: A content-based recommender system for computer science publications. Knowl. Based Syst. 157, 1–9 (2018) 11. Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 2009, 1–19 (2009) 12. Nilashi, M., Ibrahim, O., Bagherifard, K.: A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Syst. Appl. 92, 507–520 (2018) 13. Korfiatis, N., Poulos, M.: Using online consumer reviews as a source for demographic recommendations: a case study using online travel reviews. Expert Syst. Appl. 40, 5507–5515 (2013) 14. Zhao, W.X., Li, S., He, Y., Wang, L., Wen, J.-R., Li, X.: Exploring demographic information in social media for product recommendation. Knowl. Inf. Syst. 49, 61–89 (2016) 15. Burke, R.: Hybrid recommender systems: survey and experiments. User Model. User-Adap. Inter. 12, 331–370 (2002) 16. Hendrik, H., Azzakiy, K., Utomo, A.B.: Semantic hybrid recommender system. Adv. Sci. Lett. 21, 3363–3366 (2015) 17. Pandya, S., Shah, J., Joshi, N., Ghayvat, H., Mukhopadhyay, S.C., Yap, M.H.: A novel hybrid based recommendation system based on clustering and association mining (2016) 18. Adomavicius, G.: Context-aware recommender systems. In: Recommender Systems Handbook, pp. 217–253. Springer, Boston (2011) 19. Xu, Z., Chen, L., Chen, G.: Topic based context-aware travel recommendation method exploiting geotagged photos. Neurocomputing 155, 99–107 (2015) 20. Yilong, S., Hong, L., Yuqiang, M.H.: Context-Aware Recommender Systems Based on Item-Grain Context Clustering. Springer, Heidelberg (2017) 21. Peis, E., Morales-del-Castillo, J.M., Delgado-L´ opez, J.A.: Semantic recommender systems. Analysis of the state of the topic (2008)

Health Recommender Systems: A Survey

191

22. Musto, C., Lops, P., de Gemmis, M., Semeraro, G.: Semantics-aware recommender systems exploiting linked open data and graph-based features. Knowl. Based Syst. 136, 1–14 (2017) 23. Duran, D., Chanch´ı, G., Arciniegas, J.L., Baldassarri, S.: A semantic recommender system for iDTV based on educational competencies. In: Applications and Usability of Interactive TV, pp. 47–61. Springer, Cham (2017) 24. Capdevila, J., Arias, M., Arratia, A.: GeoSRS: a hybrid social recommender system for geolocated data. Inf. Syst. 57, 111–128 (2016) 25. Pereira, C.K., Campos, F., Str¨ oele, V., David, J.M.N., Braga, R.: BROAD-RSI – educational recommender system using social networks interactions and linked data. J. Internet Serv. Appl. 9, 7 (2018) 26. Wiesner, M., Pfeifer, D.: Health recommender systems: concepts, requirements, technical basics and challenges. Int. J. Environ. Res. Public Health 11, 2580–2607 (2014) 27. Calero Valdez, A., Ziefle, M., Verbert, K., Felfernig, A., Holzinger, A.: Recommender systems for health informatics: state-of-the-art and future perspectives. In: Machine Learning for Health Informatics, pp. 391–414 (2016) 28. Sezgin, E., Ozkan, S.: A systematic literature review on Health Recommender Systems (2013) 29. Bateja, R., Dubey, S.K., Bhatt, A.: Health recommender system and its applicability with MapReduce framework. In: Soft Computing: Theories and Applications, pp. 255–266. Springer, Singapore (2018) 30. Lopez-Nores, M., Blanco-Fernandez, Y., Jose, J.P.-A., Rebeca, P.D.-R.: Propertybased collaborative filtering: a new paradigm for semantics-based, health-aware recommender systems (2010) 31. Zhang, Q., Zhang, G., Lu, J., Wu, D.: A framework of hybrid recommender system for personalized clinical prescription (2015) 32. Graber, F., Malberg, H., Zaunseder, S., Beckert, S., Kuster, D., Schmitt, J., Klinik, S.A., Dermatologie, P.: Application of recommender system methods for therapy decision support. In: IEEE 18th International Conference on e-Health Networking, Applications and Services (2016) 33. Hu, S., Lu, L., Jin, X., Jiang, Y., Zheng, H., Xu, Q., Cai, F., Meng, Y., Zhang, C.: The recommender system for a cloud-based electronic medical record system for regional clinics and health centers in China (2017) 34. Gr¨ oßer, F., Malberg, H., Zaunseder, S., Beckert, S., K¨ uster, D., Schmitt, J., Abraham, S.: Neighborhood-based collaborative filtering for therapy decision support. In: 2nd International Workshop on Health Recommender Systems (2017) 35. Hu, J., Sharma, S., Gao, Z., Chang, V.: Gene-based Collaborative filtering using recommender system. Comput. Electr. Eng. 65, 332–341 (2018) 36. Agu, E., Claypool, M.: Cypress: a cyber-physical recommender system to discover smartphone exergame enjoyment. In: Proceedings of the ACM Workshop on Engendering Health with Recommender Systems (2016) 37. Hors-Fraile, S., Benjumea, F.J.N., Hern´ andez, L.C., Ruiz, F.O., Fernandez-Luque, L.: Design of two combined health recommender systems for tailoring messages in a smoking cessation app (2016). arXiv preprint: arXiv:1608.07192 38. Manogaran, G., Varatharajan, R., Priyan, M.K.: Hybrid recommendation system for heart disease diagnosis based on multiple kernel learning with adaptive neurofuzzy inference system. Multimed. Tools Appl. 77(4), 4379–4399 (2018)

Distributed Architecture of an Intrusion Detection System Based on Cloud Computing and Big Data Techniques Rim Ben Fekih1(&) and Farah Jemili2 1

Higher Institute of Computer Science and Telecom (ISITCom), University of Sousse, Sousse, Tunisia [email protected] 2 Modeling of Automated Reasoning Systems (MARS) Research Lab LR17ES05, Higher Institute of Computer Science and Telecom (ISITCom), University of Sousse, Sousse, Tunisia [email protected]

Abstract. An efficient intrusion detection system (IDS) requires more than just a good machine learning (ML) classifier. However, current IDSs offer a limited perspective in handling alerts’ databases. These databases must be local and structured in order to be referenced by these IDSs, offering an obsolete approach to solve advanced attacks and intrusions on distributed systems. With the emergence of big data, cyber-attacks have become a concerning issue worldwide. In situations where data security is paramount, swiftness becomes an obligation in processing and analytic operations. In that aspect, cloud-computing services can deal efficiently with big data issues. They offer storage and distributed analysis as the tools to be featured in our paper. To handle a large scale of alert data, we propose a new distributed IDS model that solves data storage problems, combines multiple heterogeneous sources of alert data, and makes data treatment much faster than local IDSs. For this purpose, this paper presents an approach of IDS using Databricks as a Cloud environment and Spark as a big data analysis tool. Keywords: Intrusion detection  Spark Machine learning  Naïve Bayes

 Cloud  Databricks  DBFS 

1 Introduction Since the appearance of computer networks, hackers’ endeavor is to penetrate networks to steal valuable information or disrupt computer resources. In that aspect, Intrusion detection systems (IDSs) have played a critical role in ensuring the safety of networks for all users, but the nature of this role has changed in recent history [1]. Previous wellknown events have shown that IDSs would be more effective in real-time and analysis of cyber threats could be improved by correlating security events from numerous heterogeneous sources, as mentioned by Zuech et al. [2]. However, big data challenges and its exponential growth lead to a heavy data storage and analysis limitations. To face those limitations, the power of Cloud computing technology was proposed by previous © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 192–201, 2020. https://doi.org/10.1007/978-3-030-21005-2_19

Distributed Architecture of an Intrusion Detection System

193

researchers as a solution to speed up computation and access to an overwhelming capacity of data storage. In recent years, and from a computation tools perspective, Spark, an open source software framework, has been used to perform advanced analysis. In addition, orchestrated Spark in cloud environment can even perform better results in term of rapid analysis and storage in distributed file system based in the cloud, that provides scalable and reliable data storage for managing large quantities of alert data to determine the presence of attacks or malicious activities. For this purpose, we propose an approach of IDS based on cloud computing and big data techniques. We use three intrusion detection datasets NSLKDD, MAWILab and DARPA’99, which we combine to have a variety of intrusions, also to have only one homogenous dataset and finally make a realistic intrusion detection rate. We choose Databricks as a unified cloud environment and Databricks file system (DBFS) to load our datasets on. We will follow a ML pipeline model; load, process and train model with a Naive Bayesian machine-learning algorithm to obtain classification rate of each attack type. The remainder of this paper is organized as follows. Section 2 discusses related work and background. Section 3 describes Databricks and its functionalities, while Sect. 4 describes intrusion detection datasets used in our research. In Sect. 5, the proposed approach is elaborated. The experimental results are presented in Sect. 6. Finally, Sect. 7 provides the conclusion of the paper and offers perspectives for future research.

2 Related Work and Background The purpose of this section is to present a brief background on IDSs as well as an insight of big data challenges facing intrusion detection. Also, how big data technologies and Cloud computing can be utilized to address big data challenges in intrusion detection. An IDS is a mechanism for detecting abnormal or suspicious activities on analyzed target (a network or host). In 1994, a study by Frank [3] shows that big data is a major challenge for intrusion detection. He also focused on improving detection accuracy by adopting data mining techniques and feature selection to realize a real time detection. Several researchers have utilized big data technologies to treat analysis problems. For example, Akbar et al. [4] proposed a system based on Big Data Analytics to maintain security across the heterogeneous data and Co-relating it from different sources using hybrid strategy. Reghunath [5] designed a real-Time Intrusion Detection System based on anomaly detection, which evaluates data and issue alert message based on abnormal behavior. The idea is to automatically store and monitor the log with existing intrusion dictionary when a real time cyber-attack occurred. Essid and Jemili [6] combined and removed redundancy from alert datasets (KDD99, DARPA); in addition, they applied Map Reduce operations with Hadoop to obtain a single dataset. Their main goal was to improve detection rate and decrease false negatives. In the other hand, Elayni and Jemili [7] added a third dataset and worked in a local environment with Map Reduce under MongoDB.

194

R. Ben Fekih and F. Jemili

They aimed to merge unstructured and different datasets and improve intrusion detection rate. As we talked about big data challenges and intrusion detection [4–7] are limited due to their local architecture. Among these limits, we consider, limited storage capacity, problems of processing speed and access to stored data (e.g. data stored on local disks). Consequently, Cloud becomes an inevitable alternative of unlocking the potential of Big Data. Cloud computing influence on data management and processing [1], not only it relates infrastructure and computation to the network, but it also supply management software and hardware resources with reduced costs. Furthermore, it results in a big emergence of programming frameworks such as Hadoop, Spark, and Hive for complex and large datasets. Using these tools, numerous studies have been performed in cloud environments. Esteves et al. [8] used Cloud computing for a distributed K-means clustering. The authors chose a large dataset to simulate big data challenges. Tests were executed using Mahout and Hadoop to solve data intensive problems, while running on Amazon EC2 during computation tasks. In our work, we chose Databricks as a cloud environment to process and manage data. More details about Databricks are given in Sect. 3.

3 About Databricks Databricks [9] was founded by the team that created Apache Spark™, the most active open source project in today’s large data ecosystem. Databricks is a Cloud based data platform and designed to ease the creation and deployment of advanced analysis solutions. Also, it provides the Databricks community edition as a free and open source version. The most important functionality is that Databricks provides a unified ecosystem with orchestrated Apache Spark [10] for implementation, development and scaling. In addition, It provides access to data easily and swiftly with Ingestion of nontraditional data storage based on cloud computing. Databricks integrates with Amazon S3 for storage. S3 buckets can be mounted into the Databricks File System (DBFS) and read the data into a Spark application as if it were on the local disk [11]. 3.1

Apache Spark

Apache Spark was developed at the University of California at Berkeley by AMP Lab and today is a project of the Apache Foundation, as an Open Source Framework of Big Data. We chose to program with Spark instead of Hadoop because [12, 13]: Spark is faster than Hadoop (100 faster in memory and 10 faster in disk access) because Spark reduce read/write iterations from disk and store intermediate data inmemory. • Spark is easier to program. • Spark is able to process, with low latency, real-time streams coming from different sources that generate millions of events per second (Twitter, Facebook) unlike Hadoop which is designed to batch mode processing.

Distributed Architecture of an Intrusion Detection System

195

• Spark acts its own flow scheduler (Due to in-memory computation). • Spark provides MLlib as an Apache Spark machine-learning library. Its goal is to make ML practical, scalable and easy. 3.2

Distributed Processing with Fast Data Access

Thanks to parallel-distributed processes, Spark simplify Big Data implementation and analytics. MapReduce is a great solution for one-pass computations, but less efficient for use cases that require multi-pass computations and algorithms (Slow due to replication and disk storage), however, Spark presents several advantages compared to other technologies like Hadoop and Storm, for example spark enhances MapReduce with inmemory data storage, making the treatment less costly and much faster [14]. 3.3

ML Pipeline

Machine learning pipelines are mainly inspired by scikit-learn project [15]. Subsequently, the basic concepts of a pipeline are presented: • DataFrame: DataFrames are used as learning datasets, which contains a heterogenous data types. • Transformer: is an algorithm that can transform attributes into predictions. • Estimator: is an algorithm that produces a Transformer from a DataFrame (e.g. a learning algorithm). A ML application can have several steps to form a workflow or a Pipeline (e.g. a word count application; divide the text of each document into words, convert the words of each document into a numeric vector and make predictions after learning a model using datasets.). These steps (Fig. 1) are executed in order, to transform an input Dataframe through each step.

Fig. 1. Machine learning pipeline stages

196

R. Ben Fekih and F. Jemili

4 Intrusion Detection Datasets To build an intrusion detection system, we chose NSL-KDD [16], DARPA’99 [17] and MAWILab [19] datasets. 4.1

NSL-KDD

NSL-KDD contains four attack classes: DOS, Probe, U2R, and R2L: • Denial of service attack (DOS): the goal of this attack is to render a service unavailable, and subsequently prevent legitimate users of a service from using it. These may include flooding of a network to prevent its operation, Disruption of connections between two machines, preventing access to a particular service. • Probing attack: This is the kind of attack in which the attacker scans a machine or network device to determine weaknesses or vulnerabilities that can be exploited later to compromise the system. This technique is commonly used in data mining. • User to Root Attack (U2R): Is an exploit class in which the attacker get access to a system and exploit a certain vulnerability to obtain Root access. • Remote to Local Attack (R2L): this kind of attacks occur when an attacker who has the ability to send packets to a machine on a network but does not have an account on that machine so attackers try to exploit a certain vulnerability to obtain local access as a user of that machine. 4.2

DARPA Dataset

DARPA’99 traces are generated by MIT Lincoln Labs [17]. This dataset is the most frequent one for use in various researches about intrusion detection. DARPA is grouped into files recorded in five weeks, three weeks for training and two weeks for testing. We choose to train our model with records from the first week of DARPA’99 dataset especially because it contains SSH and NOTSSH connections [18]. 4.3

MAWILab Dataset

MAWILab [19] is an available database for anomaly detection. This database classify anomalies according to a taxonomy [20] which contains 11 identifying labels: DoS, Network scan ICMP, Network scan UDP, Network scan TCP, Multi points, HTTP, Alpha flow, IPv6 tunneling, Port scan, unknown and other. In addition, it is daily updated, so we choose to train our model with the latest datasets, those recorded during Dec. 2016 and all available recordings during 2017. We merge it to get only one dataset.

5 Proposed Approach In this section we present our proposed approach (Fig. 2). We use the Pandas and scikit-learn APIs for these transformations.

Distributed Architecture of an Intrusion Detection System

197

Fig. 2. ML architecture as a distributed system

Our first objective is to load intrusion detection datasets NSLKDD, MAWILab and DARPA’99 in DBFS. Then, our work is divided into three major steps: • Extract and transform data. • Normalize features and data. • Train and evaluate model. 5.1

Extract and Transform Data

In this phase, we will eliminate the redundancy and join alert datasets. Remove Redundancy Removing duplicates from proposed alert datasets so the classifiers will not be influenced towards more frequent records and the detection rate performance will increase. This step will be realized with Dataframe.drop_duplicates method, which returns as an output a DataFrame without redundant lines. Join Datasets Spark provides four methods for joining datasets, left, right, outer, or inner join. By default, it takes the inner value. We will use full outer join to completely merge datasets. DataFrame.merge (right, left, how = ‘outer’) method which merge alert data bases, right and left are two Dataframes to merge, and it specify how to merge datasets, in our case it will be a full outer join. This function merges two tables while making the join according to one or more columns in common. 5.2

Features and Data Normalization

Before the normalization step, some of our features are textual, and we want them to be numerical so we can train our model. Therefore, we convert the features to numerical

198

R. Ben Fekih and F. Jemili

values, in the correct order based on the feature meanings. Then, we split our data into two dataframes, the first contains class or attack types, and the second dataframe contains other features. Finally, we normalize each feature to have unit variance. 5.3

Train and Evaluate Model

We use the training dataset to form and evaluate our model. The test dataset is then used to make predictions. This step gives us an idea of the performance and robustness of the model. We choose to train our Model with Naïve Bayes algorithm; specifically Bernoulli Naïve Bayes. Before evaluating our model and retrieving intrusion detection errors and accuracy values, we can tune parameters in order to improve results. Parameter tuning is the task of tuning parameters of a learning or prediction system. The idea is to split our data into k sets and train multiple models with different parameters on one set. Finally, we proceed to testing and we keep the best parameters, which give better results.

6 Experimentation and Results In our experimental result and as we said above we will use Databricks as a cloud environment to upload and analyze the dataset with Naïve Bayes algorithm. The experimentation environment is set within Databricks community edition, which gives access to Amazon EC2 with Spark nodes already configured and provides only 6 GB of storage, limiting the size of the cluster provided to achieve wide prospects of the experiment. 6.1

The Naïve Bayes Classifier

The naïve Bayes classifier is a simple probabilistic Bayesian method based on the Bayes theorem with a strong naive independence of hypotheses. This means that the probability of an attribute does not affect the probability of the other. Taking into account a series of n attributes, Naïve Bayes makes 2n! Independent assumptions. 6.2

Evaluation

After merging our intrusion detection datasets, we get a voluminous database. Table 1 contains number of records in both training and testing dataset. Table 1. Number of records Train Test Number of instances 845 721 351 296

Before getting intrusion detection rate, we tune our parameters to increase detection accuracy. This operation consist of tuning parameters of a learning or prediction system

Distributed Architecture of an Intrusion Detection System

199

in order to improve the results. It is commonly done by training multiple models using different parameters on one set of data and then testing those models on another heldout set of data. Table 2 shows results before and after tuning. Table 2. Tuning parameters Parameter (alpha) Training Test Original 0 0.82 0.85 Tuned 10 000 0.96 0.97

Table 3 shows the detection rate of our approach. The low performance of our intrusion detection system may be explained by the low proportions of attacks. Table 3. Detection rate Connection type Detection rate False positive rate Normal 95% – NOTSSH 97% 0% Suspicious 100% 0% Anomalous 100% 0% Probe 61% 1% Dos 76% 0.09% U2R 3% 0.07% R2L 83% 0.01% SSH 100% 2%

It should be clarified that DoS and probe attacks are well classified by most of machine learning algorithms. However, U2r attack categories presents poor detection rates as this type of attacks is embedded in its data packets itself. Consequently, their detection become a difficult duty. In addition, Table 3 presents the performance of our model in false positive rates. It is true that the analyzed dataset in Essid and Jemili [6] are combined with MapReduce, which can be set within a distributed architecture. However, the analysis itself is done with Weka, on a local system, which is impeded with slow execution times as we show in the next paragraph (Table 4). Table 4. Weka and Spark Weka (sec) Our system (sec) KDD 165 9.36 DARPA’99 26 5 MAWILab – 4.54 Total 191 18.9

200

R. Ben Fekih and F. Jemili

To be most effective while using Weka, we still have to check an intrusion on every dataset available, a single execution at a time. So, we will solve this issue by fusing these datasets as it is possible for Spark to handle them this way with small execution times as well (Table 5). Table 5. Runtime per operation Operations Spark in cloud (sec) Eliminate redundancy of 3 datasets 1.21 Join datasets 1.35 Train model 2.25 Inference 2.80 Total 7.61

7 Conclusion We achieved in this paper a successful combination of Cloud computing, Spark and intrusion detection datasets to build distributed IDS, reaching several benefits. After merging NSL-KDD, MAWILab and DARPA’99 datasets, we implemented Naïve Bayes algorithm to train our model. The main achievements of our work on intrusion detection are the storage of datasets in the Cloud, which allows us to have a distributed system, and the use of Spark power to join and analyze large and heterogeneous structures of intrusion datasets. Naïve Bayes classifier shows good performance especially while dealing with intrusions carrying high records number. However, there is another problem hanging in our approach, which is the rapidity of data analysis with Spark. The proposed IDS architecture uses only one cluster, in future work, we will perform our dataset analysis with several clusters to achieve faster results. In addition, we will develop our approach with other classifiers to get better results.

References 1. Keegan, N., Ji, S.-Y., Chaudhary, A., Concolato, C., Yu, B., Jeong, D.H.: A survey of cloudbased network intrusion detection analysis. Hum.-Centric Comput. Inf. Sci. 6(1), 19 (2016) 2. Zuech, R., Khoshgoftaar, T.M., Wald, R.: Intrusion detection and big heterogeneous data: a survey. J. Big Data 2(1), 3 (2015) 3. Frank, J.: Artificial intelligence and intrusion detection: current and future directions. In: Proceedings of the 17th National Computer Security Conference, vol. 10, pp. 1–12 (1994) 4. Akbar, S., Srinivasa Rao, T., Ali Hussain, M.: A hybrid scheme based on big data analytics using intrusion detection system. Indian J. Sci. Technol. 9, 33 (2016) 5. Reghunath, K.: Real-time intrusion detection system for big data. Int. J. Peer to Peer Netw. (IJP2P) 8(1) (2017)

Distributed Architecture of an Intrusion Detection System

201

6. Essid, M., Jemili, F.: Combining intrusion detection datasets using MapReduce. In: Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2016); 10/2016 - Budapest, Hungary 7. Elayni, M., Jemili, F., Using MongoDB databases for training and combining intrusion detection datasets. In: Lee, R. (ed.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 17–29. Springer International Publishing. ISBN: 978-3-319-62048-0. https://doi.org/10.1007/978-3-319-62048-0_2 8. Esteves, R.M., Pais, R., Rong, C.: K-means clustering in the cloud—a mahout test. In: Proceedings of the 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications, WAINA ’11. IEEE Computer Society, pp. 514– 519 (2011) 9. Ghodsi, A.: The databricks unified analytics platform (2017). https://databricks.com/ 10. Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X, Rosen, J., Venkataraman, S., Franklin, M.J., and others. Apache Spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016) 11. Brugier, R.: A tour of Databricks Community Edition: a hosted Spark service (2016). https:// web.cs.dal.ca/*riyad/Site/Download.html 12. Stockinger, K.: Brave New World: Hadoop vs. Spark. Datalab Seminar (2015) 13. Pan, S.: The Performance Comparison of Hadoop and Spark. St. Cloud State University, St. Cloud (2016) 14. Nathon: Apache Spark setup. nathontech (2015). https://nathontech.wordpress.com/2015/11/ 16/apache-spark-setup/ 15. David Cournapeau: Scikit-learn (2017). http://scikit-learn.org/stable/ 16. University of New Brunswick: UNB datasets (2017). http://www.unb.ca/cic/datasets/nsl. html 17. Lincoln Laboratory: DARPA 99 (2017). https://web.cs.dal.ca/*riyad/Site/Download.html 18. DARPA 99 Homepage. https://web.cs.dal.ca/*riyad/Site/Download.html 19. Fontugne, R., Borgnat, P., Abry, P., Fukuda, K.: MAWILab: combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In: ACM CoNEXT ’10, p. 12. Philadelphia, PA (2010) 20. Mazel, J., Fontugne, R., Fukuda, K.: A taxonomy of anomalies in backbone network traffic. In: Proceedings of 5th International Workshop on TRaffic Analysis and Characterization (TRAC 2014), pp. 30–36. http://www.fukuda-lab

An Affective Tutoring System for Massive Open Online Courses Mohamed Soltani1, Hafed Zarzour1(&), Mohamed Chaouki Babahenini2, and Chaouki Chemam3 1

LIM Laboratory, Department of Computer Science, University of Souk Ahras, 41000 Souk Ahras, Algeria [email protected], [email protected] 2 University of Mohamed Khider, Biskra, Algeria 3 Department of Computer Science, University of El-Tarf, 36000 El-Taref, Algeria

Abstract. In the last years, the concept of Massive Open Online Course (MOOC) is widely regarded as new, innovative and creative model for free online learning at large-scale participation from the most prestigious universities around the world. On the other hand, the intelligent tutoring systems (ITS) have been developed to support one of the most successful educational forms “individual teaching”. Recent researches demonstrate that emotions can influence human behavior and learning, as a result, a new generation of ITS is born, that is Affective Tutoring System (ATS). However, there is no study showing the importance of using ATS in MOOCs. Therefore, this paper presents a novel approach for developing an affective tutoring system for the MOOCs, which is called ATS-MOOCs. Such system can easily help students to improve their learning performance by recognizing their affective states and then adapting the MOOC content accordingly. A prototype was developed and a case study was presented to demonstrate the feasibility of the proposed approach. Keywords: Massive open online course  MOOC  Affective tutoring system  Emotion detection  Emotional awareness  Intelligent tutoring system  Facial expression

1 Introduction Intelligent Tutoring Systems (ITS) are computing learning environments coming from Intelligent Computer Assisted-Learning, which are designed in order to provide immediate personalized instruction or feedback to students. Really, they have been developed to meet the limitations of the Computer Assisted-Learning by using artificial intelligence to give more flexibility and interactivity for the system to adapt for the specific needs of the learners by evaluating and diagnosing their problems to provide them with the necessary help [1]. The idea behind is to simulate the behavior of a human tutor in his capacity as an expert pedagogue and an expert in the domain. Thus, just like a tutor, software of this type has the potential to lead the learner to perform a task and provide pertinent © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 202–211, 2020. https://doi.org/10.1007/978-3-030-21005-2_20

An Affective Tutoring System for Massive Open Online Courses

203

feedback on their actions (personalized content, feedback, navigation…etc.). ITSs need to place the learner at the center of the learning process [2]. Adaptation is possible with the information integrated into the traditional architecture of an ITS, as shown in Fig. 1. Such architecture is composed of the following models: – Domain model is the component, which allows the system to know what the learner must know about the subject taught and to be able to use the application in an optimal way. This model must be defined by one or more domain experts. – Learner model makes at a specific moment the state of knowledge of the learner. – Pedagogical model is the component which can make teaching choices according to the students’ behavior and model. – An interface module that transmits and decodes information’s from the system to the user and vice versa.

Fig. 1. ITS traditional architecture.

The domain of ITS has inherited its ideas based on some strategies of learning such as cognitivism and constructivism to focus on the processes in which the learners study. Recently, many researchers have added a new component, which is emotion, to the cognitive process because they proved that there is an important relationship between emotions and learning. After the publication of the book entitled “Affective Computing” by Picard [3], the affective computing concept has been considered as an artificial intelligence branch focusing on the process aiming at designing intelligent environments. These environments could reproduce some intelligent facts including recognizing and human emotions explaining. Nowadays, the idea of affective computing is used in the development of new ITSs. Thus, a new era of learning environments is opening [4]. The inclusion of the aspect of the emotional information in the ITSs gave a new generation called: Affective Tutoring System (ATS). An ATS can be defined as an ITS with the ability to take into consideration the knowledge state as well as the affective state of the students with the intention of individualizing the learning [4, 5]. Generally, additional components are introduced into the architecture of the ITS that are:

204

M. Soltani et al.

– Emotional recognition, which includes the detection and analysis of the learner’s specific characteristics such as facial expression, voice, and gestures; and thereafter the application of a classification tool to identify the emotion. – An emotional or affective response module, which is viewed as a part of the pedagogical module [6], that provides reasoning about the current situation and the learners’ emotional state [7]. On the other hand, the technology of MOOC is introduced as a new, innovative and creative model for free online learning at largescale participation from the most prestigious universities around the world. However, there is no study showing the importance of using ATS in MOOCs. Therefore, this paper presents a novel approach for developing an affective tutoring system for the MOOCs, which is called ATS-MOOCs. Such system can easily help students to improve their learning performance by recognizing their affective state and then adapting the MOOC content accordingly. The reminder of this paper is organized as follows. In Sect. 2, we present some of the related works. In Sect. 3, we show the main components of the proposed approach. In Sect. 4, we present a case study showing both the implementation and the use of the proposed approach. Finally, Sect. 5 provides our conclusions and possible future work.

2 Related Work 2.1

Emotion and Education

Emotion [8] is an important concept in the field of the teaching and learning, more practically, inside the new teaching online systems. This concept plays a vital role in how a student learns or understands news things. Each state of emotion can be a negative or positive. For example, a teacher can capture the students’ disgust in order to adapt the learning content according to state of emotion. A negative emotion of students can help the instructor to change its course to be more interested. For instance, the state of fear related to given students could encourage them to study more [9]. Many researches have stated the importance of the use of emotions in the context of learning [10–12]. Based on the above studies, it can be inferred that there is an effect of the management of the emotional state for the learner as well as for the teacher on the learning. For example, an educator can provide an immediate emotional feedback to the students. This help them to ensure the emotional safety [13]. A good emotional feedback leads to an improvement in the students’ emotional state [14]. Hence, it is important for the instructor to give to their students the most appropriate emotional feedback [15]. For instance, applause that is considered as emotional reactions, can reduce the negative emotional states of learners [16]. 2.2

MOOC

The progress in the technology has influenced the development of the online learning systems by covering the geographical distance between teachers, learners and organization. This leads to increase the learners’ knowledge on some fields. Nowadays,

An Affective Tutoring System for Massive Open Online Courses

205

a huge number of universities over the world are trying to balance this high level of demand using the MOOC. The notion of MOOC was coined in 2008 by George Siemens and Stephen Downes after completing the online course CCK08 [17]. Many e-learning platforms of MOOC have been emerged as an alternative to the traditional leaning methods. Nowadays, there are several platforms that provider MOOCs from the most prestigious universities such as Coursera with edX with 3 million registered, 10.5 million registered, Udacity with 1.5 million registered [18]. The xMOOCs and the cMOOCs are the heirs of the OpenCourseWare movement and the MOOCs of Georges Siemens and Stephen Downes, respectively [19, 20].

3 Proposed Approach The architecture of our ATS-MOOCs includes five functional layers and a number of subcomponents in each layer. These layers are presented as: (1) Student, (2) Interface, (3) Network, (4) Tools & Applications and (5) Data, as shown in Fig. 2.

Fig. 2. Architecture of the proposed solution.

206

3.1

M. Soltani et al.

Student Layer

At the top of the architecture, we find the layer of student corresponding of the set of learners who can follow the online MOOC by accessing to the course according to the calendar set by the teacher. 3.2

Interface Layer

It is the interface of the learner in which he/she can interact with the MOOC via different devices such as: a laptop, or smart phone. Regardless of the type of used devices, the devices are considered as the virtual workspace for all students. In addition to webcam used in devices, this layer includes two other components that are adaptive course and pedagogical agent. 3.3

Network Layer

It is obvious that our system uses Internet as a communication way in order to deliver the courses to the learners and then retrieve data from them. 3.4

Tools and Applications Layer

The layer of tools and applications is the most interesting in the overall architecture. It consists of the online MOOC that uses a webcam and an emotional recognition tool. The tool of emotional recognition is used for: – Analyzing the students’ facial expression. – Classifying their emotional states. The facial emotion recognition tool will interpret the emotional state of learners during their interactions with the MOOC content. This will trigger a timely feedback, for example: making the learners aware of their emotional states, adaptation of the course. By using this mechanism, we can retrieve factors influencing both the learners’ performance and their learning process [21–23]. In addition, this tool is able to distinguish the following emotions: happiness, neutrality, sadness, surprise, anger, fear, scorn and disgust. These states are grouped in two different categories that are positive and negative emotions. For the component of rules engine, it handles the didactic rules and triggers the relevant rules to provide feedback: – By adapting course content based on a specific didactic approach based on welldefined rules. For example, following a negative emotional state will choose a course in the objective is: – Familiarization: it aims to accustom the learner to manipulate a concept; – Clarification: it corresponds to the need to clarify or elucidate a concept; – Reinforcement: it corresponds to the consolidation the concept.

An Affective Tutoring System for Massive Open Online Courses

207

Figure 3 shows the flow chart that explains the learning process and the emotion recognition. Many studies have showed the importance of using the pedagogical agent, especially when providing feedbacks to learners to improve not only their motivations and affective states but also their metacognitive aspects [24, 25]. For the component of web service, it enables to send the feedback about the content of the course presented to the learner as well as transmit the behavior of the pedagogical agent in the situation. At this level, the learners can receive feedback based on their facial expression. 3.5

Dataset Layer

The dataset layer consists of the repositories that store information about learners’ profiles as well as their emotions. Other kinds of data are also stored such as: data statistics for the analysis of emotions and the evaluation of educational content.

Fig. 3. Flow chart of learning process and emotion recognition.

208

M. Soltani et al.

4 A Case Study A prototype of ATS-MOOCs was developed for conducting a computer science MOOC. The subject of the selected MOOC was “Introduction to algorithms”. The MOOC was divided into ten lessons according to a specific agenda defined by an experienced teacher. For the emotional recognition of the learner, we used a tool that allows to extract the facial expression from images captured by webcams during the learning activity. Then, we established the classification of the emotions using the API provided by Microsoft, named Project Oxford. This API allowed us to build more personalized application. In the recognition stage, the system identifies students’ faces, and then measures their emotions according to the following affective states: happiness, sadness, surprise, anger, fear, scorn, disgust, or neutrality. For the pedagogical agent, we used another technology developed by Microsoft, called Microsoft Agent. The pedagogical agent enabled to give additional information for the learners about any topic, and inform them about their emotional state.

5 Conclusion and Discussion In this study, we have presented ATS-MOOCs, which is considered as an effective affective tutoring system for the MOOCs. It aims at guiding learners during their learning process in order to improve their learning performance. The proposed system adopts an efficient mechanism to recognize the affective states of students using their facial expression and then generates the MOOC content accordingly. A prototype was developed and a case study was presented to demonstrate the feasibility of the proposed approach. Moreover, our system operates according to the proposed architecture based on feedback generated by merging the emotional recognition and the adaptation engine, which applies well-defined rules to ensure: – The possibility of getting an optimal emotional state for obtaining better learning results [26, 27]. – The recognition of all negative emotions to provide feedbacks to minimize the negative effects of students. This can help in increasing learners’ motivations [28]. – The reduction of the risks of uncertainty and interventions of the ITSs with the improvement of the capacities of adaptation in the system [29]. – The automatic generation of more effective decisions that imitate those provided by the human teachers [30]. – The promotion of learner involvement and confidence [31–35] to reduce the abundance rate in tradition MOOC. Future studies are encouraged to do further study integrating the functionalities provided by our system in some of the existing MOOC platforms such as EDx and courser as well as using Linked Data as educational technology to improve the students’ learning performance [36, 37].

An Affective Tutoring System for Massive Open Online Courses

209

References 1. Xhakaj, F., Aleven, V., McLaren, B.M.: Effects of a dashboard for an intelligent tutoring system on teacher knowledge, lesson plans and class sessions. Artifi. Intell. Educ., 582–585 (2017) 2. Bradáč, V., Kostolányová, K.: Intelligent tutoring systems. E-Learn., E-Educ., Online Training, 71–78 (2016) 3. Picard, R.W.: Affective Computing. MIT Press, Cambridge, MA (1997) 4. Petrovica, S., Anohina-Naumeca, A., Ekenel, H.K.: Emotion recognition in affective tutoring systems: collection of ground-truth data. Procedia Comput. Sci. 104, 437–444 (2017) 5. Khalfallah, J., Slama, J.B.H.: Facial expression recognition for intelligent tutoring systems in remote laboratories platform. Procedia Comput. Sci. 73, 274–281 (2015) 6. Kaklauskas, A., Kuzminske, A., Zavadskas, E.K., Daniunas, A., Kaklauskas, G., Seniut, M., Raistenskis, J., Safonov, A., Kliukas, R., Juozapaitis, A., Radzeviciene, A., Cerkauskiene, R.: Affective tutoring system for built environment management. Comput. Educ. 82, 202– 216 (2015) 7. Thompson, N., McGill, T.J.: Genetics with Jean: the design, development and evaluation of an affective tutoring system. Educ. Technol. Res. Dev. 65(2), 279–299 (2016) 8. Stillman, S.B., Stillman, P., Martinez, L., Freedman, J., Jensen, A.L., Leet, C.: Strengthening social emotional learning with student, teacher, and schoolwide assessments. J. Appl. Dev. Psychol. 55, 71–92 (2017) 9. Staus, N.L., Falk, J.H.: The role of emotion in informal science learning: testing an exploratory model. Mind, Brain, Educ. 11(2), 45–53 (2017) 10. García-Peñalvo, F.J., Hermo, V.F., Blanco, Á.F., Sein-Echaluce, M.: Applied educational innovation MOOC. In: Proceedings of the Second International Conference on Technological Ecosystems for Enhancing Multiculturality - TEEM’14 (2014) 11. Kim, C., Hodges, C.B.: Effects of an emotion control treatment on academic emotions, motivation and achievement in an online mathematics course. Instr. Sci. 40(1), 173–192 (2011) 12. Munoz-Merino, P.J., Fernandez Molina, M., Munoz-Organero, M., Delgado Kloos, C.: Motivation and emotions in competition systems for education: an empirical study. IEEE Trans. Educ. 57(3), 182–187 (2014) 13. Feidakis, M., Caballé, S., Daradoumis, T., Jiménez, D.G., Conesa, J.: Providing emotion awareness and affective feedback to virtualised collaborative learning scenarios. Int. J. Cont. Eng. Educ. Life-Long Learn. 24(2), 141–167 (2014) 14. Bahreini, K., Nadolski, R., Westera, W.: FILTWAM - a framework for online affective computing in serious games. Procedia Comput. Sci. 15, 45–52 (2012) 15. Jennings, P.A.: CARE for teachers: a mindfulness-based approach to promoting teachers’ social and emotional competence and well-being. In: Handbook of Mindfulness in Education, pp. 133–148 (2016) 16. Reguera-Alvarado, N., de Fuentes, P., Laffarga, J.: Does board gender diversity influence financial performance? evidence from Spain. J. Bus. Ethics 141(2), 337–350 (2015) 17. Fini, A.: The technological dimension of a massive open online course: the case of the CCK08 course tools. Int. Rev. Res. Open and Distrib. Learn. 10(5) (2009) 18. Sanchez-Gordon, S., Lujan-Mora, S.: Adaptive content presentation extension for open edX. Enhancing MOOCs accessibility for users with disabilities. In: 2015 8th International Conference on Advances in Computer-Human Interactions, February 2015

210

M. Soltani et al.

19. Fidalgo-Blanco, Á., Sein-Echaluce, M.L., García-Peñalvo, F.J.: From massive access to cooperation: lessons learned and proven results of a hybrid xMOOC/cMOOC pedagogical approach to MOOCs. Int. J. Educ. Technol. High. Educ. 13(1), 24 (2016) 20. Soltani, M., Zarzour, H., Babahenini, M.C.: Facial emotion detection in massive open online courses. In: World Conference on Information Systems and Technologies, pp. 277–286 (2018) 21. Leony, D., Parada Gélvez, H.A., Munoz-Merino, P.J., Pardo Sánchez, A., Delgado Kloos, C.: A generic architecture for emotion-based recommender systems in cloud learning environments (2013) 22. Feidakis, M., Daradoumis, T., Caballe, S.: Endowing e-learning systems with emotion awareness. In: 2011 Third International Conference on Intelligent Networking and Collaborative Systems, November 2011 23. Bahreini, K., Nadolski, R., Westera, W.: FILTWAM and voice emotion recognition. In: Games and Learning Alliance, pp. 116–129 (2014) 24. Bahreini, K., Nadolski, R., Westera, W.: FILTWAM - a framework for online affective computing in serious games. Procedia Comput. Sci. 15, 45–52 (2012) 25. Domagk, S.: Do pedagogical agents facilitate learner motivation and learning outcomes? J. Media Psychol. 22(2), 84–97 (2010) 26. Azevedo, R., Landis, R.S., Feyzi-Behnagh, R., Duffy, M., Trevors, G., Harley, J.M., Bouchet, F., Burlison, J., Taub, M., Pacampara, N., Yeasin, M., Rahman, A.K.M.M., Tanveer, M.I., Hossain, G.: The Effectiveness of Pedagogical Agents’ Prompting and Feedback in Facilitating Co-adapted Learning with MetaTutor. Lecture Notes in Computer Science, pp. 212–221. Springer, Berlin, Heidelberg (2012) 27. Ochs, M., Frasson, C.: Optimal Emotional Conditions for Learning with an Intelligent Tutoring System. Lecture Notes in Computer Science, pp. 845–847. Springer, Berlin, Heidelberg (2004) 28. Tarimo, W.T., Hickey, T.J.: Fully integrating remote students into a traditional classroom using live-streaming and TeachBack. In: 2016 IEEE Frontiers in Education Conference (FIE), October 2016 29. Kaklauskas, A., Kuzminske, A., Zavadskas, E.K., Daniunas, A., Kaklauskas, G., Seniut, M., Raistenskis, J., Safonov, A., Kliukas, R., Juozapaitis, A., Radzeviciene, A., Cerkauskiene, R.: Affective tutoring system for built environment management. Comput. Educ. 82, 202– 216 (2015) 30. Agnieszka, L.: Affect-awareness framework for intelligent tutoring systems. In: 2013 6th International Conference on Human System Interactions (HSI), June 2013 31. Lin, H.-C.K., Wu, C.-H., Hsueh, Y.-P.: The influence of using affective tutoring system in accounting remedial instruction on learning performance and usability. Comput. Hum. Behav. 41, 514–522 (2014) 32. Boujlaleb, L., Idarrou, A., Mammass, D.: The impact of perspective communities on information flow in social networks. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 184–188 (2016) 33. Jaouedi, N., Boujnah, N., Htiwich, O., Bouhlel, M.: Human action recognition to human behavior analysis. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 263–266 (2016) 34. Al-Janabi, S., Al-Shourbaji, I.: A smart and effective method for digital video compression. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 532–538 (2016)

An Affective Tutoring System for Massive Open Online Courses

211

35. Al-Janabi, S., Al-Shourbaji, I.: A hybrid image steganography method based on genetic algorithm. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 398–404 (2016) 36. Zarzour, H., Sellami, M.: A linked data-based collaborative annotation system for increasing learning achievements. Educ. Tech. Res. Dev. 65(2), 381–397 (2017) 37. Zarzour, H., Sellami, M.: An investigation into whether learning performance can be improved by CAALDT. Innov. Educ. Teach. Int. 55(6), 625–632 (2018)

Rationality Measurement for Jadex-Based Applications Toufik Marir1,2(&), Hadjer Mallek2, Sihem Oubadi1,2, and Abd El Heq Silem3 1

Research Laboratory on Computer Science’s Complex Systems (ReLa (CS)2), University of Oum El Bouaghi, Oum El Bouaghi, Algeria [email protected], [email protected] 2 Department of Mathematics and Computer Science, University of Oum El Bouaghi, Oum El Bouaghi, Algeria [email protected] 3 Faculty of Sciences of Tunis, University of Tunis El Manar, LIPAH-LR11ES14, 2092 Tunis, Tunisia [email protected]

Abstract. Nowadays, measurement becomes a primordial technique in any software project. By measurement, we mean, the process of assigning a value to an attribute. However, measurement must take into account the specificities of novel software paradigms. Hence, we propose in this paper some metrics to measure the rationality of agents. Despite the importance of the rationality as one of reasoning characteristic; there is no measure that targeted this characteristic. The proposed metrics are applied on Jadex platform which is one of well-known agent platforms. In addition, a tool is developed to measure automatically the proposed metrics. The developed tool is based mainly on aspectoriented programming. Keywords: Rationality  Measurement  Multi-agent systems Aspect-oriented programming  AspectJ

 Jadex 

1 Introduction Software development as an engineering activity should be based on using metrics and measurement in order to make rational decisions and to avoid subjective ones [1]. In fact, measurement consists of the assignment of a value to an attribute of an object or an event [2]. Consequently, the measurement activity increases our ability of understanding and mastering the studied object. However, the measurement in software engineering is a hard activity because the abstract nature of software [1]. Consequently, a special attention is made by the domain community by proposing different metrics like the complexity [3], the quality of software [4, 5] and the quality of experience [6]. Obviously, it is very important to propose metrics that reflects the specific characteristics of new software paradigms. Consequently, multi-agent systems (as one of these paradigms) require their own metrics. Indeed, several works proposed specific metrics for agent-based software [7]. These works targeted common characteristics of © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 212–221, 2020. https://doi.org/10.1007/978-3-030-21005-2_21

Rationality Measurement for Jadex-Based Applications

213

software like the complexity [8], the quality [9] and the architecture [10] as well as specific characteristics of multi-agent systems like the autonomy [11], the social ability [12] and the pro-activity [13]. We think that proposing new metrics for multi-agent systems is a very promising area because the current application fields of this software paradigm. The rationality is an important agent’s characteristic [14]. However, there is no study addressed its measurement. In our opinion the lack of specific metrics for this characteristic is due to considering it by the research community as an optional agent’s characteristic compared to the fundamental characteristics (like autonomy, social ability and pro-activity). Almost works proposed in this field targeted only the fundamental characteristics [11–13]. However, we think that this field is reached a maturity level allowing the study of more optional ones. The aim of this paper is to measure the rational of agents developed using Jadex platform [15]. Moreover, we developed a prototype tool allows calculating these metrics automatically. This paper is structured as follows: Sect. 2 is devoted to present related works. Then, we present in Sect. 3 an overview about Jadex platform followed by presenting the proposed metrics and the developed tool. Finally, we present conclusion and some future works.

2 Related Work Proposing new metrics for multi-agent systems is in full evolution [7]. In fact, we can distinguish mainly two kinds of metrics proposed in this field. Firstly, some works proposed new metrics for common characteristics of software with taking into account the specificities of multi-agent systems. The quality [9], the complexity [8] and the architecture [10] are examples of these characteristics. Secondly, several works targeted measure of specific attributes of multi-agent systems. On this context, works proposed by Alonso et al. [11–13] represent the most prominent. In a serial of works, the authors proposed metrics for the most important characteristics of multi-agent systems like the autonomy, the social ability and the pro-activity. In each work, the targeted characteristic is devised into its main attributes and these attributes are measured using a set of metrics. Despite works presented in this field, we can remark that several characteristics of multi-agent systems are omitted. In fact, we think that the research community targeted in first time the fundamental characteristics of such systems. We believe also that this software paradigm has reach a maturity level allowing addressing some characteristics considered as optional. In this work we addressed one of these characteristics called the rationality.

3 Jadex Platform Jadex [15] is an extension of JADE platform to allow implementing of BDI (Beliefdesire-Intention) model [16]. This later is a model conceived to explain the complex human behavior in a simple way. Thus, the BDI model is based on mental attitudes

214

T. Marir et al.

(beliefs, desires and intentions) that are modeled as possible world states. Hence, beliefs represent the state of the environment as is perceived by the agent. On the other hand, desires are the world states that the agent wants to reach. Intentions represent desires that the agent started to realize. Consequently, Jadex can be seen as a reasoning engine allows selecting the suitable plan based on the beliefs and the goals of the agent. Figure 1 shows the abstract architecture of Jadex platform. It is composed of three main components [15]:

Fig. 1. The main components of Jadex platform.

• Beliefs: in contrary to other BDI systems that represent beliefs using some kinds of first order logic or relational models (like Jason and JACK), Jadex adopted the object-oriented paradigm for beliefs representation in order to simplify the development task. In fact, adopting software engineering perspective to describe agents can be considered as one of the goals of Jadex project team. Hence, beliefs are stored as simple facts (called belief) or as a set of facts (called belief set). • Goals: are representing desires of an agent. After creating a goal, this later can be in active, option or suspended state. In fact, an option is a new goal created and adopted by an agent (it was added to the goalbase of the agent). After goal deliberation, the agent can decide to realize this goal (activate it) or suspend it if its context is not valid. Moreover, we distinguish four types of goals: perform, achieve, query and maintain ones. Perform goals are those defined by a specific plan. Hence, an agent will reach this kind of goal when it executes the related plan. An achieve goal is a goal specified by the desired state of the world without specifying the related plan to reach this state. Is to agent to find the plan to reach this desired state. The query goals are similar to achieve goals but they are related to a state of the agent instead of a state of the world. Finally, in the case of a maintain goal, the aim of the agent is to preserve a desired state. Hence, the agent will execute an adequate plan to re-establish this state if it is changed. • Plans: behaviors of agents are specified by plans. Each plan is composed of two parts (called, a head and a body). The first part (the head) specifies goals related to

Rationality Measurement for Jadex-Based Applications

215

the plan, events handled by it and preconditions for its execution. The body of a plan represents the actions to be executed when the plan selected. In addition to these three main components, Jadex introduced the capabilities component that represents a grouping mechanism for the above elements (beliefs, goals and plans) to ensure the reusability attribute. Jadex provides a hybrid approach to develop multi-agent systems. It is based on existing agent programming language (based on Java) with an XML extension to specify the static aspects of the agent. Hence, the static aspects of an agent are specified with Agent Definition File (ADF) and it includes description of beliefs, goals, head of plans and their initial values. Contrariwise, the dynamic aspects of an agent which represent the plan bodies are represented by Java language with an API that allows accessing to BDI facilities.

4 The Proposed Metrics and the Developed Tool Obviously, understanding a concept is the cornerstone of proposing metrics to evaluate it. However, formalizing unanimous definition of ambiguous concepts (like the rationality, the intelligence and the quality) is a hard task. One of reasons that made of the rationality concept an ambiguous one is its nature as one of concepts studied by various fields. In fact, the rationality is a concept which is studied in different fields from philosophy, sociology, economy to applied sciences like artificial intelligence [16]. Naturally, each field studied the concept from its own point of view. For example, a rational behavior is viewed in philosophy and sociology as a behavior emerged and influenced by reasoning process [18]. However, in economy the rationality is defined as using reasoning process to reach a goal within optimized way [19]. Hence, the rationality in economy implies enhancing profits and minimizing costs. The rationality of agents can be studied according to the artificial intelligence point of view because multi-agent systems are emerged from the interaction between artificial intelligence, distributed systems and software engineering. So, Russel and Norvig [17] define the rational agent as an agent which executes the adequate behavior. An adequate behavior is a behavior that makes the agent more close to his goal. Moreover, an ideal rational behavior is a rational one that minimizes the cost of reaching its goal [17]. In fact, taking into account trade-off between cost and other pertinent attributes is the subject of several recent studies [20]. Also, a rational agent can always justify his decisions because they are the result of deliberation reasoning [21]. Choosing the adequate behavior should be based on the current state of the agent and its environment. In fact, this state is represented by the agent’s beliefs. In addition, an agent should be able to update its beliefs according to the environment state. For this reason, a rational agent has perception abilities that allow to him to react to its environment. Finally, we can conclude that the rationality of an agent is determined depending on the following characteristics:

216

T. Marir et al.

• The existence of a reasoning process that justify the decision of the agent. • A set of actions allows to agent to choose the adequate action depending on the current situation. • A set of beliefs allows representation of the agent’s state and its environment to ensure a suitable decision. • A minimum using of agent’s resources. • A successful execution of plans which result of the reasoning process. 4.1

The Proposed Metrics of the Rationality of Jadex Agent

After specifying the main characteristics of a rational agent, we will present in this section some metrics to measure them. Thus, the rationality can be measured using the following metrics: • The average size of plans: as is already explained, the rationality is strongly dependent to the cost to reach the goal. So, to reach its goal, an agent should execute plans. The average size of plans can give us an idea about some resources like the execution time and memory space. Knowing that the plans are coded as Java classes in Jadex platform, we can calculate the average size of plans as the average number of instructions used to code all the plans. • The average number of plans: a rational agent is one that selects the suitable plan to reach its goal. In fact, choosing a plan is a key factor for rational agent because we can not consider an agent as rational one if he has not the choice even if the executed plan is adequate. Consequently, we propose to calculate the number of plans compared to the number of goals as a measure of the rationality. While the plans are coded as a Java classes, in Jadex platform the goals are described in the description of the agent (ADF). • The ratio of beliefs per goal: beliefs represent a fundamental element to ensure reasoning process. This later is a key factor of a rational behavior. In the other hand, beliefs are one of elements that use memory resources. Consequently, we calculate the number of beliefs described in the agent compared to the number of its goals as a measure of the rationality. • The number of executed plans: an agent can use various resources dependently to the application. Consequently, it is hard difficult to propose generic metrics that specify all the used resources. However, we can remark that using resources is closely dependent to executed plans. Hence, we propose the number of executed plans as a measure of the rationality that reflects using of resources. We can calculate the number of executed plans by calculating the number of executed body() method. • The average execution time of plans: despite that we proposed to measure the rationality using the size of plans, we think that this measure is not enough. In fact, the size of plans considers that all the instructions are equal during the execution. Practically, the instructions have not the same execution time because the existence of loops and alternative instructions. Hence, we propose to measure the average execution time of plans as an indicator of used resources. This metric is calculated by calculating the execution time of body() method compared to number of body() execution.

Rationality Measurement for Jadex-Based Applications

217

• The number of accessing to beliefs: access to beliefs by reading or updating them is an indicator of reasoning. In fat, reasoning process consists of using the current state of beliefs to generate new beliefs or actions by means of a set of plans. Moreover, updating the beliefs’ state can also make because updating the environment state. It is an indicator of using updated information about the environment during the reasoning process which enhances the rationality. Access to the beliefs in Jadex platform can be made by two functions: setFact() and getFact(). Thus, we calculate the number of execution of these functions as a measure of the rationality. • The ratio of achieved plans: we explained above that the rationality is related to the plans success. Hence, we propose the ratio of achieved plans as a measure of the rationality. This ratio is obtained by division the number of achieved plans by the number of executed ones. An achieved plan in Jadex Platform is recognized by the execution of the method passed(). 4.2

The Prototype Tool and a Case Study

In order to calculate automatically the proposed metrics, a tool has been developed as a form part of this work. This tool allows measure static and dynamic metrics. Static metrics (like the average size of plans) are calculated from the code of the application under the evaluation. In the other hand, dynamic metrics (like the ratio of achieved plans) require the execution of the application. Dynamic metrics are calculated using aspect-oriented programming [22].This paradigm is relatively a new one which is proposed to enhance the modularity of software by separating crosscutting concerns from core concerns. The basic element in aspect-oriented programming is the aspect. An aspect includes several entities like pointcuts, Join points and advices. A join point represents a well-defined point in the execution of a program in which an aspect will be integrated. Contrariwise, a pointcut allows the specification of join points in the aspect. It represents a method execution, call of a method or update of an attribute. Finally, an advice is a specification of code that will be executed by the aspect. Marir et al. [23] demonstrated the advantages of using aspect-oriented programming to collect dynamic metrics in agent-based applications. In fact, it provides the simplicity, reusability and extensibility [23]. Figure 2 shows the principle of this software paradigm.

Fig. 2. The principle of aspect-oriented programming.

218

T. Marir et al.

The static metrics are obtained by analyzing the applications’ code. Knowing that a Jadex-based application is composed of two files (one XML file to describe the agent (ADF) and the another a Java file to describe the plans), the static metrics consisted in analyzing both of them. Figure 3 presents the architecture of the developed tools. We used an open source API (called JDOM) [24] to represent and manipulate the ADF files in order to calculate the static metrics. Moreover, the tool is composed of a library of aspects developed using AspectJ [25]. These aspects allow picking out the execution of essential events related to the proposed metrics (like the execution of body() and passed() methods).

Fig. 3. The abstract architecture of the developed tool.

In this paper, we applied the proposed metrics to Mars World application [26]. This later is composed of three agents with a mission of researching resources that exists in their environment. The found resources will be transferred to the agent’s homebase. Thanks to the perception capabilities of the sentry agent, this later is the responsible to find resources. If a sentry agent found that a resource can be exploited, it calls the second agent (production agent) to produce ore using the found resources. Finally, the carry agent transfers the produced ore to the homebase. Figure 4 shows the interface of this application.

Fig. 4. The interface of Mars World application.

Rationality Measurement for Jadex-Based Applications

219

After the execution of this application using our tool, we can show the results of the different metrics. Figure 5 shows the results of the average size of plans of the three agents. As example, the average size of plans of the sentry agent is 24 because it has three plans with sizes equal to 18, 39 and 15 instructions. Moreover, the developed tool allows also calculating of dynamic metrics. Figure 6 presents the number of accessing to beliefs metric for the agents of Mars World. We can remark that this metric started from zero for all the agents and progress continuously according to the application execution.

Fig. 5. The results of the average size of plans metric.

Fig. 6. The results of the number of accessing to beliefs metric.

5 Conclusion Nowadays, measurement becomes a primordial technique in any software project. It allows, among others goals, to improve the quality of software and the process of developing them. In fact, many metrics are proposed to evaluate the different factors of software like the quality and the complexity. Moreover, proposing of metrics is in full evolution because the continuous emergence of new paradigms. For example, multiagent systems are relatively a new software paradigm which drew attention of the research community by proposing specific metrics for its different attributes like the autonomy, the reactivity and the pro-activity. Despite that the rationality is an important characteristic of intelligent agents, proposing metrics for it is omitted.

220

T. Marir et al.

Hence, we proposed in this paper some metrics that allow measuring the rationality of agent. Naturally, proposing these metrics is passed by analyzing this concept. Moreover, we presented in this paper a tool developed as part of this project that allows calculating the proposed metrics. This tool is developed using aspect-oriented programming to assess the dynamic metric, as it analysis the code of the application (XML and Java files) to assess the static metrics. We propose as a future work to enhance this work to assess other attributes like the flexibility and the intelligence. In addition, we think that the proposed metrics will be more beneficial if they are generic. So, we propose to generalize the proposed metrics to be suitable with other multi-agent platforms.

References 1. Pressman, R.S., Maxim, B.R.: Software Engineering: A Practitioner’s Approach, 6th edn. McGraw-Hill Higher Education, New York (2005) 2. ISO, ISO/IEC 9126-1: Software Engineering – Product Quality – Part 1: Quality Model, International Organization for Standardization, Geneva, Switzerland (2001) 3. McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. SE 2(4), 308–320 (1976) 4. McCall, J.A., Richards, P.K., Walters, G.F.: Factors in Software Quality, vol. 1, ADA 049014. National Technical Information Service, Springfield, VA (1977) 5. Filali, T., Chettaoui, N., Bouhlel, M.S.: Towards the automatic evaluation of the quality of commercially-oriented Web interfaces. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE (2016) 6. Torjemen, N., em Zhioua, G., Tabbane, N.: QoE model based on fuzzy logic system for offload decision in HetNets environment. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE (2016) 7. Dumke, R., Mencke, S., Wille, C.: Quality Assurance of Agent-Based and Self-Managed Systems. CRC Press, Boca Raton (2010) 8. Marir, T., Mokhati, F., Bouchelaghem-Seridi, H., Tamrabet, Z.: Complexity measurement of multi-agent systems. In: German Conference on Multiagent System Technologies, 23 September 2014, pp. 188–201. Springer, Cham 9. Marir, T., Mokhati, F., Bouchlaghem-Seridi, H., Acid, Y., Bouzid, M.: QM4MAS: a quality model for multi-agent systems. Int. J. Comput. Appl. Technol. 54(4), 297–310 (2016) 10. García-Magariño, I., Cossentino, M., Seidita, V.: A metrics suite for evaluating agentoriented architectures. In: Proceedings of the 2010 ACM Symposium on Applied Computing, 22 March 2010, pp. 912–919. ACM 11. Alonso, F., Fuertes, J.L., Martinez, L., Soza, H.: Towards a set of measures for evaluating software agent autonomy. In: Eighth Mexican International Conference on Artificial Intelligence, MICAI 2009, 9 November 2009, pp. 73–78. IEEE (2009) 12. Alonso, F., Fuertes, J.L., Martínez, L., Soza, H.: Measuring the social ability of software agents. In: Sixth International Conference on Software Engineering Research, Management and Applications, SERA’08, 20 August 2008, pp. 3–10. IEEE (2008) 13. Alonso, F., Fuertes, J.L., Martínez, L., Soza, H.: Measures for evaluating the software agent pro-activity. Computer and Information Sciences, pp. 61–64. Springer, Dordrecht (2011) 14. Verschure, P.F.M.J., Althaus, P.: A real-world rational agent: unifying old and new AI. Cogn. Sci. 27(4), 561–590 (2003)

Rationality Measurement for Jadex-Based Applications

221

15. Pokahr, A., Braubach, L., Lamersdorf, W.: Jadex: a BDI reasoning engine. Multi-Agent Programming, pp. 149–174. Springer, Boston (2005) 16. Rao, A.S., Georgeff, M.P.: Modeling rational agents within a BDI-architecture. KR 91, 473– 484 (1991) 17. Russell, Stuart J., Norvig, Peter: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Malaysia (2016) 18. https://en.oxforddictionaries.com/definition/rationality 19. https://www.ecnmy.org/learn/you/choices-behavior/what-is-rationality/ 20. Dhib, E., Boussetta, K., Zangar, N., Tabbane, N.: Resources allocation trade-off between cost and delay over a distributed cloud infrastructure. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE (2016) 21. Kabachi, N.: Modélisation et Apprentissage de la Prise de Décision dans les Organisations Productives: Approche Multi-Agents, Thèse de doctorat de l’Université Jean Monnet et de l’Ecole Nationale Supérieure des Mines de Saint-Etienne (1999) 22. Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C., Loingtier, J.M., Irwin, J.: Aspect-oriented programming. In: European Conference on Object-Oriented Programming, 9 June 1997, pp. 220–242. Springer, Berlin, Heidelberg 23. Marir, T., Mokhati, F., Bouchelaghem-Seridi, H., Benaissa, B.: Dynamic metrics for multiagent systems using aspect-oriented programming. In: German Conference on Multiagent System Technologies, 27 September 2016, pp. 58–72. Springer, Cham 24. http://www.jdom.org/ 25. Laddad, R.: AspectJ in Action. Manning Publications, New York (2009) 26. https://sourceforge.net/projects/jadex/files/jadex/2.4/

A Continuous Optimization Scheme Based on an Enhanced Differential Evolution and a Trust Region Method Hichem Talbi(&) and Amer Draa(&) MISC Laboratory, Abdelhamid Mehri Constantine 2 University, Ali Mendjeli, Algeria [email protected], [email protected]

Abstract. We present a scheme based on differential evolution and local search to solve continuous optimization problems. Improvements have been made to the basic differential evolution algorithm. We have opted for an approach with three types of population regeneration and parameter value adaptation. These improvements offer a better exploration of the search space and avoid blocking on local optima. In order to intensify the search around the neighbourhood of the solutions obtained by the differential evolution algorithm, a local search is carried out from time to time using a trust-region method. The proposed scheme was tested on the COmparing Continuous Optimizers (COCO) platform, and the obtained results showed the superiority of the proposed approach over state-ofthe-art algorithms. Keywords: Continuous optimization  Differential evolution Trust-region  Exploration  Exploitation

 Local search 

1 Introduction Many real-world problems can be formulated as optimization problems. Optimization is the process that aims at finding some variables’ values from a given search space for which a given function, called objective function, takes an optimal (minimal or maximal) value. The exhaustive exploration of all the search space is very often out of the current computers’ processing capacities. This has made exact methods unusable for solving the majority of optimization problems. Approximate methods have, thus, emerged as a better alternative which, of course, does not ensure obtaining optimal solutions, but can nevertheless reach acceptable solutions, close to the optimum, in a reasonable time. Approximate methods are often based on metaheuristics; generic heuristics that can be adapted to different types of problems. Population-based metaheuristics have been particularly effective in exploring large multidimensional search spaces. Among these metaheuristics could be mentioned: genetic algorithms, ant colonies and particle swarms. Genetic algorithms belong to a more general class of algorithms called evolutionary algorithms. The differential evolution proposed by Storn and Price [1] belongs to this class of methods. It has proved as one of the most efficient optimization methods [2]. © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 222–233, 2020. https://doi.org/10.1007/978-3-030-21005-2_22

A Continuous Optimization Scheme

223

We present in this paper a hybrid scheme based on differential evolution and local search to solve optimization problems in the continuous domain. This hybridization aims to benefit from the advantages of both methods. Differential evolution, because of its exploratory power, could offer good starting points for intense local search processes, accelerating convergence towards the best solutions in the neighbourhood of these points. An alternation between differential evolution and local search would thus give a better chance to converge towards better solutions. Careful analysis of the behaviour of the differential evolution made it possible to understand the role of the different parameters and the effect of their variation on the efficiency and the quality of the optimization process. We have opted for a strategy of mutation and crossover with variable parameters to get variable exploration and exploitation aptitudes throughout the generations of the search process. To get out of blocking situations in local optima, we have integrated three types of population regeneration. The first one concerns individuals; we have set a maximal age after which the individual should be replaced by a new generated one. The second is a systematic controlled disturbance of the whole population when its diversity is judged too restricted. The last is a total regeneration of the population if no improvement of the better solution has been observed for a very long period. To test the efficiency of the proposed approach, we have considered the optimization of the set of functions included in the reference Black-Box Optimization Benchmarking (BBOB) of the COCO platform [3, 4]. The functions have different properties and belong to different classes. They are good representatives of the different real-world problems with their different difficulty levels. The rest of the article is organized as follows. Section 2 reviews the basic notions of differential evolution and its variants. The proposed approach is presented in Sect. 3. In Sect. 4, are exposed the experimental results, analysed and compared to those offered by the state-of-the-art algorithms. The main conclusions and perspectives are presented in Sect. 5.

2 Differential Evolution Differential evolution is a stochastic search metaheuristic inspired by the genetic algorithm, it incorporates a geometric technique. From the original algorithm proposed by Storn and Price [1], several research works have been carried out to propose more robust and more efficient variants. 2.1

Basic Differential Evolution

The basic algorithm proposed in [1] uses a population of size NP. Each individual in the population is a vector of size D equal to the dimension of the search space. The ith individual of the generation G is represented by: ðGÞ

Xi

n o ðGÞ ðGÞ ðGÞ ¼ xi;1 ; xi;2 ; . . .; xi;D ; i ¼ 1; 2; . . .; NP

ð1Þ

224

H. Talbi and A. Draa ðGÞ

ðG þ 1Þ

At each generation G + 1 and for each individual Xi , a mutant vector Vi ðGÞ ðGÞ Xr1 ; Xr2

is

ðGÞ Xr3 ,

generated from three randomly-selected individuals and according to Eq. (2) which uses the mutation factor F, defined generally in the range (0,1): ðG þ 1Þ

Vi ðG þ 1Þ

ðGÞ

ðGÞ

ðGÞ

¼ Xr1 þ F:ðXr2  Xr3 Þ

ð2Þ

ðGÞ

and Xi , a differential crossover is applied according to Eq. (3) to n o ðG þ 1Þ ðG þ 1Þ ðG þ 1Þ ðG þ 1Þ generate the trial vector Ui ¼ ui;1 ; ui;2 ; . . .; ui;D : From Vi

( ðG þ 1Þ Ui;j

¼

ðG þ 1Þ

vi;j

if r\CR

ðGÞ xi;j otherwise

ð3Þ

Where r is a randomly generated number in the interval (0,1) and CR is the crossover rate belonging to the same interval. After applying the mutation and crossover operations, a selection which will maintain the same population size is made. It is a greedy selection that chooses, as indicated in Eq. (4), between the parent vector and the trial one according to the values of the objective function f to minimize, obtained for both vectors. ( ðG þ 1Þ Xi

2.2

¼

ðG þ 1Þ

Ui

ðGÞ

Xi

  ðG þ 1Þ ðGÞ if f Ui  f ðXi Þ

ð4Þ

otherwise

Differential Evolution Variants

The basic DE algorithm uses three control parameters: the size of the population NP, the mutation factor F and the crossover rate CR. In addition to the work that has been done to adapt these parameters as in [5–7], one could find in [2] more works offering different alternatives such as: – organizing the population into sub-populations with possible inter-populations communication according to different schemes, – proposing different mutation strategies, – proposing other crossover mechanisms, – proposing different ways of selection, – hybridizing differential evolution with other techniques. The first point concerns the topology of the population. Individuals are distributed over two or more subpopulations of identical or different sizes and behaving in the same or different ways [8, 9]. In addition to the original mutation strategy, there are other geometric strategies that take into consideration more randomly-selected individuals to generate the mutant vector as in Eq. (5), or using some kind of guidance by the best current individual as in Eqs. (6), (7) and (8) [7].

A Continuous Optimization Scheme

DE/rand/2: ðG þ 1Þ

Vi

  ðGÞ ðGÞ ðGÞ ðGÞ ðGÞ ¼ Xr1 þ F: Xr2  Xr3 þ F:ðXr4  Xr5 Þ

225

ð5Þ

DE/best/1: ðG þ 1Þ

Vi

ðGÞ

ðGÞ

ðGÞ

¼ Xbest þ F:ðXr1  Xr2 Þ

ð6Þ

where: best is the index of the best solution of the generation G. DE/best/2: ðG þ 1Þ

Vi

  ðGÞ ðGÞ ðGÞ ðGÞ ðGÞ ¼ Xbest þ F: Xr1  Xr2 þ F:ðXr3  Xr4 Þ

ð7Þ

DE/current-to-best/2: ðG þ 1Þ

Vi

ðGÞ

¼ Xi

  ðGÞ ðGÞ ðGÞ ðGÞ þ F: Xbest  Xi þ F:ðXr1  Xr2 Þ

ð8Þ

For the crossover operation; in addition to the so-called binomial crossover, defined in Eq. (3), there are other crossover types such as the exponential crossover. In the latter, two numbers n and L will be chosen randomly to use L genes from the position n of the mutant vector in building the trial vector [2]. Concerning selection, one can proceed differently to find a good balance between keeping the best solutions and avoiding the collapse of the necessary diversity to improve the current results. Hybridization is a very popular way to improve the performance of different techniques. Differential evolution has been combined with methods such as the biogeography-based optimization algorithm [10], ant colonies [11], BFO (Bacterial Foraging Optimization) [12], FFA (Firefly Algorithm) [13] and local search [14].

3 The Proposed Approach To improve the performance of the differential evolution algorithm, a scheme has been developed here integrating local research and the regeneration of the population (Algorithm 1). It is an alternation of the two methods where each transmits the result of its work to the other, to serve as a starting point. The differential evolution transmits two solutions to the local search method: the best solution and the last generation’s worst solution. This is motivated by the hope that the neighbourhood of that weaker solution after differential evolution might contain a better solution than that contained in the neighbourhood of the best. The analysis of the step by step progress of the algorithm confirmed this hypothesis; the frequency of occurrence of this phenomenon was significant. For the regeneration of the population that would diversify the search, we keep the best solution obtained from the local search to not restart from scratch.

226

H. Talbi and A. Draa

Algorithm 1: The General Scheme of the proposed approach Inputs: the function f to minimize, the dimension D, the lower and upper bounds of the search space {min1, min1,…,minD} and {max1, max1,…,maxD} generate a population X composed of NP individuals while (budget not consumed) and (target value not attained) do • apply the enhanced differential evolution and keep the best individual Xbest and the last generation’s worst individual Xworst • apply a local search starting from Xbest then from Xworst using the trust-region method to get eventually a new Xbest • generate a new population X composed of NP individuals including Xbest end while

The improved differential evolution algorithm is detailed in (Algorithm 2). The choices made can be summarized in the following points: • The modified DE/current-to-best/2 mutation strategy allows individuals to be guided by their “leaders” while trying to find new and interesting areas via tracking trajectories linking randomly-selected individuals. The consideration of the five best current individuals’ mean would help to avoid premature convergence into local optima. The factors F1 and F2 of the two components of this movement have been dissociated to get different combinations among reinforcing the search around the bests and reinforcing the exploration of new areas. • The parameters F1, F2 and CR are randomly generated at each iteration within the intervals [0.2, 0.53], [0.4, 0.9] and [1–2/D, 1], respectively. The intervals were fixed empirically. One could notice that unlike F1 and F2, the CR value depends on the problem dimension. In fact, intensive testing has shown different performances of the algorithm for different problem dimensions. For larger dimensions, near-to-one values for the CR parameter makes the algorithm more efficient. Contrariwise; for small dimensions, some smaller values of CR could provide better optimization behaviour. • To accelerate convergence while maintaining some diversity; we opted, in the selection phase, for the choice of the best NPR individuals (R being a fixed rate) among the set of parents and generated sons, to which we add some less good individuals who might be useful in exploring the search space. The selection should not take into consideration the individuals who have exceeded a fixed maximal age. • If the diversity of the population, measured through the standard deviation of the genes composing the selected best NPR individuals, is lost at a given moment; a Gaussian disturbance of the current population is made. • If no improvement of the best solution has been observed for a long time, we stop changing the current population and move on to the local search stage before regenerating the whole population.

A Continuous Optimization Scheme

227

228

H. Talbi and A. Draa

The algorithm chosen for local search is that based on the trust regions [15]. As summarized in Algorithm (3), the trust-region method seeks an approximation of the objective function in a certain region often by using a quadratic function. At each iteration k, a trust region subproblem will be solved: min mk ð pÞ ¼ fk þ gTk p þ

1 T p Bk p 2

ð13Þ

f, g and B denote respectively the quadratic function, its gradient and its Hessian (or an approximation of it). The trust region is then modified using the reduction calculated from Eq. (14): pk ¼

f ðXk Þ  f ðXk þ pk Þ mk ð0Þ  mk ðpk Þ

ð14Þ

If the reduction is satisfactory, the radius of the region Dk is increased to exploit more the current model in the next iteration. Otherwise, the size of the region is reduced to check the validity of the model [15].

4 Experimental Results To evaluate the performance of the proposed approach, we have used the COCO (Comparing Continuous Optimizers) platform, which makes it possible to statistically compare the different optimization algorithms on a benchmark consisting of 24 functions belonging to different classes, namely: separable (f1–f5), moderate (f6–f9), poorly

A Continuous Optimization Scheme

229

conditioned (f10–f14), multimodal (f15–f19), and weakly-structured multimodal functions (f20–f24). The algorithms are run over 15 instances of each function for dimensions 2, 3, 5, 10, 20 and 40 (dimension 40 is optional) [3, 4]. The proposed approach has been compared to a representative set among the best state-of-the-art algorithms: • The basic DE [16]. • The Multistart Bipop Covariance Matrix Adaptation Evolution Strategy (RCMAES) [17]. • The hybrid DE-Simplex algorithm [17]. • The adaptive differential evolution with optional external archive(JADE) [18]. • The restarting variant of L-SHADE (RLSHADE) [19]. • The hybrid DE-BFGS (Broyden-Fletcher-Goldfarb-Shanno) [17]. • Particle Swarm Optimization (PSO) [20]. For the choice of parameters, after long test series and an attentive analysis of the evolution of the optimization process, the following values were empirically chosen: • • • • •

NP = 250 R = 90% MaxAge = 120 MaxNoChange = 25. (20 + D) r = 10−8. (maxi - mini)

For comparison, we use the Expected Running Time (ERT) described in [20] by the developers of the benchmarking platform [3, 4]. This measure depends on a given target function value, ft = fopt + Δf, and is computed over all relevant trials as the number of function evaluations executed during each trial while the best function value did not reach fopt; summed over all trials and divided by the number of trials that actually reached fopt. For space saving reasons, we visualize only the performance of the different algorithms for the dimension 20 (the largest dimension for which the data is available for all the algorithms). Figure 1 illustrates, graphically, the obtained results. The numerical details are given in Table 1. The “best 2009” data plot corresponds to the best observed ERT during BBOB’2009 for each single instance; i.e., it is not an actual competing algorithm. We notice from Fig. 1 that our DE-fmin algorithm outperforms all other algorithms when considering all of the 24 functions. It reaches the target function value in about 83% of the instances. It performs significantly better than the others with multimodal functions (f15–f19) with almost 70% of target value realization. For the three first function categories (f1–f14), our algorithm solved all the instances, i.e. it reached a 100% success rate.

230

H. Talbi and A. Draa

Fig. 1. Bootstrapped empirical cumulative distribution of the number of objective function evaluations divided by dimension (FEvals/D) for 51 targets with target precision in 10[−8..2] for all functions and subgroups in different dimensions. As reference algorithm, the best algorithm from BBOB 2009 is shown as light thick line with diamond markers.

The numerical version of the results (Table 1) confirms the good behaviour of the proposed algorithm. The table gives: • the average Running Time (aRT) which is calculated by dividing the number of function evaluations by the corresponding best aRT reached in BBOB-2009; • the corresponding reference aRT (BBOB-2009 best aRT); • the aRT and in braces, as dispersion measure, the half difference between 10 and 90%-tile of bootstrapped run lengths for each algorithm and target;

A Continuous Optimization Scheme

231

Table 1. Numerical results for D = 20.

• the number of successes #succ gives, for each function, the number of instances for which the algorithm reaches a value of the objective function close to less than 10−8 of the best solution obtained by all the candidates in the BBOB 2009 contest; • the results in bold are the best and those marked by a star are significantly the best statistically. Further description of the functions and explanations on the interpretation of the graphs and the table are given on the official website of the COCO platform [4].

232

H. Talbi and A. Draa

5 Conclusion In this paper, we have presented an optimization scheme combining an improved version of the differential evolution algorithm and a local method using trust regions. The key choices that led to very good results are: (1) the use of a mutation strategy that accelerates the exploitation of the neighbourhood of the best current solutions, while trying to discover new areas of the search space; (2) the controlled random adaptation of mutation and crossover parameter values that gives the algorithms more improvement opportunities; (3) the three mechanisms of population regeneration that maintain a good diversity that prevents from blocking into local optima; and (4) taking advantage of the local search technique effectiveness to quickly exploit the neighbourhood of the best solutions found by the differential evolution phase. Among the limitations of the proposed approach, one can cite the difficulty of adjusting the few static parameters to treat with the same efficiency problems of different sizes or belonging to different classes. It would be interesting to implement mechanisms to dynamically adjust these parameters, especially the threshold beyond which the algorithm decides to regenerate the population.

References 1. Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11, 341–359 (1997) 2. Das, S., Mullick, S.S., Suganthan, P.N.: Recent advances in differential evolution: an updated survey. Swarm and Evol. Comput. 27, 1–30 (2016) 3. Hansan, N., Augery, A., Finck, S., Ros, R.: Real-parameter black-box optimization benchmarking: experimental setup (2015). http://coco.lri.fr/downloads/download15.03/ bbobdocexperiment.pdf 4. Official COCO website (2017). http://coco.gforge.inria.fr/ 5. Qin, A.K., Suganthan, P.N.: Self-adaptive differential evolution algorithm for numerical optimization. In: IEEE Congress on Evolutionary Computation, pp. 1785–1791, Edinburgh (2005) 6. Brest, J., Greiner, S., Boskovic, B., Mernik, M., Zumer, V.: Self-adapting control parameters in differential evolution: a comparative study on numerical benchmark problems. IEEE Trans. Evol. Comput. 10(6), 646–657 (2006) 7. Draa, A., Bouzoubia, S., Boukhalfa, I.: A sinusoidal differential evolution algorithm for numerical optimization. Appl. Soft Comput. 27, 99–126 (2015) 8. Wu, G., Rammohan, M., Suganthan, P.N., Wang, R., Huangke, C.: Differential evolution with multi-population based ensemble of mutation strategies. Inf. Sci. 329, 329–345 (2016) 9. Ge, Y., Yu, W., Lin, Y., Gong, Y., Zhan, Z., Chen, W., Zhang, J.: Distributed differential evolution based on adaptive mergence and split for large-scale optimization. IEEE Trans. Cybern. 48(7), 2166–2180 (2018) 10. Gong, W., Cai, Z., Zhang, J., Jia, L., Li, H.: A generalized hybrid generation scheme of differential evolution for global numerical optimization. Int. J. Comput. Intell. Appl. 10(1), 35–65 (2011) 11. Chang, L., Liao, C., Lin, W., Chen, L.L., Zheng, X.: A hybrid method based on differential evolution and continuous ant colony optimization and its application on wideband antenna design. Prog. Electromagnet. Res. 122, 105–118 (2012)

A Continuous Optimization Scheme

233

12. Biswal, B., Behera, H.S., Bisoi, R., Dash, P.K.: Classification of power quality data using decision tree and chemotactic differential evolution based fuzzy clustering. Swarm and Evol. Comput. 4, 12–24 (2012) 13. Abdullah, A., Deris, S., Anwar, S., Arjunan, S.N.V.: An evolutionary firefly algorithm for the estimation of nonlinear biological model parameters. PLoS One 8(3), e56310 (2013) 14. Lee, C.H., Kuo, C.T., Chang, H.H.: Performance enhancement of the differential evolution algorithm using local search and a self-adaptive scaling factor. Int. J. Innov. Compu. Inf. Control 8(4), 2665–2679 (2012) 15. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. Society for Industrial and Applied Mathematics, Philadelphia (2000) 16. Posik, P., Klems, V.: Benchmarking the differential evolution with adaptive encoding on noiseless functions. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation GECCO ’12, pp. 189–196. ACM, New York, NY, USA (2012) 17. Vogli, C., Piperagkas, G.S., Parsopoulos, K.E., Papageorgiou, D.G., Lagaris, I.E.: Mempsode: an empirical assessment of local search algorithm impact on a memetic algorithm using noiseless testbed. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation GECCO’12 (2012) 18. Zhang, J., Sanderson, A.C.: Jade: adaptive differential evolution with optional external archive. IEEE Trans. Evol. Comput. 13(5), 945–958 (2009) 19. Tanabe, R., Fukunaga, A.: Tuning differential evolution for cheap, medium, and expensive computational budgets. In: 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 2018–2025 (2015) 20. El-Abd, M., Kamel, M.S.: Black-box optimization benchmarking for noiseless function testbed using particle swarm optimization. In: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference. GECCO ’09, pp. 2269– 2274 (2009)

Strided Convolution Instead of Max Pooling for Memory Efficiency of Convolutional Neural Networks Riadh Ayachi1(&), Mouna Afif1, Yahia Said1,2, and Mohamed Atri1 1

Laboratory of Electronics and Microelectronics (ElE), Faculty of Sciences of Monastir, University of Monastir, Monastir 5000, Tunisia [email protected] 2 Electrical Engineering Department, College of Engineering, Northern Border University, Arar, Saudi Arabia

Abstract. This paper describes a new optimization technique to perform an embedded implementation of convolutional neural networks (CNN). In this case, only the inference of convolutional neural networks is discussed. As known that both pooling layer and strided convolution can be used to summarize the data. So, the proposed technique aims to replace only max pooling layers by a strided convolution layers using the same filter size and stride of the old pooling layers in order to reduce the model size and improve the accuracy of a CNN. Also, pooling layer is parameter less. However, convolution layer has weights and biases to optimize. Then, the CNN can learn how to summarize the data. By replacing max pooling layers with strided convolution layers enhance the CNN accuracy and reduce the model size. This technique is proposed in order to build a CNN accelerator for real time application and embedded implementation. The proposed optimizations are applied on some state-of-the-art CNN models and the obtained results are compared with the original ones. The proposed optimization is demonstrated for reducing the memory occupation of the model and achieving accuracy enhancement. The proposed technique enables possibility of the implementation of the convolutional neural network models in embedded systems. Keywords: Deep learning  Convolutional neural networks Strided convolution  Memory efficiency



1 Introduction A simple fast glance at an image is sufficient for a human to analyze and describe an immense amount of details about the visual scene [1]. However, this is a very hard task for a computer and needs a lot of computation resources and effort. But deep learning brings intelligence to computers. The deep learning is based on artificial neural networks that is simulated from human brain. In particular convolution neural networks are discussed. Thus, CNNs are the most used deep learning models in computer vision tasks such as image recognition [19, 20], object detection [17, 18] and natural language processing [21]. © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 234–243, 2020. https://doi.org/10.1007/978-3-030-21005-2_23

Strided Convolution Instead of Max Pooling for Memory

235

The main goal of this work is to replace only max pooling layers with strided convolution layers in existing CNN model for memory efficiency of an embedded implementation and not generating new CNN model. This work, introduce convolution neural networks and why they are the most used for computer vision tasks. Then, the proposed approach is applied in some state-of-the-art models to approve its efficiency and it can be applied in all CNN models to reduce the memory occupation size and enhance the accuracy. The proposed approach is a feature key for the implementation of deep learning models in embedded systems that are equipped with a limited on-chip memory and a limited computation resource. In the second section of this paper, a brief introduction about the fundamental of convolutional neural networks (CNN) is provided. Related works on convolution methodology in CNNs are presented in Section three. The proposed optimization is explained in detail in the fourth section. in the fifth section, the proposed approach is experimented on some of state-of-the-art models to approve its efficiency. Finally, this paper is concluded and future works are mentioned.

2 Convolutional Neural Networks CNN CNNs [5] are the deep learning models inspired by simulating the human brain system. They are introduced by Kunihiko Fukushima in 1980 [6] and improved by Yann LeCan et al. in 1998 [7]. CNNs are feed forward, sparsely connected neural networks with multiple layers. The neurons of the same layer are not connected to each other. a hidden neuron is only connected to a local patch in its lower layer. The Input and the output of a CNN layer is called feature map. The weights connected to the patch are called filter. This is also inspired by biological systems, where a cell is sensitive to a small subregion of the input space, called a receptive field. Many cells are tiled to cover the entire visual field. Also, it allows hidden neurons at different locations to share the same weights. This indicates that a visual pattern (e.g. a particular edge or pattern) captured by a filter may appear at different locations and has translation invariance. That leads to understand the importance of CNNs in computer vision tasks. A CNN is composed from 6 types of layers: input layer, convolutional layers, non-linearity layers, pooling layers, fully connected layers and output layer. A simple CNN illustration is shown in Fig. 1. The input feature map is an image that can be a gray or a color (RGB) image. The output feature map provides a prediction on the input image. The convolutional layer is the result of the convolution of the input feature map and a 2D filter. A simple illustration is provided by Fig. 2. The K convolutional layer convert nk1 2D input feature map into mk1 output 2D feature map. The output result ykj is the sum of all input neurons xki convolved with the kernel connection weights wkij and reconfigurable bias terms bkj . the process is resumed in the Eq. 1. ykj ¼

X ðxki  wkij Þ þ bkj i

ð1Þ

236

R. Ayachi et al.

Fig. 1. Convolutional neural network

Fig. 2. Convolutional layer

The non-linearity layer is just a point-wise nonlinear function applied to each entry of the feature map. There are multiple choices for an activation function such as the rectified linear unit (ReLU) and tanh. Hinton et al. [6] proved that the use of the ReLU is better than other activation functions. ReLU makes the responses of neurons sparse and robust to data corruption. Even if an image is heavily corrupted, it will not generate large negative values in the output of feature maps. The ReLU expression is presented by Eq. 2.   ReLU netj ¼ maxð0; netÞ

ð2Þ

The pooling layer is generally used for dimensionality reduction to facilitate feature maps manipulation. In addition, it was used to reduce the number of parameter to learn and avoid overfitting. The pooling layer is illustrated by the Fig. 3. The pooling layer can be used to decrease the resolution of feature maps to make the output feature maps less sensitive to input distortions. The most used pooling type in CNN models is Maxpooling and average-pooling. After several layers of convolution and pooling, a pixel on a feature map corresponds to a larger receptive field in the input image. Therefore, local features are extracted from bottom convolution and pooling layers and global features are extracted in higher fully connected layers.

Strided Convolution Instead of Max Pooling for Memory

237

Fig. 3. Max pooling layer

The fully connected layer is a vector with a dimension were every neuron of this layer is connected to all the neurons of the previous layer. Fully connected layers are used to learn global feature from the image and contains the biggest number of parameters in the network. The output layer is fully connected layer that provides prediction about the hole input image or a region from the image. If dealing with classification problem, the output is the image class. The softmax function was used to provide each class probability. If the output is a high dimension vector, a regression problem is solved using the linear regression function to provides outputs.

3 Related Works Convolutions have been used in artificial neural networks for at least 25 years. In the late 1980s, LeCun et al. [7] introduced a popular CNNs for digit recognition applications. In convolutional neural networks, 3D filters are used. The filter parameters are 3, height, width, and the channels as the most important dimensions. When filters are applied in CNN models to images, the channels number in their first input layer is 3 (RGB), and in each subsequent layer Li the filters have the same number of channels as Li−1 has filters. The early work by LeCun et al. in [7] used 5  5  Channels filters. Recently, in the VGG Net [3] architectures they use 3  3 filters. Also, there are other models such as Network-in-Network architecture [8] and the GoogLeNet family [2] architectures, they use 1  1 filters in some layers. A lot of works focused on the design space exploration of CNNs and on developing automated approaches for finding a CNN architecture that deliver the highest accuracy and the minimum of the model complexity by reducing the number of parameters. These automated approaches include Bayesian optimization [9], simulated annealing [11], randomized search [12], and genetic algorithms [10]. To their credit, each of these optimizations provides an architecture in which the proposed approach produces a new convolutional neural network architecture that achieves higher accuracy compared to an old representative baseline. Albelwi et al. [13] proposed the use of the Nelder–Mead Algorithm (NMA) to build an automated framework which used to determine the optimal hyper parameters of a CNN. The proposed framework can

238

R. Ayachi et al.

increase the network depth, shrink the kernels size and the pooling stride. Becherer et al. [14] proposed the parameters fine tuning technique to enhance the performance of CNNs. The proposed technique proves that fine tuning the parameters is better than using random parameters. This paper introduces a new optimization to be applied on existing CNN models to reduce the model size and enhance the accuracy.

4 Efficiency of Strided Convolution Instead of Max Pooling As mentioned above that pooling layers are used to reduce dimensionality. The proposed optimization aims to replace only max pooling layers with convolutional layers using the same pooling stride and kernel size. Thus, strided convolutional provides the same dimensionality reduction of the max pooling. In this section, the proposed optimization technique is detailed. Assuming that a pooling function p applied on a feature map f with 3 dimensions (w, h, n) where w is the width, h is the height and n is the number of channels. The pooling function (p-norm) with pooling size k and stride s applied on feature map f is p(f). it is a 3-dimension array presented by Eq. 3. pi;j;u ð f Þð

Xk

Xk h¼0

1

w¼0

jfg ðh; ; w; i; j; uÞjd Þd

ð3Þ

The mapping function g from positions in p to f with respect to the stride is calculated as Eq. 4 gðh; w; i; j; uÞ ¼ ðs  i þ h; s  j þ w; uÞ

ð4Þ

The p-norm order d defines the pooling method. If d ! 1 then the max pooling was performed. If the stride s is greater than the kernel size k than pooling regions will not overlap but in typical architecture, pooling regions can be overlapped by using a stride equal to 2 and a kernel size of 3. Now assuming that a convolution function c applied on a feature map f with 3 dimensions (w, h, n). the convolution function applied to f c(f) is given as Eq. 5 ci;j;n ð f Þ ¼ rð

Xk h¼0

Xk w¼0

Xm u¼1

hh;w;u;n fg ðh; w; i; j; uÞÞ

ð5Þ

Where h are the kernel or the convolution weights, rð:Þ is the activation function and n 2 [0, m] is the number of output feature maps of a convolution layer. By comparing the convolution function with the pooling function, the convolution function is the general representation of the pooling function where both depends on the same element from the previous feature map. As a result, the pooling function is a feature-wise convolution where the activation function is equal to the p-norm. The feature-wise convolution is a convolution function where h = 1 if n = u and h = 0 otherwise. Pooling layer is used for some reasons. First it was used because it makes the feature map responses more invariant against data corruption. Second, it is used to

Strided Convolution Instead of Max Pooling for Memory

239

reduce the feature map dimension. Third it makes loss function optimization easier because there are no parameters to learn as opposed to convolutional layer where features are mixed. Assuming that pooling layer is used for dimension reduction, then replacing max pooling with convolution can be performed without any loss in the spatial dimension reduction. The proposed approach aims to replace max pooling layers with convolution layers using the same pooling configuration (pooling size and stride) and producing the same number of output feature maps. To prove the efficiency of the proposed approach. Assuming that all the filter connection weights are units and a feature map of 4  4 dimension. If max pooling is applied (see Fig. 4) with stride 2 and a 2  2 kernel. If choosing the max scale of the 2  2 submatrix to replace the hole matrix, the output matrix is a 2  2. But if convolution operator is applied (see Fig. 5) on the same feature map, the convolution is calculated instead of choosing the max scale and the same dimension of the output matrix is guaranteed.

Fig. 4. Max pooling example

Fig. 5. Convolution example

Down sampling is an important step in CNN. Kong et al. [15] proved that dawn sampled models can achieve better performance and trained faster. As mentioned above, it can be performed using pooling layer or a strided convolution layer. Every layer type has its pros and cons. Pooling layer is parameter less so the training and maybe the inference can be faster compared with strided convolution that needs to optimize parameters in the retro propagation process, whereas the pooling operations only reroute the gradient. Also, pooling is better used in tasks like semantic segmentation because it needs the up sampling which is based on the inverse of pooling that recovers the special information lost by pooling. The pooling layer is designed to lose the spatial information. It was used to determine where is the object in the image

240

R. Ayachi et al.

whatever its position. Unlike strided convolution that can be very useful if the positional information is important in the desired task. In an embedded implementation using strided convolution is better choice than max pooling. In this case, the inference of the CNN was discussed. First, the most of CNN designs are an alternative of layers between convolution layers and max pooling layers. Usually, average pooling is placed just before the fully connected layers. So, replacing max pooling layers by strided convolution layers make it easier to build a perfect hardware engine based on a single functionality which is convolution. In case of using average pooling layer, it can be executed in the software engine since fully connected layers are executed using the software engine because of the huge usage of the external memory used to store the parameters. second, because of the uniqueness of the hardware engine functionally, using strided convolution enables the data reuse technique to reduce the routine between the external memory and the hardware engine resulting a time, energy and memory occupation reduction. Third, using strided convolution can speed up the CNN inference by using fast convolution algorithms.

5 Experiment and Results In this section, the proposed optimization was applied on some state-of-the-art deep learning models to prove its efficiency, 3 deep learning models were chosen. All tests are performed using the HP station Z840 with an Intel Xeon E5 processor and a general-purpose graphical processing unit (GPGPU) Nvidia Tesla K40c. also all the models are derived from the publicly available C++ Caffe framework. The ImageNet dataset was used for training and testing. Starting with VGG [3], one of the most known deep learning model from Oxford university that won the localization challenge in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014. It obtained the classification error rate of 7.325% with an ensemble of multiple CNNs. VGG has evaluated multiple configurations of network structures on ImageNet and top-5 error rates of using single CNN are shown. The number of weighted layers (i.e. convolution layers and fully connected layers) ranges from 11 to 19. When the number of weighted layers was increased from 11 to 16, the top-5 error rate was reduced from 10.1% significantly. There was little further improvement when the number of weighted layers was increased to 19. VGG adopted a small filter size (i.e. 3  3). It was assumed that a large filter (e.g. 5  5) could be approximated with two layers with a small filter size (i.e. 3  3) and a small number of weights. The second model is GoogleNet [2], won both the image classification and object detection tasks without side training data in ILSVRC2014. It cascades inception modules and made the CNN very deep (with more than 20 layers). Supervised signals are directly added at multiple layers, instead of only at the top conventional layer, in order to more effectively propagate errors. It reduced the top-5 classification error rate to 6.656% in ILSVRC 2014. In order to effectively increase the depth of the neural network, Google Net introduced inception modules and stack them upon each other. In CNN, all the filters in a convolution layers have the same filter size. In order to make use of the sparse structure of CNN in an optimal way, a convolution layer is replaced

Strided Convolution Instead of Max Pooling for Memory

241

with an inception module, which includes a set of filters of mixed sizes. The feature maps generated by filters of different sizes are concatenated and are used as the input of the next inception module. The last model is squeeze Net [16], a deep learning model optimized for embedded implementation. Squeeze Net begins with a simple convolution layer, followed by 8 Fire modules, ending with a final convolution layer followed by a global average pooling layer. Squeeze net achieve a 50X reduction in the parameters number and the model size compared to Alex Net, while meeting or exceeding the top-1 and top-5 accuracy of Alex Net. It achieves a top-5 error of 15.66% in the ILSVRC 2012 classification challenge. The proposed approach was applied on the chosen models. For VGG and GoogleNet all max pooling layers are replaced with strided convolution layers and for squeeze Net max pooling layers of the fire modules are replaced but the average pooling layer is kept. The training process is slower and models need more 10–12% multiplication accumulation operations but the main contribution of this optimization is the inference of the CNN. The new sizes are represented in Table 1. For some models, the embedded implementation is impossible even after applying the optimization because of the model topology that provides a big number of parameters. But the proposed approach was provided by reducing the size of any CNN based model. Especially the SqueezeNet gets significant reduction in model size and that leads to fit the model in an embedded system with limited memory like FPGA. Table 1. The model’s sizes before and after applying the proposed approach Network VGG net Google net Squeeze net Original model size 528 MB 51.1 MB 4.7 MB Size after our approach 493 MB 42.6 MB 3.2 MB

Another benefit of replacing max pooling by strided convolution, is the enhancement of the accuracy by 1.5% to 1.9% for the tested models as shown in Table 2. This enhancement caused by the new parameters’ representation learned by the overlapped strided convolutions. Table 2. Top-5 error reduction after replacing max pooling with strided convolution Network VGG net Google net Squeeze net Original top-5 error (%) 8.1 9.2 19.7 New top-5 error (%) 6.6 8.7 17.8

The proposed optimization technique is proved after tests. it achieves a significant optimization in model size. In addition, the accuracy of the models was enhanced. So, the proposed technique can be used enhance a wide range of embedded applications.

242

R. Ayachi et al.

6 Conclusion CNN is the most used deep learning model in computer vision application. it needs a lot of optimization spatially for a low power embedded implementation. The proposed approach proves that the model size can be reduced to facilitate embedded implementation. In most cases real time applications are. So, a lot of other optimization can be applied to make convolutional neural networks fit in embedded platforms with limited memory. One of the important optimization is provided in this paper. The proposed optimization lead to significant reduction of the model size and the top-5 error. Thus, CNNs are enhanced with more optimizations by applying techniques enabled by this optimization like the data reuse technique and the fast convolution algorithms.

References 1. Fei-Fei, L., Iyer, A., Koch, C., Perona, P.: What do we perceive in a glance of a real-world scene? J. Vis. 7(1), 10 (2007) 2. Szegedy Google Inc. Wei Liu University of North Carolina, Chapel Hill Yangqing Jia Google Inc. Pierre Sermanet Google Inc. Scott Reed University of Michigan Dragomir Anguelov Google Inc. Dumitru Erhan Google Inc. Vincent Vanhoucke Google Inc. Andrew Rabinovich Google Inc. Going deeper with convolutions Christian 3. Karen Simonyan Andrew Zisserman Visual Geometry Group, Department of Engineering Science, University of Oxford. Very Deep Convolutional Networks for Large-Scale Image Recognition 4. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size 5. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998) 6. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS’2012 (2012) 7. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L. D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989) 8. Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., Zhang, Z.: Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems (2015). arXiv:1512.01274 9. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: NIPS (2012) 10. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Neurocomputing 10(2), 99–127 (2002) 11. Ludermir, T.B., Yamazaki, A., Zanchettin, C.: An optimization methodology for neural network weights and architectures. IEEE Trans. Neural Netw. 17(6), 1452–1459 (2006) 12. Bergstra, J., Bengio, Y.: An optimization methodology for neural network weights and architectures. JMLR (2012) 13. Albelwi, S., Mahmood, A.: Automated optimal architecture of deep convolutional neural networks for image recognition. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), December 2016, pp. 53–60. IEEE

Strided Convolution Instead of Max Pooling for Memory

243

14. Becherer, N., Pecarina, J., Nykl, S., Hopkinson, K.: Improving optimization of convolutional neural networks through parameter fine-tuning. Neural Comput. Appl., 1–11 (2017) 15. Kong, C., Lucey, S.: Take it in your stride: do we need striding in CNNs? (2017). arXiv preprint arXiv:1712.02502 16. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size (2016). arXiv preprint arXiv:1602.07360 17. Gazzah, S., Mhalla, A., Essoukri Ben Amara, N.: Vehicle detection on a video traffic scene: review and new perspectives. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), Hammamet, pp. 448–454 (2016). https://doi.org/10.1109/setit.2016.7939912 18. Dahmane, K., Amara, N.E.B., Duthon, P., Bernardin, F., Colomb, M., Chausse, F.: The Cerema pedestrian database: a specific database in adverse weather conditions to evaluate computer vision pedestrian detectors. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), December 2016, pp. 472–477. IEEE 19. Trimech, I.H., Maalej, A., Amara, N.E.B.: 3D facial expression recognition using nonrigid CPD registration method. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), December 2016, pp. 478– 481. IEEE 20. Jaouedi, N., Boujnah, N., Htiwich, O., Bouhlel, M.S.: Human action recognition to human behavior analysis. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), December 2016, pp. 263– 266. IEEE 21. Jdira, M.B., Imen, J., Kaïs, O.: Study of speaker recognition system based on feed forward deep neural networks exploring text-dependent mode. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (2016)

Ear Recognition Based on Improved Features Representations Hakim Doghmane1(&), Hocine Bourouba1, Kamel Messaoudi2, and El Bey Bournene3 1

Université 8 Mai 1945 Guelma, BP. 401, 24000 Guelma, Algeria [email protected], [email protected] 2 Mohamed Cherif Messaadia University, 41000 Souk-Ahras, Algeria 3 LE2I Laboratory, Burgundy University, BP 47 870 Dijon, France

Abstract. This paper presents an ear recognition framework based on an improved representation of multi-bag of visual features model, which relies on significantly Binarized Statistical Image Features, clustering algorithm and Spatial Pyramid Histogram decomposition method. The following steps can enhance the recognition accuracy. Firstly, the Binarized Statistical Image Features is used to capture the texture information in ear image. Secondly, the multi bag-of visual features dictionary is learned from the training image responses in the feature space, using K-means algorithm. Thirdly, the spatial pyramid histogram of horizontal decomposition is applied to obtain local ear feature descriptors. Next, the histograms obtained are normalized. Then, the global representation of the ear image is obtained by concatenating all histograms calculated at each level. After that, the discriminant representation of ear image is constructed, using kernel Fisher discriminant analysis. Finally, the k-nearest neighbor and the support vector machine classifiers are used for ear identification. The experimental results achieve average rank-1 recognition accuracy of 97.81%, 97.91% and 99.20%, respectively, in the IIT-Delhi-1, IITDelhi-2 and USTB-1 publicly available database. This shows that the proposed approach provides a significant improvement performance over the state-of-theart in terms of accuracy. Keywords: Ear recognition  Spatial pyramidal histogram (SPH) Multi bag of features  KNN  SVM



1 Introduction According to their physiological or behavioral characteristics, Biometrics deals with an automatic recognition of individuals, where any biometric system can be operated in two ways: verification or identification stage. Recently, the human ear as an emerging class of biometrics has attracted the attention of researchers due to its stable structure, age invariant and its shape does not vary with changes in facial expression, unlike face biometrics [1]. However, variation of pose and occult objects such as hair, earrings and earphones are the main factors that affect the quality of ear images [2]. Therefore, it is preferable to develop a system capable of dealing with this type of problem. © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 244–260, 2020. https://doi.org/10.1007/978-3-030-21005-2_24

Ear Recognition Based on Improved Features Representations

245

Usually, the human ear recognition system consists of three stages: the automatic ear localization, the features representation and classification. The first stage is to detect and locate the ear from the profile face image. Different ear detection techniques of 2D image have been proposed in the literature. In [3], a method of ear localization for an arbitrary background image has been proposed. It is based on the extraction of edges from the outer helix of the ear image, using the Canny edge detector. Based on the AdaBoost algorithm and the Haar features, an automatic ear detection approach has been proposed [4]. It is very fast in terms of computational cost. In addition, it is robust to some occlusions and rotation variations. An automated 2D ear detection under complex background has been proposed in [5], using the asymmetric Harr for the features representation. It has two steps: cascaded classifier training (Offline step) using GentleAdaBoost algorithm, and ear detection (On-line step). The second step is based on the construction of the feature vector. In the recent decades, many feature extraction techniques have been proposed. These techniques can be organized in three categories: global, local and hybrid techniques. The first category exploits the entire ear image. This category of techniques is very sensitive to variations in illumination and pose. Hence, a preprocessing step is desired [6]. Moreover, it contains different methods of linear and non-linear projection. In [7], the Principal Component Analysis (PCA) and Independent Component Analysis (ICA) subspace methods are applied on the USTB-1 ear database. It shows that the recognition rates obtained by using the PCA method are lower than those obtained by the ICA method. The goal of the second category is to extract the local features, which are more discriminating and invariant for certain acquisition conditions. The local texture descriptors have been suggested for ear recognition system [8]. The performances obtained by Binarized Statistical Image Features (BSIF) descriptor have been enhanced compared to those obtained by Local Binary Pattern (LBP) and Local Phase Quantization (LPQ). Furthermore, it has been shown that the human ear can be composed of two regions: rigid and semi-rigid [9]. As a result, using only the first region improves the recognition accuracy. The Speeded Up Robust Features (SURF) transform is used to extract the robust features for ear image [10], which are invariant to scale variation and other geometrical transforms. Based on multi-scale analysis framework a novel feature representation MS-BSIF (Multi-scale Binarized Statistical image features) is proposed, in which the feature extraction of the BSIF descriptors is extended to the multi-scale space [11]. The third category is based on the combination of the two categories described above. The principle idea of this category lies in the extraction of feature vectors, using local techniques and reducing their dimensionality by one of the previous subspace methods. In [12], the feature vector is constructed by the local gray level orientation approach and the Sparse Representation Classification (SRC) algorithm is used for classification step. A new hybrid method has been suggested in [13], using the SURF transform with the Linear Discriminant Analysis (LDA) subspace method. Then, the neural network is used for the classification stage. Other hybrid techniques were described and evaluated in [14].

246

H. Doghmane et al.

In this paper, a novel representation for 2D ear recognition is proposed based on Improved Multi-Bag-of-Features Histograms (IMBFH). They are built using the following steps: (i) extract local texture image, (ii) quantized the descriptor images, using unsupervised clustering algorithm, and (iii) spatial pyramid histogram (SPH) decomposition. Then, the IMBFH features are projected using the Kernel Fisher Discriminant Analysis (KFDA) [15] in the KFDA subspace. This projection provides Discriminant IMBFH (D-IMBFH) feature vectors characterized by more relevant information and smaller dimensions. The proposed method requires the following steps: First, to reduce the effect of varying lighting and noise, a pretreatment step is applied to the raw images. Second, the multi bag-of-feature dictionary is learned from the training image responses of BSIF filter, using K-means algorithm. After that, the labeled images are constructed for training and testing sets. Third, the spatial pyramid histogram of horizontal decomposition is applied to labeled images. Next, the histograms obtained are normalized. Then, the global representation of the ear image is obtained by concatenating all the local feature descriptors. Afterwards, the discriminant representation of ear image is constructed, using kernel Fisher discriminant analysis (KFDA). This projection allows providing the Discriminant Improved Mulit-Bag-of-Feature Histograms (D-IMBFH) feature vectors, which are ensured with small dimensions. This makes it possible to minimize the computation cost during the classification step. Finally, the obtained D-IMBFH feature representation is used for the classification stage. The rest of the paper is organized as follows: Sect. 2 describes the main of the proposed feature extraction approach. Then, Sect. 3 reports and discusses the results of ear identification experiments. Finally, Sect. 4 presents certain conclusions and future works.

2 The Proposed Method In this part of the present work, we describe our ear representation system. This paper proposes a novel ear representation method which explores not only the local texture property, but also the microstructure information among different image patches. Thus, it increases the descriptive power of the ear representation and further improves the ear recognition rate. Therefore, this paper presents a new attempt to combine the BSIF descriptor, Multi Bag of Features (MBF) model and Spatial Pyramid Histogram (SPH) method. The implementation scheme of the proposed method is based on five stages, as shown in Fig. 1: (i) preprocessing step, (ii) feature extraction using BSIF filter, (iii) Constructing the visual words, (iv) Quantizing descriptors, using visual words, (v) histogram pooling using spatial pyramid histogram and normalization. Then the global histogram representation is fed into a classifier for ear recognition. As shown in the Fig. 1, our approach consists mainly of three phases: • Learning phase • Testing phase • Matching phase

Ear Recognition Based on Improved Features Representations

Test ear image

Preprocessing step

Extract local descriptors

Online test phase

Offline training phase

Data set for offline training

247

Using K-means algorithm to construct the visual vocabularies v{i}=[v1, ..., vk]

Assigning the local descriptors to visual word from v{i}

Construct label image

Spatial Pyramid Histogram

KFDA

K-NN or SVM classifier

Identification result

Feature Vectors Database

Fig. 1. Synoptic outline of proposed method.

2.1

The Learning Phase

This phase is performed in offline mode. As shown in Fig. 2, the ear images of the learning sets will be pre-processed by the Median filter, and then normalized to zero mean and unit standard deviation. Next, the BSIF filter is used to extract the local texture information. After that, The obtained images will be quantified by the K-means algorithm, in order to build the visual words of different subject classes. Afterwards, the labeled images of the training set are constructed by assigning the descriptors to visual words. Thereafter, the process of the spatial pyramid histogram with levels L of the horizontal decomposition is constructed. All histograms obtained for each level l (l = 0, …, L) are normalized in the range [0, 1], and concatenated together into a large global feature histogram. To further reduce the high dimensionality of a large global

248

H. Doghmane et al.

feature histogram and make these features more discriminant to enhance the discriminating ability of the proposed method, the Kernel Fisher Discriminant Analysis (KFDA) technique is applied. Therefore, just a few features are used to reduce computational costs while ensuring high recognition performance. The following subsection describes the main steps of the proposed method to explain its principle. The BSIF filter The binarized statistical image features (BSIF) is a local image descriptor used to transform an image with a linear filters to extract the texture information [16]. these filters have two parameters l (window size of filter) and n (bit string). The BSIF filters are learned from 13 natural images using ICA method. Figures 3 and 4 illustrates the BSIF filters and their ear image responses respectively obtained with l = 11, n = 12. The filter responses Si of an image patch X of size l  l is obtained by: X Si ¼ W ðu; vÞX ðu; vÞ ¼ wti x ð1Þ u;v i Where the vector w and x contain the pixels of Wi and X respectively. (.)t denotes the transpose of the vector (.) and Wi is the ith learnt filter. The binarized feature responses bi are obtained by setting bi = 1 if Si > 0 and bi = 0 otherwise. The BSIF image response r is built, using the following binary coding: r¼

Xn1 i¼0

bi 2i

ð2Þ

The multi-bag of features histogram representation The bag of word is one of the most used methods to represent the content of an image [17]. This approach has been used much more for categorization of text document, and texture classification. In computer vision, the bag of word approach treat an image as a histogram of local feature (visual words) [18–20]. This approach is mainly based on the clustering of the local descriptors in the feature space by using the clustering algorithm such as K-means algorithm. The proposed approach can be defined as follows: Given a training dataset X = [x1, x2, …, xN]t 2 RNxd , where N is the number of pixels of each subject class in the training set, and xi is a d dimensional BSIF features vector of pixel i, in which all the raw images of the database are filtered by the median filter and normalized with zero mean and unit standard deviation. From the training set, the unsupervised learning algorithm such as k-means is used to learn the dictionary of BOF for each subject class C denoted by W = [v1, v2,…, vK]t 2 RKxd , where vj (j = 1, …, K) is a visual word, and K is the cluster number.

Ear Recognition Based on Improved Features Representations

249

Fig. 2. Application scheme for ear identification based on the D-IMBFH representation scheme: (a) ear image, (b) BSIF filter, (c) BSIF image responses, (d) Construction of visual words dictionaries, (e) Labeled images, and (f) D-IMBFH feature.

Fig. 3. Learned BSIF filters of size 11  11  12.

The dictionary W should be more compact than X (K  N). By using the K-means algorithm for each subject class k = 1, 2, …, C, the visual dictionary Wk is learned, then all visual dictionary are concatenated into one dictionary X = {W1, W2, …, WC}. After that, the label images are constructed from the separate dictionaries X for the training and testing set. Some of label images are illustrated in Fig. 5.

250

H. Doghmane et al.

Fig. 4. Some BSIF image descriptors.

According to the Euclidean distance, the label image is obtained by labeling each pixel with the closest visual word in the dictionary:    T ðxi Þ ¼ arg min d xi ; vj for : i ¼ 1; . . .; N and j ¼ 1; . . .; K

ð3Þ

Where d(xi,vj) is a Euclidean distance between the BSIF features xi of pixel i and the jth visual word vj in the dictionary. T(xi) is the label image (or the visual word label) for pixel i.

Fig. 5. Some of label ear images.

Ear Recognition Based on Improved Features Representations

251

After that, the spatial pyramid histogram of horizontal decomposition is used to construct a histogram at each region of the label image. A visual word probability distribution histogram of each region is calculated by the following equation: hð j Þ ¼

1 XL d½T ðxi Þ; j i¼0 L

for : j ¼ 1; . . .; K

ð4Þ

Where L is the number of pixels in the each region of label image, h is the normalized visual word histogram and d(.) is defined as:  d½T ðxi Þ; j ¼

1 if T ðxi Þ ¼ j 0 otherwise

ð5Þ

The final histogram H is the result of the concatenation of all histograms obtained at each region. h i H ¼ hð1Þ ; hð2Þ ; . . .; hðwÞ

ð6Þ

where h(B) is the normalized histogram vector of KxC dimensional at region B (B = 1, …, w). The spatial pyramid histogram In order to introduce the spatial information for the multi bag-of-feature, the spatial pyramid histogram (SPH) approach [21] is used. The process of building the SPH for the level L is constructed as follows: first, the histogram at level 0 from the entire label image is computed. Then, the label image is divided in two equal regions at level 1, using horizontal decomposition and the histogram is calculated for each region. The process is repeated by recursively subdividing each region at level l and computing histograms in each region until the desired level L is reached. Therefore, a (2L+1−1) histograms are obtained for l = 0, …, L. Finally, all these histograms are concatenated together into a large vector. 2.2

The Testing Phase

This phase is performed in online mode. The preprocessing and the feature extraction steps in the testing phase are those that are performed during the learning phase. The label test images will be constructed by assigning the descriptors test images to visual words generated in the learning phase. Then, the spatial pyramidal histogram of horizontal decomposition is applied, allowed us to construct a sequence of histograms, that will be concatenated together to form the final feature vectors which will be the representative of the test ear image (see Fig. 7). 2.3

The Matching Phase

Given a new query (test) ear image Y. At first, we compute the proposed representation. Then, the obtained feature vector of the input ear is matched with all the stored

252

H. Doghmane et al.

templates and the most similar one is taken as the matching result. For the matching phase: Three cases are considered (Fig. 6):

Lev0

Lev1

Lev2

Lev3

Fig. 6. The principle of horizontal decomposition into sub-regions.

• A nearest neighbor classifier is used with Chi-square distance d(.) to calculate the distance between two histograms of IMBFH feature vectors. The Chi-square distance between two histograms H1, and H2 of length N, is defined as:   XN ðH 1 ðlÞ  H 2 ðlÞÞ2 dv 2 H 1 ; H 2 ¼ l¼1 H 1 ðlÞ þ H 2 ðlÞ

ð7Þ

• A nearest neighbor classifier is used with cosine similarity S(.) to calculate the distance between two D-IMBFH feature vectors. The similarity S(x, y) between two features vectors x and y of length N is defined as: Sðx; yÞ ¼

xy kx 2 kky 2 k

ð8Þ

• A SVM classifier is used with Radial Basis Function (RBF) kernel of IMBFH feature vectors.

3 Experimental Results and Discussions In this section, the proposed method is extensively evaluated to demonstrate its effectiveness. The next group of experiments uses the, IIT-Delhi-1, IIT-Delhi-2 [22] and USTB-1 databases [23]. All these databases are collected from different devices

Ear Recognition Based on Improved Features Representations

253

used for capturing the ear images with different illumination conditions, translation, rotation and resolution variations. These conditions lead us to evaluate the performance of the proposed method in various environmental conditions.

Label image

.......

Histogram at level L

Histogram at level

Histogram at level

Spatial pyramidal Histogram decomposition

Histogram Concatenation Global Histogram Fig. 7. The final histogram obtained from one label image.

It should be noted that, the important parameters of the proposed method are empirically proved to achieve the maximum recognition rate. The proposed approach depends on two parameters: the cluster number (K) of the K-means algorithm and the level (L) of the spatial pyramid histogram of horizontal decomposition. Consequently, the cluster number K is changed from 100 to 500 with a fixed level L = 0 are evaluated only for IIT-Delhi-1 database, to determine the optimal value of K for the first case of experiments. The second case consists of varying the level L using the best value of K obtained in the first case. The last one is carried out to compare the proposed method using the best values of K and L with some existing ear image descriptors. In all experiments: • Three permutations are conducted and then, the average rate is reported. • The BSIF filter of size 11  11  12 is used. • Two situations are investigated: the first use two images per subject as the training set, and the remaining as the testing set. The second use only one image per subject as the training set, and the remaining as the testing set. It should be noted that in all tables, the highest Identification Rate (IR) is listed in bold type.

254

3.1

H. Doghmane et al.

Experimental Results in the IIT-Delhi Database

The Indian Institute of Technology Delhi (IIT-Delhi) ear database have two versions [21]: the first (IIT Delhi-1) containing 493 ear images acquired from 125 different subjects. The second version (IIT-Delhi-2) contains 793 ear images captured from 221 different subjects. For the both versions (Fig. 8):

Fig. 8. Certain normalized ear images of IIT Delhi database.

• • • •

Each subject has at least three ear images. The resolution of each ear image is 272  204 pixels. Significant scale, translational and rotational variations. All subjects are in the age group 14–58 years.

In addition, the version of normalized and cropped ear images of 180  50 pixels is provided with the original images. IIT-Delhi-1 database In order to find the optimal parameters that yield the best results, we started by exploring the value of the cluster number K in experiment #1, and the level parameter L in experiment #2. Experiment #1 The first experiment is conducted only in the IIT-Delhi-1 database to explore the effect on recognition performance of parameter K to find its optimal value. In the first case, we use two ear images from for training and the remaining are used as test images to evaluate the recognition performance of the proposed method. In the second, only one image is used for training set and the remaining are used for testing set. The results obtained in Figs. 9 and 10, show the influence of the choice of parameter K on the recognition system for the D-IMBFH and IMBFH feature vectors. For a fixed level L = 0, the recognition rate is incremented slowly proportionally with the cluster number K. The highest recognition rate obtained in Fig. 9 is of 97.53% with K = 300 for D-IMBFH feature.

Ear Recognition Based on Improved Features Representations

255

Fig. 9. Ear recognition rate using two images per subject in the training set.

From the Fig. 9, it can be seen that the D-IMBFH representation gives the best recognition rate comparatively to IMBFH for the two classifiers KNN and SVM. Figure 10 shows the limit of the KFA reduction method since, we do not have the intra-class information due to the use of a single image per subject class. The recognition rate of the D-IMBFH representation gives a low recognition rate compared to the IMBFH without reduction. Experiment #2 In this case, we study the influence of the level L of the spatial pyramid histogram method on the ear identification system. We use the optimal value K = 300 obtained in previous experiment, and varying the level L from 0 to 3, to evaluate the performance of the proposed method. Tables 1 and 2 compare the recognition rate results using one and two images in the training set respectively. As can be seen from these tables the best recognition rate for all methods is obtained with level L = 3 of horizontal decomposition. These results prove the effect of the spatial pyramid histogram (SPH) method, which allows to capture the spatial information of the local patterns.

Fig. 10. Ear recognition rate using one image per subject in the training set.

256

H. Doghmane et al.

Table 1. The effect of decomposition level (L) on accuracy rates (one image per subject). Level L IMBFH + K-NN D-IMBFH + K-NN IMBFH + SVM

0 93,48 90,22 90,04

1 93,21 91,39 90,94

2 93,84 92,12 91,57

3 94,11 92,66 92,03

Table 2. The effect of decomposition level (L) on accuracy rates (two images per subject). Level L IMBFH + K-NN D-IMBFH + K-NN IMBFH + SVM

0 96,84 97,53 94,10

1 96,57 97,62 94,51

2 96,84 97,73 94,65

3 96,71 97,81 94,79

Experiment #3 In this sub-section, a comparative study with some recent work applied to the IITDelhi-1 ear database is performed to demonstrate the effectiveness of the proposed method. Table 3 summarizes the identification rates between the proposed method and different algorithms newly proposed. As can be seen from Table 3, the proposed method achieves the best performance in IIT-Delhi-1 database, Except for [28]. However, the feature size of [28] is larger than ours. Thus, it can be concluded that the proposed method achieves much better performance than other approaches. IIT-Delhi-2 database Experiment #1 In this case, we will have to explore the effect of parameter L on the recognition rate. We choose to change the level L from zero to three with K = 300 to discover its impact on recognition performance of our method. Tables 4 and 5 compare the recognition rate results using one and two images in the training set respectively. As can be seen from these tables recognition rate is incremented proportionally with the SPH level decomposition L. Therefore, we can see that the best recognition rate for all methods is obtained with level L = 3 of horizontal decomposition. These results prove the effect of the spatial pyramid histogram (SPH) method on the recognition system. It allows to capture the spatial information of the local patterns. So, the combination of the multi-bag of features model and the spatial pyramid histogram decomposition allows to improve the accuracy rate. Experiment #2 In order to demonstrate the effectiveness of the proposed method, we compare it with some existing methods applied to the IIT-Delhi-2 ear database. Note also, that the same protocol of the comparative methods is used to carry out the corresponding experiment. In Table 6, we summarize the identification rates of seven comparative methods and the proposed method on the IIT-Delhi-2 ear Database. We can clearly see that the identification rate of D-IMBFH is the highest compared to the comparative methods.

Ear Recognition Based on Improved Features Representations

257

Table 3. Comparison of some related work on the IIT-Delhi-1 database. Ref

Feature extraction

Classifier

[8] [9] [12]

BSIF descriptor Improved BSIF descriptor Sparse representation of local gray level orientations Non linear curvelet features 2-D quadrature filter

KNN KNN Sparse representation KNN Hamming distance KNN Inner product SVM KNN KNN

[24] [25] [26] [27] [28] Our

Orthogonal log-Gabor filter pair Local principal independent components Geometric measurements D-IMBFH IMBFH

Recognition rates (%) 97.26 97.39 97.07 97.77 96.53 96.27 97.60 99.60 97.81 96.71

Table 4. The effect of decomposition level L (one image per subject). Level L IMBFH + K-NN D-IMBFH + K-NN IMBFH + SVM

0 88,46 83,92 84,44

1 89,69 86,01 85,72

2 90,50 87,70 86,42

3 91,37 88,53 87,04

Table 5. The effect of decomposition level L (two images per subject). Level L IMBFH + K-NN D-IMBFH + K-NN IMBFH + SVM

0 95,92 97,06 89,26

1 95,98 97,82 90,41

2 96,23 97,88 91.02

3 96,47 97,91 92,43

This is mainly due to the robustness and strong interclass discrimination of our approach. Experimental results in the USTB-1 database The University of Science and Technology Beijing (USTB-1) database contains 185 images captured from 60 subjects. The images have 8 bit gray scale and taken under different conditions of lighting and rotation. Each subject has at least three ear images. Furthermore, the cropped image has a size of 150  80 pixels [23]. Experiment #1 In this case, we will have to explore the effect of parameter L on the recognition rate. We choose to change the level L from zero to three with K = 300 to discover its impact on recognition performance of our method.

258

H. Doghmane et al.

Table 6. Comparison of some related work on the IIT-Delhi-2 database. Ref

Feature extraction

Classifier

[8] [9] [12]

BSIF descriptor Improved BSIF descriptor Sparse representation of local gray level orientations Non linear curvelet features 2-D quadrature filter

KNN KNN Sparse representation KNN Hamming distance KNN Inner product KNN KNN

[24] [25] [26] [27] Our

Orthogonal log-Gabor filter pair Local principal independent components D-IMBFH IMBFH

Recognition rates (%) 97.34 97.63 97.73 96.22 96.08 95.93 97.20 97.91 96.47

Table 7. The effect of decomposition level L (one image per class). Level L IMBFH + K-NN D-IMBFH + K-NN IMBFH + SVM

0 95,70 89,52 87,10

1 95,97 91,13 88,98

2 96,25 92,17 89,47

3 96,78 92,58 90,39

Table 8. The effect of decomposition level L (two images per class). Level L IMBFH + K-NN D-IMBFH + K-NN IMBFH + SVM

0 97,40 97,40 90,62

1 97.92 98,96 90,62

2 98.23 99,01 90,73

3 98.23 99,20 90,75

Tables 7 and 8 compare the recognition rate results using one and two images in the training set respectively. It can be seen that, the accuracy rate is incremented proportionally with the SPH level L. So, the best recognition rate for all methods is obtained with level L = 3 of horizontal decomposition. These results prove the effect of the spatial pyramid histogram (SPH) method, which allows to capture the spatial information of the local patterns. Experiment #2 We conduct this experiment to demonstrate the effectiveness of the proposed method through the performance comparison of different existing methods using USTB-1 ear database. Table 9 show that the highest recognition rate is of 99.20% obtained by using D-IMBBFH features. In addition, its feature size is smaller than that of comparative methods.

Ear Recognition Based on Improved Features Representations

259

Table 9. Comparison of some related work on the USTB-1 database. Ref [8] [9] Our

Feature extraction BSIF descriptor Improved BSIF descriptor D-IMBFH IMBFH

Classifier KNN KNN KNN KNN

Rec (%) 98.46 98.97 99.20 98.23

4 Conclusion A novel feature representation for ear recognition is proposed in this work. It based on combined of multi bag-of-features model and the spatial pyramid histogram approach. The main idea of the proposed method is to extract discriminant features based on the histogram representation for ear recognition using, the multi-bag of features (IMBF), the spatial pyramid histogram (SPH) of horizontal decomposition and the KFDA method for feature reduction. The experimental results show that the proposed approach is effective and superior to the state-of-the-art in term of accuracy. Our future work will be focused on two main ideas: (i) ear encoding step, by using other encoding methods to improve the accuracy, and (ii) ear authentication step, evaluating the performances of the proposed approach for larger dataset case.

References 1. Singh, D., Singh, S.K.: A survey on human ear recognition system based on 2D and 3D ear images. Open J. Inf. Secur. Appl. 1(2) (2014) 2. Nanni, L., Lumini, A.: Fusion of color spaces for ear authentication. Pattern Recognit. 42(9), 1906–1913 (2009) 3. Ansari, S., Gupta, P.: Localization of ear using outer helix curve of the ear. In: Proceedings of the International Conference on Computing: Theory and Applications (ICCTA’07) (2007) 4. Islam, S.M.S., Bennamoun, M., Davies, R.: Fast and fully automatic ear detection using cascaded Adaboost. In: IEEE Workshop on Applications of Computer Vision (2008) 5. Yuan, L., Zhang, F.: Ear detection based on improved Adaboost algorithm. In: Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Boading, vol. 4, pp. 2414–2417 (2009) 6. Emeršič, Ž., Štruc, V., Peer, P.: Ear recognition: more than a survey. Neurocomput. J. 255, 26–39 (2017) 7. Zhang, H., Mu, Z.: Compound structure classifier system for ear recognition. In: Proceedings of the IEEE International Conference on Automation and Logistics (ICAL), pp. 2306–2309 (2008) 8. Benzaoui, A., Hadid, A., Boukrouche, A.: Ear biometric recognition using local texture descriptors. J. Electron. Imaging 23(5), 0530081–05300812 (2014) 9. Benzaoui, A., Adjabi, I., Boukrouche, A.: Experiments and improvements of ear recognition based on local texture descriptors. J. Opt. Eng. 56(4), 0431091–04310913 (2017) 10. Prakash, S., Grupta, P.: An efficient ear recognition technique invariant to illumination and pose. Telecommun. Syst. 52(3), 1435–1448 (2013)

260

H. Doghmane et al.

11. Doghmane, H., et al.: A novel discriminant multi-scale representation for ear recognition. Int. J. Biom. 11(1), 50–66 (2019) 12. Kumar, A., Chan, T.-S.T.: Robust ear identification using sparse representation of local texture descriptors. Pattern Recogn. 46(1), 73–85 (2013) 13. Galdamez, P., Ganzalez, A., Ramon, M.: Ear recognition using a hybrid approach based a neural networks. In: Proceedings of the International Conference on Information Fusion, pp. 1–6 (2014) 14. Pflug, A., Paul, P.N., Busch, C.: A comparative study on texture and surface descriptors for ear biometrics. In: Proceedings of the International Carnhan Conference on Security Technology, pp. 1–6. IEEE (2014) 15. Huang, H., Liu, J., Feng, H., He, T.: Ear recognition based on uncorrelated local Fisher discriminant analysis. Neurocomputing 74, 3103–3113 (2011) 16. Kannala, J., Esa, R.: Bsif: binarized statistical image features. In: Proceedings IEEE International Conference on Pattern Recognition (ICPR), pp 1363–1366, Tsukuba, Japan (2012) 17. Foncubierta-Rodríguez, A., Depeursinge, A., Müller, H.: Using multiscale visual words for lung texture classification and retrieval. Medical Content-Based Retrieval for Clinical Decision Support, pp. 69–79. Springer, Berlin, Heidelberg (2012) 18. Li, Z., Imai, J.I., Kaneko, M.: Robust face recognition using block-based bag of words. In: 20th International Conference on Pattern Recognition (ICPR), pp. 1285–1288. IEEE (2010) 19. Triggs, B., Jurie, F.: Creating efficient codebooks for visual recognition. ICCV 1, 604–610 (2005) 20. Bourouba, H., Doghmane, H., Benzaoui, A., Boukrouche, A.: Ear recognition based on multi-bags-of-features histogram. In: Proceedings of the 3rd International Conference on Control, Engineering and Information Technology (CEIT), pp. 1–6 (2015) 21. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE Computer Society (2006) 22. Kumar, A.: IIT Delhi ear image database version1.0. New Delhi, India (2007). http://www4. comp.polyu.edu.hk/*csajaykr/IITD/Database_Ear.htm 23. Mu, Z.: USTB ear image database. Beijing, China (2009). http://www1.ustb.edu.cn/resb/en/ index.htm 24. Basit, A., Shoaib, M.: A human ear recognition method using non-linear curvelet feature subspace. Int. J. Comput. Math. 91(3), 616–624 (2014) 25. Chan, T.S., Kumar, A.: Reliable ear identification using 2-D quadrature filters. Pattern Recognit. Lett. 33(14), 1870–1881 (2012) 26. Kumar, A., Wu, C.: Automated human identification using ear imaging. Pattern Recognit. 45(3), 956–968 (2012) 27. Mamta, M.H.: Robust ear based authentication using local principal independent components. Expert Syst. Appl. 40(16), 6478–6490 (2013) 28. Omara, I., et al.: A novel geometric feature extraction method for ear recognition. Expert Syst. Appl. 65, 127–135 (2016)

Some Topological Indices of Polar Grid Graph Atmani Abderrahmane1(&), Elmarraki Mohamed1, and Essalih Mohamed1,2 1

2

LRIT - CNRST URAC n°29, Faculty of Sciences Rabat IT Center, Mohammed V University, Rabat, Morocco [email protected] LPSSII, The Safi’s Graduate School of technology, Cadi Ayyad University in Marrakesh, Marrakesh, Morocco [email protected]

Abstract. For over 100 years, chemists, chemo-informatics and biochemists have explored the relationship between chemical structure and biological activity of a molecule using statistical tools (QSPR and QSAR studies, respectively) and have tried to predict them, as well as than other measurable properties. However, the use of graph theory and topological indices has become indispensable. In this case, this exploration can be done in several ways. In this article, we will deploy some bases of so-called molecular indices in order to optimize their calculation. Keywords: Degree based TI  Distance based TI QSAR/QSPR  Topological indices (TI)



Molecular descriptor



1 Introduction In the 90s and 2000s, the drug manufacturing process was long and it took a lot of time. In addition, it is technically inefficient in terms of success rate. What makes the experiment expensive is the recognition of the candidates and it requires much more important data concerning the mechanisms of the target protein so that one can have safe and valid tests. Computer screening is one of the processes that can support a large number of compound databases and will subsequently evaluate and model these properties as well as molecular activities. This allows us to determine which condidates are most likely to bind to a biological activity or the physicochemical properties of a molecule. Quantitive structure – activity relationship (QSAR) studies require numerical molecular descriptors associated with the structural formulas which are discrete entities. In QSAR or QSPR study, an attempt is made to relate the structure of a molecule to a biological activity or a property by means of a statistical tool. Such relationships can be codified as follows: QSAR or QSPR = f(molecular structure) = f(molecular descriptor) [14]. Among the challenges of computing applied to chemistry, otherwise chemoinformatics, is to represent, in a simple and effective way, the candidate elements in order to predict their activity, also to use them to be able to extract the similarity, starting from information contained in already known compounds. During years of © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 261–270, 2020. https://doi.org/10.1007/978-3-030-21005-2_25

262

A. Abderrahmane et al.

research, multitudes of results have found that the best way to represent the information contained in a molecular structure is to assemble them into real numbers called topological indexes. Topological indices are numbers associated with constitutional formulas by mathematical operations on the graphs representing these formulas. The necessity of having to use such tools as topological indices originates in the fact that physic-chemical properties are expressed as numbers and thus have a metric enabling scientists to make comparisons and correlations. By contrast, chemical structures, even expressed in the mathematical form of graphs, are discrete entities. In order to evaluate quantitatively the degree of similarity or dissimilarity of chemical structures or to find correlations between structures and properties (QSAR or QSPR) one needs to translate structures into numbers [14, 15]. The first attempts to model the activity and property of molecules were initiated in the 1990s after discovering that the biological activity of a molecule was fundamental and at the origin of its chemical constitution by Frazer and Crum- Brown [16]. Except that it was in the 1960s that “group contribution” models were developed that establish the true beginnings of QSAR modeling. Subsequently, the development of new modeling techniques with learning first linearly, then nonlinear, led to the discovery of many methods; they are based, in principle, on finding a relationship between a set of real-type numbers, molecular descriptors, and the property or activity you want to predict. Today, we are talking about more than 6000 kinds of indices that are used to extract and quantify the different physico-structural-chemical characteristics of molecules. They can be obtained empirically but the calculated descriptors are qualified because they participate in the realization of one of the objectives of the modeling that: without going through the synthesis of the molecules these indices have the ability to predict. It is essential, before modeling, to compute a multitude of different descriptors, since the mechanisms that determine the activity of the molecule or one of its properties are often unknown. You must then select among these indices those that are most relevant for modeling. There are of three types of descriptors: The one-dimensional descriptors are accessible from the empirical formula of the molecule and describe the general properties of the component. The two-dimensional descriptors also known as topological descriptors or connectivity indices are determined from the structural representation of the molecule. Topological indices are numerical parameters of a graph that characterize its topology and are usually invariant properties. Example: the Wiener index, the Randic index … The three-dimensional descriptors of a molecule are basically seen as a composition of transformations to extract a set of geometric properties describing a cloud of points called atoms [16]. Using the tools of graph theory [20, 21], set theory and statistics attempt to identify the structural features that exist in structure-property activity relationships. One of the main tasks is to partition a molecular property and to recombine its fragmentary values by additive models. Topological characterization of chemical structures makes it possible to classify [17] molecules and to model unknown structures with the desired properties [13]. Before starting the article, some basic definitions of graph theory are needed.

Some Topological Indices of Polar Grid Graph

263

A graph is a set of points called vertex connected by lines or arrows named edges. The set of edges between vertices forms a network. Different types of networks are studied according to their kind of topology and properties. Trees are a simpler subcategory of graphs particularly important and which is very studied, especially in computer science. Edges can be oriented or unoriented. If the edges are oriented, the relation goes in one direction and is therefore asymmetrical, and the graph itself is said to be oriented. Otherwise, if the edges are unoriented, the relation goes in both directions and is symmetrical, and the graph is said to be undirected. In mathematics, the set of nodes is most often denoted V, while E denotes the set of edges. In the general case, a graph can have multiple edges, that is to say that several different edges connect the same pair of points. In addition, an edge can be a loop, that is to say, connect only one point to itself. A graph is simple if it has neither multiple edges nor loops, it can then be defined simply by a pair G ¼ ðV; EÞ, where E is a set of pairs of elements of V [18, 19]. The diameter DðGÞ is the largest element of the set of distances between any two vertices of a graph G [7, 8]: n o DðGÞ ¼ max dðu; vÞ : 8ðu; vÞ 2 VðGÞ2 To begin the explanation of the content of our article we will consider that our graph G is connected and simple. We note the number of pairs of vertices whose distance is k by dG ðkÞ. Then we have k as a real number associated with Wk ðGÞ Wiener-type invariant of G: [3] W k ðG Þ ¼

X

dG ðkÞkk

k1

The Wiener index is defined as a sum of the number of edges in the shortest path between all pairs of non-hydrogen atoms in a molecule. It is the oldest topological index related to molecular branching. The Wiener index was originally defined only for acyclic graphs [1, 2, 5, 6, 14]: X WðGÞ ¼ dðu; vÞ: fu;vgVðGÞ We have also the wiener index of a vertex u in G: X wðu; GÞ ¼ dðu; vÞ: vVðGÞ The second index is the hyper-wiener index noted by WWðGÞ [9, 12]: 1 WWðGÞ ¼ ðW1 ðGÞ1 þ W2 ðGÞ2 Þ 2 With the index 1 and 2 of Wiener type invariants of G as a function of k = 1 and 2 respectively.

264

A. Abderrahmane et al.

The third index is the degree distance index. DDðGÞ is defined as [9]: X DDðGÞ ¼ ðdegðuÞ þ degðvÞÞdðu; vÞ fu;vgVðGÞ

A pair of topological indices, denoted by symbols M1 and M2 , was introduced many years ago. They had different names in the literature, such as the Zagreb Group indices, the Zagreb group parameters and the Zagreb indices [12, 13]: M1 ðGÞ ¼

X

ðdegðuÞÞ2

uVðGÞ

Articles [3, 4] and [9] have treated these topological indices by simplifying them according to dG ðkÞ and DðGÞ with a theorem which explains that with a simple, planar, connected and undirected graph G, whose diameter is greater than 2 we have: WðGÞ ¼ NðN  1Þ  M þ

DðGÞ X

ðK  2ÞdG ðKÞ

ð1Þ

K¼3 DðGÞ X  1 3NðN  1Þ  4M þ WWðGÞ ¼ K2 þ K  6 dG ðKÞ 2 K¼3

DDðGÞ ¼

X

wðU; GÞdegðUÞ

! ð2Þ ð3Þ

UVðGÞ

The Wiener polarity index of a graph G, denoted by WP ðGÞ, is denoted as the number of unordered pairs of vertices that are at distance 3 in G. Among the most used topological indices, the properties of WP ðGÞ, have been widely studied for a multitude of graphs and have been the subject of sustained attention in recent years [22]. WP ðGÞ ¼ jfu; v)jdG ðu; vÞ ¼ 3; u; v VðGÞj

ð4Þ

If D(G) = 2, then [3, 10, 11]: W ðGÞ ¼ N ðN  1Þ  M

ð5Þ

3 WW ðGÞ ¼ N ðN  1Þ  2M 2

ð6Þ

DDðGÞ ¼ 4M ðN  1Þ þ M1 ðGÞ

ð7Þ

In chemistry, in general, we use diagrams which illustrate the chemical formula as a Graph, and in our research we tried to work on real cases and usable namely connected graphs, simple, undirected, and planar. So let’s take as a case the polar grid graph.

Some Topological Indices of Polar Grid Graph

265

2 Main Results In this section, the algorithm that will calculate the various topological indices of the polar graph including wiener, hyper-wiener, degree-distance and wiener polarity, will be presented using the Eqs. 1, 2, 3 and 4 2.1

First Level of the Polar Graph Grid

We start by introducing the wheel graph, or polar graph first degree as the first level in this paper, which is composed of n+1 vertices noted vx¼0;y¼1 the center of polar graph and vx¼1;y¼y þ 1 the others with m = 2n edges. Lemma1.1: Let Pn be the polar graph (see Fig. 1) with DðPn Þ ¼ 2 . The number of vertices, edges and faces of Pn is N = n+1, m = 2n and f = n+1 respectively then:

Fig. 1. The polar graph Pn

dPn ðkÞ ¼

8
> >
> for k ¼ 3 > : nðn  2Þ nðn  7Þ=2 for k ¼ 4 8   < n for x ¼ 0 and y ¼ 1 deg vx;y ¼ 4 for x ¼ 1 and y ¼ 1; . . .; n : 3 for x ¼ 2 and y ¼ 1; . . .; n

Some Topological Indices of Polar Grid Graph

    w vx;y ; P2n ¼ DP2n vx;y ¼

8
> > 6n for k ¼ 1 > > > > < nðn þ 13Þ=2 for k ¼ 2 dP3n ðkÞ ¼ n2 þ 4n for k ¼ 3 > > > n ð 3n  7 Þ for k¼4 > > 2 > > n  6n for k¼5 : nðn  11Þ=2 for k ¼ 6 8 n for x ¼ 0 and y ¼ 1 > >   < 4 for x ¼ 1 and y ¼ 1; . . .; n deg vx;y ¼ 4 for x ¼ 2 and y ¼ 1; . . .; n > > : 3 for x ¼ 3 and y ¼ 1; . . .; n     w vx;y ; P3n ¼ DP3n vx;y ¼

8 > >
> 12n  34 for x ¼ 2 and y ¼ 1; . . .; n : 15n  53 for x ¼ 3 and y ¼ 1; . . .; n

Proof: Same proof as lemma 1.1. Theorem 1.3: Let P3n be the polar graph with DðP3n Þ ¼ 6, then WðP3n Þ ¼ 51n2  245n n WWðP3n Þ ¼ ð357n  2075Þ 2 DDðP3n Þ ¼ nð135n  339Þ WP ðP3n Þ ¼ n2 þ 4n Proof: In order to prove these formulas we focus on Lemma 1.3, then the basic Eqs. (1), (2), (3) and (4).

3 Conclusion Scientific research in the field of chemo-informatics is increasingly endorsing results that lead to more efficient and safe QSAR / QSPR models, as long as the topological indices find other appearances, as our article has shown. The 3 indices deployed are basic indices and other future works are considering other indices with more relevance and more flexibility.

Some Topological Indices of Polar Grid Graph

269

References 1. Graovac, A., Pisanski, T.: On the Wiener index of a graph. J. Math. Chem. 8, 53–62 (1991) 2. Sabine, N.: The Wiener index of a graph. Thesis of Graz University of Technology (2010) 3. Essalih, M., El Marraki, M., Alhagri, G.: Some topological indices of spider’s web planar graph. Appl. Math. Sci. 6(63), 3145–3155 (2012) 4. Essalih, M.: L’étude des indices topologiques, leurs applications en QSAR/QSPR et leurs corrélations aux représentations moléculaires «Plerograph» et «Kenograph». PhD thesis, Faculty of Sciences, Rabat (2013) 5. Wiener, H.: Structural determination of paraffin boiling points. J. Am. Chem. Soc. 69(1), 17–20 (1947) 6. Gutman, I., Yeh, Y.N., Lee, S.L., Luo, Y.L.: Some recent results in the theory of the Wiener number. Indian J. Chem. 32A, 651–661 (1993) 7. West, D.B.: Introduction to Graph Theory, 2nd edn. Prentice Hall, Upper Saddle River (2002) 8. Xu, J.: Theory and Application of Graphs. Kluwer Academic Publishers, Dordrecht/Boston/London (2003) 9. El Marraki, M., Essalih, M., Alhagri, G.: Calculation of some topological indices of graphs. J. Theor. Appl. Inf. Technol. 30(2), 122–128 (2011) 10. Schmuck, N.S.: The Wiener index of a graph. PhD thesis, Graz University of Technology (2010) 11. Gutman, I., Yeh, Y.N., Chen, J.C.: On the sum of all distances in graphs. Tamkang J. Math. 25, 83–86 (1994) 12. Gutman, I., Das, K.C.: The first Zagreb indices 30 years after. MATCH 50, 83–92 (2004) 13. Gutman, I., Trinajstic, N.: Graph theory and molecular orbitals, Total ´ p−electron energy of alternant hydrocarbons. Chem. Phys. Lett. 17, 535–538 (1972) 14. Devillers, J., Balaban, A.T.: Topological indices and related descriptors in QSAR and QSPR. Gordan and Breach Science Publishers, Singapore (1999) 15. Dehmer, M., Varmuza, K., Bonchev, D.: Statistical Modelling of Molecular Descriptors in QSAR/QSPR. Quantitative and Network Biology, vol. 2. Wiley-Blackwell, Weinheim (2012) 16. Brown, A.C., Frazer, T.: On the connection between chemical constitution and physiological action. Trans. R. Soc. Edinb. 25, 151–203 (1868–69) 17. Mili, F., Hamdi, M.: A hybrid evolutionary functional link artificial neural network for data mining and classification. In: 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 917–924. Sousse (2012) 18. Zardi, H., Romdhane, L.B.: MEP — a robust algorithm for detecting communities in large scale social networks. In: 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 13–19. Sousse (2012) 19. Ben Abdenneji, S.F., Lavirotte, S., Tigli, J.Y., Rey, G., Riveill, M.: Adaptations interferences detection and resolution with graph-transformation approach. In: 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 36–43. Sousse (2012) 20. El Bazzi, M.S., Mammass, D., Zaki, T., Ennaji, A.: A graph based method for Arabic document indexing. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), Hammamet, pp. 308–312 (2016)

270

A. Abderrahmane et al.

21. El Ghazi, A., Ahiod, B.: Random waypoint impact on bio-inspired routing protocols in WSN. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), Hammamet, pp. 326–331 (2016) 22. Lei, Hui, Li, Tao, Shi, Yongtang, Wang, Hua: Wiener polarity index and its generalization in trees. MATCH Commun. Math. Comput. Chem. 78, 199–212 (2017)

Deep Elman Neural Network for Greenhouse Modeling Latifa Belhaj Salah1(&) and Fathi Fourati2 1

2

Control and Energy Management Laboratory (CEM-Lab), University of Gabes, Gabes, Tunisia [email protected] Control and Energy Management Laboratory (CEM-Lab), University of Sfax, Sfax, Tunisia [email protected]

Abstract. In this work, we propose to use recurrent deep learning method to model a complex system. We have chosen Deep Elman neural network with different structures and sigmoidal activation functions. The emphasis of the paper is to compare modeling results on a greenhouse and to demonstrate the abilities of Deep Elman neural network in a modeling step. For this, we used training and validation datasets. Simulation results proved the ability and the efficiency of Deep Elman neural network with two hidden layers. Keywords: Greenhouse  Elman neural network Recurrent neural network

 Modeling 

1 Introduction In recent years, deep neural networks have known a great success in many domains [1]; like speech recognition [2–4], vehicle detection applications [5], renewable energy [6], fault detection, computer vision [7, 8] …. Deep learning is defined as automatic learning that use many layers of information in order to extract and transform supervised or unsupervised function. A standard neural network (NN) is composed of many processors named neurons, each producing activations sequence of real-value. Environment sensors activate input neurons and other neurons are activated from previously neurons by weighted connections. Learning consists to find weights that allow the NN to present correctly the desired behavior. Depending on the problem and the way of neurons connections, such behavior can use long chains of calculation steps to transform the network activation [9, 10]. Both recurrent NN (RNN) and feed forward (FNN) have known a lot of success in many domains. RNNs are characterized by their power comparing to FNN, they can create and treat input models sequence memories [11]. In [12] authors propose to use Deep Elman neural network (ENN) to do acoustic modeling. They showed the effectiveness of this approach compared to long short term memory networks (LSTM) method and simplified LSTM networks method [13, 14]. © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 271–280, 2020. https://doi.org/10.1007/978-3-030-21005-2_26

272

L. Belhaj Salah and F. Fourati

In this paper, we show the performance of using Deep Elman RNN to model complex system. To summarize, the plan is detailed as following: In part 2, we describe the Elman neural network. In part 3, we present the neural Elman deep learning with back propagation algorithm. In part 4, we describe the considered greenhouse. In part 5, we present simulation results of the modeling step. Finally in part 6, a conclusion and prospects are given.

2 Elman Neural Network The Elman network has been found a lot of success in several domains such as financial prediction and identification of dynamic systems. The Elman network is a type of recurrent network. The difference between this network and the feed forward neural network is defined by the presence of context layer in Elman network. The main role of this layer is to memorize the previous hidden unit activations. The algorithm that can be used during the formation of the network is the back propagation algorithm, in addition the activation functions can be linear or non-linear for the hidden units [15–17]. Figure 1 and Table 1 describe an Elman neural network architecture.

Fig. 1. Elman neural network architecture.

Table 1. Description of Elman neural network architecture Parameters Il ð k Þ Om ðkÞ Ii1 ðk Þ Ii11 ðkÞ Xjc ðkÞ

Functions The l th input unit The m th network output The input of the i th first hidden layer unit The output of the i th first hidden layer unit The output of the j th first context layer

Iii2 ðkÞ

The input of the ii th second hidden layer unit (continued)

Deep Elman Neural Network for Greenhouse Modeling

273

Table 1. (continued) Parameters Iii22 ðkÞ Xjjc2 ðkÞ n1 ðkÞ Ii::i ðn1Þðn1Þ ðkÞ Ii::i cn1 Xj::j ðkÞ n ðkÞ Ii::ii nn Ii::ii ðkÞ cn Xj::jj Ims ðkÞ wO i::ii;m ð:Þ

Functions The total output of the ii th second hidden layer unit The output of the jj th second context layer unit the input of the i..i th (n − 1th) hidden layer unit The output of the i..i th (n − 1th) hidden layer unit The output of the j..j th (n − 1 th) context layer unit The input of i..ii th (n th) hidden layer unit The output of the i..ii th (n th) hidden layer unit The output of the j..jj th (n th) context layer unit

wIl;i ð:Þ

The input of the m th output layer unit The weights of the links between the i..ii th (n th) hidden layer and the output layer The weights of the links between the input layer and the first hidden layer

wcj;i1 ð:Þ

The weights of the links between the first context layer and the first hidden layer

2 ð: Þ wcjj;ii

2

wIi;ii ð:Þ n wcj::jj;i::ii ð: Þ n

wIi::i;i::ii ð:Þ

The weights of the links between the second context layer and the second hidden layer The weights of the links between the first hidden layer and the second hidden layer The weights of the links between the n th context layer and the n th hidden layer The weights of the links between the (n − 1) th hidden layer and the n th hidden layer

3 Learning of Elman Neural Network The learning of the Elman network need to minimize the squared error criterion defined as: Jk ¼

1 Xns ðVSm ðkÞ  Om ðk ÞÞ2 m¼1 2

ð1Þ

Where VSm (k) present the desired output. To adjust Elman neural network connection weights, we used the back propagation algorithm in order to emulate the dynamics of a process. 3.1

Network with One Single Hidden Layer

The general weight adaptation in the gradient method is: Dw ¼ e

@Jk @w

ð2Þ

274

L. Belhaj Salah and F. Fourati

The following equations present the weights vectors adjustment:

3.2

DwO i;m ðk Þ ¼ e

@Jk @wO i;m ðkÞ

ð3Þ

DwIl;i ðkÞ ¼ e

@Jk @wIl;i ðkÞ

ð4Þ

Dwcj;i ðkÞ ¼ e

@Jk @wcj;i ðkÞ

ð5Þ

Network with Many Hidden Layers

The following equations present the weights vectors adjustment: @Jk O @wi::ii;m ðkÞ

ð6Þ

DwIl;i ðkÞ ¼ e

@Jk @wIl;i ðkÞ

ð7Þ

Dwcj;i ðkÞ ¼ e

@Jk @wcj;i ðkÞ

ð8Þ

DwIi;ii ðkÞ ¼ e

@Jk 2 @wIi;ii ðkÞ

ð9Þ

1 ðkÞ ¼ e Dwcjj;ii

@Jk 1 @wcjj;ii ðkÞ

ð10Þ

@Jk cn @wj::jj;i::ii ðkÞ

ð11Þ

@Jk n I @wi::i;i::ii ðkÞ

ð12Þ

DwO i::ii;m ðk Þ ¼ e

2

n Dwcj::jj;i::ii ðkÞ ¼ e

n

DwIi::i;i::ii ðkÞ ¼ e

4 Considered System We have to model a greenhouse, which is defined as a multi-inputs, multi-outputs (MIMO) process and it is characterized by disturbances and uncertainty.

Deep Elman Neural Network for Greenhouse Modeling

275

The following quantities define the greenhouse functioning [18–21]: • Measurable but not controllable input: Te (“external temprature”), He (“external hygrometry”), Rg (“global radiant”), Vv (“wind speed”). • Measurable and controllable input: Ch (“heating input), Ov (“sliding shutter”), Br (“sprayer”), Om (“curtain”). • Outputs: Ti (“internal temperature”), Hi (“internal hygrometry”). We choose an Elman neural network constituted with many hidden and context layers to model the greenhouse.

5 Simulation Results The three considered neural networks structures are: • Network constituted with one single hidden and context layer • Network constituted with two hidden and context layers • Network constituted with three hidden layers and context layers Table 2 describe deep Elman neural network characteristics for the modeling of the greenhouse. Table 2. Deep Elman neural network characteristics Parameters

Number of input unit n1 Number of output unit ns Number of unit offirst context layer nc1 Number of unit of first hidden layer nh1 Number of unit of second context layer nc2 Number of unit of second hidden layer nh2 Number of unit of third context layer unit nc3 Number of unit of third hidden layer unit nh3 iterations Learning coefficient e

Elman network with one hidden and context layer 8

Elman network with two hidden and context layer 8

Elman network with three hidden and context layer 8

2

2

2

4

4

4

4

4

4

4

4

4

4

4

4

10000 0.2

10000 0.2

20000 0.4

276

L. Belhaj Salah and F. Fourati

The criterion (13) is considered in order to compare the three neural networks structures. Jt ¼

Xnb Xns k¼1

i¼1

absðym i ðk Þ  yi ðkÞÞ

ð13Þ

Where, nb is the operating interval, k is the sample time, ns is the number of outputs, ym i ðkÞ is the i th output of the neural model at time k and yi ðkÞ is the i th output of the system at time k. For the greenhouse, we have a data base of the parameters which describe the functioning of the greenhouse during one day. This data file is composed of 1440 lines where the sampling time present one minute. The obtained data base is divided in two parts, each part is composed of 720 rows; one part is considered for the learning step and one other part is used for the step of validation. The input vector is I(k) = [Ch(k), Ov(k), Om(k), Br(k), Te(k), He(k), Vv(k), Rg(k)]T and the output vector is O(k) = [Ti(k), Hi(k)]. Figures 2, 3 and 4 present the evolution of the criterion Jk of the greenhouse.

Fig. 2. The error evolution in the case of one hidden layer network.

Fig. 3. The error evolution in the case of two hidden layers network.

Deep Elman Neural Network for Greenhouse Modeling

277

Fig. 4. The error evolution in the case of three hidden layers network.

Here, nb = 720. Figures 5 and 6 describe the real internal climate evolution which is represented by continuous lines and the one hidden and context layer neural model outputs which represented by dashed lines and using the validation data base part.

Fig. 5. The internal temperature evolution.

Fig. 6. The internal hygrometry evolution.

278

L. Belhaj Salah and F. Fourati

Here Jt = 88.7962 Figures 7 and 8 describe the real internal climate evolution which is represented by continuous lines and the two hidden and context layers neural model outputs which represented by lines and using the validation data base part.

Fig. 7. The internal temperature evolution.

Fig. 8. The internal hygrometry evolution.

Here Jt = 82.4499 Figures 9 and 10 describe the real internal climate evolution which is represented by continuous lines and the three hidden and context layers neural model outputs which represented by dashed lines and using the validation data file part.

Fig. 9. The internal temperature evolution.

Deep Elman Neural Network for Greenhouse Modeling

279

Fig. 10. The internal hygrometry evolution.

Here Jt = 90.5219; From the results and figures, we show that the network constituted with two hidden and context layers has the lowest total error value. This Elman neural structure is more performant and efficient for the modeling and validation steps.

6 Conclusion In this work, we formed a Deep Elman neural network for modeling a greenhouse. Through simulation results, we observed that the Elman neural network emulates the dynamic behavior of the process successfully. We showed that an Elman network constituted of two context and hidden layers is better suited and more efficient than the others Elman neural network structures. The obtained model will be used to the control task.

References 1. Achanta, S., Gangashetty, S.V.: Deep Elman recurrent neural networks for statistical parametric speech synthesis. Speech Commun. 93, 31–42 (2017) 2. Chen, D., Mak, B.K.: Multitask learning of deep neural networks for low-resource speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1172–1183 (2015) 3. Islam, S.M.S., Rahman, S., Rahman, M.M., Dey, E.K., Shoyaib, M.: Application of deep learning to computer vision: a comprehensive study. In: Proceedings of International Conference on Informatics, Electronics and Vision, pp. 592–597 (2016) 4. Makrem, B.J., Imen, J., Kaïs, O.: Study of speaker recognition system based on Feed Forward deep neural networks exploring text-dependent mode. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 355–360, Hammamet (2016) 5. Gazzah, S., Mhalla, A., Amara, N.E.B.: Vehicle detection on a video traffic scene: review and new perspectives. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 448–454, Hammamet (2016)

280

L. Belhaj Salah and F. Fourati

6. Kutucu, H., Almryad, A.: An application of artificial neural networks to assessment of the wind energy potential in Libya. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 405–409, Hammamet (2016) 7. Kruger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., Rodriguez-Sanchez, A.J., Wiskott, L.: Deep hierarchies in the primate visual cortex: what can we learn for computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1847–1871 (2013) 8. Psaltis, D., Sideris, A., Yamamura, A.A.: A multilayer neural network controller. IEEE Control Syst. Mag. 8, 17–21 (1988) 9. Jaeger, H.: The “echo state” approach to analysing and training recurrent neural networks. Technical Report GMD Report 148, German National Research Center for Information Technology (2001) 10. Jaeger, H.: Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304, 78–80 (2004) 11. Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990) 12. Siegelmann, H.T., Sontag, E.D.: Turing computability with neural nets. Appl. Math. Lett. 4, 7–80 (1991) 13. Fan, Y., Qian, Y., Xie, F.L., Soong, F.K.: TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Proc. INTERSPEECH, pp. 1964–1968 (2014) 14. Wu, Z., King, S.: Investigating gated recurrent networks for speech synthesis. In: Proc. ICASSP, pp. 5140–5144 (2016) 15. Pham, D.T., Liu, X.: Training of Elman networks and dynamic system modelling. Int. J. Syst. Sci. 27, 221–226 (1996) 16. Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 26–31 (2013) 17. Baghernezhad, F., Khorasani, K.: Computationally intelligent strategies for robust fault detection, isolation, and identification of mobile robots. Neurocomputing 171, 335–346 (2016) 18. Yan, A., Wang, W., Zhang, C., Zhao, H.: A fault prediction method that uses improved casebased reasoning to continuously predict the status of a shaft furnace. Inf. Sci. 259, 169–281 (2014) 19. Huang, H.B., Huang, X.R., Li, R.X., Lim, T.C., Ding, W.P.: Sound quality prediction of vehicle interior noise using deep belief networks. Appl. Acoust. 113, 149–161 (2016) 20. Souissi, M.: Modélisation et commande du climat d’une serre agricole. Ph.D. Thesis, University of Tunis, Tunis (2002) 21. Zen, H., Sak, H.: Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: Proc. ICASSP, pp. 4470–4474 (2015)

Image and Video

High Efficiency Multiplierless DCT Architectures Yassine Hacha¨ıchi1,2(B) , Sonia Mami3,4 , Younes Lahbib1,4 , and Sabrine Rjab1 1 ENICarthage, University of Carthage, Tunis, Tunisia {Yassine.Hachaichi,younes.lahbib}@enicarthage.rnu.tn, [email protected] 2 Research Laboratory Smart Electricity & ICT, SEICT, LR18ES44. National Engineering School of Carthage, University of Carthage, Tunis, Tunisia 3 Facult´e des Sciences de Tunis, Universit´e de Tunis El Manar, Tunis, Tunisia [email protected] 4 Research Laboratory LAPER UR-17-ES11, Universit´e de Tunis El Manar, Tunis, Tunisia

Abstract. In this paper, we propose a new multiplierless Cordic Loeffler DCT (CLDCT) architecture based on the Taylor expansion. The new architectures relie on an enhanced choice of rotation angles. A suitable selection of the considered function, in one hand, and of the order of approximation of Taylor expansion, on the other hand, have led to a low power and high precision scale free DCT. Comparing to classical architectures, we improve the image quality, whilst also reducing the power consumption. Our contributed architectures have closer PSNR to the Loeffler DCT, to which we compare in terms of image quality. The enhancement of PSNR reaches up to 12.89 dB in comparison with the CLDCT and 14.61 dB when compared to the BinDCT. Keywords: CORDIC · Scale free High Precision · Low power

1

· Taylor expansion ·

Introduction

The DCT (Discrete Cosine Transform) has become an unavoidable technique of transform in digital signal processing. Its scope covers in particular: image and video compression [1] and feature extraction [2]. However, it has a high computational complexity since it requires many multiplications and additions. To deal with this constraint, many solutions have been proposed. The proposed suggestions include the use of the Cordic algorithm exploited in the Cordic DCT [3] and the Cordic based Loeffler DCT [4–6], the binary representation of the constant coefficient multiplications [7,8] and the BinDCT [9]. Almost all the proposed architectures tried to replace the multiplications by shift and Add/Sub gates. Most of the researches appear as multiplierless architectures. However, it is not the case. In fact, generally the intern multiplications are substituted by shift c Springer Nature Switzerland AG 2020  M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 283–293, 2020. https://doi.org/10.1007/978-3-030-21005-2_27

284

Y. Hacha¨ıchi et al.

and add operations but the scaling factor is not. Even in the rare cases when the scaling factor is substituted, we have a significant loss of precision, since many approximations are performed. In this paper, we propose scale-free DCT architectures based on a new choice of rotation angles and an accurate Taylor expansion. The proposed architectures provide both low power and high precision. Our contributions in this work are: – A new architecture based on an enhanced choice of rotation angles. – An improved architecture based on a generic precision Cordic based Loeffler DCT architecture. – A multiplierless DCT architecture based on an efficient Taylor expansion. – A high quality architecture which provides the closest results to the Loeffler based DCT, the reference in terms of precision and accuracy. In the rest of the paper, we will first give a state of the art of multiplierless DCT architectures. We will next give a background of the Cordic and the DCT, respectively in Sects. 3 and 3.1. The Generic precision DCT architectures are presented in Sect. 3.3. Section 4 details the proposed architectures. In Sect. 5, we analyze and discuss the experimental results before concluding our work in Sect. 6.

2

Related Works

The DCT architecture requires the use of arithmetic operations namely the additions, subtractions and multiplications. These operations and particularly the multiplications are greedy in power and area and have a high computational complexity. Many works have been carried out in order to avoid the multiplications by approximating them with shift and add/sub operations. The disadvantage of these approximations is that they cause a serious degradation of the transform accuracy. In [4,5], Sun et al. proposed the flow graph of the Cordic Loeffler DCT based on the well-known Cordic algorithm. They claim that the architecture they propose, namely the cordic based Loeffler DCT consumes 16% less energy when compared to the Loeffler based DCT [10] but image quality degradation occurred. It is necessary to notice that only the intern multiplications were taken into consideration in the proposed graph. The multiplication based compensation factor stage still remains. The DCT Architecture proposed in [11] is based on Angle Recoding CORDIC. For the Scale-Free Factor Technique, the authors simply proposed to merge the factors with the quantization matrix during the process of the image/video compression. However, this solution is not general since the DCT is used in several domains other than compression and decompression and the quantization step does not always exist. However, some works really eliminates the scaling factor including [7,8,12,13].

High Efficiency Multiplierless DCT Architectures

285

In [12], authors presented a CORDIC algorithm which completely eliminates the scaling factor. They based their work on an appropriate selection of the order of approximation of the sine and cosine Taylor series. In order to implemente the scale factor using shift and add operators, the authors proposed to use the third order expansion and approximate 3!=6 to 23 = 8. Moreover, they assumed that the elementary angle is equal to 2−i . This can be considered as a supplementary approximation since the elementary angle is actually equal to arctg(2−i ). These approximations lead to a significant loss of accuracy. In [13], a scaling free cordic algorithm is proposed. This algorithm is based on both the sine and cosine Taylor series and the leading one bit detection algorithm. In this case, authors chosed also to use the third order expansion, but they approximate 3!=6 to 22 = 4. The leading one bit has been used to reduce the number of required iterations. In our case, we don’t need such algorithm since the Cordic angles are fixed and their parameters are extracted from [6]. In [7], authors presented a fast DCT using multiplier-less method. This method consists on representing the constant coefficients of the multiplications using an unsigned 12 bits precision, leading to shift/add based multiplications. The paper [8] also adopted the same technique but applied it on a 16 point DCT. Contrary to these works, we propose in this paper a DCT architecture which completely eliminates the scaling factor whilst also meeting the accuracy requirement by refining our approximations.

3

Cordic-Based Loeffler DCT Background

In this section, we present the architectures on which we have based our work. 3.1

Cordic Algorithm

Cordic [14,15] is a class of hardware based algorithms introduced in order to approximate transcendental functions. These methods use only shifts and adders. The idea in the conventional Cordic is to expand any angle into a sum of microrotation angles of arctangent radix as shown in Eqs. 1 and 2.  θ= σi θi where σi = ±1 (1) i

θi = arctan(2−i )

(2)

A rotation of the vector by an angle θi , changes the coordinate from (Xi , Yi ) to (Xi+1 , Yi+1 ). After an elementary rotation, the obtained vector can is given by Eq. 4:      xi+1 xi cos(θi ) −σi sin(θi ) = (3) yi+1 yi σi sin(θi ) cos(θi )      xi 1 −σi 2−i xi+1 = Ki (4) yi+1 σi 2−i 1 yi

286

Y. Hacha¨ıchi et al.

Where Ki = cos(θi ). In the Eq. 1, we use only shift and add operations in order to make the given rotation. Complete this operation, the output of the rotations, must be multiplied by a scale factor. This constant is given by the Eq. 5.  Ki (5) K= i

3.2

Conventional Cordic-Based DCT Architecture

Equations of the one-dimension, 8 points DCT are:   7  (2i + 1)tπ 1 x(i) cos X(t) = C(t) 2 16 i=0

(6)

√ C(t) =

if t=0 1 otherwise 2 2

(7)

x(i) represents the input data while X(t) represents the 1-D DCT transformed output data. The 1-D DCT transform is represented as follows. ⎛ ⎞ ⎞⎛ ⎞ ⎛ X0 x0 + x7 A1 A1 A1 A1 ⎜ X2 ⎟ 1 ⎜ A2 A3 −A3 −A2 ⎟ ⎜ x1 + x6 ⎟ ⎜ ⎟ ⎟⎜ ⎟ ⎜ (8) ⎝ X4 ⎠ = 2 ⎝ A1 −A1 −A1 A1 ⎠ ⎝ x2 + x5 ⎠ X6 A3 −A2 A2 −A3 x3 + x4 ⎛ ⎞ ⎞⎛ ⎞ ⎛ X1 x0 − x7 A4 A5 A6 A7 ⎜ X3 ⎟ 1 ⎜ A5 −A7 −A4 −A6 ⎟ ⎜ x1 − x6 ⎟ ⎜ ⎟ ⎟⎜ ⎟ ⎜ (9) ⎝ X5 ⎠ = 2 ⎝ A6 −A4 A7 A5 ⎠ ⎝ x2 − x5 ⎠ X7 A7 −A6 A5 −A4 x3 − x4 π  3π   3π  π where A , A , A , A = cos = sin = cos = cos 1 2 3 4 4 8 8 16 , A5 =  3π  π  3π  cos 16 , A6 = sin 16 and A7 = sin 16 . The resulting 1-D DCT equation is given by a rotation matrix spanned into CORDIC iterations. The Cordic algorithm performs the rotation of the angles occurring in the DCT transform. Now, if we analyse the unfolded Cordic blocks, we can note that: – For the Cordic of the angle 3π/8, the rotation iterations are 0, 1 and 4. So the compensation factor is K = cos(θ0 ) × cos(θ1 ) × cos(θ4 ) = 0.6312. This factor can be merged with the term 1/2 × C(t) (Eq. 7) since this cordic block is directly connected to the output. – For the Cordic of the angle π/16, the rotation iterations are 3 and 4. So the compensation factor is K = cos(θ3 )×cos(θ4 ) = 0.9903. In this case, the factor K is close to 1, so it is neglected.

High Efficiency Multiplierless DCT Architectures

287

– For the Cordic of the angle 3π/16, the rotation iterations are 1 and 3. So the compensation factor is K = cos(θ1 )×cos(θ3 ) = 0.8875. In this case, the factor K can not be neglected since it is not close to 1 and can not be shifted to the output because of the data dependency. So K is represented by shift and add operators and is connected to the cordic block of 3π/16. For the purposes of notation, all the operators needed to perform the 3π/16 rotation are called cordic block. 3.3

Generic Precision Cordic Based Loeffler DCT Architecture

In [6], authors proposed an algorithm which generates the cordic parameters of any given angle giving the required precision degree. This algorithm has been exploited to determine a new table of parameters of the Cordic based Loeffler DCT architecture which provides more accurate results. Seven architectures have been presented namely P1 to P7 which correspond to the Cordic based Loeffler DCT architecture with a precision degree of respectively 10−1 to 10−7 . The proposed architectures are scaled. In the next section, we propose a multiplierless architecture based on P1, P2 and P3 using the Taylor expansion.

4

The Proposed Accurate CLDCT Architecture

We give in the rest of this section our main contributions: First, we choose to change the rotation angles. Then, we propose to totally remove multiplications by using the Taylor expansion. 4.1

The New Cordic Based Loeffler Architecture

The idea behind changing the rotation angles used in the classic DCT emerges from two remarks: – First, we plan to remove the multiplications by the use of Taylor expansion. It is well known that the Taylor expansion of a function f around the point x = 0 is more accurate as the value is close to 0, so the error incurred in approximating f (which is in o(hn ) converges to 0. In our case and considering the cosinus and sinus functions, we can affirm that the error is smaller the more the angle is lower than 1. This is the case of 3π/16 and π/16 but not for 3π/8. – Second, we remark that 3π/8 = π/2 − π/8. According to the trigonometric identities, we have cos(π/2 − x) = sin(x) and sin(π/2 − x) = cos(x). So we can deduct that cos(3π/8) = sin(π/8) and sin(3π/8) = cos(π/8). The value of the angles π/8, 3π/16 and π/16 are perfectly lower than 1. According to the Eq. 8, we can slightly modify the values B and C in the first matrix. We obtain B = cos(π/8) and C = sin(π/8). Equation 3 becomes Eq. 10 then 11:

288

Y. Hacha¨ıchi et al.

  y sin(θ ) −σi cos(θ ) x σi cos(θ ) sin(θ )      x cos(θ ) −σi sin(θ ) −σi xi+1 = σi yi+1 y σi sin(θ ) cos(θ ) 

xi+1 yi+1





=

(10) (11)

Where x =y, y =x and θ = π/8. In summary, if we change the angle 3π/8 by the angle π/8, we only have to switch the inputs and negate one of the outputs as shown in Fig. 1.

Fig. 1. The insertion of the rotation angle π/8 in the DCT graph

4.2

The Taylor Expansion Based DCT

As we said previously the compensation factor K is the product of all the Ki as shown in Eq. 5. If we replace Ki by its value, we find the Eq. 12.  cos(arctan(2−i )) (12) K= i

Now, we will not use the Taylor expansion of the function cos(x) but that of the function cos(arctan(x)). We find Eq. 13 for the third order Taylor expansion and Eq. 14 for the fifth order. The Taylor expansion of the sinus terms is determined by the Eq. 15. x2 + o(x3 ) (13) cos(arctan(x)) = 1 − 2 x2 3 × x4 + + o(x5 ) 2 8 sin(arctan(x)) = x × cos(arctan(x))

cos(arctan(x)) = 1 −

(14) (15)

Now, we will consider the third and the fifth order terms of the Taylor expansion. The results are summarized respectively in the Tables 1 and 2. The parameter table for the different angles is extracted from [6]. We take for example the cordic of the angle π/8 with the precision P2. We find that KP 2 (π/8) = (1 − 2−3 ) × (1 − 2−9 ). Regarding the hardware this compensation factor can be implemented using only shifts and adders as seen in the Fig. 2.

High Efficiency Multiplierless DCT Architectures

289

Table 1. The third order Taylor expansion of the compensation factor for the cordic of each angle Precision Iterations TE third order π/8

P1 P2 P3

3π/16 P1 P2 P3

π/16

P1 P2

P3

= 1 − 2−3 = 1 − 2−3 = KP 1 × 1 − 2−9 = 1 − 2−3 = KP 1 × 1 − 2−9 = KP 2 × 1 − 2−15

1 1 4 1 4 7

KP 1 KP 1 KP 2 KP 1 KP 2 KP 3

1 3 1 3 1 3 10

K1 = 1 − 2−3 KP 1 = K1 × 1 − 2−7 K1 = 1 − 2−3 KP 1 = K1 × 1 − 2−7 K1 = 1 − 2−3 KP 2 = K1 × 1 − 2−7 KP 3 = KP 2 × 1 − 2−21

2 2 4 6 2 4 6 9

KP 1 = 1 − 2−5 KP 1 = 1 − 2−5 K = KP 1 × 1 − 2−9 KP 2 = K × 1 − 2−13 KP 1 = 1 − 2−5 K = KP 1 × 1 − 2−9 KP 2 = K × 1 − 2−13 KP 3 = KP 2 × 1 − 2−19

According to the required accuracy, some terms can be neglected. In our case, we kept only the powers higher than −16.

Fig. 2. The implementation of the cordic π/8 with the third order Taylor expansion of its compensation factor

290

Y. Hacha¨ıchi et al.

Table 2. The fifth order Taylor expansion of the compensation factor for the cordic of each angle Precision Iterations TE fifth order π/8

P1 P2 P3

3π/16 P1 P2 P3

π/16

P1 P2

P3

= 1 − 2−3 + 2−6 + 2−7 = 1 − 2−3 + 2−6 + 2−7 = KP 1 × 1 − 2−9 + 2−18 + 2−19 = 1 − 2−3 + 2−6 + 2−7 = KP 1 × 1 − 2−9 + 2−18 + 2−19 = KP 2 × 1 − 2−15 + 2−30 + 2−31

1 1 4 1 4 7

KP 1 KP 1 KP 2 KP 1 KP 2 KP 3

1 3 1 3 1 3 10

K1 = 1 − 2−3 + 2−6 + 2−7 KP 1 = K1 × 1 − 2−7 + 2−14 + 2−15 K1 = 1 − 2−3 + 2−6 + 2−7 KP 1 = K1 × 1 − 2−7 + 2−14 + 2−15 K1 = 1 − 2−3 + 2−6 + 2−7 KP 2 = K1 × 1 − 2−7 + 2−14 + 2−15 KP 3 = KP 2 × 1 − 2−21 + 2−42 + 2−43

2 2 4 6 2 4 6 9

KP 1 = 1 − 2−5 + 2−10 + 2−11 KP 1 = 1 − 2−5 + 2−10 + 2−11 K = KP 1 × 1 − 2−9 + 2−18 + 2−19 KP 2 = K × 1 − 2−13 + 2−26 + 2−27 KP 1 = 1 − 2−5 + 2−10 + 2−11 K = KP 1 × 1 − 2−9 + 2−18 + 2−19 KP 2 = K × 1 − 2−13 + 2−26 + 2−27 KP 3 = KP 2 × 1 − 2−19 + 2−38 + 2−39

Considering the third and the fifth Taylor expansion and the 3 precisions P1, P2 and P3 extracted from [6], we obtain 6 different architectures. The computational complexity of these architectures including the compensation factors are summarized in Table 3.

5

Experiments

The different architectures have been implemented on Spartan6 using Xilinx System Generator (XSG). The power consumption is measured with Xpower Analyzer with 100 MHz clock cycles, 1 V supply power. The power consumption of the different proposed architectures is shown in the Table 3. To ensure a fair comparison, we implement also the conventional architectures under the same conditions (Fig. 3). As it is shown, the power consumption of the proposed architectures is satisfactory. In fact, since we propose here a totally free scale solution, we can observe a decrease in the power consumption of P1TE3 (445 mW) and

High Efficiency Multiplierless DCT Architectures

291

Fig. 3. The implementation of the cordic π/8 with the fifth order Taylor expansion of its compensation factor Table 3. The power consumption and the computational complexity of the new architectures in comparison with the conventional ones. Architectures Add/Sub Shifts Multipliers Power (mW) LDCT

29

0

19

554

CLDCT BinDCT

38

16

8

521

36

17

8

491

P1 [6]

34

12

8

484

P1TE3

53

40

0

445

P1TE5

69

56

0

501

P2 [6]

40

18

8

488

P2TE3

65

52

0

480

P2TE5

81

68

0

517

P3 [6]

46

24

8

497

P3TE3

73

60

0

495

P3TE5

89

76

0

530

P2TE3 (480 mW) in comparison with all the classical architectures including the BinDCT (491 mW) which needs a relatively low power. The power consumption of P1TE5, P2TE5 and P3TE3 is lower than the Cordic based Loeffler DCT and obviously the Loeffler DCT. Concerning the architecture P3TE5, it is right that its power is barely higher than the Cordic based Loeffler DCT but we will see later its contribution in terms of quality. The efficiency of our architecture has been proven by comparing it to the other architectures quoted. We made this comparison considering a JPEG2000 compression chain [16] applied to standard test images. Table 4 summarizes the

292

Y. Hacha¨ıchi et al.

Table 4. PSNR quality for Lena Image in JPEG2000 for different quantization quality Quality factor

LDCT CLDCT BinDCT P1TE3 P1TE5 P2TE3 P2TE5 P3TE3 P3TE5

95

44.23

36.98

26.94

41.38

41.67

43.51

44.03

43.55

44.07

90

39.72

36.02

26.85

38.52

38.65

39.45

39.65

39.46

39.67

85

37.14

35.11

26.78

36.5

36.58

36.98

37.09

36.99

37.09

80

35.46

34.30

26.65

35

35.08

35.36

35.43

35.36

35.43

75

34.36

33.71

26.57

34.04

34.1

34.28

34.33

34.29

34.33

70

33.61

33.18

26.48

33.33

33.38

33.54

33.59

33.55

33.59

65

32.94

32.75

26.39

32.76

32.8

32.88

32.92

32.89

32.92

60

32.40

32.32

26.27

32.25

32.28

32.34

32.38

32.34

32.38

55

31.92

31.89

26.25

31.8

31.82

31.87

31.9

31.87

31.9

50

31.48

31.46

26.16

31.34

31.36

31.44

31.46

31.44

31.46

Average 35.32

33.77

26.53

34.69

34.77

35.16

35.28

35.17

35.28

quality comparison of the proposed DCT algorithm for two precision degrees, with the other conventional DCT architectures. The results consider high-to-low quality compression using Lena image. The Table 4 clearly shows the enhancement provided by the proposed architectures. Among the proposed architectures, P3TE5 provides the closest PSNR to the Loeffler-DCT, than the considered ones. The Table 4 clearly shows the enhancement provided by the proposed architectures. Among the proposed architectures, P3TE5 provides the closest PSNR to the Loeffler based DCT are, which is a reference in terms of image quality.

6

Conclusion

In this paper we introduced different architectures with high quality based on a new choice of the rotation angles and the Taylor expansion of the compensation stage. From the empirical results, we observe a significant enhancement of the PSNR value (reaching 7.09 dB for P3TE5) in comparison with the Cordic based Loeffler architecture and a substantial decrease in the power consumption (attaining 14.6% for P1TE3). The results obtained make our architecture adequate to high precision applications. As a perspective, we can combine our results to recent improvements in the DCT architecture. We conjecture that this will give more efficient architectures, Quality/Consumption tradeoff.

References 1. Al-Janabi, S., Al-Shourbaji, I.: A smart and effective method for digital video compression. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 532–538, December 2016

High Efficiency Multiplierless DCT Architectures

293

2. Benati, N., Bahi, H.: Spoken term detection based on acoustic speech segmentation. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 267–271, December 2016 3. Jeong, H., Kim, J., Cho, W.K.: Low-power multiplierless DCT architecture using image correlation. IEEE Trans. Consum. Electron. 50(1), 262–267 (2004) 4. Sun, C.-C., Ruan, S.-J., Heyne, B., Goetze, J.: Low-power and high quality Cordicbased Loeffler DCT for signal processing. IET Circ. Devices Syst. 1(6), 453–461 (2007) 5. Sun, C.-C., Donner, P., G¨ otze, J.: VLSI implementation of a configurable IP Core for quantized discrete cosine and integer transforms. Int. J. Circ. Theory Appl. 40(11), 1107–1126 (2012) 6. Mami, S., Saad, I.B., Lahbib, Y., Hachaichi, Y.: Enhanced configurable DCT Cordic Loeffler architectures for optimal Power-PSNR trade-off. J. Signal Process. Syst. 90(3), 371–393 (2018) 7. E1 Aakif, M., Belkouch, S., Chabini, N., Hassani, M.M.: Low power and fast DCT architecture using multiplier-less method. In: Faible Tension Faible Consommation (FTFC), pp. 63–66, June 2011 8. Jeske, R., et al.: Low cost and high throughput multiplierless design of a 16 point 1-D DCT of the new HEVC video coding standard. In: Programmable Logic (SPL), Bento Gon¸calves, RS, March 2012 9. Dang, P.P., Chau, P.M., Nguyen, T.Q., Tran, T.D.: BinDCT and its efficient VLSI architectures for real-time embedded applications. J. Image Sci. Technol. 49(2), 124–137 (2005) 10. Loeffler, C., Lightenberg, A., Moschytz, G.S.: Practical fast 1-D DCT algorithms with 11-multiplications. In: Proceedings of ICASSP, Glasgow, UK, vol. 2, pp. 988– 991, May 1989 11. Hoang, T.-T., Nguyen, H.-T., Nguyen, X.-T., Pham, C.-K., Le, D.-H.: Highperformance DCT architecture based on angle recoding CORDIC and scale-free factor. In: The Sixth International Conference on Communications and Electronics (ICCE), July 2016 12. Aggarwal, S., Meher, P.K., Khare, K.: Area-time efficient scaling-free CORDIC using generalized micro-rotation selection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 20(8), 1542–1546 (2012) 13. Mokhtar, A.S.N., Reaz, M.B.I., Chellappan, K., Mohd Ali, M.A.: Scaling free CORDIC algorithm implementation of sine and cosine function. In: Proceedings of the World Congress on Engineering, WCE 2013, vol. II, July 2013 14. Meher, P.K., Valls, J., Juang, T.-B., Sridharan, K., Maharatna, K.: CORDIC circuits. In: Arithmetic Circuits for DSP Applications. Wiley (2017) 15. Hacha¨ıchi, Y., Lahbib, Y.: An efficient mathematically correct scale free CORDIC, June 2016. https://hal.archives-ouvertes.fr/hal-01327460. (submitted) 16. International Organization for Standardization. ITU-T Recommendation T.81. In ISO/IEC IS 10918-1, October 2017. http://www.jpeg.org/jpeg/

Signature of Electronic Documents Based on the Recognition of Minutiae Fingerprints Souhaïl Smaoui(&) and Mustapha Sakka Higher Institute of the Technological Studies of Sfax, Sfax, Tunisia [email protected], [email protected]

Abstract. This work presents a new approach of security of electronic documents. This approach has the advantage of integrating several hybrid technologies as the biometrics which is based on the recognition of the digital fingerprints, coding PDF417, the techniques of encoding, and the electronic signature of documents. This approach uses all the techniques previously illustrated to reinforce the security of signature and consequently the warranty of authentication of the signatory. The authentication is a task requested by several fields to ensure security and the iniquity of information. In our approach we chose the use of the techniques of recognition of digital fingerprints in order to ensure a high level of security and confidentiality. With this intention we prepared a database containing a list of digital fingerprints for a set of people. The classification was made with the Multi-Layer Perceptron (MLP) neural network. Keywords: Authentication

 Digital fingerprints  Minutiae  Neural network

1 Introduction A versatility of documents were created, exchanged or managed have met day after day in an imaginable way. The example most usually used being the secure transfer of documents which are also called signature of documents or digital signature. To present our work we chose the following structuring: The following section presents the problematic of this subject of research. Sections 3 and 4 are devoted to the state of art on work of recognitions of the digital fingerprints, this work were classified in two main approaches, their defects and advantages are discussed indeed. All along Sect. 5, there is focus on the main principle and the process of our approach. We have detailed the various stages which we used during our work. Section 6 is dedicated to the presentation of various experiments results. The last section comprises the conclusions and some reflections on the possible openings and an overview of our future work.

© Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 294–302, 2020. https://doi.org/10.1007/978-3-030-21005-2_28

Signature of Electronic Documents Based on the Recognition

295

2 Problematics During information exchange or documents, certain applications use security standards like the cryptography and the coding of information. But, in these last presentations sometimes it exists gaps in keeping with the authentication of the transmitter of the document and the confidentiality of the transactions: How can one make sure of the originality of the messages and ensure their transmission without running risks? The difficulty of the solutions suggested for this problem comes from the gathering of methods of authentications, coding, encodings, and biometrics. It is in this context that this study in which we will present our solution of signature documentation. This solution rests on a hybrid approach crypto-biometric based on the recognition of the digital fingerprints. The process of recognition of the digital fingerprints is a relatively complex task when it is about an automatic treatment based on a machine deprived of any intelligent and human reasoning. This task can fail with the appearance of new circumstances knowing the darkness, the intense luminosity, the variation of position, etc. A change in the digital fingerprint acquisition conditions generates an enormous modification on the data which would be stored about its owner and which would affect the performance of the monitor in turn. The recognition of the digital fingerprints is not obvious for various reasons such as the various orientations of prints (Fig. 1), the variations of positions (Fig. 2), the physical variations (Fig. 3), without forgetting that a variation of the luminosity can complicate this task (Fig. 4).

Fig. 1. Various orientations of prints.

Fig. 3. Physical variation.

Fig. 2. The variation of positions.

Fig. 4. Variation of luminosity

296

S. Smaoui and M. Sakka

For the objective to resolve these problems many techniques were taken into consideration.

3 Techniques of Recognition of Digital Fingerprints In literature one can count two main methods of recognition of digital fingerprints: geometrical and global methods [13, 14]. 3.1

Global Methods

These methods are founded primarily on pixel values where the image is treated in block. Global methods are generally based on a phase of training where techniques like the Support Vector Machines (SVM), Neural Networks, etc. can be used. If these methods are simple and plain showing an important rate of recognitions, nevertheless, we can reproach it to the slowness of the training phase [4, 5]. Up to a point, in the global methods, we are able to distinguish the stochastic methods, and the parametric methods, as well as the non-parametric methods. The stochastic methods are based on the sweeping of from top to bottom image. The characteristics appear in natural order, based on this fact it would be modeled in a practical way by using a hidden model of Markov (HMM: Hidden Markov Model) [11]. This model encounters the problems of sight perception and identification of images in the beginning. The parametric methods make an assumption concerning the analytical form of the searched probability distribution, and estimating the distribution parameters relying on the given data. Further, non-parametric methods: They do not make any assumption on the distribution of data training contrary to the parametric methods [8, 11]. 3.2

Geometrical Methods

These methods make it possible to supplement obtaining the “signature” of the print. From an image of the beforehand treated print, an operator has the ability to extract it thanks to various algorithms, a structure of data (or signature) [1]. The selected signature used to characterize the print is based on a sufficient and reliable complete minutiae, approximately 14. It is, then, possible to identify a print among several million specimens. Generally, each minutiae occupies approximately a space of 16 bytes (Fig. 5).

Fig. 5. Extraction and digitalization of the minutiae

Signature of Electronic Documents Based on the Recognition

297

During the process of extraction, initially 100 minutiae are detected on average, among which approximately 60% correspond to false minutiae which will be identified during a specific process. Furthermore, we extract forty real minutiae of the print. The number of minutiae is definitely higher than 10, which increases reliability. Moreover, this figure is far from the total of detected minutiae; let’s suppose that having preserved only the most reliable one and eliminated the erroneous minutiae, which could have deteriorated the behavior of the system. [10] As a result, this system identifies the owner of a fingerprint with reasonable accuracy and rejects it in case of questionable output. The system presented was divided into three stages [15]: Pre-treatment of an image of digital fingerprint. Extraction of digital fingerprint devices, and classification for decision [16]. This approach has two cons: physical and position variation.

4 Adopted Approach Our approach, represented by Fig. 7, starts with the registration of the applicant’s signature as it contains pairs of key private and public, acting at first as insurance to the registration of the customer. After authentication, the customer can make interventions on the document such as: The coding of the document: It is ensured by the integration of a bar code 2D PDF417 [3] in the content of the signed document. CodePDF417 contains the identity and the digital fingerprint of the customer signatory (Fig. 6).

Fig. 6. Bar code PDF417

The signature of the document is ensured by the secret key of the customer freed after the phase of decision. It electronically guarantees, after signature, that the document is non-modifiable and that it is signed by only one customer who is the only person in charge. The phase of checking the signature and coding is ensured by the applicant since it has the public key of each customer and consequently it can ensure that such customer is the signatory of the document. Figure 7 represents the process of the formulation of the method.

298

S. Smaoui and M. Sakka

Fig. 7. Process of signature of documents

The phase of recognition, we chose a non-parametric approach for which there exist only two alternatives: the non-parametric estimate of density function and the nonparametric estimate of the ranking function. The first approach was frequently used in literature. The second includes the methods of classification by the neural network [2, 6], graphs of induction [7], etc. We wanted to make a better compromise between the processing time and the relevance of analysis. This is why we chose the use of data storage technique which is based on neural network. In our field, the more recommended neural network is the PMC type, which enables us to save time for both the training and the classification of the prints. Among the strengths of the PMC is that is possible to detect the convenient period where the algorithm of training is not capable any more to perform which makes it possible to optimize the training time. For the development of our solution of recognition, we proceeded it into three stages: The first stage is devoted to the constitution of our corpus, like with the preparation of data for the phase of training. The image of digital fingerprint must be treated by the resolution of 500 PPP (Points Per Pixel) so that there exist at least 10 Pixel are between the edges. Equally important, this resolution is necessary for the stage of extraction of minutiae, since the central point of a digital fingerprint changes considerably individuals with another. We pass the normal image by a bank of the filters of Gabor. Each filter is carried out by generating an image of the filter for six angles (0, p/6, p/3, p/2, 2p/3 and 5p/6) (Fig. 8).

Fig. 8. Filter of Gabor

Signature of Electronic Documents Based on the Recognition

299

We obtained at the end of the treatment the diagram of Fig. 9.

Fig. 9. Extraction and digitalization of the minutiae

The second stage consists in finding a model of prediction. We unveiled in this stage, the importance of the neural network for such an application of recognition. The last stage is about the model validation. Thus, the construction of the training base is a significant component in a process of data mining. We work on a based impressed of the University of Bologna in order to test our process of recognition of prints which is useful for the authentication and consequently for the digital signature of the documents. The corpus is made up of 320 fingerprints resulting from 20 people. For each person 16 prints of size 92*112 were taken with various poses and luminosity. To reduce data to be processed by the PMC during the phase of training, we resized the images by using the bi-cubic method. This method has the advantage that it uses a low-pass filter which lets the passage of the low frequencies and attenuates the high ones before the interpolation in order to decrease crenellations. Reducing the size of the base makes it possible to accelerate the process of training and to increase the reliability of the obtained classification. During the implementation, we chose the reduction of the size of the data of 40% compared to the initial state. This threshold ensures a high performance as well at the level of the training phase as on the phase’s test level. Let us note that the reduction threshold was given after a series of intensive experiments. Our objective is to identify the most distinguished model throughout the use of storage detailed techniques. We proposed indeed, to classify the prints by an approach supervised by using a PMC neural network type. The training is carried out on 80% of the base of prints; the rest of the base will be reserved to the evaluation of the classifier. To launch the process of training, we prepared the vector as well as the necessary neurons made up by the vectors corresponding to the prints which we aim to learn. Thereafter, the network starts to seek the model of ranking by the diffusion of the classification results towards the neurons of exit by means of hidden neurons which are parts of the strong points of the PMC and are also able to support any type of nonlinear function. In our approach we chose the use of the sigmoid function of activation. This function is the most common in the construction of the networks of artificial neurons.

300

S. Smaoui and M. Sakka

Example of sigmoid function: uðvÞ ¼

1 1 þ expðvÞ

Our learning technique is based on the algorithm of retro progation of the gradient. We used method TRAINCGP suggested by Polakand Ribière [9] which is specified by its optimized algorithm compared to the standardized methods. Let us remember that hidden layers number plays a significant impact in the neural network structure, for this reason we applied test decks in order to determine a threshold which is articulated around the square root amongst data by vector of entries. Additionally, The experiments carried out show a degradation on the level of the performance of classification by using a number of hidden layers lower than the threshold, whereas with a number higher than this threshold we noted a slowness on the level of the training phase without enhancing the performance. The final step in our approach is the validation and evaluation of the model that will be defined in the next section.

5 Experiments This section shows the various experiments carried out with transfer function logsig, the transfer in order to find the model of prediction which would be the most relevant and significant for our application. Our experimentation is based on a base of training made up of 288 images and a basis of test of 32 images (Table 1). Table 1. Results of experiments. Time of training 203 S Time of test 7S Rate of success 96% Rate of failure 4%

It is to mention that our proposed approach in this work did reach 96% of true finger print recognition, which is an important rate specially in comparison with several proposed techniques. In fact, Sangram Bana and Davinder Kaur [17] proposed the KNN classifier to recognize finger print with a rate of 70%, S. Uma Maheswari and his colleagues [18] did ameliorate the rate to reach 94.39% using Minutiae features and SVM classifier, A.T. Gowthami [19], also realized almost the same rate 94.32% by exploring Zone Based Linear Binary Patterns with neural network classifier, also Thai and his colleagues [20], proposed to use standardized fingerprint model in order to ameliorate the rate of recognition. Hence, we can conclude that our approaches are promoting for recognizing finger print within a high level of precision.

Signature of Electronic Documents Based on the Recognition

301

6 Conclusion and Prospects All along our application we introduced and nominated our approach of document’s signature which is mainly based on fingerprint recognition using a data mining approach. The output exhibits how would be our approach empirical and practical and deserves to be well-recognized and valorized. However, the performance of our method can be enhanced by integrating new biometrics techniques. We also intend to study other classification methods namely Bayesian networks, inductions graphs, SVM, etc.

References 1. Isobe, Y., et al.: A Proposal for authentication system using a smart card with fingerprints, information processing society of Japan SIG Notes 99-CSEC-4 99(24), 55–60 (1999) 2. Wu, J.K.: Neural networks and simulation methods. Editeur CRC Press dec 1993, ISBN 08247-9181-9 3. AIM Europe Uniform Symbology Specification PDF417, published by AIM Europe, 1994 [Spécification relative à la symbologie PDF 417 uniforme de l’AIM Europe, publiée par l’AIM Europe, 1994] 4. Labati, R.D., Piuri, V., Scotti, F.: A neural-based minutiae pair identification method for touch-less fingerprint images. Computational Intelligence in Biometrics and Identity Management (CIBIM), 2011 IEEE Workshop, 96–102 5. Kristensen, T., Borthen, J., Fyllingsnes, K.: Comparison of neural network based fingerprint classification techniques, 1043–1048 6. Djarah, D.: Application des réseaux de neurones pour la gestion d’un système de perceptron pour un robot mobile d’intérieur. Thèse préparer au laboratoire d’électronique avancée(LEA) Batna 7. Zighed, D.A., Rakotomala, R.: A method for non arborescent induction graphs. Technical report, Laboratory ERIC, University of Lyon2 (1996) 8. Dreyfus, G., Personnaz, L.: Perceptrons, past and present. Organisation des systèmes intelligents (1999) 9. Grippo, L., Lucidi, S.: Convergence conditions, line search algorithms and trust region implementations for the Polak-Ribière conjugate gradient method. Optim. Methods Softw. 20(1), 71–98 (2005) 10. Maio, D., Maltoni, D., Cappelli, R., Wayman, J.L., Jain, A.K.: FVC2004: Third Fingerprint Verification Competition, Proc. International Conference on Biometric Authentication (ICBA), 1–7, Hong Kong, July (2004) 11. Samaria, F.S., Harter, A.C.: Parameterization of a stochastic model for humanface identification. IEEE Comput. Soc. Press, 138–180 (1994) 12. Sakka, M., Smaoui, S.: Signature de documents électroniques basée sur la reconnaissance des empreintes digitales, JRST 2015 Sfax-Tunisie. 2015 Sfax-Tunisie 13. Loussaief, S., Adelkrim, A.: Machine learning framework for image classification, SETIT, 58–61 (2016) 14. Manel, B.S., Karim, S.E., Bouhlel, M.S.: Anomaly detection in hyperspectral images based spatial spectral classification, SETIT, 166–170 (2016) 15. Zouari, J., Hamdi, M.: Enhanced fingerprint fuzzy vault based on distortion invariant minutiae structures, SETIT, 491–495 (2016)

302

S. Smaoui and M. Sakka

16. Smari, K., salim Bouhlel, M.: Gesture recognition system and finger tracking with kinect: Steps, SETIT, 544–548 (2016) 17. Bana, S., Dr. Kaur, D.: Fingerprint recognition using image segmentation. Int. J. Adv. Eng. Sci. Technol. (IJAEST), 12–23 (2011) 18. Maheswari, S.U., Dr. Chandra, E.: A novel fingerprint recognition using minutia features. Int. J. Adv. Eng. Sci. Technol. (IJEST) 19. Gowthami, A.T., Dr. Mamatha, H.R.: Fingerprint recognition using zone based linear binary patterns. Procedia Computer Science 58, 552–557 (2015) 20. Thai, L.H., Tam, H.N.: Fingerprint recognition using standardized fingerprint model. IJCSI Int. J. Comput. Sci., 11–17 (2010)

Person Re-Identification Using Pose-Driven Body Parts Salwa Baabou1,4(&), Behzad Mirmahboub2, François Bremond3, Mohamed Amine Farah4, and Abdennaceur Kachouri4 1

University of Gabes, National Engineering School of Gabes, Gabes, Tunisia [email protected] 2 Pattern Analysis and Computer Vision (PAVIS), Italian Institute of Technology, Genoa, Italy 3 INRIA Sophia Antipolis Mediterranee, Biot, France 4 University of Sfax, National Engineering School of Sfax Laboratory of Electronics and Information Technology (LETI), Sfax, Tunisia

Abstract. The topic of Person Re-Identification (Re-ID) is currently attracting much interest from researchers due to the various possible applications such as behavior recognition, person tracking and safety purposes at public places. General approach is to extract discriminative color and texture features from images and calculate their distances as a measure of similarity. Most of the work consider whole body to extract descriptors. However, human body maybe occluded or seen from different views that prevent correct matching between persons. We propose in this paper to use a reliable pose estimation algorithm to extract meaningful body parts. Then, we extract descriptors from each part separately using LOcal Maximal Occurrence (LOMO) algorithm and Cross-view Quadratic Discriminant Analysis (XQDA) metric learning algorithm to compute the similarity. A comparison between state-of-the-art Re-ID methods in most commonly used benchmark Re-ID datasets will be also presented in this work. Keywords: Person Re-Identification (Re-ID) LOMO features  XQDA algorithm

 Pose-driven body parts 

1 Introduction The emergence of person Re-ID is related to the increasing demand of public safety and the widespread of large camera networks. From this perspective, the task of person Re-ID is the process of recognizing and identifying a person between several non overlapped camera views. The images in two cameras are called “probe” and “gallery” sets in which we are looking for probe images between gallery images. It has important applications in surveillance systems and can reduce human labor and errors of human matching. However, the major challenge of person Re-ID is how to make a correct match between two images of a same person with intensive appearance changes as lighting, pose and viewpoint changes. The problem of localizing keypoints or parts of human body is known as human pose estimation which consists in finding or extracting body © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 303–310, 2020. https://doi.org/10.1007/978-3-030-21005-2_29

304

S. Baabou et al.

parts of individuals. However, this task presents a set of challenges by its own: (i) an image may contain one or more persons that can occur at any time or position. (ii) this interaction between persons may lead to complex interference which make association of parts difficult. (iii) making realtime performance is a challenge due to runtime complexity which increases with the number of persons present in the image or scene; i.e. the more people there are, the greater the computational cost is. In the literature, there are many approaches that focus to extract body parts of individuals [1–3]. The main contribution of this paper is: Extracting body parts from iLIDS-VID [10], PRID-2011 [11], MARS [12] and our own dataset called CHU-Nice using OpenPose [6] which is a pose estimation algorithm that detects 15 body joints. From these body parts, we extract LOMO features and then compute the similarity using XQDA algorithm [4]. Then, we compare our work to some recent state-of-the-art Re-ID approaches. Our paper is organized as follows: Section II is the core of the paper: it presents body parts extraction: We extract the features from those body parts using LOMO method and then we compute the distance using XQDA algorithm between those descriptors. In section III, we present some commonly used Re-ID datasets that we will use to evaluate and compare our approach to some recent Re-ID approaches. Finally, we finish by drawing the conclusion.

2 Related Work Person Re-ID is considered as a retrieval problem that aims to identify and recognize an unknown person (probe) among a set of unknown persons (gallery) between nonoverlapped camera views. There are two principal categories of person Re-ID approaches: i) methods based on modeling the signature using the extraction of features from different human body parts, or ii) methods based on matching function by learning a model that minimizes intra-class variance and maximizes inter-class variance of signatures. Moreover, authors in [16] proposed a division of the body into three regions which are the head, the torso and the legs. Besides, in [15], authors presented a pre-learned model to detect the disposition of different body parts. Khan et al. [14] introduced a three stripes division of the body (head, torso and legs) with the size of 16, 29 and 55 respectively. Another division method [7] presented six parts of the human body which are the head, upper and lower legs, upper and lower torso and feet. These cited approaches consider that the different body parts are informative to learn a robust appearance signature model of persons.

3 Proposed Method Figure 1 illustrates our proposed framework which consists of five steps. We begin by detecting the body joints of persons using pose estimation algorithm. Then, we extract the body parts. From these latter, we extract LOMO features and we evaluate the

Person Re-Identification Using Pose-Driven Body Parts

305

similarity by computing the distances between the descriptors extracted by using XQDA metric learning algorithm.

Fig. 1. An overview of our proposed framework

3.1

Body Parts Extraction

OpenPose [5] is a pose estimation algorithm that detects 15 body joints as shown in Fig. 1(a). An example of pose estimation result on MARS dataset [12] is shown in Fig. 1(b). We used joint positions to define 12 body parts as shown in Fig. 1(c). Our idea is to extract image descriptor from each part and calculate their distances separately. Distance between two images can be computed by weighted average of all distances between body parts (Fig. 2).

Fig. 2. Human body joints and parts (a) Body joints that are detected with pose estimation algorithm (b) An example image from MARS dataset with estimated pose (c) Different body parts that we defined for feature extraction.

LOMO [4] is a descriptor for person Re-ID that divides each image into horizontal bands and finds the maximum bins of color and texture histograms in each stripe. We modified this code to use it on body parts. After extracting LOMO features, next step is to compare probe feature vector xi with gallery feature vector xj, find their similarity and calculate their distances in order to find a correct match between gallery and probe images. Different metric learning methods are proposed in literature but in our work we will use the XQDA metric learning algorithm.

306

3.2

S. Baabou et al.

LOMO Feature Extraction

LOcal Maximal Occurrence (LOMO) [4] is a feature extraction method that aims to compensate illumination variations and viewpoint changes between two cameras. In fact, by applying Retinex algorithm [18], it pre-processes person images in order to provide an image that is efficient to human observation of the scene especially in shadowed regions. Figure 3(a) shows an example of original and processed images of the same person across two cameras. Person Re-ID is easier when using Retinex images instead of original images. After adjusting the illumination, HSV color histogram and Scale Invariant Local Ternary Pattern (SILTP) [19] texture descriptor are extracted from images.

Fig. 3. An illustration of the LOMO feature extraction method [14]

Figure 3(b) shows the LOMO scheme to address view point changes between two cameras. A sliding window with size of 10  10, with a step of 5 pixels, locates local patches in 128  48 images. All sub-windows at the same horizontal position are checked and maximum values between all corresponding bins are selected to produce only one histogram for each row. The above feature extraction procedure is repeated for two additional scales by down sampling the original image using 2  2 local average pooling operations (that produces 64  24 and 32  12 image size) to consider the multi-scale information. All the computed local maximal occurrences are concatenated to get the final descriptor which has (8  8  8 color bins + 34  2 SILTP bins) (24 + 11 + 5 horizontal groups) = 26,960 dimensions. 3.3

XQDA Algorithm

Usually a low dimensional space is preferred for classification especially because of the large dimension of original features.

Person Re-Identification Using Pose-Driven Body Parts

307

Authors in [4] proposed Cross-view Quadratic Discriminant Analysis (XQDA) algorithm which is an extension of KISSME method [17]. The idea is to learn a lower dimensional subspace W that original features are mapped to it and to learn a distance function to measure the similarity in that subspace for different cameras. For this purpose, the distance function is modified as: dw ðxi ; xj Þ ¼ ðxi  xj ÞT Wð

X0 1 I



X0 1 E

ÞW T ðxi  xj Þ

ð1Þ

P0 P0 P P where I ¼ W T I W and E ¼ W T E W. P0 P0 Therefore, a kernel matrix Mw ¼ Wð I1  E1 ÞW T is to be learned. On the other hand, the goal is to find a projection direction w that increase extra-personal P variance rE ðwÞ and decrease intra-personal variance rI ðwÞ. Since rE ðwÞ ¼ wT E w P ðwÞ and rI ðwÞ ¼ wT I w, then maximizing the objective rrEI ðwÞ is relative to the Generalized Rayleigh Quotient: JðwÞ ¼

P wT E w P wT I w

ð2Þ

P P Thus, the largest eigenvalue of 1 I E is the maximum value of JðwÞ, and the solution corresponds to the eigenvector w1 .

4 Datasets and Performance Evaluation 4.1

Datasets

The commonly used datasets for person Re-ID are summarized in Table 1 [6]. We evaluated our work using four challenging datasets: PRID-2011, iLIDS-VID, MARS and on our own dataset CHU- Nice. Table 1. Summary of some widely used datasets from image- and video-based Person Re-ID [6] Datasets ViPER iLIDS GRID CAVIAR PRID2011 CUHK01 CUHK02 CUHK03 RAiD PRID450S

#ID #Image #Distractors 632 1,264 0 119 476 0 1025 1,275 775 72 610 22 934 1,134 732 971 3,884 0 1,816 7,264 0 1,467 13,164 0 43 1,264 0 450 900 0

#Camera 2 2 8 2 2 2 10 (5pairs) 10 (5pairs) 4 2 (continued)

308

S. Baabou et al. Table 1. (continued) Datasets Market-1501 ETHZ 3DPES iLIDS-VID MARS DukeMTMC-reID DukeMTMC4ReID

#ID 1,501 148 192 300 1,261 1,812 1,852

#Image #Distractors #Camera 32,668 0 6 148 0 1 1,000 0 8 600 0 2 20,715 0 6 36,441 408 8 46,261 439 8

• PRID-2011 [11] is a multi-shot dataset captured by two cameras in outdoor environment. Camera A captures 385 persons and camera B captures 749 persons. Only the first 200 persons are common between two cameras. Each person in each camera has 5-675 consecutive frames. • iLIDS-VID [10] consists of 300 IDs of persons and each identity has 2 image sequences, totaling 600 sequences having a length that varies from 23 to 192. Both testing and training set have 150 identities. • MARS [12] (Motion Analysis and Re-identification Set) is an extended version of Market1501 dataset. It contains 1261 persons with about 1.19 million images and 3248 distractors. • CHU Nice is a dataset collected from the hospital of Nice (CHU), France. Most of the persons recruited were elderly, aged 65 and above. It contains 615 videos with 149365 frames. It is also an RGB-D dataset, i.e. it provides RGB + Depth images. 4.2

Performance Evaluation

The Cumulative Matching Characteristic (CMC) curve is a common metric for evaluation of the performance of person Re-Identification algorithms. It demonstrates percentage of correct matches that are located below a specific rank. However, when multiple ground truth exist in the gallery and inspired from the assumption that a perfect Re-ID system should be able to return all true matches to the user, the mean Average Precision (mAP) is proposed for evaluation. This latter allows to know whether most of the matched gallery images have been ranked high in the output of ReIdentification ranking or not. In the case of the Market-1501 dataset, mAP and CMC are used together for evaluation. In our case, we also used the CMC and mAP together as evaluation metrics for our experiments. In Table 2, we compare our proposed method ([5] + LOMO + XQDA [4]) with some other methods in the context of video-based datasets (iLIDS-VID, PRID-2011, MARS and CHU-Nice) as we are trying to propose our new dataset CHU-Nice which is a multi-shot Re-ID dataset. Three descriptors are compared, i.e BoW [7], HOG3D [8], LOMO with the metric learning algorithm XQDA [4] which is evaluated.

Person Re-Identification Using Pose-Driven Body Parts

309

Table 2. Rank-1 accuracy (%) with comparison to some approaches on the four databases is presented. Average pooling is used for iLIDS-VID and max-pooling for PRID-2011, MARS and CHU Nice datasets. Best results are highlighted in bold. Methods BoW + XQDA [5, 8] HOG3D + XQDA [5, 9] LOMO + XQDA [14] Ours ([6] + LOMO + XQDA [5])

iLIDS-VID 14.0 16.1 53.0 54.8

PRID-2011 31.8 21.7 – –

MARS 30.6 2.6 30.7 32.7

CHU-Nice – – 36.2 40.6

From the above results, we note that our proposed approach has achieved the best Rank-1 accuracy 54.8% in four datasets (for example: Rank-1 accuracy = 54.8% on iLIDS-VID dataset). However, we believe that the research on both image- and videobased person Re-ID still has to be improved in the future especially with the emergence of large-scale datasets and the great success of Convolutional Neural Network CNN system in computer vision.

5 Conclusion We proposed in this paper, to use a reliable pose estimation algorithm to extract meaningful body parts and then extract LOMO descriptors from each part separately and then compute the distances between those descriptors using XQDA metric learning algorithm as a measure of similarity. Preliminary experiments show some potentials of using pose estimation for Re-ID, but not as accurate as global signature. One shortcoming of our work may be that we relied on LOMO descriptor that is essentially designed for the whole image. Suitable descriptor such as deep features should be designed for body parts. In case of proper descriptor, part-based Re-Identification is promising to cope with the problem of pose and viewpoint variations. This work can also be extended to detect mid-level features or attributes (such as gender, long hair, jeans, t-shirt etc.) that are more reliable than low-level descriptors (such as gradients and histogram).

References 1. Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In ECCV (2016) 2. Ramakrishna, V., Munoz, D., Hebert, M., Bagnell, J.A., Sheikh, Y.: Pose machines: articulated pose estimation via inference machines. In ECCV (2014) 3. Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In CVPR (2016) 4. Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: CVPR (2015) 5. Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In “CVPR” (2017)

310

S. Baabou et al.

6. Gou, M.: Person re-identification datasets (2017). http://robustsystems.coe.neu.edu/sites/ robustsystems.coe.neu.edu/files/systems/projectpages/reiddataset.html 7. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person reidentification: a benchmark. In: CVPR (2015) 8. Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3dgradients. In: BMVC (2008) 9. Fendri, E., Frikha, M., Hammami, M.: Adaptive person re-identification based on visible salient body parts in large camera network. Comput. J. 60(11), 1590–1608 (2017) 10. Wang, T., Gong, S., Zhu, X., Wang, S.: Person re-identification by videoranking. In: Computer VisionECCV 2014, pp. 688703. Springer (2014) 11. Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Image Analysis, pp. 91102 (2011) 12. Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., Tian, Q.: Mars: a video benchmark for large-scale person re-identification. In: European Conference on Computer Vision, ECCV, pp. 868–884, Springer (2016) 13. Cheng, D.S., Cristani, M.: Person re-identification by articulated appearance matching. In: Person Re-Identification, pp. 139–160. Springer (2014) 14. Khan, A., Zhang, J., Wang, Y.: Appearance-based re-identification of people in video. 2010 Int. Conf. Digital Image Computing: Techniques and Applications (DICTA), pp. 357– 362. IEEE (2010) 15. Jaouedi, N., Boujnah, N., Htiwich, O., Bouhlel, M.S.: Human action recognition to human behavior analysis. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 263–266. IEEE (2016) 16. Das, A., Chakraborty, A., Roy-Chowdhury, A.K.: Consistent re-identification in a camera network. European Conf. Comput. Vision, 330–345. Springer (2014) 17. Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Largescale metric learning from equivalence constraints. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2288–2295. IEEE (2012) 18. Jobson, D.J., Rahman, Z.-U., Woodell, G.A.: A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans. Image Process. 6(7), 965–976 (1997) 19. Liao, S., Zhao, G., Kellokumpu, V., Pietikinen, M., Li, S.Z.: Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1301–1306. IEEE (2010)

High Securing Cryptography System for Digital Image Transmission Mohamed Gafsi1(&), Sondes Ajili1, Mohamed Ali Hajjaji1,2, Jihene Malek1,3, and Abdellatif Mtibaa1,4 1

Electronics and Micro-Electronics Laboratory, University of Monastir, Monastir, Tunisia [email protected] 2 Higher Institute of Applied Sciences and Technology of Kasserine, University of Kairouan, Kairouan, Tunisia 3 Higher Institute of Applied Sciences and Technology of Sousse, University of Sousse, Sousse, Tunisia 4 National Engineering School of Monastir, University of Monastir, Monastir, Tunisia

Abstract. In this paper, we propose a cryptography system for digital image high securing. Our method is asymmetric utilizing RSA algorithm, which requires a public key for encryption and a private key for decryption. However, the image is encrypted with a combination between the AES-256 CTR mode algorithm and the SHA-2 function. Our algorithm is evaluated by several tools and tests mainly selected from the image cryptography community using many types of standard non compressed images. The experimental and analytical results demonstrate that our encryption scheme provides a high robustness and security. It can resist the most known cryptanalysis attacks. Very good results are obtained, which allows us to confirm the high performance and efficiency of our algorithm for image high protection, which can be used in several domains like military and community privacy. Keywords: RSA

 AES  CTR encryption mode  SHA-2

1 Introduction Actually, digital communication networks have attained a formidable technology progress. Thus, this communication allows data transfer, exchange, storage as well as easy and fast exchanges for many services and applications around the world. Nevertheless, the liberty of access to this communication by anyone has increased the shrewdness of spying, hacking, falsification, illegal copy and utilization of digital multimedia documents such as images. As a result, the protection of image content has lately become an essential issue. In literature, many approaches have pronounced to ensure the images integrity, confidentiality, authentication and copyright by processing content that can be exchanged securely through public digital networks [1–3]. These services can be provided by cryptography. Cryptography in short-definition is an art to be written in coded language. The basic characteristics of cryptography are to make © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 311–322, 2020. https://doi.org/10.1007/978-3-030-21005-2_30

312

M. Gafsi et al.

data incomprehensible to others who have no right to read, take, use or copy it. This technique has been proved efficient for data protection. In this context, our current study aims to present a secure cryptography system for image protection. Our method is based on Rivest-Shamir-Adleman (RSA), Advanced Encryption System (AES), and a Secure Hash Algorithm (SHA-2). Our main objective consists in enhancing the protection and robustness of images against cryptanalysis attacks, while respecting the running time for real-time image processing application. This study is planned in five parts as follows: in Sect. 2, a review of related work of recent encryption algorithms is given. In Sect. 3, a preliminary study of AES, RSA, SHA-2, and counter encryption mode (CTR) are respectively described. The proposed encryption algorithm is given in Sect. 4. Evaluation and analysis study of the proposed algorithm are given in Sect. 5. Section 6 concludes the work.

2 Related Work In this section we recall diverse reviews of ciphering algorithms designed for digital images protection. In [4] Yong Zhang et al. proposed an image encryption system based on AES-128. Thus, the plain image was decomposed into block with size 128-bit, so, the first block was permuted by an initial vector. However, the blocks were encrypted sequentially using the AES-128 in CBC mode (Cipher Block Chaining). In [5] Unal Cavusoglu et al. suggested an image encryption algorithm using a chaos system based S-box. However, a chaos system was used to generate three sequences of a random numbers, then, an S-box was created. As a consequence, the image was encrypted using a XOR operation and a Sub-byte function. In [6] Shelza Suri et al. put forward an encryption approach to encrypt two images based on chaos and AES. There by, a first image was encrypted utilizing the AES and a second image was encrypted utilizing chaotic system. As a result, the obtained images were combined using Cramer’s rule. In [7] Akram Belazi et al. suggested an encryption system for image protection. Firstly, DWT was applied to the original image in order to acquire the approximation and detail sub-bands. Then, the approximation coefficients were encrypted using a block permutation based on chaotic tent map. Next, an S-box substitution method based on chaotic system and linear fractional transform is used to substitute the permuted band. Finally, the IDWT was applied to construct the ciphered image. In [8] the authors proposed an encryption algorithm for color image protection. It was based on elliptic curve and AES. Thus, a random numbers are generated using an elliptic curve, so, these numbers are used for generating three maskers to encrypt the three components red, blue and green of the image.

3 Preliminary Study Our cryptography system uses the AES, RSA, SHA-2, and CTR encryption mode to implement an encryption algorithm for images.

High Securing Cryptography System for Digital Image Transmission

3.1

313

AES

AES is one of the most known encryption algorithms for data protection. Invented in 1998 and proved in 2000 by NIST, AES has been widely deployed due to its high performance [9, 10]. In fact, AES provides high security and it is fast and easy to implement. Technically, AES is a symmetric encryption algorithm by block. Keys comprise 128,192 or 256 bits with a number of encryption rounds of 10, 12 and 14, respectively. Particularly, AES-256 proceeds with a plain data decomposed into blocks of 256 bits and key of 256 bits as well. Each block undergoes a sequence of four transformations repeated fourteen times in order to acquire the cipher block. A detailed study of AES can be found in [11]. 3.2

RSA

RSA cryptography system is one of the most practical cryptosystems for secure data exchange. This algorithm was patented by the Massachusetts Institute of Technology (MIT) in 1983, and has been widely used since then [12]. Thus, RSA provides high security way, wide portability and ease of use. Technically, RSA is an asymmetric encryption/decryption algorithm. It uses a pair of keys consisting of public key Ka to encrypt and private key Kb to decrypt data. Therefore, RSA provides two prime numbers, p and q, with an N-bit length for a both keys generation. However, the encryption is performed only by the public key using the following equation: Ci ¼ Mie mod ðnÞ

ð1Þ

Where Mi is the message, Ci is the correspondent cipher message, and (n, e) is the public key of destination. Data decryption is performed only by the private key using this equation: Mi ¼ Cid mod ðnÞ

ð2Þ

Where Mi is the decrypted message, Ci is the cipher message, and (n, d) is the private key of destination. The main advantage of this algorithm is that it allows sharing only one key, which is the public key, for data encryption by anyone of the RSA community. A detailed study of RSA can be found in [13]. 3.3

SHA-2

Hash functions are functions that get a finite arbitrary length of data D as an input argument and produce an output data digest Dd of a fixed length of bits. Table 1. SHA-2 functional characteristics. SHA-2 Max input message length Data digest size Block size processing Digest round number

SHA-256 264 256 512 64

SHA-384 264 384 1024 80

SHA-512 2128 512 1024 80

314

M. Gafsi et al.

In cryptography, these functions are used for several services such data integrity and authentication, password protection, pseudo-random number generation, digital signature and more others. Especially, the SHA-2 family is very used due to its high performance. Technically, they process the input data by a block for multiple rounds to finally generate a data digest. Table 1 shows the functional characteristics of the SHA-2 family and their description can be found in [14]. These functions enable the integrity of data such that a tiny change in the input data, with one bit, will cause a greatly significant change in the output. As a sequence, each data has its own data digest. 3.4

CTR Encryption Mode

Encryption based counter (CTR) is an encryption mode based on a counter function. Technically, it is the value of a counter function that is encrypted by an encryption algorithm and the result is added by a bitwise XOR operation to the original block data producing an encrypted block data with the same size. Encryption architecture is detailed in Fig. 1. The counter is a non redundancy function, nevertheless, a sequence used to encrypt will not be used again for CN times. Let saying that the counter function produces, on each time, a stream of N-bit count value. The total number of counting is: C N ¼ 2N

ð3Þ

Thus, by encrypting these values, we have CN dissimilar encryption keys. This mode has various advantages as it is very speedy, it allows no propagation error, the different values can be pre-computed, there is indiscriminate access to any datum for encryption as much for decryption. A detailed study of CTR encryption mode can be found in [15].

Fig. 1. General architecture of CTR encryption mode.

4 Proposed Cryptography System In this section, we describe our cryptography system for digital image protection. The algorithm is an asymmetric encryption technique, so, it involves a public key for encryption and a private key for decryption. As a consequence, each place is disposed

High Securing Cryptography System for Digital Image Transmission

315

of an encryption and decryption systems. However, the first system is used for image encryption processing, so, the second system is used for plain image extraction processing. 4.1

Encryption System

The encryption system is used for image encryption processing. It requires a plain image and the public key of the destination. The system performs a set of organized mechanisms in order to acquire the ciphered image and the ciphered key. The general architecture is presented in Fig. 2. Hence, this mechanism enables encrypting the image using a combination between an SHA-2 algorithm and AES-256 CTR mode.

Fig. 2. General architecture of encryption system.

Thus, AES requires an initial key Ki for key expansion, so Ki is generated by the SHA-256. On the other hand, the image is decomposed into blocks of 32 bytes. As a consequence, the used encryption mode is the CTR. Thus, a counter function is utilized for generating 256-bit count values, after that, it is encrypted by the AES-256 mechanism. Indeed, the initial key Ki of AES is encrypted using the RSA algorithm, which engages the safety asymmetric technique of our algorithm. This permits making a key Ka for encryption and a key Kb for decryption. However, key Ka of the destination is used to encrypt the initial key in the emission place utilizing the Eq. (1). In the destination, private key Kb is only used for decrypting the AES initial key, using the Eq. (2), to finally decrypt the received image. 4.2

Decryption System

This system is used to extract the plain image. Practically, it uses the reverse algorithm, according to the encryption algorithm, the general architecture is depicted Fig. 3. The system starts by decrypting the encrypted key by the RSA. This step enables finding the plain key. Next, the system performs the AES-256 CTR decryption mode in order to extract the plain image.

316

M. Gafsi et al.

Fig. 3. General architecture of decryption system.

5 Evaluation and Security Analysis To validate our proposed encryption algorithm, we evaluate it using a lot of selected tools for several standard images, which are the most used in the image cryptography community. This section includes statistical analysis, key analysis and algorithm speed. 5.1

Statistical Analysis

For high security, the plain image and its corresponding encrypted image must have little or no statistical similarity between them. Statistical analysis can be achieved using image Entropy (E) Normalize Correlation (NC) Peak Signal to Noise Ratio (PSNR) and correlation coefficient (q) tools described respectively as follows: E ðI Þ ¼

XN i¼1

½PðIi Þlog2 ðPðIi ÞÞ

  xij  x yij  y ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi NC ¼ r 2  2  Pm;n  x yij  y i;j xij  

ð4Þ

Pm;n  i;j

ð5Þ

Where x and y the means of the variable x and y, respectively.  PSNR ¼ 10  log10

L MSE

 ð6Þ

Where L is the peak gray scale value and MSE is the Mean Square Error. qðx; yÞ ¼

Covðx; yÞ rx  ry

ð7Þ

High Securing Cryptography System for Digital Image Transmission

317

Where Cov(x, y) is the covariance, rx and ry are the standard deviations of variable x and y, respectively. As a consequence, we evaluate the algorithm for all selected images. For each image, we produce the correspondent ciphered image utilizing the cipher system. Afterwards, we analyze it by computing the PSNR, NC and entropy values (Table 2) and displaying its histogram, before and after treatment (Table 3). Table 2. PSNR, NC, and E values of encrypted image. Image Lena Peppers Baboon

PSNR NC 6.143 0.0027 6.762 −0.0019 6.109 −0.0134

E 7.99935 7.99929 7.99931

As a result, it is visually remarkable that there is no relationship between the original image and the correspondent encrypted image. Yet, the latter has a regular gray scale distribution. This demonstrates the effectiveness of our cipher algorithm that equalizes the gray scale of the plain image. Furthermore, the entropy value of each image is close to 8; i.e., the probability of accidental information leakage is nearly zero. Yet again, PSNR and NC have a very weak value; i.e., the images are greatly different. Table 3. Histogram of the original image and its correspondent encrypted image.

O. Image

H

C. Image

H

We also analyze the images by the correlation coefficient of adjacent pixels. Nevertheless, we randomly select 2000 pairs of adjacent pixels from the original image and it correspondent ciphered image. Then, we calculate the correlation coefficient in

318

M. Gafsi et al.

the horizontal, vertical and diagonal directions. Table 4 shows the distributions of the selected adjacent pixels of the original and cipher Lena image, respectively. The simulation results are given in Table 5. Table 4. Distribution of 2000 pairs randomly selected adjacent pixels for Lena image.

Image

Horizontal

Vertical

Diagonal

Plain Lena

Cipher Lena

Regarding the aforementioned results, our cipher design eliminates the correlation of adjacent pixels in the plain image and makes a cipher image with no correlation. Table 6 gives a comparative study of the PSNR, entropy and correlation coefficient tools with the recent work cited in [4, 7, 16]. Table 5. Correlation coefficient of the image and its correspondent ciphered image. Image Lena

Status H V Plain 0.9791 0.9870 Cipher −0.1242 0.0027 Peppers Plain 0.9861 0.9883 Cipher 0.00243 −0.0475 Baboon Plain 0.9145 0.9026 Cipher 0.0079 −0.0150

D 0.9501 0.0022 0.9830 0.0025 0.9507 0.0019

Table 6. Comparative study of PSNR, entropy, NC, and q for encrypted Lena image. Work Ref. [4] Ref. [7] Ref. [16] Our work

PSNR _ _ 10.04 6.143

E qH qV qD 7.99943 0.0495 0.0008 −0.0050 7.90303 −0.0294 −0.0014 −0.0180 7.75970 0.0591 0.0508 0.0480 7.99935 −0.1242 0.0027 0.0022

High Securing Cryptography System for Digital Image Transmission

5.2

319

Key Analysis

To evaluate the strength of our cipher scheme against differential hackers, we can use the NPCR and UACI tests [17]. In this part, we describe the key space and the key sensitivity. NPCR ¼

1X Dði; jÞ  100% S

ð8Þ

1 X jd j  100% S G

ð9Þ

UACI ¼

Key space. The key space of an encryption algorithm must be large to resist the bruteforce attack. In our method, the AES-256 CTR provides 2256 dissimilar keys. Moreover, the size of Ki is 32 bytes, so the system has a 2512 key space in total. Certainly, the key brute-force attacks are infeasible. Key sensitivity. For high security encryption, our encryption algorithm must be sensitive to the plain image. Key sensitivity can be achieved by using NC, NPCR, and UACI tests. Table 7. NC, NPCR, and UACI tests applied on the plain image. Image NC Lena 0.0020 Peppers −0.0019 Baboon 0.0045

NPCR 99.6315 99.7531 99.7066

UACI 33.8300 33.6432 33.8092

However, we perform the tests using two images I1 and I2, where the latter is different by one bit from the first. Here, the change is selected randomly. After encryption, we try decrypting the two images by a wrong key, so, each image is decrypted by a wrong key which is different by one bit from the correct key. The key sensitivity test for Lena image is presented in Fig. 4, and the simulation results for all images are introduced in Table 7.

Fig. 4. Key sensitivity test, (a) encrypted original Lena, (b) encrypted modified Lena, (c) difference between (a) and (b), (d) decrypted image (a) by wrong key Ki1, (e) decrypted image (b) by wrong key Ki2, and (f) difference between (d) and (e).

320

M. Gafsi et al.

Regarding the results, it is clear that our encryption design is very sensitive to tiny change in the plain image. The NC value between the two encrypted images is very weak; i.e., no similarity coefficient between them. In addition, the difference between them is another image. Moreover, the NPCR and UACI percentage have high values; i.e., the encrypted images are greatly different. As a consequence, our cipher design can resist differential attacks. 5.3

Encryption Algorithm Speed

In real-time image processing, it is very important to design a speedy encryption algorithm [18, 19]. In a software implementation, the execution speed of the algorithm mainly depends on the performance of the CPU. However, we can use the approximate formulas (10) and (11) for computing the time and the number of cycles per byte taken by an encryption algorithm running on a specific CPU. S¼

DS MB=s T

CpB ¼

CpS S

ð10Þ ð11Þ

Where S is the speed, DS the Data Size, T the Time, CpB Cycle per Byte, and CpS the Cycle per Second. Table 8. Comparative study of encryption algorithm speed. Method Ref. [5] Ref. [8] Ref. [15] Our method

Tools Chaos system Elliptic curve + AES Chaos system + DWT AES-CTR + SHA-2

CPU S (MB/s) CpB _ 0.04 70000 Core-i7 2.8 GHz 0.002 1400000 Core-i7 2.8 GHz 0.019 147368 Core-i7 3.4 GHz 0.38 8947

Our algorithm is implemented using the MatLab R2016a software running on personal computer based CPU core-i7 3.4 GHz speed. As a consequence, we propose to compare our encryption algorithm with those in [5, 8, 15] using the gray scale Lena image with size 256 * 256. The comparison is presented in Table 8. The encryption algorithms in the comparative study are based on AES, elliptic curve, and chaotic systems. Thus, it is clear that our ciphering scheme is faster to them.

6 Conclusion In this work, we have proposed a secure cryptography system for digital image high protection. The image is encrypted with the combination between AES-256 in CTR mode and the SHA-2 function, which enables high protection and robustness. For asymmetric encryption/decryption method, the AES initial key is shared using the RSA

High Securing Cryptography System for Digital Image Transmission

321

algorithm. The evaluation and analysis results demonstrate that our proposed algorithm allows a high robustness and security. It can resist to most known cryptanalysis attacks. The comparative study with recent work indicates that our algorithm provides very good results. However, it is extremely adapted to image protection for a high secure transmission. For further work, we shall develop our proposed method to ensure even more the protection of the image against the most known cryptanalysis attacks. However, designing a mechanism for dynamically changing the S-box values of AES it is very interesting. Also, the utilizing of an advanced secure hash function is very important, like the SHA-3 family.

References 1. Banavath, D., Srinivasulu, T.: Multimedia cryptography- a review. In: International Conference on Power, Control, Signals and Instrumentation Engineering. IEEE, India (2017) 2. Samaher, J., Ibrahim, S.: A hybrid image steganography method based on genetic algorithm. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE, Hammamet, Tunisia (2016) 3. Med Karim, A., Ali, K., Med Salim, B.: A chaotic cryptosystem for color image with dynamic look-up table. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE, Hammamet, Tunisia (2016) 4. Yong, Z., Xueqian, L.: A fast Image encryption scheme based on AES. In: 2th International Conference on Image, Vision and Computing. IEEE, Chengdu, China (2017) 5. Unal, C.: Secure image encryption algorithm design using a novel chaos based S-Box. Chaos, Solitons Fractals 95, 92–101 (2017) 6. Shelza, S.: An AES-chaos based hybrid approach to encrypt multiple images. In: Advances in Intelligent Systems and Computing 555. Springer (2017) 7. Akram, B.: Chaos-based partial image encryption scheme based on linear fractional and lifting wavelet transforms. Opt. Lasers Eng. 88, 37–50 (2017) 8. Toughi, S.: An image encryption scheme based on elliptic curve pseudo random and advanced encryption system. Sig. Process. 141, 217–227 (2017) 9. Yuhang, X., Min, L.: Chaotic-map image encryption scheme based on AES key producing schedule. In: Third International Conference on Data Science in Cyberspace. IEEE (2018) 10. Yashasvee, J., Kulveer, K.: Improving image encryption using two-dimensional logistic map and AES. In: International Conference on Communication and Signal Processing. IEEE, India (2016) 11. FIPS PUB 197: Advanced Encryption Standard (AES). Computer Security Standard, Cryptography (2001) 12. Jeba Nega, C.: An innovative encryption method for images using RSA, honey encryption and inaccuracy tolerant system using Hamming codes. In: International Conference on Computation of Power, Energy, Information and Communication. IEEE, India (2017) 13. FIPS PUB 186-4: Digital Signature Standard (DSS). Computer Security Standard, Cryptography (2013) 14. FIPS PUB 180-2: Secure Hash Signature standard (SHS). Computer Security Standard, Cryptography (2001) 15. Helger, L.: CTR-Mode Encryption. ResearchGate (2001)

322

M. Gafsi et al.

16. Khalaf, A.: Fast image encryption based on random image key. Int. J. Comput. Appl. 134(3), 0975–8887 (2016) 17. Yue, W.: NPCR and UACI randomness tests for image encryption. J. Sel. Areas Telecommun. Cyber Journals (2011) 18. Kaouther, G.: Workflow for multi-core architecture: from matlab/simulink models to hardware mapping/scheduling. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE, Hammamet, Tunisia (2016) 19. Anissa, S.: Proposed unified 32-bit multiplier/inverter for asymmetric cryptography. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE, Hammamet, Tunisia (2016)

A Novel DWTTH Approach for Denoising X-Ray Images Acquired Using Flat Detector Olfa Marrakchi Charfi1,2(&), Naouel Guezmir1,3, Jérôme Mbainaibeye4, and Mokhtar Mars5 1

Department of Physic and Instrumentation, National High Institute of Applied Sciences and Technology, Carthage University, Centre Urbain Nord, Box 676, 2080 Tunis, Tunisia [email protected], [email protected] 2 GREEN-TEAM Laboratory LR17AGR01 INAT, Tunis, Tunisia 3 MMA Laboratory IPEST, Tunis, Tunisia 4 University of Doba, BP 03, Doba, Chad [email protected] 5 Laboratory of Biophysics Research and Medicals Technologies, High Institute of Tunisian Medicals, Doctor Z. Essafi Avenue, 1006 Tunis, Tunisia [email protected] Abstract. This paper proposes a new approach for denoising an X-ray flat detector image by combining Discrete Wavelets Transform (DWT) and the hard Thresholding method (DWTTH). The developed procedure can decrease noise for X-ray images to achieve a great quality of image at minimum X-ray dose. Noisy images are those acquired with low X-ray doses. For this purpose we have tested our DWTTH algorithm on one low X-ray dose image (Low_RX). The denoised image is compared to a standard X-ray dose image (S_RX). Images are acquired on a Pro-Digi phantom. We have focused our study on denoising the image with preserving contrast between regions. So, denoising procedure is applied on seven region of interest (ROI) selected on the two Pro-Digi X-ray images with different contrast. The proposed denoising DWTTH method is based on the combination of discrete wavelet transform and hard thresholding of energy coefficients of the approximation image issued from DWT applied on several decomposition levels. The denoised image is reconstructed by applying the inverse DWT. The DWTTH results are evaluated in terms of Contrast to Noise Ratio (CNR) and the Signal to Noise Ratio (SNR). These ratios are computed for each denoised ROI and are compared to those corresponding ROIs of S_RX image. The DWTTH method results show that the SNR and the CNR ratios are improved considerably compared to those obtained by the Wavelet Coefficient Magnitude Sum (WCMS), the soft thresholding and the conventional filtering methods. Keywords: X-ray image  Pro-Digi phantom  Flat detector Image denoising  Wavelet transform  Thresholding

© Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 323–331, 2020. https://doi.org/10.1007/978-3-030-21005-2_31

324

O. Marrakchi Charfi et al.

1 Introduction The important progress of the microelectronics industry over the last twenty years has produced the fast development of digital sensors. Actually, these sensors are present in many fields of application (video surveillance and security, medical, general public) and their capacity in terms of resolution is constantly increased. Medical imaging has also benefited from this development and CCD or CMOS detectors providing high definition image replace film acquisition system in radiography. For that, many patients no longer need to go through invasive and often dangerous procedures to diagnose a wide variety of pathologies. With the widespread use of digital acquisition in medical imaging, the quality of digital medical images becomes an important issue [2]. In medical field radiology, images must contain no artifacts and have a good quality to enable doctors and radiologists to perform the best possible diagnosis. Hence, radioprotection strategy imposes the reduction of dose radiation in radiography exploration exam. For this reason and the other considerations, denoising methods is still a valid challenge and methods for denoising and will be helpful in medicine. In literature, Wavelet transform is commonly used for image compression [7], image restoration [8, 9], image segmentation [10] and in recent years, it is essentially used for image denoising [1]. Some other methods are also used to denoise radiology images, such us, the conventional filters; the soft threshold method and the WCMS algorithm [1]. In this paper, we have shown that these last cited methods are so weak in performance for reducing the kind of noise enclosed in X-ray image acquired using flat detectors. Therefore, we propose, in this paper, a new algorithm named DWTTH to denoise X-ray images. The DWTTH method combines the DWT transform of the image and the Thresholding of the wavelet coefficients of DWT sub-bands images. This article is structured in six sections. Section 2 presents the X- ray data base acquisition. Section 3 resumes the existing and recent denoising algorithms and methods. In Sect. 4, is developed the proposed DWTTH method. In Sect. 5, experimental results and discussions are presented. Finally, Sect. 6 gives the conclusion and the perspectives of this work.

2 Material and Data The Multix Swing Siemens radiology machine is used to acquire the X-ray images on the Pro-Digi phantom. The low X-ray dose image (Low_RX) is noisier than the standard X-ray dose image (S_RX). The image acquisition characteristics are shown in Table 1. The images acquired by this system have both 3032  2520 pixels size (Fig. 2) and are encoded on 2 bytes (16 bits). From these images, and on the same spatial localization, we have extracted sub-images, which contain seven Regions of Interest (ROI). The size of the ROI sub-images is 145  1058 pixels (Fig. 3). These sub-images are considered for study of the contrast preserving with DWTTH denoising algorithm (Fig. 1).

A Novel DWTTH Approach for Denoising X-Ray Images

325

Fig. 1. The used phantom for the images acquisitions (Pro-Digi).

Fig. 2. X-ray image of the Pro-Digi phantom acquired with using the Multix Swing Siemens radiology machine.

Table 1. The different exposure levels used for the X-ray images. Images

Voltage (KV) Low_RX 70 S_RX 70

Electric current (mA) 40 50

326

O. Marrakchi Charfi et al.

Fig. 3. The contrasted ROI extracted from X-ray image.

3 Review of Existing Image Denoising Methods 3.1

Conventional Filter for Image Denoise

In literature, the common uses of filters for denoising images are the median and mean filters. In addition, many significant researches were done on radiology images denoising algorithms. Some of these methods have been tested on the selected ROIs of Low_RX image (Fig. 3). The mean filter smooth images and reduces the intensity variation between adjacent pixels. This kind of filter is powerful to reduce the Gaussian noise [5]. The median filter is also used to remove noise from images. It is widely used for removing noise with preserving edges in image processing. It is particularly used for removing ‘salt and pepper’ noise [6]. 3.2

Soft Wavelet Thresholding Algorithm

The DWT Transform uses Low-pass (L) and High-pass (H) filters to do one level DWT decomposition. As a result of the decomposition we obtain four orthogonal sub-bands images: Approximation (LL), horizontal (HL), vertical (LH) and diagonal (HH) details sub-bands. The DWT can be applied N level times on LL sub-band to analyze image at low frequencies domain [7, 8] where N > 1. DWT permits to analyze and identify the discontinuities of an image at different scales. Hence, DWT is also used for denoising images by thresholding some wavelet coefficients and thus eliminates fine details that correspond to the noise in the image. DWT can be done at different scales N, depending on noise localization in frequencies domain. There are several types of thresholding methods: The soft thresholding sets the numerical values lower than a threshold T to zero but for those above T, their amplitude are divided by the value of the threshold. This kind of thresholding guarantees the noise remove for high coefficients values [11]. The hard thresholding maintains coefficients above threshold T at the same value but puts the other coefficients to zero. 3.3

Denoising by WCMS Algorithm

In 2004, Zhong J. et al. [1] suggest an algorithm called Wavelet Coefficient Magnitude Sum (WCMS). In their article, authors applied WCMS procedure on flat detector based cone beam computed tomography breast imaging to reduce noise and the high dose exposure too. Results shown that X-ray dose can be reduced by up to 60% with preserve

A Novel DWTTH Approach for Denoising X-Ray Images

327

of the image quality. This method consists on using the inter-scale WCMS ratio decision rule to classify wavelet coefficients into two classes: the edge-related and regular signal coefficients class, and the irregular coefficients class. WCMS method uses the MMSE estimation criteria [3, 4] to identify on one hand irregular coefficient localization and on the other hand to apply WCMS algorithm to denoise only the edge-related coefficients and regular coefficient at the lowest DWT decomposition level. Furthermore, these last coefficients are denoised using the MMSE criteria [3, 4]. We note that no modification is happened on these coefficients if they are located at a high decomposition levels.

4 The Proposed DWTTH Denoising Algorithm The proposed Discrete Wavelet Transform and Thresholding algorithm (DWTTH) uses a new procedure for denoising X-ray images. The DWTTH denoising algorithm permits to preserve the regions ROI boundaries. In brief, DWTTH method operates only on the approximation sub-bands (LL), with seven DWT decompositions levels to eliminate the aberrant energy coefficients by hard threshold method.

Fig. 4. DWTTH method flowchart explaining how the thresholds are adapted for each ROI: MROI indicates the mean value of the ROI (S_RX image or Low_RX image).

The DWTTH algorithm (Fig. 4) is customized for the Low_RX image (Fig. 3). For more details, the DWTTH algorithm steps are: Decomposing the Low_RX image using the redundant DWT transformation on N decomposition levels (N = 7, in our case of study), and then applying the hard thresholding method on the approximation image (LL) according to the level decomposition associated to the ROI, for which noise energy coefficients can be detected. The threshold T value is adapted for each ROI and

328

O. Marrakchi Charfi et al.

it is different for each one. To adapt thresholds values for each decomposition level, we must search for each ROI the best decomposition level and the best threshold in each level satisfying the selection criteria, which is the convergence of the average value of each ROI image denoised by this algorithm (MROIlow) to the average value of its corresponding ROI on the reference image (MROIref). After thresholding the denoised image is obtained using the inverse DWT procedure (Fig. 4). Table 2 shows for each ROI which DWT decomposition level was taken and the best threshold value, which allows satisfaction selection criteria. Table 2. Threshold values and DWT decomposition levels for the seven ROIs used by DWTTH algorithm to denoise the Low_RX image. ROI 1 2 3 4 5 6 7 N 7 6 5 4 3 2 1 T 338480 147840 60250 21150 6300 1460 221

5 Results and Discussions Conventional filters and WCMS method are tested widely on Low_RX image of ProDigi phantom. The results in Naouel et al. [12], shown the weakness of these methods to denoise the image. In the same article, we have also announced that DWTTH method can be used to denoise this image. Thus, in the actual paper, the DWTTH algorithm is extended and more developed than the old version presented in [12]. For DWTTH algorithm, firstly the thresholding technique is applied on all sub-bands images obtained by decomposing the Low_RX image by DWT transform. The soft and hard thresholding approaches are tested separately on these sub-bands images for different T values. Results show that the soft thresholding method doesn’t denoise ROIs and the hard threshold method has no effect when hard threshold is applied on horizontal, vertical and diagonal details sub-bands. Only, hard threshold method applied on approximation image (LL) denoises ROIs in the case where each ROI threshold is optimized depending on DWT decomposition level. Hence, for each region of interest (ROI) is affected a threshold value and a level for the DWT decomposition (see Table 2). Results of denoised image with DWTTH method show that for the ROI7, one level decomposition and a threshold value T equal to 221 is sufficient to have a mean value of the denoised ROI7 with DWTTH method quasi-equal to the ROI7 mean value of the S_RX image. But for the other’s ROI, it is necessary to denoise higher-order approximation images (see Table 2). We can note that for each ROI of Low_RX image, the mean value which is associated is affected by noise (see Table 3). To evaluate DWTTH results on all ROIs, Contrast to Noise Ratio (CNR) and the Signal to Noise Ratio (SNR) are also calculated such as in [12] for each ROI. These values are compared to the respectively CNR and SNR values of the ROIs of the S_RX image (see Tables 4 and 5).

A Novel DWTTH Approach for Denoising X-Ray Images

329

The obtained results are compared with those obtained using WCMS method and conventional filters. Therefore, the developed DWTTH method gives for this instance the best results of SNR and CNR to reduce noise and preserve contrast in the Low_RX image (Tables 4 and 5). Hence, we can confirm that noise energy coefficients are localized in the low frequency domain. Hard thresholding can eliminate noise-related energy coefficients, but it also affects the regularity of some ROI textures (Fig. 5(a)). Also, hard thresholding may be used to limit the interval of values of the energy coefficients to be eliminated, but a selection criterion of the energy coefficients must be developed in order to select some of these coefficients which must be restored to preserve the regularity of the ROI. However, instead of restoring desirable energy coefficients, we have chosen, in this case, to attribute for each pixel of one ROI the mean value (smoothing) of the corresponding denoised ROI with DWTTH method (Fig. 5(b)). Also, we can see that the gray scales of the reconstructed denoised image looks like those of the reference image (Fig. 5(c)), essentially from ROI3 to ROI7 which are denoised with DWTTH method. Finally, the obtained results are promising but DWTTH method must be applied and tested on real images for denoising. Hence, the DWTTH algorithm must be improved in order to automate the denoising process. For this reason the execution time of the algorithm is not considered in our study. Table 3. Average values of ROIs of Low_RX image and those one after applying the different denoising methods. ROI ROI1 ROI2 ROI3 ROI4 ROI5 ROI6 ROI7

S_RX 2520.16 2136.21 1650.66 1100.56 598.22 239.00 25.18

Low_RX 2702.97 2394.18 1943.21 1395.08 825.09 377.21 69.23

Mean filter 2703.02 2394.23 1943.29 1395.03 825.15 377.14 69.21

Medium filter WCMS DWTTH 2702.72 2702.90 2520.57 2394.70 2394.01 2131.85 1944.02 1943.31 1564.83 1394.57 1394.96 1110.93 824.98 825.04 585.74 377.19 377.23 236.55 68.86 69.24 25.70

Table 4. SNR values of ROIs of Low_RX image and those one after applying the different denoising methods. ROI S_RX Low_RX ROI1 100.09 39.04 ROI2 84.84 34.58 ROI3 65.56 28.07 ROI4 43.71 20.15 ROI5 23.76 11.92 ROI6 9.49 5.45

Mean filter Medium filter WCMS DWTTH 39.06 39.25 39.04 100.18 34.59 34.78 34.58 84.73 28.08 28.23 28.07 62.53 20.16 20.25 20.15 44.43 11.92 11.98 11.92 23.38 5.45 5.48 5.45 9.40

330

O. Marrakchi Charfi et al.

Table 5. CNR values of ROIs of Low_RX image and those one after applying the different denoising methods. ROI (ROI1, (ROI2, (ROI3, (ROI4, (ROI5,

ROI ROI ROI ROI ROI

3) 4) 5) 6) 7)

S_RX 0.21 0.32 0.47 0.64 0.92

Low_RX 0.16 0.26 0.40 0.57 0.84

Mean filter Medium filter WCMS 0.16 0.16 0.16 0.26 0.26 0.26 0.40 0.40 0.40 0.57 0.57 0.58 0.85 0.85 0.85

DWTTH 0.23 0.31 0.46 0.65 0.92

Fig. 5. Images represented in 256 gray scales: (a) Artefacts in denoising ROIs with DWTHT. (b) Smoothed DWTTH denoised ROI, (c) X ray reference image.

6 Conclusion and Perspectives In this paper, we have present a novel method based on DWT and called DWTTH method for denoising a low X-ray dose image of the Pro-Digi phantom acquired with flat detector. The set of X-ray images is composed of a standard X-ray dose (S_RX) and a low X-ray dose (Low_RX) images. The last one is a high noisy image than the other (S_RX). Tests were done on seven contrasted ROIs located on the images set. Several denoising methods were tested. The conventional filters and WCMS algorithm are not powerful to denoise the image. Only DWTTH denoising algorithm is able to well localize noise in the DWT - approximation sub-bands images at seven decomposition levels. The efficient denoising results with edge and contrast preservation are due to the optimal selection of thresholds if hard thresholding method is used on all levels of the DWT approximation sub-bands. To select thresholds, firstly, the convergence of the average pixels values criterion of denoised ROIs is satisfied and secondly, the SNR and

A Novel DWTTH Approach for Denoising X-Ray Images

331

the CNR ratios of each denoised ROI of the Low_RX image must converge to the SNR and the CNR ratios values of their respectively ROI on the S_RX image. DWTTH denoising algorithm presents the advantages of reducing patient’s X-ray dose, localizing noisy pixels in low frequencies domain and preserving edges and contrast of the image. However, some artifacts may be generated due to the thresholding of no noisy values of some pixels, which belong to the interval. So, selective criteria must be used to pick up only the noisy coefficients from the interval in order to improve aberrant results. An alternative solution is done. Smoothing denoised ROI in order to have a result image look like to the reference image. Acknowledgment. Authors acknowledge are addressed to the Charles Nicolle Hospital radiology staff for the data base acquisitions.

References 1. Zhong, J., Ning, R., Conover, D.: Image denoising based on multiscale singularity detection for cone beam CT breast imaging. IEEE Trans. Med. Imag. 23(6), 696–702 (2004) 2. Aribi, W., Khalfallah, A., Bouhlel, M.S., Elkadri, N.: Evaluation of image fusion techniques in nuclear medicine. In: 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications Proceedings, SETIT 2012. IEEE Publisher, Tunisia (2012) 3. Mihcak, M.K., Kozintsev, I., Ramchandran, K., Moulin, P.: Low complexity image denoising based on statistical modeling of wavelet coefficients. IEEE Signal Process. Lett. 6 (12), 300–303 (1999) 4. Cai, Z., Cheng, T.H., Lu, C., Subramanian, K.R.: Efficient wavelet based image denoising algorithm. Electron. Lett. 37(11), 683–685 (2001) 5. Turajlić, E., Karahodzic, V.: An adaptive scheme for X-ray medical image denoising using artificial neural networks and additive white gaussian noise level estimation in SVD domain. In: Badnjević, A. (ed.) CMBEBIH 2017, IFMBE Proceedings, pp. 36–40. Springer, Sarajevo, B&H (2017) 6. Kirti, T., Jitendra, K., Ashok, S.: Poisson noise reduction from X-ray images by region classification and response median filtering. Sadhana 42(6), 855–863 (2017) 7. Chang, C.L., Girod, B.: Adaptive discrete wavelet transform for image compression. IEEE Trans. Image Process. 16(5), 1289–1302 (2007) 8. Mário, A., Figueiredo, T., Nowak, R.D.: An EM algorithm for wavelet-based image restoration. IEEE Trans. Image Process. 12(8), 906–916 (2003) 9. Belge, M., Kilmer, M.E., Miller, E.L.: Wavelet domain image restoration with adaptive edge-preserving regularity. IEEE Trans. Image Process. 9, 597–608 (2000) 10. Choi, H., Baraniuk, R.G.: Multiscale image segmentation using wavelet-domain hidden Markov models. IEEE Trans. Image Process. 10(9), 1309–1321 (2001) 11. Donoho, D.L.: Denoising by soft-thresholding. IEEE Trans. Inf. Theor. 41(3), 613–627 (1995) 12. Guezmir, N., Marrakchi Charfi, O., Mbainaibeye, J., Mars, M.: Evaluation of DWT denoise method on X- ray images acquired using flat detector. In: 4th IEEE Middle East Conference on Biomedical Engineering Proceedings, MECBME 2018, pp. 18–20. EMB Publisher, Tunisia (2018)

Recent Advances in Fire Detection and Monitoring Systems: A Review Rafik Ghali1,2(&), Marwa Jmal1,2(&), Wided Souidene Mseddi1(&), and Rabah Attia1(&) 1

SERCOM, Ecole Polytechnique de Tunisie, Université de Carthage, B.P.743, 2078, La Marsa, Tunisia {rafik.ghali,wided.souidene}@ept.rnu.tn, [email protected] 2 Telnet Innovation Labs, Telnet Holding, Ariana, Tunisia [email protected]

Abstract. Wildfires are one of the most impacting natural disasters, leading to a huge devastation of humans and the environment. Due to the rapid development of sensors and technologies as well as the success of computer vision algorithms new and complete solutions for automatic fire monitoring and detection have been exposed. However, in the past years, only few literature reviews have been proposed to cover researches until the year 2015. To fill this gap, we provide, in this paper, an up-to-date comprehensive review on this problem. First, we present a general description and a comparative analysis in terms of reliability, flexibility and efficiency, of these systems. Then, we expose vision-based methods for fire detection. Our main focus was on techniques based on deep convolutional neural networks (CNNs). Keywords: Fire detection  Fire monitoring Deep convolutional neural networks

 Vision-based systems 

1 Introduction Forest fires have always been one of the major environmental catastrophes with disastrous effects on forest wealth. They grow out of control quickly and their extinction requires huge efforts, time and resources. Very recent statistics [45] showed that the average number of wildfires, over the last 10 years, is about 33.863 fires per year with losses added up to $5,1 billion [46]. Furthermore, a study performed by Lee et al. in 2017 [1], revealed that wildfires kill around 339 thousand people per year worldwide. Regarding these alarming numbers, systems for detecting, monitoring and fighting forest fires at early stages are crucial. Early fire monitoring and detection systems (FMDS) are based on traditional methods like human supervision either by on-site or video monitoring [3]. However, these techniques present some inaccuracies and false detections caused mainly by the limitation of human capacities in supervision. For these reasons, researchers have been working on automating fire detection systems by taking advantage from technological advances [2]. © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 332–340, 2020. https://doi.org/10.1007/978-3-030-21005-2_32

Recent Advances in Fire Detection and Monitoring Systems

333

To reduce the effects of fire, several approaches have been proposed to improve the reliability of FMDS. Vision based fire detection methods is the most usually and more interesting. Fire monitoring present the first level in FMDS. This level is essential to detect and localize fire in the images acquired by the vision sensors. Automatic fire detection has attracted the research community due to its importance yielding to an outstanding number of contributions proposed as a solution to this problem. To the best of our knowledge, only few literature reviews have been proposed in the past years covering this problem. We cite mainly the works of YUAN et al. [4] in 2015, Çetin et al. [7] in 2014, Alkhatib et al. [5] and Mahdipour et al. [6] in 2013. Since 2015 and due to the rapid evolution of sensors, technologies as well as the success of deep learning algorithms, new research and complete solutions have been proposed. For such, we provide, in this paper, an up-to-date comprehensive review on the problem of forest fires monitoring and detection, which we believe is helpful to understand this problem, its main challenges, pitfalls, and the state of the art. The rest of the paper is organized as follows: in Sect. 2, we review the technologies used in detection and monitoring fire systems. We introduce in Sect. 3 the vision-based methods for fire detection while focusing on ones based on deep learning techniques. Finally, conclusions are drawn in Sect. 4.

2 Monitoring Systems for Fire Detection The main purpose of FMDS is to provide a mechanism able to assess environmental factors and their effects on the environment as well as to detect fires and even predict their occurrences at early stages. FMDS can be grouped into three categories [8]: ground systems, satellite systems and unmanned aerial vehicle systems. 2.1

Ground Systems

Traditional ground systems, also called terrestrial systems, are based on human supervision. Fire detection and monitoring is performed by supervising regions locally or by analyzing data provided from local sensors such as flame, smoke and heat detectors, and gas sensors. In order to increase systems efficiency and detect the exact location of fires, ambient sensors were also integrated. These sensors are used during the day and night to detect fire and smoke and identify their characteristics. The main sensors are employed in terrestrial systems are vision or infrared (IR) camera, IR spectrometers and Light detection and ranging systems (LIDAR) [3–5]. 2.2

Satellite-Based Systems

Satellite-based systems are based on space sensors, the most developed fire remote sensing devices. These sensors are known by their reliability and large areas monitoring since they acquire images in multiple spatial and temporal resolutions [3]. Space sensors have many applications in earth observation such as road extraction, Building detection, land cover classification and fire detection and monitoring [17]. In this task,

334

R. Ghali et al.

several sensors have been used to assess the environmental characteristics during a fire and the consequent degree of environmental change. Between 1990 and 2000, images acquired by Advanced Very High-Resolution Radiometer (AVHRR) [47], are used for analyzing the environmental characteristics of a burned area, due to the capability of AVHRR to reconstruct the dataset of burned areas in long-term [18]. In the years1998 and 2000, SPOT-VEGETATION (Systeme Pour I’Observation de la Terre Vegetation) [54] and MODIS (moderate resolution imaging spectroradiometer) [48] were also used for detecting fire and mapping burned area, due to their high quality of spectral and temporal resolution and the possibility of data. High to moderate resolution sensors (Landsat TM/ETM +) [51] were also employed in this task as its greater spatial, spectral, and radiometric resolution than AVHRR. Recently, the new European satellite Sentinel-2A [55] was applied in FMDS as its higher spatial resolution, spectral and geo-metrical performance of the measurements [17]. A number of techniques have been also developed to compute the degree of change in soil and vegetation caused by fire, such as Normalized Difference Vegetation Index (NDVI), Vegetation Cover Index (VCI), Composite Burn Index (CBI) and Surface Roughness Index (SRI). Those techniques are calculated by subtracting the post-fire index from the pre-fire index [44]. 2.3

Unmanned Aerial Vehicles (UAVs) Systems

UAVs are aircrafts without a human pilot. Due to the development of this technology, they become used for both civilian and military purposes. UAVs communicate with the ground station by means of a data transmission system that conveys both the real time order coming from the ground and the information acquired by the UAV that can be delayed and usually intermittent. Data is transmitted either directly through optical range on short distances, up to 150 km, or indirectly relaying on a satellite or an aerial vector (airplane or UAVs) [50]. This monitoring system includes the following steps [5, 15]: (i) Finding a potential fire using different kinds of sensors including visual cameras (for daytime) and/or infrared cameras (for both daytime and nighttime), (ii) Detection of fire with specific algorithms for fire monitoring and informing firefighting operators, (iii) Initialization of fire diagnosis to find information related to the fire such as its location and extent of its evolution and (iv) Initialization of fire prognosis to predict the evolution of the fire in real time using information provided by the on-board remote monitoring sensors. UAVs were firstly used to collect data on forest fires in 1961 by the United States Forest Services and the Forest Fire Laboratory [11]. Later, between 2006 and 2010, NASA and the US Forest Service flew 14 unmanned airborne system sensor missions. Autonomous geospatial data collection, processing and delivery are acquired within 10 min. In addition, it was proved that a multispectral sensor can be integrated to process and visualize data in order to provide near-real-time intelligence [13]. Adding to that, UAVs combined with computer vision-based remote sensing systems increase the efficiency of data collection in real-time and determine the current position of the fire in geographical coordination [14].

Recent Advances in Fire Detection and Monitoring Systems

2.4

335

Comparison of Forest Fire Detection Systems

Ground systems are situated in look out spot which is able to detect fire in real time. However, the flexibility of these systems can’t hide their drawbacks that are mainly caused by human error estimations, inaccuracy in visual estimation, lower fire localizing accuracy and the difficulties in predicting the spread of fire and smoke [16]. Satellite systems, when compared to ground and UAV systems, present several advantages, mainly large areas monitoring, and higher data acquisition frequency [3]. However, these systems are not qualified to early wildfires detection due to their low temporal resolution. In fact, it takes two days to acquire images of the earth. Besides, spatial resolution [1] and images quality can be affected by weather conditions [5]. It can be easily concluded that UAVs have a major significance for fire detection and monitoring thanks to their low cost and reliable data transmission. Furthermore, compared to ground and satellite systems, UAVs are used for early fire detection due to their real time monitoring system and higher data acquisition frequency as well as the fire localization accuracy.

3 Vision-Based Fire Detection A plethora of vision-based fire detection techniques have been proposed. In order to highlight recent advances in machine learning techniques, in this review we to choose to classify into feature and deep learning based-methods. For more early works, the reader may refer to the reviews of YUAN et al. [4], Çetin et al. [7], Alkhatib et al. [5] and Mahdipour et al. [6]. Datasets: Several datasets are used to train and test learning methods especially Convolutional Neural Networks (CNNs) methods. They contain a large number of images/videos acquired from different fire experiments in the forest environment as well as from different scenes in indoor and outdoor environments. Data include positive and negative sequences/images of fire, non-fire, smoke and non-smoke. Examples of datasets are the Fire detection dataset [49], Flickr dataset [52] and FIRESENSE database of videos for flame and smoke detection [53] contain 11 positive and 16 negative videos for flame detection and 13 positive and 9 negative videos for smoke detection. Nonetheless, it is not easy for researchers to acquire real data even if there are many existing open datasets in the domain, for many reasons. There are no standard opensource datasets for the evaluation of FMDS which makes the comparison to the stateof-the-art methods a little bit critical. 3.1

Feature-Based Fire Detection

Color-based methods are the most simple and used techniques to solve this problems. They consist on defining a range of pixel values after converting them to a specific color space. For instance, the combination of RGB color space channels with the saturation component from the HSV color space was shown to be efficient for extracting fire and smoke-pixels [31]. The YCbCr color space was used to construct a

336

R. Ghali et al.

generic chrominance model for flame pixel classification [32]. The YUV color model was also employed to detect fire in real time basing on the temporal variation of fire intensity, due to its efficient separation of the luminance from the chrominance compared to RGB color space [33]. Nonetheless, the performance of color-based methods for fire detection was limited by the complexity of defining smoke characteristics that are in most cases confused with the clouds. This problem was solved by analyzing the spectral, temporal and spatial characteristics of both flame and smoke [28, 34–36]. In the same direction, a method based on combining both color and motion of fire/smoke increased the reliability of fire detection in both indoor and outdoor environments [25]. In [26] combination of different flame characteristics (color, shape and flame movements) extracted from videos acquired by surveillance cameras, was presented to reduce the false alarms caused by fire. Kim et al. [27] also proposed to fuse stereo thermal IR vision and FMCW radar. This method presented a reduction of the distance error interval for the stereo IR vision from 1 and 19% to 1 and 2% and showed a good efficiency in fire smoke-filled environments characterized by low visibility and high temperature. A novel method based on static and dynamic texture features is proposed [40]. At first, YCbCr color space is used to segment the input image. Static features are obtained through hybrid texture descriptors, and dynamic texture features are derived using 2D spatio temporal wavelet decomposition and 3D volumetric wavelet decomposition. This method was tested over the VisiFire datasets as well as a dataset formed by realworld images. A detection rate of 95.65% was achieved while showing the ability to reduce the false alarms caused by moving objects having the same color of fire. 3.2

Deep Learning Methods

Deep learning methods have been reviewed and discussed in recent years. These methods can be divided into four categories [19]. Among these categories, techniques based on Convolutional Neural Networks (CNNs) are the most employed in the task of fire monitoring and detection. For this reason, we will focus in this section on deep learning methods based on CNNs. Zhang et al. [24] proposed a deep CNN model based on training both the full image and a fine-grained patch fire classifier. They are the first to use a patch level annotations. The full image is first tested by this model. If it contained fire, the neural network classifier is employed to detect precise location. Good results are obtained in detection accuracy, 97% on training and 90% on testing, using their benchmark datasets. In [39] a CNN model is proposed for identifying fire on real video sequences. This model detects only red fire. This method showed a reduction of time cost to a ratio from 6 to 60. Better classification accuracy is achieved and indicated that using CNN to detect fire in videos is very promising, due to its performance to extract complex features and classify fire in the some architecture. Furthermore, Lee et al. [1] used a deep CNN in detection wildfire system in aerial images, due to high accuracy and no need for hand-crafted feature extractors. Even though CNNs were widely used in computer vision for object classification, they weren’t highly employed in the task of fire detection. For this reasons, many CNN

Recent Advances in Fire Detection and Monitoring Systems

337

architectures such as AlexNet [41], GoogLeNet [42], VGG13 [43], the modified GoogLeNet and the modified VGG13, have been tested, using aerial images with high resolution. The evaluation of these architectures showed that GoogLeNet and the modified VGG13 present higher accuracies and better performances. Zhao et al. [20] proposed own deep CNN architecture called ‘Fire_Net’ to detect, localize and recognize of wildfire in aerial images. This model contains 15 layers. At first level, saliency detection method is used to extract the dominant object region in image and calculate its color and its texture features. At second level, two logistic regression classifiers are used to identify each feature vector of ROIs belong to flame or smoke, if positive, segment these regions. A great performance of detection core fire area and extraction fire regions even very tiny ignition zone, is proved, using real aerial images of wildfire for train. In [30] and [29] two CNN models are proposed for detect and localize fire in surveillance videos. The first model [29] is inspired from GoogleNet architecture and the second [30] is based on the SqueezeNet [23] architecture. There are various reasons to select these models such as their reasonable computational complexity, their better classification performance and their higher feasibility of implementation on FPGAs compared to other models. Using various datasets, the result of test achieved a high accuracy of fire detection and proved can be minimize fire disasters and implement this system in real-world surveillance networks. Recently, region based on CNN models were highly employed methods for generic object detection. These models are used to simultaneously detect objects and predict their objectness scores at each position. Faster R-CNN [9] is a kind of region CNN models that proved a best performance to generate high-quality region proposals in real time. For this motivational reasons, this model is used to detect smoke in [38]. The results of test by real forest and synthetic smoke images proved the feasibility of this solution, in real early fire monitoring detection. Faster R-CNN also used by Young et al. [37] to detect and localize fire in real-time. Great performance with detection accuracy at 99.24% and a mean Average Precision at 0.7863, are obtained, using various images such as forest fires, gas range fires, and candle flames. In the same direction, Shen et al. [21] employed other region based on CNN model, YOLO [22] (You Only Look Once) unified deep learning model, to detect flame from video. A good accuracy and high precision flame detection are obtained and proved can be used as real-time model for fire detection.

4 Conclusion In this paper, a widespread literature survey on fire monitoring and detection systems has been presented. The main objective of these systems is the detection and estimation of fire evolution in real-time. A comparative analysis of ground, satellite and UAV systems in terms of reliability, flexibility and efficiency revealed that UAVs have a major significance for fire detection and monitoring thanks to their low cost and reliable data transmission and the most important real-time processing. We have also presented an up-to-date review of vision-based fire detection techniques while focusing on the

338

R. Ghali et al.

ones based on deep learning algorithms. Compared to classic method, the latter are showed to be robust and more efficient to solve fire detection and recognition problems. Acknowledgements. This project is carried out under the MOBIDOC scheme, funded by the EU through the EMORI program and managed by the ANPR.

References 1. Lee, W., Kim, S., Lee, Y.T., Lee, H.W., Choi, M.: Deep neural networks for wild fire detection with unmanned aerial vehicle. In: 2017 IEEE International Conference on Consumer Electronics (ICCE), pp. 252–253. IEEE (2017) 2. Dimitropoulos, K., Gunay, O., Kose, K., Erden, F., Chaabene, F., Tsalakanidou, F. … Cetin, E.: Flame detection for video-based early fire warning for the protection of cultural heritage. In: Euro-Mediterranean Conference, pp. 378–387. Springer, Berlin, Heidelberg (2012) 3. San-Miguel-Ayanz, J., Ravail, N.: Active fire detection for fire emergency management: Potential and limitations for the operational use of remote sensing. Nat. Hazards 35(3), 361– 376 (2005) 4. Yuan, C., Zhang, Y., Zhixiang, L.: A survey on technologies for automatic forest fire monitoring, detection, and fighting using unmanned aerial vehicles and remote sensing techniques. Can. J. For. Res. 45(7), 783–792 (2015) 5. Alkhatib, A.A.A.: A review on forest fire detection techniques. Int. J. Distrib. Sens. Netw. 10 (3), 597368 (2014) 6. Mahdipour, E., Dadkhah, C.: Automatic fire detection based on soft computing techniques: review from 2000 to 2010. Artif. Intell. Rev. 42(4), 895–934 (2014) 7. Çetin, A.E., Dimitropoulos, K., Gouverneur, B., Grammalidis, N., Günay, O., Habiboǧlu, Y. H., Verstockt, S.: Video fire detection–review. Digit. Signal Proc. 23(6), 1827–1843 (2013) 8. Den Breejen, E., Breuers, M., Cremer, F., Kemp, R., Roos, M., Schutte, K., De Vries, J.S.: Autonomous forest fire detection. In: Proceedings of 3rd International Conference on Forest Fire Research, pp. 2003–2012 (1998) 9. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91– 99 (2015) 10. Wilson, C.C., Davis, J.B.: Forest fire laboratory at riverside and fire research in California: past, present, and future. Gen. Tech. Rep. PSW-105, vol. 105, p. 22. Berkeley, Calif.: Pacific Southwest Research Station, Forest Service, US Department of Agriculture (1988) 11. Tranchitella, M., Fujikawa, S., Ng, T.L., Yoel, D., Tatum, D., Roy, P., Hinkley, E.: Using tactical unmanned aerial systems to monitor and map wildfires. In: AIAA Infotech@ Aerospace 2007 Conference and Exhibit, p. 2749 (2007) 12. Ambrosia, V.G., Wegener, S., Zajkowski, T., Sullivan, D.V., Buechel, S., Enomoto, F., Hinkley, E.: The Ikhana unmanned airborne system (UAS) western states fire imaging missions: from concept to reality (2006–2010). Geocarto Int. 26(2), 85–101 (2011) 13. Merino, L., Caballero, F., de Dios, J.R.M., Maza, I., Ollero, A.: Automatic forest fire monitoring and measurement using unmanned aerial vehicles. In: Viegas, D.X. (ed.) Proceedings of the 6th International Congress on Forest Fire Research. Coimbra, Portugal (2010) 14. Zhang, Y., Jiang, J.: Bibliographical review on reconfigurable fault-tolerant control systems. Annu. Rev. Control. 32(2), 229–252 (2008)

Recent Advances in Fire Detection and Monitoring Systems

339

15. Martínez-de Dios, J.R., Merino, L., Caballero, F., Ollero, A.: Automatic forest-fire measuring using ground stations and unmanned aerial systems. Sensors 11(6), 6328–6353 (2011) 16. Navarro, G., Caballero, I., Silva, G., Parra, P.C., Vázquez, Á., Caldeira, R.: Evaluation of forest fire on Madeira Island using Sentinel-2A MSI imagery. Int. J. Appl. Earth Obs. Geoinf. 58, 97–106 (2017) 17. Ruiz, J.A.M., Riaño, D., Arbelo, M., French, N.H., Ustin, S.L., Whiting, M.L.: Burned area mapping time series in Canada (1984–1999) from NOAA-AVHRR LTDR: A comparison with other remote sensing products and fire perimeters. Remote Sens. Environ. 117, 407–414 (2012) 18. Zhao, Z.Q., Zheng, P., Xu, S.T., Wu, X.: Object detection with deep learning: A review. arXiv preprint arXiv:1807.0551 (2018) 19. Zhao, Y., Ma, J., Li, X., Zhang, J.: Saliency detection and deep learning-based wildfire identification in uav imagery. Sensors 18(3), 712 (2018) 20. Shen, D., Chen, X., Nguyen, M., Yan, W.Q.: Flame detection using deep learning. In: 2018 4th International Conference on Control, Automation and Robotics (ICCAR). IEEE (2018) 21. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016) 22. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016) 23. Zhang, Q., Xu, J., Xu, L., Guo, H.: Deep convolutional neural networks for forest fire detection. In: Proceedings of the 2016 International Forum on Management, Education and Information Technology Application. Atlantis Press (2016) 24. Di Lascio, R., Greco, A., Saggese, A., Vento, M.: Improving fire detection reliability by a combination of videoanalytics. In: International Conference Image Analysis and Recognition, pp. 477–484. Springer, Cham (2014) 25. Foggia, P., Saggese, A., Vento, M.: Real-time fire detection for video-surveillance applications using a combination of experts based on color, shape, and motion. IEEE Trans. Circuits Syst. Video Technol. 25(9), 1545–1556 (2015) 26. Kim, J.-H., Starr, J.W., Lattimer, B.Y.: Firefighting robot stereo infrared vision and radar sensor fusion for imaging through smoke. Fire Technol. 51(4), 823–845 (2015) 27. Bosch, I., Serrano, A., Vergara, L.: Multisensor network system for wildfire detection using infrared image processing. Sci. World J., 1–10 (2013) 28. Muhammad, K., Ahmad, J., Mehmood, I., Rho, S., Baik, S.W.: Convolutional neural networks based fire detection in surveillance videos. IEEE Access 6, 18174–18183 (2018) 29. Muhammad, K., Ahmad, J., Lv, Z., Bellavista, P., Yang, P., Baik, S.W.: Efficient deep CNNbased fire detection and localization in video surveillance applications. IEEE Trans. Syst., Man, Cybern.: Syst. 99, 1–16 (2018) 30. Chen, T.-H., Wu, P.-H., Chiou, Y.-C.: An early fire-detection method based on image processing. In: 2004 International Conference on Image Processing. ICIP ‘04, pp. 1707– 1710. IEEE (2004) 31. Celik, T., Demirel, H.: Fire detection in video sequences using a generic color model. Fire Saf. J. 44(2), 147–158 (2009) 32. Marbach, G., Loepfe, M., Brupbacher, T.: An image processing technique for fire detection in video images. Fire Saf. J. 41(4), 285–289 (2006) 33. Ho, C.-C.: Machine vision-based real-time early flame and smoke detection. Meas. Sci. Technol. 20(4), 045502 (2009)

340

R. Ghali et al.

34. Celik, T., Demirel, H., Ozkaramanli, H.: Automatic fire detection in video sequences. In: 2006 14th European Signal Processing Conference, pp. 1–5. IEEE (2006) 35. Yu, C., Mei, Z., Zhang, X.: A real-time video fire flame and smoke detection algorithm. Procedia Eng. 62, 891–898 (2013) 36. Kim, Y.-J., Kim, E.-G.: Fire detection system using faster R-CNN. In: International Conference on Future Information & Communication Engineering, vol. 9, no. 1 (2017) 37. Zhang, Q.-X., et al.: Wildland forest fire smoke detection based on faster R-CNN using synthetic smoke images. Procedia Eng. 211, 441–446 (2018) 38. Frizzi, S., Kaabi, R., Bouchouicha, M., Ginoux, J.M., Moreau, E., Fnaiech, F.: Convolutional neural network for video fire and smoke detection. In: IECON 2016 – 42nd Annual Conference of the IEEE Industrial Electronics Society. IEEE (2016) 39. Prema, C.E., Vinsley, S.S., Suresh, S.: Efficient flame detection based on static and dynamic texture analysis in forest fire detection. Fire Technol. 54(1), 255–288 (2018) 40. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (NIPS 2012), pp. 1097–1105 (2012) 41. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015) 42. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 43. Chu, T., Guo, X.: Remote sensing techniques in monitoring post-fire effects and patterns of forest recovery in boreal forest regions: a review. Remote Sens. 6(1), 470–520 (2013) 44. FireInfo. https://www.nifc.gov/fireInfo/nfn.htm. Accessed 9 July 2018 45. Statistics. https://www.iii.org/fact-statistic/facts-statistics-wildfires. Accessed 4 July 2018 46. AVHRR Homepage. http://noaasis.noaa.gov/NOAASIS/ml/avhrr.html. Accessed 4 July 2018 47. MODIS Web. https://modis.gsfc.nasa.gov/. Accessed 4 July 2018 48. Fire dataset. http://signal.ee.bilkent.edu.tr/VisiFire/. Accessed 9 July 2018 49. UAVs. http://dronestpe.e-monsite.com/pages/un-drone-comment-ca-marche.html. Accessed 9 July 2018 50. LandsatHome. https://landsat.gsfc.nasa.gov/the-thematic-mapper/. Accessed 10 Sept 2018 51. Flickr dataset. http://conteudo.icmc.usp.br/pessoas/junio/DatasetFlicker/DatasetFlickr.htm/. Accessed 9 July 2018 52. Firesense dataset. https://zenodo.org/record/836749#.W22IN870mUk. Accessed 9 July 2018 53. SPOT Homepage. http://www.spot-vegetation.com/index.html. Accessed 10 Sept 2018 54. Sentinel Homepage. https://sentinel.esa.int/web/sentinel/home. Accessed 10 Sept 2018

Superpixel Based Segmentation of Historical Document Images Using a Multiscale Texture Analysis Emna Soyed(B) , Ramzi Chaieb, and Karim Kalti LATIS - Laboratory of Advanced Technology and Intelligent Systems, ENISo, Sousse University, Sousse, Tunisia [email protected], [email protected], [email protected]

Abstract. In this paper, a superpixel based segmentation of Historical Document Images (HDIs) using multiscale texture analysis is proposed. A Simple Linear Iterative Clustering (SLIC) superpixel technique and Kmeans classifier are applied in order to separate the input image into background and foreground superpixels. The foreground superpixels are characterized by the standard deviation and the mean of the Gabor features. These features are extracted in a multiscale fashion to adapt to the variability of the textures that may be present in HDIs. Text/graphic separation is then performed by applying a classification of the foreground superpixels for each texture analysis scale followed by a merging step of the obtained classification results. Since the classification results depend on the used classifier, a comparative study is performed for supervised (Support Vector Machine (SVM), K-Nearest Neighbors (KNN)) and unsupervised (Kmeans, Fuzzy C-Means (FCM)) techniques. Experiments show the effectiveness of our proposed method especially when compared with similar work in the literature. Keywords: Segmentation of Historical Document Images · Multiscale texture analysis · SLIC superpixel · Gabor features Merging classification results

1

·

Introduction

In recent years, digitizing collections of cultural images becomes the vision of libraries and museums to ensure a sustainable conservation of historical collections and a world access to large sets of cultural heritage documents. Thus, many open issues and challenges have been raised, such as developing content-based image indexing and retrieval tools and ensuring the efficiency of document interpretation systems [1,2]. In particular, the literature shows that a huge amount of issues is related to the particularities of historical documents such as noise, degradation, page skew, specific fonts, irregular spacing between characters, random alignment and different text orientations [3]. Recently, the family of superpixel based algorithms has gained a great attention of many researchers in how c Springer Nature Switzerland AG 2020  M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 341–351, 2020. https://doi.org/10.1007/978-3-030-21005-2_33

342

E. Soyed et al.

we can use segmentation methods to gain new insights into the world of image processing. The superpixel technique uses the degree of characteristics similarity between pixels and acquires the redundant information in the image to grip pixels. Therefore, they can reduce the complexity of image post processing tasks and by following will lead to a more effective treatment.

2

Related Work

For instance, Jian used the superpixel technique to perform the image background segmentation [4]. The particular aggregation of information provided by the superpixel has been proved by Li et al. to be useful for image segmentation [5]. Cohen et al. used spatial and color features extracted from superpixels to separate drawings regions from background of ancient documents [6]. Garz et al. used a multistage algorithm based on interest points to detect the layout entities in ancient manuscripts [7]. They proposed a new stroke width computing method by using intensity and Simple Linear Iterative Clustering (SLIC) superpixels region growing to segment text from low quality images. The limitation of this method is the fact that it cannot be used in text images with various strokes and it was proved to not be good for complex background text segmentation. Mehri et al. pointed out a background separation technique for ancient documents using spatial and color features extracted from superpixels [8]. Moreover, in order to extract the text and graphic regions, they presented an algorithm based on the use of the SLIC superpixel approach and Gabor descriptors to segment historical document images [9]. Mehri et al. used different sliding window sizes for the texture analysis. The descriptors extracted from these different windows were embedded in a single vector which may affect the performance of the classification step. In this paper, we propose a contribution to overcome these inefficiencies by developing a HDIs segmentation method based on using multiscale texture analysis. The proposed method is based on the SLIC superpixel technique and Gabor features. Accordingly, four different scales analysis are used and four classifiers are applied to extract textual information from the graphical ones. In the literature, many methods for text/graphic classification were elaborated [8,9]. Unfortunately, all these works did not make any comparative study between them. Our objective is to apply a comparison between four different classification methods in order to ensure the effectiveness of the selective classifier. First, an interactive feature learning step is introduced to train supervised classifiers. Then, a merging method is developed to deal with the variety of the multiscale analysis windows and to make easier the choice of the best classifier. The remainder of this paper is organized as follows. Section 3 presents an overview of our proposed method. The adopted superpixel technique is described and the investigated Gabor features are detailed. In order to outline the pertinence of the experimental protocol, Sect. 4 describes the ground truth, the segmentation results and their evaluation. Finally, Sect. 5 gives a conclusion in the end of this paper.

Superpixel Based Segmentation of HDIs Using a Multiscale Texture Analysis

3

343

The Proposed Method

In this work, we are not searching for an accurate segmentation, but to find regions with similar characteristics in a variety of scales and fonts, and to discriminate between text and graphic components. Figure 1 presents the schematic block representation of the proposed method. The proposed method starts by separating the input HDI into foreground and background superpixels using the SLIC superpixel technique and the Kmeans clustering to obtain an enhanced and non noisy background and to avoid the complexity of pixel based segmentation. The extracted Gabor features for each texture analysis scale are fed as input to four different classifiers in order to classify the foreground superpixels into two classes (text or graphic). Finally, a merging step is introduced in order to reduce the multi-variance of the sliding windows on the classification step.

Fig. 1. Schematic block of the proposed method based on texture features extraction for HDI and merging results of each classifier.

In order to have a relevant segmentation of textual regions from graphical ones and a separation between different Kinds of graphics and various text fonts, the proposed method is composed of four steps: Preprocessing, Texture feature extraction, Superpixels classification and Merging results.

344

3.1

E. Soyed et al.

Preprocessing: Foreground/Background Segmentation

First, a HDI is fed as input and converted to a grayscale image. The proposed method does not assume a prior knowledge regarding document content and layout. In order to improve the segmentation results, a median filter is applied on the input image. Figure 2 shows an example of the preprocessing step.

Fig. 2. Illustration of the results of the preprocessing step. (a) and (f) Original HDIs (b) and (g) Grayscale conversion results (c) and (h) Results of SLIC superpixel segmentation (d) and (i) Results of the foreground/background segmentation (foreground superpixels are labeled as blue and the background ones are labeled as green) (e) and (j) Foreground SLIC superpixels.

Once the preprocessing is terminated, the image is segmented. The aim of the segmentation task is to simplify the representation of an image on some things easier and more meaningful for the extraction and analysis step. Rather than using a rigid structure of pixel grid, the superpixel technique has been used to group pixels sharing similar properties or characteristics into a significant polygon shaped region. The superpixels are used in our method as basic units instead of using the rigid structure of pixel grid and become a consistent alternative for a foreground/background segmentation. Achanta et al. performed an empirical comparison of five state-of-the-art superpixel methods [10]. They classified the existing superpixel methods into three classes: gradient-based-ascent, graph-based and SLIC superpixel. They stated that the SLIC technique is the best superpixel method regarding the segmentation results and performances, the ability to adhere to image boundaries, the efficiency of memory, the simplicity of use and the ability to control superpixel compactness. Actually, by setting the superpixels number equal to 0.08%, the SLIC superpixel technique

Superpixel Based Segmentation of HDIs Using a Multiscale Texture Analysis

345

is performed on a grayscale image. The segmented image is generated by grouping pixels sharing gray level similarity and their proximity in the image plane into significant polygon-shaped regions. Afterwards, the background and foreground clustering task is performed based on the calculation of the mean gray level value of each superpixel by averaging over all gray level pixels associated to the superpixel regions and using the Kmeans algorithm. By setting the number of clusters equal to two, two well separated clusters are extracted: one for the foreground and one represents the background information. An enhanced background is obtained by allocating the value of a white pixel to the superpixel center and the pixels belonging to them. 3.2

Texture Feature Extraction

The texture features explored in this work are based on a multiscale Gabor filter. The choice of Gabor features was based on an empirical comparison between texture features extracted using Gabor filter and the discrete Fourier transform presented by Ursani et al. [11]. They proved that Gabor filters have good performance as opposed to Fourier transform, due to the optimal localization properties to get information in both frequency and spatial domains from the analyzed images. Moreover, some of the known texture-based approaches have been compared by Mehri et al. [12]: The auto-correlation function, Gray level Co-occurrence Matrix, and Gabor features, used for the segmentation of digitized simplified ancient document images. They concluded that Gabor features results, obtained on ancient document images, are the best for distinguishing textual regions from graphical ones and for font segmentation. Nevertheless, few studies have been sought for ancient document image segmentation using the multiscale Gabor filters [13]. Therefore, the investigated Gabor features tested in our work are Gabor filters. Gabor features are produced at several frequencies and orientations for texture characterization. Four orientations and six frequencies are used in common implementation [14,15]. In this study, 24 Gabor filters are applied √ by √ using √4 } and 6 distinct frequencies {2 2, 4 2, 8 2, different orientations {0, π4 , π2 , 3π 4 √ √ √ 16 2, 32 2, 64 2}. The energies and the amplitudes of the Gabor filter outputs are investigated. Gabor features are extracted by the convolution of the analyzed document image and pairs of orientations and frequencies. The standard deviation and the mean values of Gabor filtered magnitudes and energies responses corresponding to all pixels defined in the same superpixel are extracted. The convolution of the gray level ancient document image with 24 Gabor filters are applied for each pixel defined in the same superpixel. After all, an easy way to extract Gabor features on the hole transformed image by the selective Gabor filter, is the use of a multiscale analysis technique to extract Gabor features from the selected foreground superpixels at four different sliding windows ((8 × 8), (16 × 16), (32 × 32) and (64 × 64)). Thus, four feature vectors are formed through four different sizes of sliding windows.

346

3.3

E. Soyed et al.

Superpixels Classification

Once the extracted Gabor features step is terminated, a classification task is performed by partitioning Gabor based features sets into separated classes. In order to select the best classifier for the discrimination of textual content from graphical one, a comparative study based on using several classifier techniques (Kmeans, Fuzzy C-Means (FCM), K-Nearest Neighbors (KNN), Support Vector Machine (SVM)) is carried out. Supervised and unsupervised techniques have been used on partitioning the image. As Kmeans and FCM are unsupervised techniques, only the number of classes K is provided as input. By setting the number of classes K equal to two, two well separated classes are extracted: one for textual region and one for graphical one. The SVM and KNN are supervised techniques used with a default kernel, in addition to the number of classes K, models are trained to generate the classification. Therefore, the training data is obtained by a manual selection of six regions (three belong to the text regions and three belong to the graphical ones) built for each image. Our goal is to construct a training data for each HDI, to ensure the effectiveness of the interactive method. The foreground superpixels issue is presented as a binary classification: one class for textual superpixels and another one containing all superpixels considered as other contents. Each superpixel belonging to the training data is noted as belonging to one of the two classes. 3.4

Merging Results

In many fields such as multiscale sliding windows, we are often confronted with multiple and conflicts sources of information. In this paper, we propose a merging method based on using the majority vote in order to resolve the conflicts of windows resolution. Four classification results are fed as input. Then, for each superpixel, we assign it to the majority voted class. If we have an equal case, we take the majority vote of neighboring superpixels already classified. Otherwise, we assign it to the same class of the nearest neighbor. A detailed schematic block representing the merging results of each Gabor filter application window for one classifier is illustrated in Fig. 3. For each superpixel, the majority vote can be formed as follows: V (s) = argmax(g(ci , s))

(1)

i∈{1,2}

Where s is the superpixel to be classified, V(s) is the majority vote and g(ci , s) is calculated as: 4  g(ci , s) = wj (2) j=1

Where wj = 1 if Rj (s) = ci else wj = 0. Rj (s) is the classification result of s by the j classifier. In this study, we consider only two classes which are text (c1 ) and graphic (c2 ). In the first equal case where g(ci , s) = 2, the majority vote among the

Superpixel Based Segmentation of HDIs Using a Multiscale Texture Analysis

347

Fig. 3. Proposed merging method.

neighboring superpixels is calculated based on the same definition. In the second equal case where g(c1 , s) = g(c2 , s), the superpixel can obtain its class from the nearest neighbor. For this purpose, the euclidean distance is used to select the nearest neighbor. This distance is calculated as follows:  diss1,s2 = (xs2 − xs1 )2 + (ys2 − ys1 )2 (3) Where s1 and s2 are two neighboring superpixels and (xs1 , ys1 ) and (xs2 , ys2 ) are respectively their coordinates of the centroids. Otherwise, we consider the result of the sliding window resolution 32 × 32 as the final decision class. The respective results of each step of our document image segmentation system will be reported and discussed in the next section.

4

Evaluation and Results

In this section, we perform experiments to illustrate and to discuss the performance of the proposed method. A brief description of the ground truth and the experimental corpus is presented.

348

4.1

E. Soyed et al.

Corpus and Ground Truth

To prove the influence of the merging step on extracting graphical regions from textual ones, we focus in our corpus on collecting HDIs containing various textual and graphical contents as shown in Fig. 4.

Fig. 4. Illustration of some examples of HDIs containing textual and graphical contents ((a), (b), (c) and (d)) or only textual contents ((e), (f), (g) and (h)).

A significant number of historical documents with both complex and simple layouts have been selected. This dataset is composed of grayscale, binary and color HDIs. Moreover, we have constructed the training data for each image by selecting three regions from the text and three others from the graphical ones. The training data is made for each image to guarantee a good understanding of the behavior of the extracted Gabor features. 4.2

Experiments and Results

The obtained results of the proposed segmentation method using the SLIC superpixel technique, multiscale Gabor filters and several classifiers techniques are illustrated in this section. By visual inspection of the obtained results, we note that the proposed method in this article ensure satisfying results particularly in distinguishing textual regions from the graphical ones. We perform a qualitative comparison of four different classification techniques. Arguably, the SVM classifier highlights the best classification results for separating textual regions from the graphical ones. Figure 5 shows an example of HDI segmentation assigned to each classifier. The quantitative evaluation of the proposed segmentation method is made by four measures: Complete (CM), Quality (CR), Accuracy (ACC) and F-measure (F) based on the Confusion Matrix decisions.

Superpixel Based Segmentation of HDIs Using a Multiscale Texture Analysis

349

– The Complete (CM): The CM measures the probability of having a pixel that belongs to both the system response and the Ground Truth. – The Quality (CR): The CR is described by the percentage of the region (text or graphic) correctly extracted. – The Accuracy (ACC): The ACC consists of the combination of the two other CM and CR measures. The ACC can be defined as the percentage of the text and graphic region extracted by the algorithm. – The F-measure (F): The F-measure is calculated as a score resulting from the combination of CM and CR. It assesses both the completeness and the homogeneity criteria of a clustering result.

Fig. 5. Example of resulting images of the proposed method assigned to each classifier: (a) Original HDI (b) The foreground/background separation results (c) The foreground superpixels (d) The ground truth generated by GEDI [16] (e) Kmeans results (f) FCM results (g) KNN results (h) SVM results.

Table 1 presents the obtained measures values computed for each classifier. Thus, by merging the Gabor features multiscale application results, we note that the SVM technique is more relevant than other techniques used in this work for the characterization and segmentation of textual regions and graphical ones. We note that the superpixels are classified with 94% (CM), 92% (CR) and 91% (ACC). Further, the SVM technique based on learning Gabor features is more efficient than the method presented in [9] as shown in Table 1. Furthermore, it tends to well classify graphical superpixels into graphical classes thanks to the use of an interactive method to create the training data. As shown in Table 2, the obtained F-measure values are congruent and very promising.

350

E. Soyed et al.

Table 1. Evaluation and comparison of the proposed method of HDI segmentation by calculating several measures FCM Kmeans SVM KNN Method proposed in [9] CM

0.52

0.67

0.94

0.90

0.87

CR

0.63

0.64

0.92

0.91

0.83

ACC 0.47

0.49

0.91

0.88

0.79

Table 2. Evaluation of HDI segmentation for the classifier SVM by computing the F-MEASURE values (F) Only one font One font and one graphic Total F 0.88

5

0.92

0.90

Conclusion

In this paper, we proposed an interactive method based on using SLIC superpixels and Gabor features to extract textual and graphical regions in HDIs. In this study, each superpixel is characterized by the average and the standard deviation of the response of Gabor filters application. The features are computed for different sliding window sizes. Then, the extracted features are classified using supervised (KNN, SVM) and unsupervised (Kmeans, FCM) clustering techniques. In order to deal with the variety of windows, the results of the different classifiers are merged using the majority vote. Finally, the evaluation shows that our approach brings satisfactory results and outperforms the one proposed in [9] thanks to the merging step. Our further work will be the use of the presented algorithm on an another type of database for example administrative documents. We will also focus on evaluating and merging other texture extraction methods.

References 1. Loussaief, S., Abdelkrim, A.: Machine learning framework for image classification. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 58–61. IEEE (2016) 2. El Bazzi, M., Mammass, D., Zaki, T., Ennaji, A.: A graph based method for Arabic document indexing. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 308–312. IEEE (2016) 3. Coustaty, M., Raveaux, R., Ogier, J.-M.: Historical document analysis: a review of French projects and open issues. In: 2011 19th European Signal Processing Conference, pp. 1445–1449. IEEE (2011) 4. Jiang, H.: Linear solution to scale invariant global figure ground separation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 678–685. IEEE (2012)

Superpixel Based Segmentation of HDIs Using a Multiscale Texture Analysis

351

5. Li, Z., Wu, X.-M., Chang, S.-F.: Segmentation using superpixels: a bipartite graph partitioning approach. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 789–796. IEEE (2012) 6. Cohen, R., Asi, A., Kedem, K., El-Sana, J., Dinstein, I.: Robust text and drawing segmentation algorithm for historical documents. In: Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing, pp. 110–117. ACM (2013) 7. Garz, A., Sablatnig, R., Diem, M.: Layout analysis for historical manuscripts using sift features. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 508–512. IEEE (2011) 8. Mehri, M., Sliti, N., H´eroux, P., Gomez-Kr¨ amer, P., Amara, N.E.B., Mullot, R.: Use of SLIC superpixels for ancient document image enhancement and segmentation. In: SPIE/IS&T Electronic Imaging, p. 940205. International Society for Optics and Photonics (2015) 9. Mehri, M., Nayef, N., H´eroux, P., Gomez-Kr¨ amer, P., Mullot, R.: Learning texture features for enhancement and segmentation of historical document images. In: Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing, pp. 47–54. ACM (2015) 10. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., S¨ usstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012) 11. Ursani, A.A., Kpalma, K., Ronsin, J.: Texture features based on Fourier transform and Gabor filters: an empirical comparison. In: International Conference on Machine Vision, ICMV 2007, pp. 67–72. IEEE (2007) 12. Mehri, M., Gomez-Kr¨ amer, P., H´eroux, P., Boucher, A., Mullot, R.: Texture feature evaluation for segmentation of historical document images. In: Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing, pp. 102–109. ACM (2013) 13. Raju, S.S., Pati, P.B., Ramakrishnan, A.: Text localization and extraction from complex color images. In: International Symposium on Visual Computing, pp. 486– 493. Springer (2005) 14. Charrada, M.A., Amara, N.E.B.: Texture approach for nets extraction application to old Arab newspapers images structuring. In: 2012 3rd International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 212–216. IEEE (2012) 15. Zhong, G., Cheriet, M.: Image patches analysis for text block identification. In: 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), pp. 1241–1246. IEEE (2012) 16. Doermann, D., Zotkina, E., Li, H.: GEDI–a groundtruthing environment for document images. In: Ninth IAPR International Workshop on Document Analysis Systems (DAS 2010) (2010)

Palm Vein Biometric Authentication Using Convolutional Neural Networks Samer Chantaf1(&), Alaa Hilal2(&), and Rola Elsaleh1(&) 1

Faculty of Technology, Lebanese University, Saida, Lebanon [email protected], [email protected] 2 Faculty of Technology, Lebanese University, Aabey, Lebanon [email protected]

Abstract. In this research, we present a new way of thinking using a Convolutional Neural Network (CNN) for palm-vein biometric authentication. In contrary to fingerprint and face, palm vein patterns are internal features which make them very hard to replicate. The objective of this research is to examine the possibility of a contactless authentication of individuals by imply a series of palm veins photographs taken by a camera in the near infrared. Biometric systems based on palm veins are considered very promising for high security environments. In mean time, deep learning techniques have assisted in image classification and tasks retrieval. The use of palm vein recognition through deep learning based methods and Convolutional Neural Network architectures (i.e., Inception V3 and SmallerVggNet) applications. Keywords: Palm-vein  Biometric authentication Convolutional neural network

1 Introduction Personal authentication, which is the association of verification with an individual, is a highly demanded technique for security access systems. The correct authentication of individuals is of high importance especially in managing the operations of many systems at the airport and in companies for example [1–3]. Conventional biometric personal authentication technology is based on behavioral patterns or physiological characteristics such as (fingerprints, faces, or irises…..). However, these patterns present several drawbacks in identification individual [4]. Recently, perceiving palm-vein has been progressing and demonstrated to be an effective biometric authentication trail. In our study, a palm-vein biometric system is applied in images acquisition of the vein formation within a palm and similar to other biometric patterns, these palm vascular patters are distinguished them anything individual [5]. Dissimilar to other biometric authentication technologies, vessels are beneath the skin so they are difficult to falsify. Moreover, palm-vein recognition facilitates liveness detection because of the traits of Near Infrared (NIR) visual representation that looks for the thermal variation between circulates blood in the vessel and its delimiting skin of the body. In addition, palm-vein © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 352–363, 2020. https://doi.org/10.1007/978-3-030-21005-2_34

Palm Vein Biometric Authentication

353

authentication is found out by a CCD camera through near-infra-red filter without touching the sensor. The hemoglobin in the vein of the palm can absorb the infra-red LED light with 840 nm after its passage through the vein. Next an infrared sensitive CCD camera can take up the palm vein patters through there minimal reflections [6, 7]. The palm-vein authentication has many advantages compared to other biometric authentication technique: highly accurate, non-vulnerable to spoofing attaches and internal so they cannot be stolen. Moreover, palm vein designs do not change with passage of years (time), unsmoothness or wounds and they are unique even to identical twins who share the same DNA pattern [14]. The noninvasive and contactless application of our technique ensures convenience and cleanliness for the individual [8–10]. Modes of profound literacy, in particular CNNs have demonstrated their capability in varies applications of images recognition, such as discerning visages and verbal communication in biometrics. Profound literacy allows the automatic discovery of the internal frameworks of high-dimensional training information. This research suggests a methodology founded on a CNN to carry out palm-vein biometric authentication. A CNN is a Multilayer Neural Network comprised a series of convolutional layers intercepting with categorizing that end with an interwoven web of layers. In one trainable module, a CNN merge image partitions, attributes and sorted constituents. It’s designed to accept a 2D unrefined image requiring little preprocessing and retrieves the 2D configuration during process. During training the categorizing is carried out, ending with the final weights behaving as a trait extractor to categorize the input sample. CNNs were used in different case studies. For example (visage identification, deciphering, handwriting and document identification) [11–13].

2 Related Work Before discussing the suggest method we briefly review related methods for palm vein classification and provide a cursory overview of deep learning architectures. 2.1

Palm Vein Review

One study showed that the palm print can be attained using common web camera. In addition splitting the hands from frames was succeeded by ROI excerpt of palm print and traits were drawn out implementing Gabor kernel. Later, ICA traits were extracted and categorized implementing NN and distance-based classifier [15]. In other studies, researchers have used vein pattern confirmation through palm vein images which were primarily improved and traits were drawn out with neural networks, feed forward and SVM algorithms covered with high effectiveness and accuracy [16]. In other studies, obtaining creation for dorsa palm vein authentication implementing correlation method [17]. Some studies reported, implementing directional encoding and Back Propagation Neural Network functioned with palm vein identification in the study Region of Interest and gamma correction practiced [18]. In other paper, palm vein were improved implementing histogram and drawing out trait was performed using Laplacian filter on convolved images [19].

354

S. Chantaf et al.

Previously, identifying human with palm vein images used to happen by drawing out edges and curves from image and trait excerpt of palm vein images implementing canny edge detection methodology [20]. In other studies researchers, discovered a correlation between center of pixel and neighbors implementing local tetra pattern. Recovering images based on best equivalence was performed by LrTP [21]. Some studies reported the comparison of multi vein based techniques and came to the conclusion that the finger, iris, palatal and face veins contribute low FAR and FRR [22]. 2.2

Deep Learning Architectures

In general, convolutional neural network is a standard deep neural network patterned to process data that is generated from a combination of groups like the gray image and the colored image. The propose of this research is not to develop a novel CNN architecture, but to evaluate how far solutions based on CNN can be applied to the palm-vein recognition. A standard architecture of convolutional network is built as chain of layers, and each layer of a ConvNet changes one volume of activations to another through a distinguishable function. Researches use three basic kinds of layers to construct ConvNet architectures: Convolutional Layer, Pooling Layer, and FullyConnected Layer (see Fig. 1). The intaken image under goes a bundle of many processing layers that are tied together implementing spatially ordered forms. The outcome of this local heavy sum is later preceded over a activation function (non-linear), such as Rectified Linear Unit, to speed up the training phase. A pooling layer comprises a grid of pooling units; each of these computes the maximum or the average of a local patch of units in one or a few feature maps. Nearby pooling can not only diminish the magnitude of standing outs; in addition it produces constant to small measures deformation. Later, after the passage of some levels of convolution, pooling, fullyconnected (FC) layers and Softmax, are following and they have relations with all activations in the preceding layer. The Softmax function is to classify an object with probabilistic values between 0 and 1.

Fig. 1. General CNN architecture.

Nowadays, routine ConvNets that demonstrate efficiency for image classification. Previous studies proposed VGGNet [23] and GoogLeNet [24] showing the great different between their maps. In this paper, we show two ImageNet dataset pre-trained CNN designs for texture-based vein traits extraction. The first one is the

Palm Vein Biometric Authentication

355

SmallerVggNet and the second is the Google’s Inception V3 [25]. Noting that ImageNet dataset comprises approximately 1.5 million labeled marked divided into 1000 division, found on the Caffe library [26].

3 Method The proposed methods are tested and evaluated using a capture of dataset. The images were collected at Lebanese University-Faculty of technology by implementing a Near Infrared camera. The overall diagram of the method shows the different stages of the identification of different individuals using the palm veins image (see Fig. 2).

Fig. 2. General block diagram of our study.

These steps consist first of acquiring the palm veins image, then perform image processing algorithms to extract the veins map for the identification and finally the convolutional neural network perform an automatic identification. The aim of this research paper is to study deep learning ways on palm-vein recognition. The researchers’ asses’ two CNN designs, called Inception V3 and SmallerVggNet in showing and noticing palm vein forms.

4 Experimental Work and Results Palm vein images are taken under near infrared lighting system. The experiment aims to remove noise and enhancement image. These procedures are performed on multispectral palm vein image and useful to extract the vein pattern for further processing. In what follows, the image acquisition system, the palm vein images database, the preprocessing stage, CNNs architecture and the results are presented. 4.1

Image Acquisition System

Our acquisition system is consists of a Raspberry Pi 3 (only using for preprocessing phase and not running the CNN model) Noir camera with NIR ring of LEDs implemented for lighting the target body section with infrared light. In this proposed system, the Raspberry Pi camera captures the images at a resolution of 640  480. The IR LED ring is composed of 6 LEDs having a wavelength of 840 nm (see Fig. 3). First, the image is captured using the camera and then it is transferred to the Raspberry Pi for further pre-processing.

356

S. Chantaf et al.

Fig. 3. Image acquisition system: (a) Palm Veins Imaging System; (b) Infrared LEDs + Filter; (c) Raspberry Pi 3; (d) Raspberry Pi 3 NoIR Camera V2.

4.2

Palm Vein Image Database

The palm vein image data implemented in this paper was worked out in the Lebanese University – Faculty of Technology – Saida. Therefore, we had 4000 Near-Infrared (NIR) maps of palm vein. 100 samples for each hand were taken from both the left and right hands of 20 people (8 females and 12 males). Acquisition was done at 25 cm far from the cam in the horizontal position and the volume of a vein image captured is 640  480 pixels. Figure 4 below presents the palm vein patterns for two different individuals.

Fig. 4. Palm veins for two different individuals

Table 1 reveals the number of images for every gender (female vs. male) for each individual dataset and the combined dataset. The total number of females in the combined dataset is 800 while the total number of males is 3200. Table 1. Number of images in each gender. Dataset Female Male Total left and right hand Individuals 8 12 20 Numbers of images 800 3200 4000

Palm Vein Biometric Authentication

4.3

357

Preprocessing Stage

Preprocessing include algorithms that consist of region of ROI excerpts, image enhancement, image normalization and image segmentation. The segmentation process primarily converts the initial in an expressive depiction. This section describes the operations and transformations that were applied to the digital images in order to improve and process the captured image. Initially filtering process is applied on the captured vein pattern image to remove the noise. There are many filters but in this paper, two filters are used: averaging filter and median filter. After that, the contrast is enhanced by histogram equalization. The framework of the proposed method comprises the following processes (see Fig. 5):

Fig. 5. Proposed system workflow for palm-vein pattern

The initial image taken is of size 640  480 pixels. We cut the image after selection of the areas of concern (ROI) to covers almost the entire palm. We achieve the crop so that we diminish the spent time on the process of local dynamics and this will be practiced during the categorizing stage. Experimentally, the image is later resized to 299  299 pixels (for NN training inception V3) and 96  96 pixels (for NN training SmallerVggNet), minimizing the data content in order to facilitate NN training (see Fig. 6). This image size has important influence on the size of the convolution kernel to be chosen.

Fig. 6. Preprocessing stage that includes segmentation.

358

S. Chantaf et al.

4.4

CNN Architecture

The aim of this research is not to evolve new CNN architecture, but to evaluate how far solutions based on CNN can be practiced through to the palm-vein recognition. As mentioned earlier, we adopt two pre-trained CNN designs for texture based vein feature excerpt. The first one is SmallerVggNet (implementation of VGGNet) and the second is Google’s Inception V3 (implementation of GoogLeNet). 4.4.1 Inception V3 (GoogLeNet) The first topology was chosen for implementing the proposed classifiers, named GoogLeNet [26], because we needed to reveal our proposal sharing in a reproductive structure. The GoogLeNet is the ILSVRC 2014 winner from Szegedy et al. from Google. Its main sharing was evolving the Inception Module which dramatically minimized the number of parameters in the network making a more profound and wider topology. The inserted palm vein image that enters an inception module is preceded by a variety of different filters. The topology named GoogLeNet is created with Inception modules piled over on another, guiding to a 22-layer profound design (see Fig. 7).

Fig. 7. Inception V3 architecture.

The architecture of a single inception module get the variety of convolutions that we want; specifically, we will be using 1  1, 3  3, and 5  5 convolutions along with a 3  3 max pooling. 4.4.2 VGGNET (SmallerVggNet) The second topology was chosen for implementing the proposed classifiers, named SmallerVggNet introduced by [23]. The SmallerVggNet is composed of five layers. The palm vein is taken image size is set to 96  96 pixels and a 3  3 convolution kernel size is implemented (see Fig. 8). The convolution layers are: 1. Conv1-32 filters of size 3  3, resulting in an output volume size of 96  96  32. This is followed by a ReLU followed by batch normalization and 3  3 MaxPooling which reduces the size to 32  32  32. The dropout is 25% to reduce overfitting.

Palm Vein Biometric Authentication

359

Fig. 8. SmallerVggNet architecture for palm-vein biometric recognition

2. Conv2-64 filters of size with a 3  3 kernel, resulting in an output volume size of 32  32  64. This is followed by a ReLU followed by batch normalization. The dropout is 25% also. 3. Conv3-64 filters of size 3  3, resulting in an output volume size of 32  32  64. This is followed by a ReLU followed by batch normalization and 2  2 MaxPooling which reduces the size to 16  16  64. The dropout is also 25%. 4. Conv4- 128 filters of size 3  3, resulting in an output volume size of 16  16  128. This is followed by a ReLU followed by batch normalization. The dropout is 25% also. 5. Conv5- 128 filters of size 3  3, resulting in an output volume size of 16  16  128. This is followed by a ReLU followed by batch normalization and 2  2 MaxPooling which reduces the size to 8  8  128. The dropout is also 25%. And finally, we have a set of FC => ReLU layers and a softmax classifier. The fully connected layer is specified by Dense (4096) with a rectified linear unit activation and batch normalization. The dropout is 50%. Table 2 lists the input image size and output feature length for each CNN architecture Table 2. Image size and feature length for each CNN architecture CNNs Image size Feature length Inception V3 229  229  3 2048 VggNet 96  96  3 4096

4.5

Result and Discussion

As mentioned above, the palm vein image database was executed in the Lebanese University – Faculty of Technology – Saida. It contains 20 subjects with 200 samples from each palm (left and right hand). Furthermore, we divide the dataset into 20 classes, and each class contains 160 images for training and 40 images for testing and validation. The training was done with 100 epochs with 50 iterations each, batch size of 32 images and a learning rate (to adjust the weights of our network with respect the loss gradient) of 0.001. The system ran on an Intel® core™ i7 7700HQ, CPU 2.8 GHz, GPU GTX 1060. The algorithms

360

S. Chantaf et al.

are implemented in python, and python OPENCV is utilized for preprocessing. The Fig. 9 below presents the palm vein patterns for different individuals in the database.

Fig. 9. Database samples

4.5.1 Training and Validation Dataset Results The model is initially fit on a training dataset that is a set of examples used to fit the parameters of the model. The model is trained on the training dataset using a supervised learning method. In practice, the training dataset often consist of pairs of an input vector and the corresponding answer vector or scalar, which is commonly denoted as the target. Successively, the suitable design is implemented to anticipate the responses for the observations in a second dataset called the validation dataset. The validation dataset provides an unprejudiced assessment of a design suitable to the training dataset as adjusting the design’s hyperparameters as the number of hidden units in a neural network. The test dataset is a dataset implemented to equip an unprejudiced assessment of a final design suitable for the training dataset. Table 3 lists the split of the number of images stored in the training set and the test and validation are prepared respectively. Table 3. Split of dataset images. Number of the images in database Training Test and validation Percentage 4000 2000 2000 50/50 4000 3000 1000 75/25 4000 3200 800 80/20

Table 4 present our results for palm vein classification for the two CNN models respectively. The same dataset is used in training both models, but we notice we have different accuracies. This difference can result from the architecture of the network and the parameters used in training. The SmallerVggNet has higher accuracy than the Inception v3 because it is a smaller network, and thus more suitable for our dataset (see Table 4). Sometimes bigger networks can cause overfitting which give unrealistic results. To prevent overfitting, reducing the model network might be the solution.

Palm Vein Biometric Authentication

361

Table 4. Accuracies obtained from the two pre-trained CNN models Inception V3 and SmallerVggNet respectively for the 3 cases of split. Split images Case 1: 50%–50% Case 2: 75%–25% Case 3: 80%–20%

SmallerVggNet model Accuracy = 82.5% Accuracy = 85% Accuracy = 93.2%

Inception V3 model Accuracy = 80% Accuracy = 82% Accuracy = 91.4%

5 Conclusion In this research, the researchers discussed deep learning based methodology on palmvein recognition and a relative study was of two deep neural models (VGGNet namely SmallVggNet and Google’s Inception V3) pre-trained on ImageNet. The proposed approach in this study is original so far as the palm veins are unique for each person and very difficult to falsify. Throughout this paper, we devised a way of biometric authentic founded on palm veins images. This method presents a complete and fully automated palm image matching by utilizing palm-vein images. Palm veins were extracted using an infrared camera and near IR light (840 nm). A database was gathered containing 20 individuals with 200 images for each with 640  480 resolutions that were used for training in two CNN models. Our palm vein-matching method will work more effectively in scenarios that are more realistic, lead to a more accurate performance, and promotes high user’s acceptance level, as demonstrated in the experimental results. These high accuracies were achieved using different types of Convolutional Neural Networks. Different CNN models exist, but we used two of them, the first one is SmallerVggNet and the second is Google’s Inception V3. In our study we split the images into 3 cases (50–50, 75–25 and 80–20). The SmallerVggNet provides accuracy of 93.2% against an accuracy of 91.4% for Google’s Inception V3 and both models reach a loss lower than 0.5. Both models are suitable for realistic use in biometrics with low error rate.

References 1. Jain, A.K., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans. Circuits Syst. Video Technol. 14, 4–20 (2014) 2. Eastwood, S.C., Shmerko, V.P., Yanushkevich, S.N., Drahansky, M., Gorodnichy, D.O.: Biometric-enabled authentication machines: A survey of open-set real-world applications. IEEE T. Hum. Mach. Syst. 46, 231–242 (2016) 3. Sequeira, A.F., Cardoso, J.S.: Fingerprint liveness detection in the presence of capable intruders. Sensors 15, 14615–14638 (2015) 4. Wu, J.D., Ye, S.H.: Driver identification using finger-vein patterns with Radon transform and neural network. Expert Syst. Appl. 36, 5793–5799 (2009) 5. Kumar, A., Hanmandlu, M., Gupta, H.: Online biometric authentication using hand vein patterns. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–7. IEEE (2009)

362

S. Chantaf et al.

6. Cross, J.M., Smith, C.L.: Thermographic imaging of the subcutaneous vascular network of the back of the hand for biometric identification. In: Proceedings of 29th International Carnahan Conference on Security Technology, Institute of Electrical and Electronics Engineers, pp. 20–35 (1995) 7. Wang, L., Leedham, G.: Near- and far- infrared imaging for vein pattern biometrics. In: IEEE International Conference on Video and Signal Based Surveillance (AVSS’06) (2006) 8. Yanagawa, T., Aoki, S., Ohyama, T.: Human finger vein images are diverse and its patterns are useful for personal identification. In: 21st Century COE Program, Development of Dynamic Mathematics with High Functionality, pp. 1–8 (2007) 9. Mulyono, D, Jinn, H.S.: A study of finger vein biometric for personal identification. In: International Symposium on Biometrics and Security Technologies (2008) 10. Miura, N., Nagasaka, A., Miyatake, T.: Feature extraction of finger-vein patterns based on repeated line tracking and its application to personal identification. Mach. Vis. Appl. 15, 194–203 (2004) 11. Cheung, B.: Convolutional neural networks applied to human face classification. In: 11th International Conference on Machine Learning and Applications; pp. 580–583 (2012) 12. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998) 13. Chen, Y.N., Han, C.C., Wang. C.T., Jeng, B.S., Fan, K.C.: The application of a convolution neural network on face and license plate detection. In: 18th International Conference on Pattern Recognition (2006) 14. Kavitha, S., Sripriya, P.: A review on palm vein biometrics. Int. J. Eng. Technol.7, 407–409 (2018) 15. Vijayalakshmi, Pushpalatha, S.D.: |Palm vein recognition using independent component analysis and gabor texture patterns. In: IJARCET (2015) 16. Sasikala, R., Sandhya, S., Ravichandran, K., Subramaniam, B.: A survey on human palm vein identification using Laplacian Filter. In: IJIRCCE (2016) 17. Liu, W., Lu, M., Zhang, L.: Palm vein using directional features derived from local binary patterns. Int. J. Signal Process 9, 87–98 (2016) 18. Villarina, Linsangan, N.B.: Palm vein recognition using directional encoding and Back Propagation Neural Network In: Proceedings of the World Congress on Engineering and Computer Science (2016) 19. Zhang, L., Li, L., Yang, A., Shen, Y., Yang, M.: Towards contactless palmprint recognition: a novel device, a new benchmark, and a collaborative representation based identification approach. Pattern Recognit. 69, 199–212 (2017) 20. Gopal, Srivastava, S.: Accurate human recognition by score level and feature level fusion using palm phalanges print. Arab. J. Sci. Eng. 43(2), 543–554 (2018) 21. Subramaniam, B., Radhakrishnan, S.: Multiple features and classifiers for vein based biometric recognition. BioMedical research (2018) 22. Chiu, C.C., Liu, T.K., Liu, W.T., Chen, W.P., Chou, J.H.: A micro control capture images technology for the finger vein recognition based on adaptive image segmentation. Micro Syst. Technol. 24, 1–44 (2018) 23. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint, arXiv:1409.1556 (2014) 24. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 25. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

Palm Vein Biometric Authentication

363

26. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014) 27. Loussaief, S., Abdelkrim, A.: Machine learning framework for image classification. In: Proceedings of the SETIT International Conference on Sciences of Electronics, Technologies of Information and Telecommunications, pp. 58–61. SETIT (2016) 28. Zouari, J., Hamdi, M.: Enhanced fingerprint fuzzy vault based on distortion invariant minutiae structures. In: Proceedings of the SETIT International Conference on Sciences of Electronics, Technologies of Information and Telecommunications, pp. 491–495. SETIT (2016) 29. Ameur, S., Ben Khalifa, A.: A comprehensive leap motion database for hand gesture recognition. In: Proceedings of the SETIT International Conference on Sciences of Electronics, Technologies of Information and Telecommunications, pp. 514–519. SETIT (2016) 30. Hadj Mabrouk, H.: Machine learning from experience feedback on accidents in transport. In: Proceedings of the SETIT International Conference on Sciences of Electronics, Technologies of Information and Telecommunications, pp. 246–251. SETIT (2016)

Indoor Image Recognition and Classification via Deep Convolutional Neural Network Mouna Afif1(&), Riadh Ayachi1, Yahia Said1,2, Edwige Pissaloux3, and Mohamed Atri1 1

Laboratory of Electronics and Microelectronics (ElE), Faculty of Sciences of Monastir, University of Monastir, 5000 Monastir, Tunisia [email protected] 2 Electrical Engineering Department, College of Engineering, Northern Border University, Arar, Saudi Arabia 3 LITIS Laboratory & CNRS FR 3638, University of Rouen, Normandy, Rouen, France

Abstract. Indoor navigation (or way finding) still presents a great challenge for autonomous robotic systems and for visually impaired people (VIP). Indeed, the VIP is often enabling to see visual cues such as informational signs, landmarks or geometrical shapes. A Deep Convolution Neural Network (DCNN) has been proven to be highly effective and has achieved an outstanding success comparing to other techniques in object recognition. This paper proposes a robust approach for objects’ classification using a DCNN model. Experimental results in real indoor images with natural illumination (the MCIn-door 20000 dataset) show that the proposed DCNN model achieves the accuracy of 93.7% in objects classification. Keywords: Indoor navigation  Object recognition Deep convolution neural network (DCNN)

 DenseNet 

1 Introduction In classical methods, the recognition of objects and forms is usually done by matching of points of interest. However, these methods are of high computational costs and require a specific hardware in order to respect temporal performances of applications (such as autonomous robot navigation, [6]). In 2012 Deep learning has revolutionized the field while leading to better results. Deep learning has become the most used and the dominant machine learning approach for object detection and recognition. Deep learning exploitation has gained more attention thanks to its better results comparing to other machine learning algorithms. Thanks to deep convolution neural networks, developers and researchers can deal with big data and huge image number during the training and the test steps. Recognition and classification for indoor object is one of the actively pursued research area in computer vision and pattern recognition. Real time indoor object recognition can be used in robotic assistance navigation applications. For the indoor localization, basically there are two main approaches to reach localization either metric © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 364–371, 2020. https://doi.org/10.1007/978-3-030-21005-2_35

Indoor Image Recognition and Classification

365

SLAM (Simultaneous Localization and Mapping) or the appearance-based approach. SLAM presents a flexible way to implement localization and techniques, it is based on using various types of sensors but it is computationally expensive due to 3Dreconstruction. Appearance-based localization has performed good results when using a limited set of place labels. However, this type of approaches is heavy computationally process and not well recommended for real time applications. New trending works, including computer vision fields has proved that using deep CNN present the best approach to perform indoor object detection and scene classification. It is a new re-search trend on using deep CNN models to recognize and detect objects. Improvements of the efficiency of deep learning and convolutional neural network structures have enabled truly the training of deep CNN architectures recently. As deep learning models became increasingly deep, it can be used to solve many problems which were unresolved in the past. The big importance of deep convolutional neural networks is its capability on working with big data volume while giving powerful results. In this paper, we propose a deep convolutional neural network (DCNN) model to ensure indoor object detection without much computationally overhead. The remainder of the paper is organized as follows. Related works on indoor image recognition and classification are presented in Sect. 2. Section 3 describes the proposed architecture for indoor object recognition and classification. In Sect. 4, experiments and results are detailed. Finally, Sect. 5 concludes the paper.

2 Related Work Many classical works based on machine learning techniques were elaborated [2, 3, 27, 28] for indoor object recognition. These techniques are focused on designing statically models to understand indoor scene and geometry [3, 4]. Authors in [5] presented a study for indoor robot navigation which is based on kinect cameras which provide color and depth information about the indoor space. Some other studies have contributed to design an RGB descriptor for real time object recognition [6]. Chae et al. [7] presented another technique to recognize objects for localization and for map- ping (SLAM) using depth sensors. In [8], Manuel et al. proposed a descriptor based on Kmeans for object recognition. Recently deep learning, gained a high attention to be used in computer vision applications as for object recognition and classification. Deep learning can serve for many applications as for object recognition [9, 10], for detection [11, 12]. Recent works in computer vision have improved that working with deep CNN present an effective way to well perform indoor object detection and classification tasks. Uman et al. [13] presented a convolution neural network (CNN) which serve for features extraction and object recognition. In [14], Ding et al. designed a pipeline indoor object recognition used for both public indoor dataset and private video. Results achieved are efficient against indoor recognition. Kim et al. [25] have trained a deep learning model in order to assist for a robot indoor navigation. In [26], Chen et al. have trained a DCNN model for indoor robot navigation, especially this model focus on detecting doors.

366

M. Afif et al.

Generally deep learning model demands a high capacity of memory, and energy consumption. For this, in this paper, we choose to work with DenseNet [20] model because it presents a compensated architecture which can be implemented in embedded systems.

3 Proposed Architecture for Indoor Object Recognition Deep Convolutional Neural Networks (DCNN) [15] are the widely used neural networks in computer vision applications and image and video processing tasks. Convolutional Neural Networks (CNNs) are multi-level forward propagation networks. The entrances of each Input layer of a CNN are called feature maps. For the first layer (input layer), the input feature maps are images. The outputs on each layer are the characteristics extracted from all the locations of the input features map of that layer. Basically, a CNN is simply a stack of multiple layers of Convolution, non-linearity, Pooling and Fully-connected (FC). Figure 1 presents the architecture of a Convolution Neural Network.

Fig. 1. CNN architecture

In order to obtain deeper, more accurate and more efficient results via convolutional neural networks, we have to choose a model that contains shorter connections between the layers near the input and those close to the output. For this fact we will use the Densely Connected Convolutional Networks (DenseNet) [20] which has a connection architecture that connects each layer to all other layers in an anticipated manner. Traditional convolutional neural networks with L layers present L connections. DenseNet present L(L+1)/2 direct connections. The DenseNet model is a specific deep CNN architecture. In this architecture, the convolution, pooling and non-linearity layers are replaced by the transition layers and dense blocks. Figure 2 presents a DenseNet model containing 3 dense blocks and 2 transition layers. A dense block is composed by much type of layers and functions such as convolution and non-linear layers. As optimization techniques, this DCNN model apply batch normalization and the dropout function. In the DenseNet architecture outputs of the previous layers are concatenated instead of being summed.

Indoor Image Recognition and Classification

367

DenseNet present many advantages: it alleviates the descent gradient problem and it considerably re-duce the number of parameters. DenseNet model was evaluated on four highly competitive object classification and recognition benchmarks (ImageNet, CIFAR 100, SVHN, and CIFAR 10). It had shown significant improvements over the state of the art, while requiring less computation resources to achieve better performances.

Fig. 2. DenseNet architecture

The proposed custom-made model shown in Fig. 3, presents 4 dense blocks and 3 transition layers with a change of optimization technique. The pink blocks present the Dense blocks (four in our case). These blocks are composed by the batch normalization functions, Adam algorithm, convolution and nonlinear layer (RELU in our case), the exponential decay and so on. Orange blocks present the transition layers which are composed by the average pooling layers. Also, our model is composed by the fully connected layer which gives us the object class (softmax layer in our case). Transfer learning is an efficient technique used to train new deep learning models which will be used for new tasks. In this technique, we use a pre-trained model for a specific application which will be a new starting point for a new task. Fine-tuning a deep learning model with transfer learning is much efficient and faster than training models from scratch. By using transfer learning technique, we can speed up training set and improve accuracy and performances of the deep learning model. Generally, transfer learning offers optimization that ensures more progress and best performances when modelling a second task. This technique is currently very used to solve other type of problems with little data. By using transfer learning, we generally improve what have been learned in one task to be generalized in another one and we relocate weights for a task A to be learned for a second task B. In the proposed architecture, weights of different layers are initialized by using the pre-trained model in ImageNet [19]. Therefore, pre-trained weights of ImageNet dataset were continuously updated in the fully connected layer. After that, we employ fine-tuning on the last layer on the MCIndoor 20000 dataset [18]. Finally, we un-froze the first convolution layer of the network and the totality of the network will be finetuned on the MCIndoor 20000 training set.

368

M. Afif et al.

Fig. 3. Architecture of our proposed DenseNet model

4 Experiments and Results In our experiments, the MCIndoor 20000 dataset [18] is used for the train and the test steps of the model. This dataset present 20000 images which were isolated from their surrounding environment which make it suitable for indoor object classification and recognition tasks. This image dataset includes three different object categories: door, stairs and sign, which are landmarks objects for indoor navigation. Images have undergone modifications such as applying Gaussian Blur filter, image rotation, and Poisson equation and so on. It can be combined with other image datasets in order to obtain more robust recognition and accurate classification.

Indoor Image Recognition and Classification

369

To perform the proposed model, TensorFlow framework [21] and NVIDIA tools [22] were used. A Tesla work stations with 12 GB RAM equipped with an NVIDIA Tesla K 40c GPU, is used for the model training and testing. We have modified the DenseNet model by using the Adam optimizer algorithm instead of using the gradient descent. The model is trained on the MCIndoor 20000 images by using the transfer learning technique. In the training step, and after getting weights results of the pre-trained model on ImageNet, we apply transfer learning to the model used. We freeze the entire convolution layers presented in the model, we just train fully connected layers instead of training the whole model on the MCIndoor dataset. These layers are trained from scratch to extract features and characteristics from the frozen convolution layers. Furthermore, we un-freeze the first convolution layer of the network and we fine-tune the deep learning model on the MCIndoor 20000 training dataset. Then, we retrain the model from the retrained weights which ensure the use of very small step sizes. In the training step of our model, we have used the Adam optimizer algorithm to decrease the loss function [23]. The Adam optimizer is a gradient descent optimization function which provides adaptive value of learning rate for each parameter [24]. Figure 4 represents the curves of learning rate obtained by the Adam algorithm and the total loss function understates while training the model.

Fig. 4. (a) Learning rate curves obtained by the Adam optimizer, (b) total loss of our modified DenseNet mode

After completing the training set, we turn to testing the model by using the dataset subset reserved for the test. Table 1 reports the accuracies achieved by the proposed model and those obtained in [18]. The proposed model achieves good results with better performances. It achieves 93.7% of average accuracy comparing to 90.4% achieved by Bachiri et al. [18]. Our framework can be used in mobile applications or indoor navigation systems. Table 1. Classification accuracy Dataset ID Accuracies (%) McInodoor 20000 (original images) Bachiri et al. method [18] Our method 90.4% 93.7%

370

M. Afif et al.

5 Conclusion In this paper, we designed a modified DenseNet model used for indoor object recognition. We performed our model training and testing on the MCIndoor 20000 dataset and we obtained very encouraging results comparing to other methods and works. Other indoor datasets can be combined with MCIndoor 20000 dataset to get more accuracy. The proposed recognition and classification model can be used in many applications like robotic navigation systems and indoor navigation assistance for visually impaired people. Potential future works includes the design of a real time detection system for indoor navigation to be implemented in mobile application.

References 1. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L. D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989) 2. Mei, S., Yang, H., Yin, Z.P.: Discriminative feature representation for image classification via multimodal multitask deep neural networks. J. Electron. Imaging 26(1), 013023 (2017) 3. Nan, L.L., Xie, K., Sharf, A.: A search-classify approach for cluttered indoor scene understanding. ACM Trans. Graph. 31(6), Article no. 137 (2012) 4. Wang, H.Y., Gould, S., Roller, D.: Discriminative learning with latent variables for cluttered indoor scene under-standing. Commun. ACM 56(4), 92–99 (2013) 5. Husain, F., Schulz, H., Dellen, B., Torras, C., Behnke, S.: Combining semantic and geometric features for object class segmentation of indoor scenes. IEEE Robot. Autom. Lett. 2(1), 49–55 (2017) 6. Jiang, L.X., Koch, A., Zell, A.: Object recognition and tracking for indoor robots using an RGB-D sensor. In: Proceedings of the 13th International Conference IAS-13 on Intelligent Autonomous Systems 13, pp. 859–871. Springer, Paris (2016) 7. Chae, H.W., Park, C., Yu, H., Song, J.B.: Object recognition for SLAM in floor environments using a depth sensor. In: 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), pp. 405–410. IEEE, Xian (2016) 8. Blum, M., Springenberg, J.T., Wülfing, J., Riedmiller, M.: A learned feature descriptor for object recognition in RGB-D data. In: Robotics and Automation (ICRA), 2012 IEEE International Conference on, pp. 1298–1303 (2012) 9. Schwarz, M., Schulz, H., Behnke, S.: RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. In: 26th IEEE International Conference on Robotics and Automation (ICRA), pp. 1329–1335. Washington (2015) 10. Eitel, J., Springenberg, T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687. IEEE, Madrid (2015) 11. Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: European Conference on Computer Vision, ECCV 2014, pp. 345–360. Springer International Publishing, Zurich (2014) 12. Bianco, S., Celona, L., Schettini, R.: Robust smile detection using convolutional neural networks. J. Electron. Imaging 25(6), 063002 (2016) 13. Asif, U., Bennamoun, M., Sohel, F.A.: RGB-D object recognition and grasp detection using hierarchical cascaded forests. IEEE Trans. Robot. 33(3), 547–564 (2017)

Indoor Image Recognition and Classification

371

14. Ding, X., Luo, Y., Yu, Q., Li, Q., Cheng, Q., et al.: Indoor object recognition using pretrained convolutional neural network. In: 23rd International Conference on Automation and Computing (ICAC), pp. 1–6. IEEE, Ohio (2017) 15. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998) 16. Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202, 19 (1980) 17. Yann, L., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 18. Bashiri, F.S., LaRose, E., Peissig, P., Tafti, A.P.: MCIndoor20000: a fully-labeled image dataset to advance indoor objects detection. Data in brief 17, 71–75 (2018) 19. Olga, R., Deng, J., Hao, S., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015) 20. Huang, G., Liu, Z., Weinberger, K.Q., Van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol. 1, no. 2, p. 3. IEEE, Hawaii (2017) 21. Tensorflow a deep learning framework. https://www.tensorflow.org. Last accessed 21 Sept 2018 22. Nvidia digits system for deep learning model implementation on Nvidia GPU https:// developer.nvidia.com/digits. Last accessed 21 Sept 2018 23. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations, pp. 1–13 (2015) 24. Sutskever, I., Martens, J., Dahl, G.E., Hinton, G.: On the importance of initialization and momentum in deep learning. ICML (3), 28, pp. 1139–1147 (2013) 25. Kim, D., Chen, T.: Deep neural network for real-time autonomous indoor navigation. arXiv [cs.CV] (2015). http://arxiv.org/abs/1511.04668 26. Chen, W., Wei, C., Ting, Q., Yimin, Z., Kaijian, W., Gang, W., et al.: Door recognition and deep learning algorithm for visual based robot navigation. In: IEEE International Conference on Ro-botics and Biomimetics (ROBIO 2014). Bali (2014) 27. Loussaief, S., Abdelkrim, A.: Machine learning framework for image classification. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 58–61. IEEE, Hammamet (2016) 28. Guerfala, M.W., Sifaoui, A., Abdelkrim, A.: Data classification using logarithmic spiral method based on RBF classifiers. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 416–421. IEEE, Hammamet (2016)

Automatic USCT Image Processing Segmentation for Osteoporosis Detection Marwa Fradi1(&), Wajih Elhadj Youssef1, Ghaith Bouallegue1, Mohsen Machhout1, and Philippe Lasaygues2 1

Laboratory of Electronics and Micro-Electronics FSM, Monastir University, Monastir, Tunisia [email protected] 2 Laboratory of Mechanics and Acoustics, Marseille University, Marseille, France

Abstract. Ultrasound Computed Tomography (USCT) can be used to cortical bone imaging but is limited by the strong variations in acoustic impedance between the medium and its environment. The aim of this work is to test an automatic image processing recognition to enhance the detection of the boundaries. Image processing algorithms are used for automatic detection of edges and defects. In first step, the application of pre-processing algorithm is done. In second step, we have applied k-Means and Ostu algorithm. As a result, we improve the edge detection to bring up the calculation bones structures length. Hence, osteopathologies detection was achieved and results outperform related works by Signal-to-Noise Ratio improvement and saving time execution. Keywords: Proposed k-means algorithm  Discret Haar wavelet (DHW) Edge detection  Osteoporosis detection  SNR



1 Introduction Morphologic bone diseases have been considered for a long time as an interesting issue to have be identified automatically in medical imaging area. Nowadays, computer vision has presented an important topic in medical imaging diagnostic. In this work, we have interested in the Ultrasound Computed Tomography (USCT) for bone imaging process. The proposed method is based on an application of discrete wavelet transformation as first step. As second step, a k-means with Ostu morphologic algorithm have been implemented using Visual C++ and Python language with Keras, opencv and matplot librairies.

2 Related Work Signal processing technique improvement have enhanced the USCT image quality resolution [1, 2]. Thus, image processing algorithms is needed for ultrasound tomography image contrast improvement and automatic image recognition. Automatic © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 372–381, 2020. https://doi.org/10.1007/978-3-030-21005-2_36

Automatic USCT Image Processing Segmentation for Osteoporosis Detection

373

recognition techniques were recently divided into two classes such as supervised and unsupervised [5]. Supervisory segmentation requires operator interaction throughout the segmentation process whereas unsupervised methods generally require operator intervention only after the segmentation. But unsupervised techniques are preferable to gain wonderful results [6]. Although, operator’s intervention is always required to correct the existing error in case of a result failure [7]. For instance, with Born approximation and Fourier transform application, the contrast-to-noise ratio (CNR) is poor. Previous works to improve CNR were done using signal and image [3, 4] processing. This work is concerning this last point, and an automatic edge detection procedure, using Haar wavelet 2D-decompositon [8, 9], combining k-means [10] and Ostu algorithms [11] were developed. Results will be presented on ex vivo real bones and on geometrical mimicking bone phantom (SawbonesTM).

3 Method 3.1

New Hardware Prototype

The prototype used is an ultrasound 2D-ring circular antenna with a diameter of 300 mm, supporting 8 transducers distributed over 360° (45°). The 8 transducers are piezo-composite elements whose frequencies are 1 MHz, having a cylindrical focus in the plane of section. The object to image is placed at the center of the antenna. USCT prototype is presented in detail in the ref [2] (Fig. 1).

Fig. 1. USCT prototype (Copyright / Rights reserved, CNRS-LMA Marseille) [1]

3.2

Proposed Algorithm

Tomographic bones obtained images with USCT prototype were noisy and difficult to interpret because of the inhomogeneity of medical images and the difficulty to distinct

374

M. Fradi et al.

between pixels’ intensity. For this fact, we are going to the application of preprocessing algorithm and K-Means. The wavelet transform is a mother wavelet translation or dilatation function which is identified by the equation below   1 b Wða; bÞ ¼ pffiffiffi W t  a a The k-means method was an unsupervised learning method that resolute the issue of image clustering. The procedure follows a simple way of regrouping the image dataset into k-clusters. Ostu method: is the most powerful and global threshold method. It performs image binarization based on the histogram shape of an image. Otsu algorithm assumes that the image for binarization contains only foreground and background pixels [12, 13] (Fig. 2).

Fig. 2. Synoptic flow of the proposed processing algorithm

Automatic USCT Image Processing Segmentation for Osteoporosis Detection

375

4 Experiments and Results 4.1

Experiments

The protocol for signal acquisition and pre-processing is presented in detail in reference [4]. The image database to test the proposed methods is from experiments using the USCT device on a human bone and mimicking-bone phantom. The human bone is an adult thighbone with a cortical and porous parts. The porous part was characteristic of an osteoporotic bone condition (Fig. 3), where the density was decreasing and the mean width (*8 mm) increasing against the cortical one. The mean cavity diameter was 10 mm. A calliper was used for measurement. The water tank temperature was 21°7 (Fig. 4).

D3 Defected Osteoporosis bone D1 The distance that sould be before defect D1-D3= Healthy Bone D5 Width of Cortical bone Fig. 3. Osteoporosis adult bone

4.2

Fig. 4. USCT adult bone image

Image Pre-processing Results

USCT image Pre-processing is very necessary step to study the bone and its morphology. Thus many filters have been implemented and different algorithms of image pre-processing are used such as Sobel and Laplacien filters but results aren’t sufficient because of the obtained image like shown in Fig. 5. An implemented Haar Wavelet algorithm is applied with a Microsoft Foundation Class (FMC) interface and visual C++ but also results aren’t sufficient. Implementing Haar Wavelet algorithms, the ambiguity was solved, SNR was enhanced but only external which represent the

376

M. Fradi et al.

cortical bone was detected. Sobel and Laplacien shows us the results as Haar Wavelet but only SNR was more improved.

LH1 SNR=12.88

HL1

(a) Original tomographic image

HH1

(b) Haar Wavelet transform results

External

D

Cortical Bone edges

(c) Laplacien results SNR= 7.46

(d)

results SNR=8.6

D: Represent the diameter of cortical bone

Fig. 5. Pre-processing algorithm results.

4.3

Image Processing Results

Image pre-processing algorithms failed to improve USCT image quality and to detect osteopathologies. For this fact, we have reached to the application of an unsupervised k-means algorithm. As shown in Figs. 6, 7 and 8 with different highs of ultrasonic LH1 SNR = 12.88 HL1 HH1 D External cortical bone edges 6 tomographic images. Osteoporosis is defined as decrease of bone’s density with dematerialized contrast

Automatic USCT Image Processing Segmentation for Osteoporosis Detection

377

results [14]. Applying k-means, and after a number of 1000 iterations and after the determination of Euclidian distances between pixels to the cluster in which they are included in. For k = 3, in Fig. 6(b), 7(b) and 8(b) the cortical bone can’t be measured, in fact its width is incompletely visible. However, for K = 5, the width of cancellous bone ids decreasing from D3 in Figs. 8(c) to D1 in Fig. 6(c) In other hand, the width of cortical bone is decreasing passing from D6 to D4 in Fig. 9 with Ostu algorithm. We can say that the bone became weak and its width is decreased. Thus, the osteoporosis is detected and the distances values are roughly similar to that found in reality. As a result, we succeeded in detecting osteoporosis due to K-Means and Ostu algorithm application by distance calculation between bones and width bone values.

(a) K=0, SNR=7.13

(b) K=3, SNR=6.22

(c) K=5, SNR=6.29

Fig. 6. Ultrasonic Tomographic osteoporosis adult bone image with H1

(a) K=0, SNR=8.77

(b) K=3, SNR=8.72

(c) K=5, SNR=8.69

Fig. 7. Ultrasonic Tomographic osteoporosis adult bone image with H2

378

M. Fradi et al.

(a) K=0, SNR=6.59

(b) K=3, SNR=7.46

(c) K=5, SNR=6.96

Fig. 8. Ultrasonic Tomographic osteoporosis adult bone image with H3

Fig. 9. Ostu algorithm results

4.4

SNR Results

In general, the higher the signal-to-noise ratio value, the higher the image resolution quality [15]. Indeed, SNR is an important parameter for image quality measurement. In fact, among the criteria of evaluating the performance of an image is the calculation of SNR [16, 17]. The performance of our implemented Haar Wavelet shows an SNR enhancement of roughly 50%. It has passed from 7.13 to 12.88 of the image compared to the original image. 4.5

Time Execution Results

Time execution results shows us the performance of K-Means and Ostu algorithm with Python and Opencv and Matplot librairies compared to that found with Haar wavelet and Visual C++ integrated with FMC graphic interface (Table 1).

Automatic USCT Image Processing Segmentation for Osteoporosis Detection

379

Table 1. Time execution results. Image Figure Figure Figure Figure Figure

4.6

5(b) 5(c) 5(d) 6(a) 8

Algorithm Haar wavelet Laplacien Sobel K-means Ostu

Time 25 s 2.2 s 2.2 s 1.3 s 0.8 s

Auto Results Comparison

See Table 2 Table 2. Auto results comparison Algorithm

Images

Original image DHW Laplacien Sobel K-means K-means Ostu

Figure Figure Figure Figure Figure Figure Figure

5a 5b 5c 5d 6a 6b 8c

SNR Edges detection Region detection 7.13 —— —— 12.88 +++ – 7.49 + – 8.6 + – 7.46 ++ ++++ 8.77 ++ ++++ 4.05 + +++++

Osteopathologies —— – – – ++++ +++++ ++++

5 Discussions and Comparative Study 5.1

SNR Improvement

The measurement of image quality is evaluated by SNR results. In fact, among the criteria of evaluating the performance of an image is the calculation of SNR [16, 17]. The performance of our implemented Haar Wavelet shows an SNR enhancement of roughly 50%. It has passed from 7.13 to 12.88 of the image compared to the original image. 5.2

Region Detection

In previous work, Lasaygues et al. have showed that a sliding window technique can be used to improve the detection of outer and inner boundaries [2]. However, the measurement relative error remains significant, between 4% and 20% The proposed kmeans method, associated with Ostu algorithm improves the edge detection. However, the processing time is still long for fast analysis, and we will consider a deep learning algorithm to go faster.

380

5.3

M. Fradi et al.

Osteopathologies Detection

We succeeded in the detection of osteoporosis by region and edges determination compared to that done by Lasaygues in 2017 due to SNR enhancement [4]. 5.4

Processing Time

Improvement of the processing time was a challenge to have overcome. In this context, we have gained processing time level with a reduction of 24.2 s. Thus, it has passed from 25 s with C++ to 0.8 s with Python. Our proposed method outperforms time results found by Lasaygues in 2006 with processing time of 3 min.

6 Conclusion Bone tomography is nowadays limited to algorithms that provide high SNR images. Thanks to Haar wavelet transform (using Visual C ++) and the development of a kmeans algorithm, associated with an Ostu method (using Python), we have achieved an automatic detection of shapes and areas detection of cortical and porous areas. The image quality is improved, for a shorter processing time. Our results outperform previous works in processing time, in SNR enhancement and areas detection and identification. The future work will be dedicated to the development of Deep Learning techniques for the automatic classification of USCT images into unhealthy and healthy categories. As well, we will be interested to the implementation of hardware process into GPU, to access a faster global processing.

References 1. Lasaygues, P., Lefebvre, J.: Bone imaging by low frequency ultrasonic reflection tomography. In Hallowell, M., Wells, P.N.T. (eds.) Acoustical Imaging, vol. 25. Kluwer Academic Publishers, Boston (2002) 2. Lasaygues, P.: Assessing the cortical thickness of long bone shafts in children, using twodimensional ultrasonic diffraction tomography. Ultrasound Med. Biol. 32(8), 1215–1227 (2006) 3. Lasaygues, P.: Tomographie ultrasonore osseuse : Caractérisation de la diaphyse des os par inversion d’un champ acoustique diffracté, Intérêt pour l’imagerie pédiatrique (2006) 4. Lasaygues, P., Guillermin, R., Metwally, K., Fernandez, S., Balasse, L., Petit, P., Baron, C.: Contrast resolution enhancement of Ultrasonic Computed Tomography using a waveletbased method – preliminary results in bone imaging. In: International Workshop on Medical Ultrasound Tomography, Nov (2017), Speyer, Germany modified, Avril (2018) 5. Bezdek, J.C., Hall, L.O., Clarke, L.P.: Review of MR image segmentation techniques using pattern recognition. Med. Phys. 20(4), 1033–1048 (1993) 6. Clarke, L.P., Velthuizen, R.P., Camacho, M.A., Heine, J.J., Vaidyanathan, M., Hall, L.O., Thatcher, R.W., Silbiger, M.L.: MRI segmentation: methods and applications. Magn. Reson. Imaging 13(3), 343–368 (1995)

Automatic USCT Image Processing Segmentation for Osteoporosis Detection

381

7. Olabarriaga, S.D., Smeulders, A.W.M.: Interaction in the segmentation of medical images: a survey. Med. Image Anal. 5, 127–142 (2001) 8. Fradi, M., Youssef, W.E., Lasaygues,P., Machhout, M.: Improved USCT of paired bones using wavelet-based image processing. Int. J. Image, Graph. Signal Process. (IJIGSP) 10(9), 1–9 (2018). https://doi.org/10.5815/ijigsp.(2018).09.01 9. Mallat, S., Hwang, W.L.: Singularity detection and processing with wavelets. IEEE Trans. Inf. Theory 38(2), 617–643 (1992) 10. Jose, A., Ravi, S., Sambath, M.: Brain tumor segmentation using K-means clustering and Fuzzy C-means algorithm and its area calculation. Int. J. Innov. Res. Comput. Commun. Eng. 2, 3496–3501 (2014) 11. Liu, S.: Image segmentation technology of the Ostu method for image materials based on binary PSO algorithm. In Jin, D., Lin, S. (eds.) Advances in Computer Science, Intelligent System and Environment, vol. 104, pp. 415–419. Springer Berlin Heidelberg, Berlin, Heidelberg (2011) 12. Naga Gayathri Divya, B., Sowjanya, K.: Otsu’s method of image segmentation using particle swarm optimization technique. Int. J. Sci. Eng. Technol. Res. 4(10), 1805–1808 (2015) 13. Liu, S.: Image segmentation technology of the ostu method for image materials based on binary PSO algorithm. In: Advances in Computer Science, Intelligent System and Environment, pp. 415–419. Springer, Berlin, CSISE 2011, AISC 104 (2011) 14. Glaser, D.L., MD, Kaplan, F.S.: Osteoporosis: definition and Clinical Presentation. Spine J. 22, 12S–16S (1997) 15. Liao, Y.-Y., Wu, J.-C., Li, C.-H., Yeh, C.-K.: Texture feature analysis for breast ultrasound image enhancement. Ultrason. Imaging 33, 264–278 (2011) 16. Wiem FOURATI et Mohamed Salim BOUHLEL: Techniques de Débruitage d’Images. In: SETIT (2009) 17. Hadda, W., Hamrouni, K., Kalti, K.: Analyse des performances des filtres en traitement d’images. In: SETIT (2003)

An Efficient Approach to Face and Smile Detection Alhussain Akoum, Rabih Makkouk(&), and Rafic Hage Chehade Faculty of Technology, Department of CCNE, Lebanese University, Beirut, Lebanon [email protected], [email protected], [email protected]

Abstract. The mental state of a person is judged by detecting smiles. The smile detection starts with face recognition. My algorithm in our paper detects by definition the face from the image we entered, and then it detects the mouth and finally the smile. Next, we make sure that the face we detected is a smiley face or not in a photo, detects the person’s mouth, and decides if they are pleased or not. Given a set of photos of a person entry in our system, we can compare their images using algorithms to detect face and other to detect automatic the corners and the features of the month then we determine which picture has the best smile. Keywords: Face detection Haar classifier

 Smile detection  Corner detection 

1 Introduction To differentiate between satisfactory or unsatisfactory photo we linked to the fact that the person in the picture smiled or not [1]. At the hand of the recognition of features and corner find, That is to say, the most common facial expression that we see in humans is the smile [2, 3]. It gives a positive expression on others and makes more accessible. This reflects joy, happiness, admiration or gratuity from a person. The mental state of a person is judged by detecting smiles [4–6]. The goal is to detect by definition a smiling face in an image automatically detect a smiling subject in an image. Our system is applied to detect by definition the smile of any face detected from many photo. The first camera with smile secure function was released in 2007 by Sony. According to this, only three person faces can be noticed and picture is taken automatically when they smile. This is not precise method as it can detect only big smile but not the fragile smile. The faces are grouped in a photo to make the ideal [7–9]. In this paper, we start the process by detecting the face of the input image. Then we identify the mouth of the person and check if the person smiles or not. To identify the best smile, we compare all the images we have.

© Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 382–389, 2020. https://doi.org/10.1007/978-3-030-21005-2_37

An Efficient Approach to Face and Smile Detection

383

2 Methodology 2.1

Procedure

Given a group of pictures of somebody entry into the system, it compares these pictures and determines which picture has the greatest smile. The photos can be from the similar individual or separate persons. To identify the subject’s face in the photo using the Viola-Jones algorithm and the same algorithm is used to locate the subject’s mouth. Next, the border detection algorithm Shi-Tomasi is performed through the mouth area; locate the borders and characteristics of the mouth (smile wrinkles, tooth, and mouth shape). From the pixels we obtained through the border detected, we can draw a polynomial second degree line. Then we take the best derivation of the line obtained and we determined the concavity of the pixels. Through this process, we can deduce that the face is smiling or not in the image (Fig. 1).

Fig. 1. Program overview.

384

A. Akoum et al.

2.2

Smile Detection

To determine a smiley or not smiley face, we can use multitechniques. The first technique is tried simply to count all points of the borders detected because a person tended to produce more borders than someone without a smile, mainly due to the presence of the number of tooth in a smile [10, 11]. However, they appreciated that this method was inexact when the face gave a smile to the lips close together or was wordless but not laughing. The next technique was to design the points of border detection, since a minimum threshold is grasped, and calculate the best-fit line to the resulting point cloud [12, 13]. This technique combined with our first technique has proven to be an effective combination to detect the concavity of the region of the subject’s mouth and the density of border points in this area, letting us to regulate if the mouth was formed smile.

3 Production This initiate’s face detection can be considered as one of the object detection target. In object-target detection, the mission is to catch the locations and the sizes of all objects in a photo that belong to a certain target. Our algorithms face detection center on the detection of the front of person faces (Fig. 2).

Fig. 2. Detected face.

While detect face, the region for acting mouth mark is formed. The lower third of the face region is isolated to perform the required mouth (Fig. 3).

An Efficient Approach to Face and Smile Detection

385

Fig. 3. Detect face and mouth.

Our final finding stage is to define the place of the angles in the mouth Box. This is complete via the Shi-Tomasi process (Fig. 4). Finally, from the pixels detected by the Shi-Tomasi process, we find the angle concentration parameters and arc (Fig. 4).

Fig. 4. Detected mouth.

3.1

Decision Tree

It uses the decision algorithm to define the image he’s the best smile (Fig. 5).

386

A. Akoum et al.

Fig. 5. Decision algorithm.

4 Our Results It takes 4 images to choose the best smile (Fig. 6)

Fig. 6. Distingue 4 images to select the greatest smile.

An Efficient Approach to Face and Smile Detection

387

Table 1. Area concentration and mouth arc settings for 4 pictures. Picture 1 2 3 4

Angle concentration Mouth arc 15 0.0040 16 0.0079 31 0.0111 13 0.0075

After defining the greatest smile of this group of images, the next calculation data are utilized in the decision tree code (Table 1). Both photos and three contain a high number of corner points, but the number three and four to a greater arc. Since the three pictures reached the minimum level and the greater arc, he was selected as the best smile picture of the whole. In this method the best smile more comparing to the same person. Then onto different step for having the best smile (Fig. 7).

Fig. 7. a. Detection face, b. Face with mouth, c. Mouth, d. Features and e. Curvature in the 5 successive images.

388

A. Akoum et al.

The best smiles are number 8 as he the greater curvature, and meet the minimum number of data items (Table 2). Table 2. The results obtained for 20 images. Picture 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Corner concentration Mouth arc 63 0.0045 81 −9.6942-04 24 0.0051 12 −000034 10 0.0031 15 0.0040 16 0.0079 31 0.0111 13 0.0075 44 −0.0024 23 0.0029 108 0.0039 32 0.0086 37 0.0088 91 0.0032 57 9.9288e-04 24 0.0070 103 0.0019 14 0.0045 51 0.0058

5 Conclusion By detecting the characteristics and detection corners, we could detect if the person was smiling or not in the photo with Face Detection. This automatic identification of Smiles has many potential applications, including: enhanced camera functionality for advanced functionality. In the future, we should update our mouth design such that we can implement larger head turning and face size scaling. In addition to that, we should update our model of mouth so that we can implement a greater turning of the head and scaling of the size of the face.

An Efficient Approach to Face and Smile Detection

389

References 1. Murthy, V., Vinay Sankar, T., Padmavarneya, C., Pavankumar, B., Sindhu, K.: Smile detection for user interfaces. Int. J. Resaerch Electron. Commun. Technol., 21–26 (2014) 2. Rai, P., Dixit, M.: Smile detection via Bezier curve of mouth interest points. J. Adv. Res. Comput. Sci. Softw. Eng. 3(7), pp. 1–5 (2013) 3. Devito, J., Meurer, A., Volz, D.: Smile identification via feature recognition and corner detection (2012) 4. Li, J., Chen, J., Chi, Z.: Smile detection in the wild with hierarchical visual feature. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 639–643 (2016) 5. Hu, P., Ramanan, D.: Finding tiny faces. In: Computer Vision and Pattern Recognition, pp. 1612–1624 (2017) 6. Akoum, A.: Real-time best smile detection. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 7(5), pp. 8–12. ISSN 2278-6856 (2018) 7. Davies, E.R.: Face detection and recognition. Chapter in book: Computer Vision, pp. 631– 662 (2018) 8. Bensalem, M.K., Ettabaa, S., Bouhlel, M.S.: Anomaly detection in hyperspectral images based spatial spectral classification. In: International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT 2016) Hammamet, pp. 166– 170, Tunisia (2016) 9. Soliman, H., Saleh, A., Fathi, E.: Face recognition in mobile devices. Int. J. Comput. Appl. (0975 – 8887) 73(2) (2013) 10. Smari, S.K., Bouhle, M.S.: Gesture recognition system and finger tracking with Kinect. In: International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT 2016) Hammamet, pp. 44–548, Tunisia (2016) 11. Akoum, A.: Real time hand gesture recognition. Int. J. Eng. Inven. 5(7), 21–30. ISSN 23196491 (2016) 12. Ameur, S., Ben Khalifa, A., Bouhle, M.S.: A comprehensive leap motion database for hand gesture recognition. In: International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT 2016) Hammamet, pp. 514–519, Tunisia (2016) 13. Akoum, A.: Real time face detection and segmentation. Int. J. Appl. Eng. Res. 13(19), pp. 14308–14312. ISSN 0973-4562 (2018)

Human-Machine Interaction

The Role of Virtual Reality in the Training for Carotid Artery Stenting: The Perspective of Trainees Daniela Mazzaccaro1(&), Bilel Derbel2, Rim Miri2, and Giovanni Nano1 1

IRCCS Policlinico San Donato, Operative Unit of Vascular Surgery, San Donato Milanese, Milan, Italy {danymazzak83,giovanni.nano}@libero.it 2 La Rabta University Hospital, CardioVascular Surgery Department, Tunis, Tunisia [email protected], [email protected]

Abstract. INTRODUCTION: Virtual reality (VR) simulators have been proven to be a reliable tool to achieve experience for stenting of the carotid artery (CAS). We describe our experience in the use of virtual reality for the training of young surgeons for CAS procedures. METHODS: One hundred novice surgeons without any prior experience with endovascular interventions (group A) and 100 operators with prior experience in CAS (group B) carried on a virtual procedure of carotid stenting on a simulator in an easy and in a difficult case of carotid stenosis. They performed a tutored training on VR simulator for three hours and then performed again the same starting procedures. Data recorded from the starting and ending procedures were inserted in a database and assessed. RESULTS: All participants of group A ended the first procedure on the easy case within 1 h, while only 38% of the trainees concluded the starting procedure on the difficult case (P = .001). After training, 59% of the trainees concluded the difficult case (P = .02). After training, a significant improvement of some metrics was recorded in the group of trainees when the easy case was performed, but the improvement after training was not significant in the difficult case. CONCLUSIONS: VR simulator has proved to be an effective tool for the training of inexperienced operators in easy cases; conversely, its role is still unclear for more complex cases. Keywords: Virtual reality

 Carotid artery stenting  Training

1 Introduction For over two decades, the endovascular techniques have developed along with the traditional vascular surgical techniques, for the less invasive treatment of arterial lesions. The unstoppable technical and technological implementation, driven by medical-scientific research [1–4], therefore imposes on doctors the need for continuous

© Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 393–399, 2020. https://doi.org/10.1007/978-3-030-21005-2_38

394

D. Mazzaccaro et al.

and constantly updated training, both on a cultural level and in the acquisition of practical skills. It is precisely in view of this goal that our Centers have exploited a fruitful collaboration between University and Industry, which provided us with a simulator of endovascular interventions for the teaching of vascular surgery. In fact, VR simulators have been recently proposed as reliable tools to achieve experience in surgical procedures and among all in carotid artery stenting (CAS), without risk of damage to either surgeons or patients [5]. Great evidence has been brought to the fact that the more the experience in CAS, the lower the risk of peri-procedural complication and mortality rates during the “real life” procedures [6]. Moreover, VR simulators have already been proved to be of benefit for inexperienced operators who have to become familiar with basic skills [7]. Also. VR simulators can be reliable tools for the assessment of the acquisition of technical skills [8]. We describe our experience in the use of virtual reality for the training of young surgeons for CAS procedures.

2 Methods The study was approved by the local Ethical Committee. A hundred vascular surgeons and radiologists in training, without any prior experience with endovascular procedures were enrolled for the study (group A). After specific didactic instructions, they performed a virtual CAS procedure on a simulator (The Procedicus VIST system), in a right bifurcation carotid stenosis on both an anatomical “easy” case (type I aortic arch) and on an anatomical “difficult” case (arch type III). Then, they performed a training on the simulator for three hours, with the feedback of an experienced tutor about the most important errors committed during the virtual procedures. At the end of the training, all participants performed again the two starting cases. Data of both the starting and ending procedures were automatically recorded by the machine, such as the total time of the procedure and the number of catheter movements against the vessel’s wall (see Table 1). The results recorded by the group of trainees were then compared to those of 100 experienced interventional vascular surgeons and interventional radiologists who participated as a control group. Data collected were then inserted into a database and analyzed using the statistical software JMP 5.1.2 (SAS Institute). The paired 2-tailed Student t test was used to analyze normally distributed measurements recorded by the simulator before and after training. The “Chi-Square” test was used for non-parametric metrics. All values were reported using mean + 2SD. P values < .05 were considered statistically significant. The Procedicus VIST® system The Procedicus VIST® has been specifically designed as a virtual reality simulator for training in endovascular interventional procedures; the device is based on the technology of a dual processor (2  2.8 GHz) on a Pentium IV computer with a Microsoft Windows XP Professional operating system with 1 GB RAM, 40 GB of hard disk

The Role of Virtual Reality in the Training for Carotid Artery Stenting

395

Table 1. Virtual performances of non-experienced operators before and after training in both “easy” and “difficult” cases. Significant P values in bold. Metrics Time to end the procedure (min:sec) Contrast amount (cc) Time of scope (min:sec) Time to the placement of EPD (min:sec) Time to catheterisation of CCA (min:sec) Movements of the catheter against vessel wall Movements of the catheter without guidewire Catheter movements near lesion Movements of the wire near lesion EPD movements after deployment a. Embolic Protection Device Common b. Carotid Artery

Before training 29:15 ± 12:05 12.1 ± 9.2 21:22 ± 6:30 17:21 ± 2:13 12:22 ± 5:12 10.6 ± 4.8 3.1 ± 1.8 5.4 ± 4.2 4.8 ± 2.2 4.2 ± 3.1

After training 19:01 ± 8:04 10.3 ± 5 13:12 ± 5:08 10:21 ± 3:18 7:13 ± 4:12 5.2 ± 3.2 3.2 ± 1.9 5.1 ± 4 4.2 ± 2.3 4 ± 2.9

P value .001 .15 .03 .002 .02 .001 .21 .33 .17 .01

memory, 128 MB GeForce FX 5200 peripheral card and two 17-inch flat-panel monitors with touch-screen technology. The used devices are the same commonly used in real procedures (catheters, guides, filters, stents, balloons,…) and are detected by the simulator through a complex system of trolleys with optical readers, whose data are then processed by the dedicated software and finally interfaced with the operator through the fluoroscopy screen. The optical readers are extremely sensitive to the rotation and translation movements of up to three coaxial instruments (e.g. guide and catheter). The air flow of a syringe shows on the display the amount of contrast liquid injected, as well as the pressure exerted from a fluid that compresses and decompresses in the balloon expansion procedure of the vessel. Finally, through the activation, by pedals, of fluoroscopy or of the cine-angiography modality, is possible to record and review the performed angiography. The forces applied to the individual clinical instruments are perceived by voltage sensors, recalibrated by the same software at each start of the session, which record the force variations with sensitivity up to 0.025 N. Moving in translation is read by a sensor, which offers a resolution of 0.11 mm, while the one measuring the rotation offers a resolution between 7.9 and 31.4 milliradians. The diameter of the tools is measured by an infrared optical reader, which offers a resolution of 0.02 mm and has an accuracy of about ±15%. The software also allows you to load a “road map”, id est an angiographic imageguide of the vascular anatomy of the clinical case, to be used to support during the procedure. Finally, through a console, it is possible to vary the inclination of the operating table or move the fluoroscopy, according to the needs of the operator, to the different anatomical districts of the viral patient. The different types of guides, catheters, filters, flasks and stents are selected manually in the graphical interface and are virtually simulated by the computer, which

396

D. Mazzaccaro et al.

therefore allows the choice of instruments with curvatures, characteristics and dimensions appropriate to the individual clinical case. The simulator is also designed to provide tactile feedback to the operator (for example, a manual resistance can be appreciated when a guide passes through a curve or passes a stenotic lesion).

3 Results All the trainees concluded the starting case of “easy” CAS within 1 h, while only 38% of them successfully carried on the starting virtual procedure in the difficult case (P = .001). The remaining 62, in fact, voluntarily decided to drop. After three hours of training, 59% of the trainees carried out the difficult case (P = .02). Conversely, all participants of group B were able to conclude all the starting and ending simulations of both the easy and the difficult cases. After training, a significant improvement of some metrics was recorded in the group of trainees when the “easy” case was performed. As showed in Table 1, for example, the total time of the procedure was significantly reduced after training (29:15 ± 12:05 min:sec versus 19:01 ± 8:04 min:sec, P = .001), as well as the number of catheter movements against vessel wall (10.6 ± 4.8 versus 5.2 ± 3.2, P = .001), which in “real life” procedures is associated with an increased risk of complications. On the other side, the improvement after training was not significant in the “difficult” case, as showed in Table 2. Table 2. Virtual performances of non-experienced operators before and after training in “difficult” case. Significant P values in bold. Metrics Time to end the procedure (min:sec) Contrast amount (cc) Time of scope (min:sec) Time to the placement of EPD (min:sec) Time to catheterisation of CCA (min:sec) Movements of the catheter against vessel wall Movements of the catheter without guidewire Catheter movements near lesion Movements of the wire near lesion EPD movements after deployment a. Embolic Protection Device b. Common Carotid Artery

Before training 42:21 ± 15:50 13 ± 7 33:38 ± 13:11 26:43 ± 10:17 21:16 ± 8:14 29.4 ± 12.5 4 ± 2.8 6.8 ± 4.2 4.2 ± 2.8 6.1 ± 4.2

After training 39:02 ± 12:45 11.1 ± 6-.8 29:41 ± 12:03 25:31 ± 6:52 21:12 ± 9:12 30.2 ± 8.4 3.8 ± 2.6 5.8 ± 3.8 4.6 ± 2.3 5.9 ± 3.2

P value .21 .34 .54 .42 .32 .22 .18 .44 .21 .32

The Role of Virtual Reality in the Training for Carotid Artery Stenting

397

The virtual performances of the group of experienced operators were significantly better if compared to those of the trainees, in both the “easy” (Table 3) and the “difficult” CAS. Table 3. Performances of experienced interventionalists compared to those of non-experienced, in “easy” virtual procedure. Metrics Time to end the procedure (min:sec) Contrast amount (cc) Time of scope (min:sec) Time to the placement of EPD (min:sec) Time to catheterisation of CCA (min:sec) Movements of the catheter against vessel wall Movements of the catheter without guidewire Catheter movements near lesion Movements of the wire near lesion EPD movements after deployment a. Embolic Protection Device b. Common Carotid Artery

Before training 29:15 ± 12:05 12.1 ± 9.2 21:22 ± 6:30 17:21 ± 2:13 12:22 ± 5:12 10.6 ± 4.8 3.1 ± 1.8 5.4 ± 4.2 4.8 ± 2.2 4.2 ± 3.1

After training 16:21 ± 2:03 6.4 ± 2.8.8 7:12 ± 2:14 6:21 ± 2:11 5:22 ± 2:52 6.4 ± 2.8 2.9 ± 1.4 2.2 ± 0.6 1.4 ± 0.2 0.84 ± 0.6

P value .02 .03 .03 .001 .01 .001 .06 .02 .01 .01

4 Discussion The training of resident and young surgeons is a discussed topic in the scientific literature [9]. In most cases trainees must build their own “live” experience directly on the patient, which could be harmful for both the operator and the patient themselves, especially when procedures have a certain degree of difficulty. In particular, surgeons who will perform a CAS procedure, have the need to build an appropriate learning curve, however the lack of experience in the first phase of their learning curve may result in particular intraprocedural maneuvers that can lead to intraoperative complications [10]. The use of VR simulators was approved in 2004 by the Food and Drug Administration (FDA). VR simulators were judged to be reliable tools for the acquisition of experience in the training for endovascular procedures [11]. Since then, a consistent number of studies have been performed to bring evidence to the realism and potential benefits of these devices in such training processes [12, 13]. The reduction of time and expense for learning new skills, and the total safeness for the acquisition of these skills are well recognized benefits of VR [14]. Nevertheless, VR simulators haven’t been formally included in training programs yet [5]. In our study, VR simulators proved to be a valid and effective tool for the training of inexperienced operators. However, this was better demonstrated when the virtual procedure was easier, while the training was not as effective when the procedure was more difficult.

398

D. Mazzaccaro et al.

The use of the simulator since the first years of the degree course in medicine and surgery aims to gradually bring the student closer to the most innovative practices of vascular surgery. Thanks to the simulator, the surgeon can train and improve his practical skills, optimizing learning time and reducing the risks associated with multiple maneuvers. Through our daily practice associated with the simulation of virtual reality, a scale of risk has been established in the execution of the CAS procedures of which all of us, from the less expert to the most advanced, should take it into account. Some may argue that difficult performances could need more than three hours of training. For example, Van Herzeele et al. [15] described the benefit achieved on virtual reality metric performances of experienced operators after a two-day training course. However, none of the studies reported in the literature has shown yet the possible transfer of competence gained during VR to “in vivo” procedures. This could be an excellent point of discussion for future prospective randomized studies, if the ultimate goal would be to use VR as a “stand alone” training tool prior to performing CAS procedures.

5 Conclusion “Easy” virtual CAS procedures of non-experienced operators significantly improved after training. On the other side, “difficult” virtual CAS procedures did not improve significantly after training.

References 1. Cherdal, S., Mouline, S.: A Petri Net model to simulate and analyse cerebral folate deficiency and hyperhomocysteinemia effects in Autism Spectrum Disorder. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 279–284 (2016) 2. Jebali, N., Beldi, S., Gharsallah, A.: RFID antennas implanted for pervasive healthcare applications. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 149–152 (2016) 3. Trabelsi, N., Aloui, K., Sellem, D.B.: 3D Active Shape Model for CT-scan liver segmentation. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 161–165 (2016) 4. Loussaief, S., Abdelkrim, A.: Machine learning framework for image classification. In: 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 58– 61 (2016) 5. Coates, P.J.B., Zealley, I.A., Chakraverty, S.: Endovascular simulation is of benefit in the acquisition of basic skills by novice operators. J. Vasc. Interv. Radiol. 21, 130–134 (2010) 6. Willaert, W.I., Aggarwal, R., Van Herzeele, I., Plessers, M., Stroobant, N., Nestel, D., Cheshire, N., Vermassen, F.: Role of patient-specific virtual reality rehearsal in carotid artery stenting. Br. J. Surg. 99, 1304–1313 (2012)

The Role of Virtual Reality in the Training for Carotid Artery Stenting

399

7. Willaert, W.I., Cheshire, N.J., Aggarwal, R., Van Herzeele, I., Stansby, G., Macdonald, S., Vermassen, F.E.: European Virtual Reality Endovascular Research Team (EVERest). Improving results for carotid artery stenting by validation of the anatomic scoring system for carotid artery stenting with patient-specific simulated rehearsal. J. Vasc. Surg. 56, 1763– 1770 (2012) 8. Neequaye, S.K., Aggarwal, R., Van Herzeele, I., et al.: Endovascular skills training and assessment. J. Vasc. Surg. 46, 1055–1064 (2007) 9. Willaert, W.I., Van Herzeele, I.: Carotid artery stenting - strategies to improve procedural performance and reduce the learning curve. Interv. Cardiol. 8(1), 50–56 (2013) 10. Clinical competence statement on carotid stenting: training and credentialing for carotid stenting - multispeciality consensus recommendations (A report of the SCAI/SVMB/SVS Writing Committee to develop a clinical competence statement on carotid interventions). J. Vasc. Surg. 41, 160–168 (2005) 11. US Food and Drug Administration Center for Devices and Radiological Health Medical Devices Advisory Committee Circulatory System Devices Panel Meeting (2004). http://www.fda.gov/ohrms/dockets/ac/04/transcripts/4033t1.htm. Accessed 25 Oct 2016 12. Kirkman, M.A., Ahmed, M., Albert, A.F., Wilson, M.H., Nandi, D., Sevdalis, N.: The use of simulation in neurosurgical education and training. A systematic review. J. Neurosurg. 121, 228–246 (2014) 13. Mazzaccaro, D., Nano, G.: The use of virtual reality for carotid artery stenting (CAS) training in type I and type III aortic arches. Ann. Ital. Chir. 83, 81–85 (2012) 14. Ahmed, K., Keeling, A.N., Fakhry, M., et al.: Role of virtual reality simulation in teaching and assessing technical skills in endovascular intervention. J. Vasc. Interv. Radiol. 21, 55–66 (2010) 15. Van Herzeele, I., Aggarwal, R., Neequaye, S., et al.: Experienced endovascular interventionalists objectively improve their skills by attending carotid artery stent training courses. Eur. J. Vasc. Endovasc. Surg. 35(5), 541–550 (2008)

Bio-Inspired EOG Generation from Video Camera: Application to Driver’s Awareness Monitoring Yamina Yahia Lahssene1(&), Mokhtar Keche1,2, and Abdelaziz Ouamri1,2 1

University of Sciences and Technology of Oran Mohamed Boudiaf (USTO-MB), Oran, Algeria [email protected] 2 Signals and Images Laboratory, Electronique, University of Sciences and Technology of Oran Mohamed Boudiaf (USTO-MB), Oran, Algeria

Abstract. Electrooculogram (EOG), while being a source of useful information. Many researches and applications such as Human computer interaction (HCI) and physiological state monitoring rely match more on it then other resources like video surveillance. In our work we’re interested in awareness level classification for driver drowsiness detection, in the purpose of improving road security. However, EOG acquisition requires placing electrodes around eye’s subject all the time which is not comfortable and convenient for many applications. Therefore, we propose a generation of a Bio-inspired EOG signal (pseudo EOG) using video camera. As for EOG signal, the bio-inspired one contains useful information of eye state and movements in time. Critical features were extracted or calculated from the bio-inspired EOG signal and used as inputs of a fuzzy logic classification of the awareness level in time. Keywords: EOG  Bio-inspired EOG  Physiological state classification Fuzzy logic  Driver awareness monitoring



1 Introduction The physiological signal EOG represents eye movements as well as subject condition such as the state of drowsiness which makes it highly useful in detecting changes in the physiological states and could be useful in applications such as human-computer interaction (HCI) in general, based on behavior analysis [1, 2] or more precisely, as an awake-sleep detection system like our case [3, 4]. Indeed, EOG signal has proven its efficiency and accuracy for drowsiness detection [5–7]. Eye movements are represented by EOG signal such as blinking, from which different parameters could be extracted and used as an indicator for fatigue diagnostics [8, 9]. For the same purpose, other works exploited electroencephalogram signal (EEG) [10], next to [11, 12] where EEG was fused with EOG to enhance the results. But, for those approaches, sensing electrodes must be directly attached onto the driver’s body, which may be distracting and annoying for him. In addition, the presence of artifacts electromyogram (EMG) in the EOG affects the extracted information. © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 400–409, 2020. https://doi.org/10.1007/978-3-030-21005-2_39

Bio-Inspired EOG Generation from Video Camera

401

Therefore, in this paper, we develop an approach that provides a signal similar to the EOG signal using a practical and comfortable tool like a camera. As described by Picot and other authors. [13], using high frame rate video such as 200 fps can replace the EOG signal for the extraction, with the same accuracy, of many eye blinking parameters. However high frame rate video database are still not obtainable. Therefore, in this work, we developed an approach that provides a signal similar to the EOG signal using a practical and comfortable tool: Camera instead of electrodes. Our goal is to extract, from the video signal, information of the same relevance as the one that can be extracted from the EOG signal, using a slower camera (60 fps) of a reasonable price, which generates a video sequence that requires simpler treatments. than those required by a 200 fps camera. The advantages brought out by the proposed approach are threefold: providing an information as relevant as the one gotten by the physiological EOG signal, extending systems based on EOG such as, the drowsiness detection system, to the video (monitoring by vision), getting rid of the electrodes, i.e. providing more comfort to the user. To validate our approach, we used it to detect the drowsiness of the driver while evaluating his level of awareness. For the classification purpose, we exploited the fuzzy logic method as proposed by Antoine Picot [9], to test and compare the efficiency of two different combinations of features as inputs for the fuzzy logic classifier [9]. To validate our proposed approach, we created a video database of 10 persons, recorded using a camera 60 fps with a 640  480 resolution. Each video includes the transition between the two phases: awake and drowsy. The rest of this paper is as follows: Sect. 2 presents EOG signal and its acquisition next to our method to generate a pseudo-EOG signal along with its application to drowsiness detection. In Sect. 3 we summarize and discuss the experimental results along with the advantages and limitations of our method. Finally, a conclusion and future work described in Sect. 4.

2 Background and Proposed Method 2.1

Electrooculogram Definition

The human eye is considered as an electrical dipole. Its positive pole is the cornea while the retina represents its negative pole. The eye muscle electrical activity, known as EOG signal, can be measured as the eye-movement changes using two electrodes attached around the eye. EOG signal is usually exploited to extract and describe eye movements especially vertical ones such as blinks [9]. It is performed at a high sample rate (from 250 to 500 Hz) which makes the characterization of eye blinking very precise. In EOG signal, blinks can be described through different parameters [13]. Those parameters can be classified into three main categories: Amplitude parameters, duration parameters and velocity (speed) parameters (Fig. 1).

402

Y. Yahia Lahssene et al.

Blinking Eye closed

Eye opened

Fig. 1. Sample of EOG signal from CEPA database.

2.2

Bio-Inspired EOG Generation

The proposed method consists of generating a pseudo EOG from a video surveillance that includes the deriver’s face. This video was recorded using a 60 fps camera (Fig. 2).

Fig. 2. Sequence of an eye blinking captured using a 60 fps camera.

To detect the eye blinking we need to locate the eye position, which requires first the location of the zone that contains the face in the image. For this we chose to apply a simple method, which consists of a binarisation of the original gray scale image, followed by a detection of the face’s contour. The eyes are then located by using horizontal projection (horizontal intensity average) [14]. As illustrated in Fig. 3, the two most significant valleys of horizontal intensity average indicate eyebrow and upper eyelid, and the distance between them represents the amplitude of eye opening (or closing), calculated for each frame of the video.

Fig. 3. Horizontal intensity average: (a) Opened eye, (b) Closed eye.

Bio-Inspired EOG Generation from Video Camera

403

The pseudo EOG signal representing vertical movements of the eye, i.e. eye blinking is the eyebrow-upper eyelid distances calculated from each one of the successive video frames. Figure 4 shows a sample of such a signal, whose shape is similar to that of an EOG signal sample in Fig. 1, from a database of the CEPA (Centre d’Etudes de Physiologie Appliquée, Strasbourg).

Blinking

Eye closed

Eye opened

Fig. 4. Sample of our generated pseudo EOG.

Many parameters were extracted from the pseudo EOG signal and its derivative which, in fact, presents the speed of the signal variation in time. These parameters may be instantaneous or averaged over time. Obviously, the later ones are more pertinent for the estimation of the driver’s state (awake or drowsy). Indeed, the best results were obtained by using parameters calculated every second by averaging over a sliding window of 20 s. The parameters selected for the classification are defined below: • • • •

The time measured between the half rise amplitude to the half fall amplitude (D50). The percentage of time when the eye is closed at least at 80% (80% PERCLOS). The frequency of blinking (F). The ratio between the blinking amplitude and its maximum speed during the closing period (A/PCV).

It has been observed from the experiments carried out that when the subject’s state changes from awake to sleepy, the time parameters increase (blinking become slower), while the speed parameters decrease. It has also been found that for most of the subjects the amplitude parameters are not discriminating. 2.3

Awarness Level Classification

In order to classify driver’s state, we implemented a Fuzzy logic as a qualitative classification model [14–17]. The first step of this classifier is to transform each parameter into a positive fuzzy variable not greater than 1. It is known as the membership degree to the “drowsy” state: the closer to 1 is the membership degree, the higher the probability of being drowsy. The membership function ðfa;bðxÞ Þ is defined by Eq. (1). The coefficients (a) and (b) are presented below as Eqs. (2) and (3) respectively.

404

Y. Yahia Lahssene et al.

8 < fa;bðxÞ

0 si x  a a  x  b 1 si x  b

xa si : ba

ð1Þ

fa ¼ sF  0; 25: sF

ð2Þ

fb ¼ sF þ 0; 25: sF

ð3Þ

According to [14, 15], (sF) is the threshold selected for each feature to standardize the membership function. The values −0,25 and +0,25 in Eqs. (2) and (3) are added to centre the membership function on the threshold (sF). The corresponding membership degrees of features D50, P80, F and A/PCV are µD50, µPERCLOS, µF and µ(A/PCV) respectively. The second step consists of merging the fuzzy variables at each time i, using the Eq. (4). The fusion variable is computed at each instant as the average of the different membership degrees. lfusion ðiÞ ¼

1X l ðiÞ; j 2 fD50; PERCLOS; F; ðA=PCVÞg n j j

ð4Þ

The value of µfusion is between 0 and 1. In our case, the physiological state is considered “drowsy” if µfusion takes 1 but for the “awake” state it takes 0. The driver’s state is classified as “drowsy” if µfusion is greater than 0.5, and as “awake” otherwise. For a comparative study, we applied it with the same rules and using the same four inputs features as in [14, 15]. These features are: The D50, the 80% PERCLOS, the A/PCV and the blinking frequency F extracted from the pseudo EOG signal. After considering each one of the four previous features alone we found that D50 and A/PCV are more pertinent then the rest so we used only the two of them as inputs of the fuzzy logic classifier [9] to compare their results with those obtained using the four features mentioned earlier.

3 Results and Discussion We have constructed our own database of videos of 10 persons, were recorded using a point gray camera operated at 60 fps with a 640  480 resolution. Each video is composed of two phases, during the first phase the person is awake, whereas drowsy during the second phase. Along with each video, the ground truth which consists of the beginning of the drowsy phase is stored in the database. This provides a video quality that is sufficient to generate a pseudo EOG signal. In what follows the results of our bio-inspired study next to its application to the classification of awareness level and drowsiness detection corresponding to the subjects of our database (Fig. 5).

Bio-Inspired EOG Generation from Video Camera

405

220

Signal vidÈoEOG bio-inspirÈ Bio-inspired signal Blinking localization

Distance upper eyelid-eyebrow

215

210

205

200

EYE CLOSED 195

190

EYE OPENED

185

180

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

frame

Fig. 5. Subject 2: Pseudo EOG signal (bio-inspired EOG) and blinking localization.

3.1

A. Classification Using Fuzzy Logic

It is clear from Fig. 6a that the decision based on each blinking independently is not reliable, since unexpected slow blinking may occur while the subject is awake and also, fast blinking can be present during drowsiness phase, due to external effects. On the

1

0.9

Class 1 'Drowsy'

0.8

Probability: P

0.7

0.6

0.5

0.4

Fusion of instantaneous blinking parameters (D50, A/PCV, PERCLOS). Decision each blinking (0) -> Class 0: Awake, (1) -> Class 1: Drowsy.

0.3

Class 0 'Awake'

0.2

0.1

0

0

5

10

15

20

25

30

35

40

Blinking

(a) 1

0.9

0.8

Probability: P

0.7

0.6

Class 1 'Drowsy'

0.5

0.4

Fusion of averaged parameters (4p: D50, A/PCV, F, PERCLOS) instantaneous decision (0) -> Class 0: Awake, (1) -> Class 1: Drowsy. Ground truth (0: Awake, 1: Drowsy)

0.3

0.2

Class 0 'Awake'

0.1

0

0

20

40

60

80

100

120

140

Sec

(b) Fig. 6. Classification of subject 9’ physiological state using Fuzzy logic, (a) based on single blinking features (instantaneous decision), (b) based on averaged blinking features.

406

Y. Yahia Lahssene et al.

other hand, using the averages of the blinking features (Fig. 6b) reduces the decision errors. Therefore, in the following, only the results obtained using averaged parameters will be presented. The decision on physiological state of the subject 2 using fuzzy logic based on the combination: (D50, A/PCV, PERCLOS and F) and (D50 and A/PCV) are given by the figures Figs. 7 and 8 respectively: 1

0.9

0.8

Class 1 'Drowsy'

Probability: P

0.7

0.6

0.5

Class 0 'Awake'

0.4

0.3

Real case (0: Awake, 1: Drowsy) Fuzzy logic (4p: D50, A/PCV, F, PERCLOS) Decision using fuzzy logic 4p (0) -> Class 0: Awake (1) -> Class 1: Drowsy

0.2

0.1

0

0

20

40

60

Sec

80

100

120

140

Fig. 7. Fuzzy logic classification of Subject 2’ state, using 4 features (D50, A/PCV, PERCLOS, and F).

1

0.9

0.8

Class 1 'Drowsy'

Probability: P

0.7

0.6

0.5

Class 0 'Awake'

0.4

Real case (0: Awake, 1: Drowsy) Fuzzy logic (2p: D50, A/PCV) Decision using fuzzy logic 2p (0) -> Class 0: Awake (1) -> Class 1: Drowsy

0.3

0.2

0.1

0

0

20

40

60

Sec

80

100

120

140

Fig. 8. Fuzzy logic classification of Subject 2’ state, using 2 features (D50 and A/PCV).

We present in the following table the results of classification of the physiological state of each one of the ten subjects, using fuzzy logic model [13, 14] with an input combination of four parameters (D50, A/PCV, PERCLOS and F) then those of two parameters (D50 and A/PCV) (Fig. 9) (Table 1).

Bio-Inspired EOG Generation from Video Camera

407

120

Real Instant of transition from class 0 (Awake) to class 1 (Drowsy) Instant of transition obtained by fuzzy logic (4P) from class 0 to class 1 Instant of transition obtained by fuzzy logic (2P) from class 0 to class 1

100

Sec

80

60

40

20

0

1

2

3

4

5

6

7

8

9

10

Subject

Fig. 9. Comparison between the true instant of transition between the two physiological states (awake and drowsy), and those estimated using fuzzy logic model with 4 features (D50, A/PCV, PERCLOS, and F), and 2 features (D50 and A/PCV) for the ten subjects.

Table 1. True Transition Instant (T.T.I) between the awake and drowsy states, and those estimated using fuzzy logic (FL.T.I.4P) based on two features (D50 and A/PCV), then, on four features (D50, A/PCV, PERCLOS, and F), for ten subjects. Subject 1 2 3 4 5 6 7 8 9 I.T.R (sec) 47 62 99 81 64 75 53 56 44 FL.T.I.4P (sec) 54 76 105 51 70 78 39 47 47 FL.T.I.2P (sec) 54 63 105 51 70 78 39 47 47

3.2

10 40 27 27

Interpretation of the Presented Results

We evaluated the classification results considering the instant of transition from the awake to drowsy states as a criterion, It’s found that some results are too close from the real instant of transition, such as: subject 6 and 9, although, we note that incorrect classifications taking place during the period when the subject is awake (False alarms) are considered not serious, on the other hand, using 2 and 4 parameters, we can state that: The fuzzy logic classification [13, 14] results obtained using two parameters (D50, A/PCV) and four parameters (D50, A/PCV, PERCLOS, F) are the same for all subjects, except subject 2 for which the detection of the drowsiness state occurs with a delay of 14 s, when using 4 parameters, and just 1 s when using 2 parameters.

4 Conclusion and Future Work In this work we, first, proposed a method of a pseudo EOG generation from video signal of the driver’s face, which is recorded by a medium speed camera (not expensive). Using a camera instead of electrodes is certainly more comfortable for the driver and offers the possibility of extrapolating the techniques applied with the EOG signal to the pseudo EOG signal. The idea of generating a signal similar to EOG signal, from a video, has already been discussed by Antoine Picot [13, 14] [14, 15]. However,

408

Y. Yahia Lahssene et al.

this method requires a high-speed camera of 200 fps, and therefore complex treatments. Our method has the advantage of requiring a relatively slower and thus cheaper camera and simpler treatments. The second part of the paper consists of an application of the bio-inspired EOG instead of EOG itself. Therefore, many features have been extracted from our generated pseudo EOG signal of three main categories: duration, amplitude and speed describing eye blinks. Using the features averaged over a sliding window instead of instantaneous features allows pruning short blinking misplaced and so improves the classification results. Also we found that using 4 parameters (D50, A/PCV, PERCLOS, F) is nearly similar to using 2 parameters (D50 and A/PCV) as inputs of the classifier. In order to exploit most of the parameters extracted we attend, in a future work, to implement other classifiers that may give better results such as the SVM [17, 18] classifier which, unlike the fuzzy logic method, does not require prior determination of decision thresholds corresponding to input features therefore, the freedom of choosing any parameter from pseudo EOG signal and its derivative as inputs of classification system.

References 1. Jaouedi, N., Boujnah, N., Htiwich, O., Bouhlel, M.S.: Human action recognition to human behavior analysis. In: 7th International Conference On Sciences Of Electronics, Technologies Of Information And Telecommunications (SETIT), pp. 263–266 (2016) 2. Smari, K., Bouhlel, M.S.: Gesture recognition system and finger tracking with kinect: steps. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 544–548 (2016) 3. Kumar, D., Poole, E.: Classification of EOG for human computer interface. In: Proceedings of second joint EMBS/BMES Conference Houston, pp. 23–26. TX, USA, Oct (2002) 4. Robert, J., Jacob, K.: Eye Movement-Based Human-Computer interaction techniques: Toward Non-Command Interfaces. Advances in human computer interaction 4, 151–190 (1993) 5. James, B., Sharabaty, H., Esteve, D.: Automatic EOG analysis: a first step toward automatic drowsiness scoring during wake-sleep transitions. Somnologie 12, 227–232 (2008) 6. Jiao, Y., Yong, P., Bao-Liang, L., Xiaoping, C., Shanguang, C., Chunhui, W.: Recognizing slow eye movement for driver fatigue detection with machine learning approach. In: IEEE, International Joint Conference on Neural Networks (IJCNN), pp. 4035–4401 (2014) 7. Xuemin, Z., Wei-Long, Z., Baa-Liang, L., Xiaoping, C., Shanguang, C., Chunhui, W.: EOGbased drowsiness detection using convolutional neural networks. In: IEEE, International Joint Conference on Neural Networks (IJCNN), pp. 128–134 (2014) 8. Galley, N., Scleicher, R., Galley, L.: Blink parameter as indicators of driver’s sleepinesspossibilities and limitations. 16.09.2003 9. Picot, A.: Driver drowsiness detection using both physiological and video information. PhD thesis PhD Thesis in Automatic and Production, the Doctoral School of Electronics, Electrotechnics, Automation and Signal Processing, laboratory GIPSA-Lab / DA (2009) 10. Poorna, S.S., Arsha, V.V., Aparna, P.T.A., Gopal, P., Nair, G.J.: Drowsiness detection for safe driving using PCA EEG signals. In: P. Pattnaik, S. Rautaray, H. Das, J. Nayak (eds) Progress in Computing, Analytics and Networking. Advances in Intelligent Systems and Computing, vol 710. Springer (2018)

Bio-Inspired EOG Generation from Video Camera

409

11. Golz, M., Sommer, D., Chen, M., Mandic, D., Trutschel, U.: Feature fusion for the detection of microsleep events. Springer Science + Business Media, LLC. Manufactured in The United States. J. VLSI Signal Processing 49, 329–342 (2007) 12. Lan-lan Chen, Y., Zhao, J. Zhang, J.Z. Zou: Automatic detection of alertness/drowsiness from physiological signals using wavelet-based nonlinear features and machine learning. Elsevier Ltd, Expert Systems with Applications 42, 7344–7355 (2015) 13. Picot, A., Caplier, A., Charbonier, S.: Comparison between EOG and high frame rate camera for drowsiness detection. In: Proc. of the IEEE Workshop on Applications of Computer Vision, Snowbird, USA (2009) to be published 14. Parmar, N.: Drowsy driver detection system. In: Technical report, Department of Electrical and Computer Engineering, Ryerson University (2002) 15. Picot, A., Charbonnier, S., Caplier, A.: On-Line detection of drowsiness using brain and visual information. In: IEEE transactions on systems, man, and cybernetics-Part A: systems and humans, vol. 42, No. 3, May (2012) 16. Torjemen, N., Zhioua, G.E.M., Tabbane, N.: QoE model based on fuzzy logic system for offload decision in HetNets environment. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 1–6 (2016) 17. Fillali, F., Chettaoui, N., Bouhlel, M.S.: Towards the automatic evaluation of the quality of commercially oriented web interfaces. In: Proceedings of the IEEE 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), pp. 237–245 (2016) 18. Awad, M., Khanna, R.: Support Vector Machines for Classification. Efficient Learning Machines. Apress, Berkeley, CA (2015)

A Memory Training for Alzheimer’s Patients Fatma Ghorbel1,2(&), Elisabeth Métais1, Fayçal Hamdi1, and Nebrasse Ellouze2 1

CEDRIC Laboratory, Conservatoire National des Arts et Métiers (CNAM), Paris, France [email protected], {metais,faycal.hamdi}@cnam.fr 2 MIRACL Laboratory, University of Sfax, Sfax, Tunisia [email protected]

Abstract. Numerous studies have confirmed that Alzheimer’s patients may benefit from memory rehabilitation processes. In this context, we propose a non pharmacological training, named Autobiographical Training. It is an adaptive and accessible “Question/Answer” training. It has a purpose to stimulate the patient’s memory. It is proposed in the context of a memory prosthesis called Captain Memo. (1) Autobiographical Training do not use general facts or false examples, but it automatically propose for each user their specific questions related to his life e.g., events that he lived. (2) It adjusts automatically the level of difficulty of the generated question depending on the progression of Alzheimer’s disease. (3) It supports multilingualism and multiculturalism. (4) It offers accessible user interfaces. We evaluated the accessibility and the usability of Autobiographical Training. 18 participants entered the study. The results confirmed that it is accessible and the frequent use of this training helps patients in reminding some information, in which they received the training. Keywords: Alzheimer’s disease  Memory training  Autobiographical question  Adaptive user interface  Accessibility Multilingualism  Multiculturalism



1 Introduction The number of Alzheimer patients is increasing [1]. Indeed, the Alzheimer’s disease association reported that, in 2016, the number of patients is estimated at nearly 44 million. In 2040, with aging of the populations, this number will rise progressively to about 80 million [2]. Consequently of the rising number of patients and the correspondent social costs in term of carefulness, care for their problems is raised [1]. Their capabilities ought to be protected to preserve their life quality [3, 4]. The current pharmacological interventions have limited efficacy in slowing the Alzheimer’s disease symptom progression and have many side effects [5, 6]. Thus, a great number of Alzheimer’s patients and/or their families refuse to use pharmacotherapies to face Alzheimer’s disease. Numerous studies show that non pharmacological enhancement strategies are more efficacious compared to the pharmacological ones [7]. While many other studies support the multi-factorial therapeutic approach for © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 410–419, 2020. https://doi.org/10.1007/978-3-030-21005-2_40

A Memory Training for Alzheimer’s Patients

411

Alzheimer’s disease, include pharmacological and non pharmacological interventions [5, 9]. Non pharmacological interventions have a significant role in delaying Alzheimer’s disease progression [1, 3, 5, 6, 8] reducing functional impairments [3, 5, 6] and helping in retaining some data longer [3, 9]. So, it helps in improving the quality of life of Alzheimer’s patients as well as their caregivers [3, 5, 6, 8, 9]. These training may be completely computerized [4, 6]. In this paper, we propose a non pharmacological memory training, called Autobiographical Training, for Alzheimer’s patient. It is a “Question and Answer” training that attempt to preserve the patient’s reminding abilities. The highlight of our work is that it keeps into account the patient’s profile: (1) It generates for each user his own questions based on his private life. (2) It adjusts automatically the level of difficulty of these questions according to the progression of the disease. (3) It supports multilingualism and multiculturalism. (4) It offers accessible user interfaces and interactions. The remainder of the paper is structured as follows. In the Sect. 2, we present Autobiographical Training. Section 3 presents its evaluation. In Sect. 4, we details related work. In Sect. 5, we conclude and we give perspectives.

2 Overview of Autobiographical Training Autobiographical Training is an adaptive and accessible “Question and Answer” training. It takes aim to refresh the patient’s memory. It is proposed as a service of the Captain Memo memory prosthesis [10]. 2.1

General Algorithm

In contrast to related work which includes general facts or false examples, Autobiographical Training uses the patient’s private life as a knowledge source input. The questions are based on information that the patient or his surroundings introduced before. Each patient has his own collection of questions. The algorithm takes into account the different stages of Alzheimer. It defines 3 levels of difficulty of questions LJ = {L1, L2, L3}. L1 is the easiest and gradually getting harder through L3 which is the hardest. L1 reminds the patient about his firstdegree relatives and itself, without going into details e.g., “what is your job?” and “what is the name of your son?”. L2 is based on the patient’s second degree relatives e.g., “is Alice black-haired?”. L3 reminds the patient about details related to his family, surroundings and events that he lived e.g., “when you travelled to Paris?”. The first questions of L1 are very easy e.g., “what is your mane?”. The aim is to allow the patient experience some level of success to encourage him to use frequently our training. Each question is identified by a unique number QI/J, i.e., the question I of the level of difficulty J. The algorithm starts each exercise session with Q1/1. If the patient successfully answers 10 questions of LJ, the rest of the questions are ignored and it restarts asking the questions of LJ+1.

412

F. Ghorbel et al.

After each question, the Alzheimer’s patient is asked to give the answer. If the answer is incorrect, the right one is displayed, and the question is repeated again later. For each level of difficulty LJ, we define a set of questions CJ. CJ consists of 80 different questions CJ = {Q1/J… Q80/J}. Q1/J is the easiest, gradually getting harder through to the last question Q80/J which is the hardest. For all levels, the questions and the associated order are validated by a neurologist doctor. Autobiographical Training aims to remind the patient about his family. However interpersonal relations depend on the language and the culture. The transition of a link from one culture or language to another is not solved with a simple translation of terms. The algorithm takes into consideration the differences existing between languages (Arabic, French and English) and culture. Taking the example of Q2/2 that is based on the cousinhood family link. In the French language, there exist 2 terms that represent this relations based on the gender of cousin and 8 terms in the Arabic language capturing in addition to the cousin’s gender, the mother/father lineage as well as its gender. However, only one term exists related to this relationship in English. For each question QI/J, we define an algorithm ALQI/J. It takes as input the unique identifier I/J of QI/J and the chosen language/culture. It returns the corresponding alternative of QI/J. Algorithm 1 presents a part of ALQ2/2. Algorithm. 1. ALQ2/2. Switch (Language) Case English : Q2/2:= “What is the name of your cousin?” Case French : if (Gender of cousin == “Male”) then Q2/2 := « Donner le prénom de votre cousin? » else Q2/2 := « Donner le prénom de votre cousine? » Case Arabic : If (Gender of the patient = = “Female”) Then Q2/2:= « ‫; » ﻣﺎ ﻫﻮ ﺍﺳﻢ ﺯﻭﺟﻚ ﺍﻟﺤﺎﻟﻲ؟‬ Else If (The patient is polygamous) Then Q2/2:= « ‫; » ﻣﺎ ﻫﻲ ﺃﺳﻤﺎء ﺯﻭﺟﺎﺗﻚ ﺍﻟﺤﺎﻟﻴﺔ؟‬ Else Q2/2:= « ‫; » ﻣﺎ ﻫﻮ ﺍﺳﻢ ﺯﻭﺟﺘﻚ ﺍﻟﺤﺎﻟﻴﺔ؟‬

For each question QI/J, an algorithm ALVI/J is associated to verify if the response of QI/J is available or not (from the information introduced before by the patient or the caregiver). If the response is not available, QI/J is skipped. Else, it is displayed. After each exercise session, our training displays the total of the asked questions, the current level of difficulty and the percentage of the correct answers (“Successful Score”). The intervention supports ongoing evaluation of person’s ability to respond to

A Memory Training for Alzheimer’s Patients

413

the system tasks, as comparing the “Successful Score” to previous sessions. The algorithm of the Autobiographical Training is the following: Algorithm. 2. The general algorithm of Autobiographical Training. Inputs: Information introduced by the Alzheimer’s patient or the caregiver. Outputs: A set of questions, successful score. Language := Arabic, French or English; I := 1 ; J := 1 ; /* Initialize LJ with L1 and Initialize QI/J with Q1/1 */ while (The patient use the training AND the system has more question) do CJ := CJ-QI/J ; /* Deleting QI/J from the queue CJ */ QI/J := ALQI/J; (I/J , Language); if (ALVI/J (QI/J) return “Not available”) then QI/J is skipped ; else QI/J is asked / displayed ; if ( The response is correct) then A successful feedback is displayed ; else A wrong feedback is displayed, The right response is displayed ; CJ := CJ+QI/J ; /*Add QI/J into the queue CJ, re ask QI/J */ end if end if if (I==80/* CJ has not more question*/ OR The user responses 10 correct responses successfully) then J :=J+1;/* The next level of difficulty (LJ+1) */ I := 0; end if I := I+1; /* The next question (QI+1/J) */ End while.

2.2

Designing the User Interfaces

Autobiographical Training aims to offer accessible user interfaces for Alzheimer’s patient. The design is based on our 146 accessibility guidelines for designing user interfaces for Alzheimer’s patients, presented in [11]. Visibility: To maximize the contrast, we choose an orange background and a black foreground color. We use 26 pt type size for text, an Arial font type, a medium face type and lower cases. Larger images are used. Multimodalities: The output modalities are voice/audio and vision. The input modalities are mouse + keyboard, speech to text, and touch. We provide vocal and textual feedbacks which have the same messages. We give two modalities to type the response: dictation and keyboard. We provide possibility to alternating between the two modes. Auditory assistance: We use an auditory background to assist the patients in their interactions. We use higher volume and a male voice. We let user adjust the volume and the text size themselves. We use both natural speech (a male voice recorded) and synthetic speech.

414

F. Ghorbel et al.

Feedback and support: We provide a feedback message after responding each question. We give short and easy to understand messages. Two different tones are used for error and successful entries. Understandability: We provide easy to understand textual and pictorial content. Graphical metaphors are used to ease the text understanding e.g., “sad emoticon” metaphor related to the wrong feedback. Fun: We resort to a funny Disney-parrot. It is known for intelligence and repeating human sentences. We provide funny emoticons. We ask some droll questions e.g., “Do you wear clothes?”. Figure 1 shows an example of a quiz question in Arabic language (Q2/2).

Fig. 1. Screenshot which shows the quiz question Q2/2 presented in Arabic language.

3 Evaluation of Autobiographical Training We evaluated the accessibility and the usefulness of Autobiographical Training. A total of 18 Alzheimer’s patients {ADP1, ADP2… ADP18} were recruited to participate in this study. Most participants were living in a nursing home in Sfax/Tunisia. We asked each patient’s legal sponsor for the consent letter. Their Mini Mental State Examination (MMSE)1 [12] scores were ranged from 8 to 27 at the baseline. {ADP1, ADP2…ADP9} were early stage Alzheimer’s patients (MMSE score better than 20). {ADP10, ADP11… ADP15} were moderate stage Alzheimer’s patients (MMSE score more than 10 and less 1

MMSE is a questionnaire to measure cognitive impairment. Scores range from 0 to 30. A higher scores indicate better performance. According to [12], MMSE below than 10 means “late stage”, MMSE between 10 and 20 means “mild stage” and MMSE better than 20 means “early stage”.

A Memory Training for Alzheimer’s Patients

415

than 20). The others were late stage Alzheimer’s patients (MMSE score below than 10). They were aged between 55 and 81 years old (median is 64). Only very few participants had a background in using computer and related technologies. We excluded participants with overt behavioral disturbances, sever aphasia and sever auditory and/or visual loss. 3.1

Accessibility Evaluation

The evaluation consisted on one test session to each patient. Its duration depends on the cognitive performance of the Alzheimer’s patient. At mean, it was about one hour and 18 min. At the beginning of each session, we gave a brief introduction and a live demo of Autobiographical Training. Table 1. Results of the accessibility evaluation. 1 2 Overall (mean is 3,85) Autobiographical Training is easy to use 3 Autobiographical Training is facile to learn 5 User interfaces are funny You are satisfied about the proposed user interfaces 3 Visibility (mean is 4,7) You can read the main body 4 You can read headlines Ability to adjust text size is helful Image dimensions are acceptable Speech-to-text (mean is 3,77) Speech-to-text mode is supportive 6 Terminology (mean is 4,67) Button names are significant Icons are facile to comprehend Use of text labels improves the interpretation of the icon Error feedbacks are supportive 4 Informative feedbacks are straightforward Hearing (mean is 3,69) The speed of the voice is seemly 6 Oral feedbacks are supportive 6 2 Spoken interactions are supportive 6 2 Ability to adjust volume is supportive

3 4

5

Mean

3 5 7 3,88 3 10 3,27 5 3 10 4,27 3 3 9 4 4 10 4,11 3 15 4,83 18 5 2 16 4,88 4

2 2

8 3,77

2 16 4,88 3 13 4,61 18 5 2 10 4 2 16 4,88 10

2 3,11 10 3,33 10 3,33 18 5

416

F. Ghorbel et al.

This study was performed via a questionnaire. It includes qualitative questions according to ISO 9241 part 10 (how navigating the information space and how assessing the understanding of the content), and quantitative questions according to ISO 9241 part 12 (how assessing the information presentation). It covers 5 main dimensions: “overall”, “visibility”, “speech-to-text”, “terminology” and “hearing”. Scores range on a scale from 1 (predominantly disagree) to 5 (predominantly agree). We helped the patient to use Autobiographical Training and fill the questionnaire. Table 1 summarizes the results. As showing in Table 1, the mean score of all mentioned dimensions is more than 3,69 and less than 4,7. The participants were almost agreed that Autobiographical Training is accessible. 6 patients choose the keyboard modality than the dictation modality; as the volume of their voice was not high to be captured. We will use acoustic model for old seniors for the speech-to-text mode. Illiterate patients were satisfied with the speech to text mode. 6 patients ignored the auditory background as they did not understand the generated messages. 3.2

Usefulness Evaluation

The exercise program consisted of one exercise session of one hour, five times a week for six weeks. Patients were evaluated at the end of each session (successful score). For all participants, the successful score at the baseline and the evolutions of the mean of the successful score of the five sessions of each week are presented in Table 2. For all patients, the overall mean of the successful score increases from an average of 40,55% at the baseline to 46,65% at the end of the study. Therefore, we conclude that Autobiographical Training can help Alzheimer’s patients in retaining some information, in which they received training. For early stage Alzheimer’s patients (MMSE score better than 20), the overall mean of the successful score increases from an average of 65% at the baseline to 70,69% at the end of the study. For moderate stage Alzheimer’s patients (MMSE score more than10 and less than 20), the overall mean of the successful score increases from an average of 42,03% at the baseline to 46,81% at the end of the study. For late stage Alzheimer’s patients (MMSE score below than 10), the overall mean of the successful score increases from an average of 14,67% at the baseline to 16,45% at the end of the study. Alzheimer’s patients in first or mild stage take more benefits from our intervention relatively to ADP17 and ADP18 who survive into the final sub-stages of the disease process.

Mean of the evaluations of Fourth week 68,2 63,8 66,6 68,8 62,3 77,5 60,5 90,6 57,9 68,5 46,3 53,9 40,4 54,9 43,9 32,2 45,3 Hospitalized 16,8 14,8 15,8

Mean of the evaluations of third week 67,9 64,2 65,7 68,5 61,6 76,6 58,7 90,7 57,2 67,9 45,9 53,5 40,1 53,8 43,6 31,2 44,7 22,1 16,3 14,3 15,3

Table 2. Usefulness evaluation’s results.

Baseline Mean of the Mean of the evaluations of first evaluations of second week week Early stage Alzheimer’s patients (MMSE better than 20) 66,9 67,2 ADP1 65,9 ADP2 61,4 62,6 62,9 ADP3 62,1 64,2 64,8 ADP4 64,2 65,7 65,7 ADP5 58,4 60,1 60,7 ADP6 73,1 75,3 75,8 ADP7 56,9 57,9 58,6 ADP8 87,5 88,6 89,4 ADP9 55,2 56,2 56,7 Mean 65 66,4 66,9 Moderate stage Alzheimer’s patients (MMSE in [10 .. 20]) ADP10 42,1 43,5 45,7 ADP11 48,2 49,2 50,6 ADP12 39,5 39,2 39,5 ADP13 51,7 52,2 53,4 ADP14 41,2 42 42,6 ADP15 29,3 30,2 30,2 Mean 42 42,7 43,7 Late stage Alzheimer’s patients (MMSE below than 10) ADP16 28,5 27,1 28,5 ADP17 16,1 16,3 16,1 ADP18 13,2 13,4 13,7 Mean 14,6 14,9 14,9 16,9 15,6 16,3

47,2 54,2 41,0 55,6 44,6 32,5 45,9

69,2 64,1 67,5 69,1 63,3 78,8 59,7 91,2 58,6 69,1

Mean of the evaluations of fifth week

16,9 15,9 16,4

47,9 55,2 42,9 55,9 45,2 33,5 46,8

70,3 65,9 67,9 70,9 64,8 80,2 62,9 92,7 60,2 70,6

Mean of the evaluations of sixth week

A Memory Training for Alzheimer’s Patients 417

418

F. Ghorbel et al.

4 Related Work In this section, we review some “Question and Answer” memory training such as Adcope [9], StimCards [4], Savion [6], Zpaly [8] and the no named software of [1]. Table 3 gives a general comparison between the cited ones. Table 3. Usefulness evaluation’s results. Target user

Theme

Level of training

Technology

Multilingualism Multiculturalism

ADcope

Alzheimer’s patient

General knowledge facts

Not adjustable

Smart phone

No

No named [1]

People with dementia

General knowledge facts

Not adjustable

Bike + PC, C#, game,

No

StimCards

All publics (Alzheimer’s patient, …)

General knowledge facts + possibility to create personalized questions

Not adjustable

Game, Java

No

Savion

Older people, people with dementia

Language skills, calculation, nonverbal memory and visual-spatial skills

Adjustable by the user

3D, touch, keyboard + mouse

No

Zpaly

Patient with early Alzheimer

General knowledge facts

Not adjustable

Game

No

Autobiographical Training

Alzheimer’s patient

The patient’s private life and background

Three levels of difficulty Automatically adjustable

RDF, touch, keyboard + mouse, game

Yes

Compared to related work, those generate only static questions based on general facts, our adaptive intervention generates for each user his own questions related to his private life. It sets the level of difficulty of these questions based on the progression of the disease. It supports multilingualism and multiculturalism.

5 Conclusion The present paper presents Autobiographical Training. It is an adaptive and accessible “Question and Answer” exercise training. It stimulates the Alzheimer’s patient’s reminding abilities. The training is repeatedly performed until the patient demonstrates the ability to recall information in everyday life. It is developed in the context of Captain Memo, which is proposed to support Alzheimer’s patients to palliate mnesic problems. Compared to related work, Autobiographical Training presents 4 main advantages. (1) It is not based on general content, but it intelligently proposes for each user their specific questions based on their life e.g., events that he lived. (2) It adjusts automatically the level of difficulty of these questions based on the progression of the disease. (3) It supports multilingualism and multiculturalism. (4) It offers accessible user interfaces. The evaluation phase confirmed that Autobiographical Training is

A Memory Training for Alzheimer’s Patients

419

accessible and it helps Alzheimer’s patient to retain some information in which they received training. However, according to the neurologist doctor, these gains do not translate to better other areas in which the patient does not receive any memory training. Future works will be devoted to allow users to answer the questions in a natural language. Acknowledgments. The present work was funded by the VIVA project2. The authors wish to thank Dr. Salma SAKKA CHARFI (Department of Neurology, Habib BOURGUIBA Hospital, Tunisia) for her valuable help.

References 1. Chilukoti, N.: Assistive technology for promoting physical and mental exercise to delay progression of cognitive degeneration in patients with dementia. In: Proceedings of the biomedical circuits and systems conference, pp. 235–238 (2007) 2. Massoud, F., Gauthier, S.: Update on the pharmacological treatment of Alzheimer’s disease. Curr. Neuropharmacol. 8(1), 69–80 (2010) 3. Yu, F., et al.: Cognitive training for early-stage Alzheimer’s disease and dementia. J. Gerontological Nurs. 35(3), 23–29 (2009) 4. Jost, C., Le Pévédic, B., Duhaut, D.: Robot is best to play with human. In: Proceedings of the international symposium on robot and human interactive communication (2012) 5. Olazarán, J., et al.: Nonpharmacological therapies in Alzheimer’s disease: a systematic review of efficacy. Dement. Geriatr. Cogn. Disord. 30(2), 161–178 (2010) 6. Sarne-Fleischmann, V.: Computer-Supported Personal Interventions for Elderly People with Cognitive Impairment and Dementia. Ph.D. Thesis, Ben-Gurion University of the Negev (2013) 7. Dresler, M., et al.: Non-pharmacological cognitive enhancement. In: Neuropharmacology, vol. 64, pp. 529–543 (2013) 8. Makedon, F., et al.: An interactive user interface system for Alzheimerʼs intervention. In: Proceedings of the 3rd international conference on PErvasive technologies related to assistive environments (2010) 9. Zmily, A., Abu-Saymeh, D.: Alzheimer’s disease rehabilitation using smartphones to improve patients’ quality of life. In: Proceedings of pervasive computing technologies for healthcare (Pervasive Health), pp. 393–396 (2013) 10. Métais, E., et al.: Memory prosthesis. Non-pharmacol. Ther. Dement. 3, 177–180 (2015) 11. Ghorbel, F., Métais, E., Ellouze, N., Hamdi, F., Gargouri, F.: Towards accessibility guidelines of interaction and user interface design for Alzheimer’s disease patients. In: Tenth International Conference on Advances in Computer-Human Interactions (2017) 12. Folstein, M.F., Folstein, S.E., McHugh, P.R.: Mini mental state: a practical method for grading the cognitive state of patients for clinician. In: J. Psychiatry Res. 189–198 (1975)

2

http://viva.cnam.fr/.

Visual Exploration and Analysis of Bank Performance Using Self Organizing Map Mouna Kessentini(&) and Esther Jeffers(&) Research Center on Industry, The Institutions and The Economical Systems of Amiens (CRIISEA), University of Picardie Jules Verne, 10 Placette Lafleur, 80027 Amiens Cédex, France [email protected], [email protected]

Abstract. The visualization of high dimensional data has an important role to play as an artifact supporting exploratory data analysis. There is growing evidence of the effectiveness of information visualization as it provides help in understanding data, increases the level of cognitive functioning and performs pattern recognition. This paper deals with the usefulness of Self-Organizing Map (SOM) neural network in the area of the banking sector. We want to show how SOM can be useful to convert huge amounts of financial data into valuable information used to speed up the decision-making process and facilitate data analysis for deeper understanding Keywords: Information visualization Performance analysis  Banking

 Self-Organizing map 

1 Introduction Images speak louder than words as you can convey much more information through images. Historically, pictures have played an important role in developing the thinking of many civilizations. Visual representation of information was extensively used before the 1700s [1]. Since then, various historical disciplines have contributed at different levels to the development of information visualization (InfoVis) as a discipline. According to Card et al. [2], InfoVis is “the use of computer-supported, interactive, visual representation of abstract data to amplify cognition”. This definition means that the purpose of InfoVis is to transform a huge amount of data into a graphical representation in order to simplify the users’ data analysis and understanding. Card et al. [2], enumerate an initial list of six ways in which information visualization amplifies cognition: (i) increasing memory and processing resources available, (ii) reducing the time to search for information; (iii) enhancing the detection of patterns through visual representations; (iv) enabling perceptual inference operations; (v) using perceptual attention mechanisms for monitoring; (vi) encoding information in an easily managed medium. Moreover, InfoVis enhances human perception and understanding of information easily and fast without requiring a major cognitive effort [3]. Ware and Bobrow [4] emphasized also that InfoVis makes possible mental operations with rapid access to large amounts of data outside the mind. Nevertheless, as data continue to overgrow, © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 420–434, 2020. https://doi.org/10.1007/978-3-030-21005-2_41

Visual Exploration and Analysis of Bank Performance

421

both in size and complexity, the need to reduce their dimensionality while preserving information typology can be drawn as a great way to reduce screen crowdedness and the level of noise in the display [5]. Up to now, many studies have made significant contributions to developing dimensionality reduction techniques. The SOM, which is a machine learning tool, was originally developed for data exploration through unsupervised learning. The main advantage of using the SOM lies in its strong ability to preserve the topology of data using neighborhood function. This preservation of the topological properties of data allows best visualization and identification of data clusters. Our paper aims to contribute to the previous literature comparing the performance among banks by providing a complete and simple model. Until now, most of the comparative studies have used the Data Envelopment Analysis (DEA) model [6]. Even though it has many advantages such as ranking a set of entities called Decision Making Units (DMUs), identifying and estimating sources of technical inefficiency, etc. [7], some limitations should be noted. The classic DEA model assumes that both inputs and outputs variables must be non-negative and preferably strictly positive. This assumption does not always hold, particularly in the context the actual business world. DEA is also sensitive to the number of inputs and outputs. Indeed, if the number increases, the ability to discriminate between the DMUs decreases [8]. Our objective is twofold. Firstly, is to show the effectiveness of SOM for overcoming the DEA limitations and emphasize its great potential to achieve proper classification and visual analysis of large and complex financial data in the banking sector. Secondly, to compare the operational performance of Islamic and conventional banks during the financial crisis. To the best of our knowledge, it is the first study using SOM to differentiate between conventional and Islamic banks. Our paper is divided into five sections. Section 2 provides an overview of SOM and discusses its potential benefits regarding of visualization and exploratory analysis for the banking sector. Section 3 tends to focus on our methodology and presents more technical details about SOM. Empirical results are given and discussed in Sect. 4. Conclusion and suggestion for future research are drawn in the last section.

2 Self-Organizing Map for Bank Classification and Analysis: Review of Literature Several research studies have shown the effectiveness of SOM for achieving good classification, generating visual clustering and facilitating visual analysis of large and complex data. In fact, Kohonen [9] was inspired by the biological visual systems of Hubel and Wiesel [10] to model his artificial neural networks based on unsupervised learning algorithms. SOM can perform two tasks simultaneously, vector quantization [11] and vector projection [12]. No target output is provided and the network evolves until convergence. Based on Gladyshev’s theorem, the convergence of SOM algorithm is proved [13]. There is no limit to the maximum amount of input data. The input matrix contains variables with positive, negative and zero values. SOM has five main favorable characteristics for the banking sector: handling of outliers, suitability for unbalanced panel data, resilience on problems of multicollinearity, identification of nonlinear dependencies among variables, and the lack of a required assumption of

422

M. Kessentini and E. Jeffers

normal distribution of financial data [14]. The relevance of SOM in the field of finance does not only show good ability to reduce dimensionality by embedding data but its effectiveness in supporting visual cluster analysis as well. Martin-del-Brio and Serrano [15] are the first researchers who used SOM to analyze the Spanish banking crisis over the period 1977–1985 and to examine the Spanish economic situation in 1990 and 1991. In the first case study, they looked at a sample of 66 Spanish banks using nine financial ratios in order to cluster banks into two main groups: one of bankrupt and the other of solvent banks. In the second study, they took a large sample of 84 Spanish companies and clustered them according to profitability and liquidity. Kiviluoto and Bergius [16] evaluated 4.989 financial statements of 1.137 Finnish financial companies with three methods: Linear Discriminant Analysis (LDA), Learning Vector Quantization (LVQ), and SOM. They found that SOM outperforms the first two methods (LDA and LVQ) both in classification accuracy and ease of interpretation. Sarlin [17] applied Self-Organizing Financial Stability Mapping (SOFSM) to represent and understand a multidimensional financial stability space. The author provides evidence that SOFSM makes it possible to monitor economies in the financial stability cycle, which is represented by four stages: pre-crisis, crisis, post-crisis and tranquil states. Sarlin and Zhiyuan [18] have illustrated the usefulness of Self-Organizing Time Map (SOTM) pairing with classical cluster analysis through two experiments. The first realworld experiment explored the evolutionary dynamics of European banks before and during the global financial crisis while the second experiment identified changing, emerging, and disappearing clusters over time. Eklund et al. [19] investigated the financial performance of seventy-seven pulp and paper companies over a five-year period. The majority of these companies were based in Japan, Canada, the US, Continental and Northern Europe. Seven financial ratios were used as input data to train SOM algorithm. The results of their study provide further evidence that the method used is a very practical tool for comparing the financial performance of different companies. The remarkable benefit of SOM is that the similarities of the input data are preserved as faithfully as possible within the representation space [20]. With all the presented studies, it is very clear that the SOM algorithm is a promising tool for unsupervised classification especially for the banking sector, which conducts us to perform a visual analysis of sixty-five Islamic and conventional banks based in different countries.

3 Research Design and Methodology In this study, to illustrate the usefulness of SOM to compare operational performance among banks and over time, we adopt the following methodology. Firstly, we prepare the training datasets. Our sample of banks contains only conventional and Islamic commercial banks. The financial data from 2007 to 2010 were collected from the Fitch Connect database. Secondly, we process the data for every two consecutive years with the Matlab Som Toolbox tool. After the learning process, we use the K-means clustering method in order to divide the prototypes produced by the SOM algorithm into similar groups. We illustrate the different steps of our methodology in more detail below.

Visual Exploration and Analysis of Bank Performance

3.1

423

Data Description

Our panel is composed of annual data for sixty-five active conventional and Islamic banks over the period 2007–2010. This panel is unbalanced since not all banks have the same information for every year. We followed two criteria when choosing these banks. Only commercial banks were targeted. These banks are mainly based in the biggest markets for Islamic finance (Saudi Arabia, United Arab Emirates, Kuwait, Qatar, Bahrain, Jordan, Malaysia Turkey, and Pakistan). The final sample is described in Table 1, where we list the names of the Islamic and commercial banks that are part of our study, as well as their corresponding labels (four-letter abbreviation) used to facilitate the visual analytic interpretation. Moreover, for each bank, the input vector includes seven variables related to the key dimensions of banking performance. The choice of financial ratios was motivated by two essential criteria. First, the selected ratios have been conventionally used in previous studies. Second and most important, these ratios reflect various stakeholders’ interests as well as the short- and long-term bank’s goals. The list of the financial ratios is reported in Table 2. Table 1. Names and labels of conventional and Islamic banks used in the sample Countries Bahrain

Name of conventional banks BMI Bank BSC BBK BSC Ahli United Bank

Jordan

Capital Bank of Jordan

Kuwait

Bank of Jordan Commercial Bank of Kuwait Ahli United Bank National Bank of Kuwait

Labels BHBM BHBB BHAU

Name of Islamic banks Al Baraka Islamic Bank Al Salam Bank Bahrain Islamic Bank B.S.C Khaleeji Commercial Bank Kuwait Finance House JOCA Islamic International Arab Bank JOBJ Jordan Islamic Bank KWCB Boubyan Bank KWAU Kuwait International Bank KWNB Kuwait Finance House

Labels BHAL* BHAS* BHBI* BHKC* BHKF* JOII* JOJI* KWBB* KWKI* KWKF* (continued)

424

M. Kessentini and E. Jeffers Table 1. (continued)

Countries Malaysia

Name of conventional banks Bank of Nova Scotia Berhad Deutsche Bank (Malaysia) OCBC Bank (Malaysia) Berhad Royal Bank of Scotland Berhad Standard Chartered Bank Malaysia Berhad United Overseas Bank HSBC Bank Malaysia

Pakistan

Qatar

Saudi Arabia

Turkey

Bangkok Bank Berhad CitiBank National Bank of Pakistan Habib Metropolitan Bank Doha Bank Commercial Bank of Qatar Ahli Bank QSC Arab National Bank National Commercial Bank Saudi Britch Bank Saudi Invest Bank Anadolu Bank Burgan Bank Denizbank

United Arab Emirates

Finasbank Sekerbank Union National Bank Commercial Bank of Dubai National Bank of Fujairah

Labels Name of Islamic banks MYBN Affin Islamic Bank Berhad MYDB Alliance Islamic Bank Berhad MYOC Al Rajhi Banking and corporation MYRB Asian Finance Bank MYSB Hong Leong Islamic Bank MYUO HSBC Amanah Berhad MYHS Kuwait Finance House Berhad MYBB Bank Muamalat MYCB RHB Islamic Bank PKNB Meezan Bank PKHM Bank Islami Pakistan Limited QADB Qatar International Islamic QACB Qatar Islamic Bank QAAB Masraf Al Rayen SAAN Al Rajhi Bank SANC Bank Alinma SASB SASI TRAB Kuwait Turk Participation TRBB Turkish Finance Participation TRDB Al Baraka Turk Participation TRFB TRSB AEUN Abu Dhabi Islamic Bank AECB Emirates Islamic Bank AENB Sharjah Islamic Bank

Labels MYAI* MYAL* MYAR* MYAF* MYHL* MYHA* MYKF* MYBM* MYRH* PKMB* PKBI* QAQI* QAQB* QAMR* SAAR* SAAI*

TRKT* TRTF* TRAL*

AEAD* AEEI* AESI*

Visual Exploration and Analysis of Bank Performance

425

Table 2. List of the input variables Stakeholders’ name Shareholders

Stakeholders’ interests Profitability

Variables

Description

ROAA

Return on Average Asset (%) defined as net profits over average total assets [17, 21, 22] Return on Average Equity (%) defined as net profits over average total assets [17, 21, 22] Ratio of Net Interest Margin [19, 23] Ratio of Equity over Total Assets [24]

ROAE

Regulators

Managers, employees and customers

3.2

Earnings Capital adequacy Credit risk Credit quality

NIM EQTA

Insolvency risk Profitability, earnings, insolvency risk

Z-score ROAA, ROAE, NIM, Z-score

NLTA LLRGL

Ratio of Net Loans-to-Total-Assets [25] Ratio of Loan Loss Reserve over Gross Loans [26] (ROAA + EQTA)/r(ROAA) [27, 28]

The Neural Network Method: SOM

The SOM neural network was originally inspired by neuroscience research on the human cerebral cortex. A detailed explanation of the SOM algorithm can be found in [11]. One of SOM’s objectives is to convert a complex non-linear high dimensional input data into low dimension representation using geometric relationships of the input space. In the literature, there are two types of training algorithms (sequential or batch). Both algorithms are iterative; they differ basically in the method of updating synaptic weight vector. According to [29] the batch-learning version of the SOM is preferable for practical applications, because it does not involve any learning-rate parameters, and its convergence is an order of magnitude faster and safer. Then, we use the batch training algorithm. The training of SOM involves the following steps: Step 1. A predefined structure (hexagonal or rectangular lattice) and the learning parameters of the SOM should be assigned. The synaptic weight vectors are randomly initialized. Step 2. During the competition phase, neurons compete with each other. The neuron with the closest weight vector to the input vector is declared as the winner neuron or the Best Matching Unit (BMU). Mathematically, the distance di2 between the initial input vector and each weight vector wi is computed as the Euclidean distance between them.   d2i ¼ x  wij  for i ¼ 1. . .C

ð1Þ

426

M. Kessentini and E. Jeffers

Step 3. In the cooperation phase, the direct (immediate) neighborhood neurons of the BMU are identified. Accordingly, the number of direct neighbors differs according to the map structure. Hence, in the case of a rectangular structure, the BMU can have four immediate neighbors i.e. the immediate neighbors which are directly attached to the BMU. However, in the case of a hexagonal structure, the winner neuron can have six immediate neighbors. Step 4. During the adaptation phase, these neurons are selectively tuned to form a specific pattern on the lattice. The rule of updating weight vector can be computed as follow: Pn j¼1 hsi xj wi ðt þ 1Þ ¼ Pn j¼1 hsi

ð2Þ

Where hsi ðtÞ is the neighborhood kernel centered on the BMU. It is expressed as: ! kr s  r i k2 hsi ðtÞ ¼ e  1ðrðtÞ  krs  ri kÞ 2r2 ðtÞ

ð3Þ

Where rs and ri are the positions of neurons s and i on the SOM grid. r measures the degree to which excited neurons in the vicinity of the winning neuron cooperate in the learning process. rðtÞ decreases monotonically through time. Step 5. The stopping process occurs if all the sample data inputs are presented in the output layer and the maximum number of iterations is reached. 3.3

The Clustering Algorithm

After the learning process, we apply a clustering algorithm in order to range prototypes into homogeneous groups. A multitude of clustering methods already exists in the literature. As pointed out by [30] a good clustering algorithm is one which has a low intra-cluster distance (high intra-cluster similarity) and a high inter-cluster distance (low inter-cluster similarity). In this work, a K-means clustering algorithm is used to gather banks in similar groups. Due to its simplicity, efficiency and moderate but stable performance across different problems [31], K-means remain one of top most popular algorithms in data mining [32]. The optimal number of homogeneous groups that best fit the natural partition is obtained using the Davies-Bouldin (DB) index. Since the objective is to obtain clusters with minimum intra-cluster distances, small values for DB are interesting. The pseudo code of k-means clustering algorithm goes as follows: BEGIN 1. Put C points into the map represented by nodes that are being clustered. These points represent initial group centroids. For (i = 2 to C) do 2. Assign each node to the group that has the closest centroid. 3. When all nodes have been assigned, recalculate the positions of the C centroids.

Visual Exploration and Analysis of Bank Performance

427

4. Repeat steps 2 and 3 until the centroids no longer move. This produces a separation of the nodes into groups from which the metric J to be minimized can be calculated. J¼

C X n  X   w j  c j 2 i

j¼1 I¼1

 2 Where wij  cj  is a distance measure between a weight vector to a node wij ; cj the group center, n is the number of nodes end END 3.4

Quality Measures of SOM

To evaluate the quality of the SOM algorithm, two common measurements are proposed in the literature: the quantization error (QE) and the topographic error (TE). QE measures map resolution, i.e. the faithful representation of the training data. It is generally assumed that faithfulness increases with a smaller quantization error. TE measures the proportion of all data vectors for which first and second BMUs are not adjacent vectors. So, the lower TE is, the better SOM preserves the topology.

4 Results and Discussion The results of the SOM algorithm are typically done graphically. The same setting was used for all the experiments on neural network's training. The suitable number of map units for the SOM grid was found to be 900 (30  30). Note that the number of neurons in the horizontal and vertical map is a free parameter chosen by the experimenter. We chose a hexagonal grid of units to visualize Islamic and conventional banks according to their similarity ratios. This choice is suggested by one of the early publications of [20], where the author reports that it is advisable to select the hexagonal arrays as they provide a more illustrative and accurate visualization. We use the “cutgauss” neighborhood function to update the neuron coefficients close to the winner and a decaying neighborhood radius rate from 2 to 1. In all experiments, the SOM Toolbox is used to train the algorithm. Moreover, the following evaluation metrics are used to check the clustering performance of SOM: QE and TE. Concerning the optimal number of clusters that maximizes the within-groups homogeneity and the between-groups heterogeneity, the DB index is used. Table 5 shows the time consumed (CPU time) by SOM in each experiment, as well as quantization and topological errors.

428

M. Kessentini and E. Jeffers

When examining our results, one has to take into account that the SOM would order similar banks into the same winning neuron or in the neurons topologically nearby. Figures 1 and 2 illustrate the convergence vs. divergence in terms of bank risk and performance across all banks (IBs and CBs) during the crisis period. As we can see clearly in Fig. 1, our sample of banks is divided into 14 homogeneous color-coded clusters. The distinct colors of the clusters were chosen to illustrate the surface of each cluster and they have no intrinsic meaning. It is important to note that there is no considerable difference between some Islamic and conventional banks mapped in the most clusters. Except clusters 1, 2, 5, 6 and 7 that appear homogenous, dominated respectively by conventional and Islamic banks, the remaining clusters represent very heterogeneous groups since they are made up by both types of banks. Remarkably, while we may think that the distinct differences in their mode of contract principles may influence their performance significantly, our results seem to provide evidence for relative convergence in performance across 65 Islamic and conventional banks operating in nine countries. Nevertheless, it is clear that there is an inter-temporal performance variation. The performance of the banks included in our sample varies significantly from 2007 to 2008 since the cluster’s members have changed over time. Focusing more on the geographic location of banks, it is evident from Fig. 1 that the clusters of banks revealed by SOM are not geographically concentrated. Hence, the banks grouped in cluster 3 or 13 operate in different countries like Qatar, Kuwait, Malaysia, Saudi Arabia, Bahrain Pakistan, Turkey, Jordan, and the United Arab Emirates. The exploration of the visualization space seems to be very interesting, but it might be useful to examine the features that separate one cluster from another. Hence, analytical analysis can identify intra-group homogeneity and inter-group heterogeneity. Table 3, shows the quantitative differences between the financial ratios (attributes) of the 65 banks.

MYSC*08 TRBB 08 MYDB 07 MYDB 08 MYRB 07 MYRB 08

MYAF*08

MYHA*08 MYCB 08

C5

BHKC*07

JOJI*07 SANC 07

QAQI*07

MYSB 07 PKMB*07 KWNB 07 KWCB 07

C12

BHAU 08 MYBB 08

AESI*08 PKBI*07

BHBI*07

TRDB 08 TRFB 08 TRTF*08

KWKI*08 TRSB 08 KWAU 08 QAQI*08

MYRH*08

PKBI*08

C6

BHKF*07

BHBB 08

C8 C5

KWBB*07

TRBB 07

MYUO 08

JOBJ 08

MYAI*08

C2 BHKC*08 BHAS*08

KWKF*07 QAQB*07

MYHL*08

MYAF*07 C1

MYHS 08 AEUN 08 QAMR*08 MYBN 08 AECB 08 QAAB 08 MYOC 08 TRKT*08 PKHM 08 AEAD*08 TRAL*08

MYAL*08

MYBM*08

C11

KWNB 08 SAAN 08 SAAR*08 TRAB 08 PKNB 08 QACB 08 QADB 08 C13

MYBB 07 PKNB 07 SASI 07

3 KWAU 07 QACB 07 MYKF*07 AENB 07 AEAD*07 JOCA 07

BHBM 07 TRAB 07 TRSB 07 AEEI*07 SASB 07 QADB 07 SAAN 07 KWKI*07

BHAL*07 C4

SASB 08 BHKF*08

MYBM*07

JOJI*08

MYSB 08 AEEI*08

C2

KWCB 08 MYAI*07

QAQB*08

JOII*07

SANC 08

JOII*08 MYCB 07 PKHM 07 MYBN 07

C9 BHAU 07 MYRH*07 MYHS 07 BHBB 07

BHBI*08

SASI 08 PKMB*08

KWKF*08 BHAL*08 C14

AECB 07 AEUN 07 AESI*07 JOCA 08 KWBB*08 BHAS*07 SAAR*07 TRFB 07 QAAB 07 TRDB 07 MYOC 07 JOBJ 07 MYKF*08 BHBM 08 C10 C7 TRTF*07 MYAR*08 TRAL*07 MYAR*07 SAAI*08 QAMR*07 TRKT*07 MYUO 07 MYHL*07 AENB 08

Fig. 1. Clustering of 31 IBs and 34 CBs using SOM from 2007 and 2008

Visual Exploration and Analysis of Bank Performance

429

Figure 1 shows clustering of 31 IBs and 34 CBs using a 30-by-30 unit SOM from 2007 and 2008. Each bank is labeled with a four-letter abbreviation. The starred labels refer to IBs, while the non-starred ones refer to CBs. The additional index (07 or 08) contained in all labels refers to the year of financial data. C1, C2, C3 up to C14 represent the optimum number of clusters formed by training a SOM. Remember that variables that we have described previously such as the Return on Average Asset (ROAA), Return on Average Equity (ROAE) and Net Interest Margin (NIM), Loan Loss Reserve to Gross Loans ratio (LLRGL), Net loans to Total Assets (NLTA), Altman Z-score, and Equity to Total Assets ratio (EQTA) are considered in the training of the model to classify banks with the SOM neural network. Our results indicate that banks in cluster 6 and 7 are more profitable and better capitalized (have a higher return on asset ratio and at the same time a higher ratio of equity capital to assets). Clearly, there is some variation in profitability between banks in clusters 8 and 14 despite the same proportion of equity capital to assets. Similarly, banks in cluster 13 achieved considerably better profitability than banks in cluster 11 despite the fact that they have the same value of capital adequacy ratio in 2008. Overall, these findings suggest that the availability of equity can promote the financial stability of both types of banks but cannot easily raise profitability. In examining the relationship between the lower probability of insolvency risk and the percentage of loans to total assets, it follows that lower risk in the credit portfolio is not generally associated with a higher stability. Hence, although banks in cluster 7 display higher net loans to total assets ratio than banks in cluster 6, they seem better positioned to absorb shocks. The group Zscore is two times higher than that of cluster 6 (27% against 12.9%). In addition, even though the proportion of loans represents 57% of the total asset in 2008 for banks, especially in clusters 8, 13, and 14, there is a noticeable difference in their banks’ stability. It can be viewed that some Malaysian banks in cluster 8 that recently established themselves within the Islamic banking industry, like MYHL*, MYHA* and Table 3. Mean values for each cluster of banks during 2007–2008 Cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14

ROAA 0.93 2.74 2.61 2.69 0.05 6.73 6.90 1.30 1.52 1.35 1.41 2.71 2.73 0.61

ROAE 12.41 13.19 18.86 21.73 3.99 23.43 13.61 12.86 19.58 14.96 15.25 17.94 23.22 3.05

NIM 2.21 3.68 3.54 5.17 4.44 4.42 2.80 3.58 2.87 3.42 2.83 5.35 4.35 3.40

LLRGL 2.81 3.10 2.94 2.00 3.25 1.85 2.68 3.10 2.54 3.35 2.47 2.40 2.95 3.28

NLTA 15.02 35.81 49.09 67.86 35.12 38.65 68.01 56.26 55.08 59.71 62.09 71.93 57.90 57.31

Z-score 12.68 23.52 17.97 19.24 40.72 12.91 27.07 66.43 40.78 69.68 38.77 20.82 17.40 11.41

EQTA 6.91 21.50 13.47 16.64 14.37 24.16 17.88 13.81 8.61 14.84 11.34 14.65 12.12 14.26

430

M. Kessentini and E. Jeffers

MYAL*, have a Z-score six times higher than the one achieved by some banks which have a longstanding retail presence grouped in clusters 13 and 14 such as QAQB* and QACB based in Qatar, AENB and AECB in the United Arab Emirates, BHAL* and BHAS*, etc. This may indicate the importance of a precautionary strategy to protect themselves against any default risk. Regarding the effect of the GFC, the empirical findings are as follow. First, the financial crisis affected the profitability of the two banks types negatively. Second, no significant differences in soundness between the two types of banks were found. Third, the no significant differences regarding profitability, capitalization, risk aversion, insolvency, and credit risk make the challenge ever greater between Islamic and conventional banks in Turkey (for details see the average values of clusters 4 and 12). Two years after the great recession that began in 2007, the clustering of IBs and CBs in 2009 and in 2010 is shown in Fig. 2. Our graphical results demonstrate that there is no considerable differences in the level of performance between Islamic and conventional banks as all clusters include both types of banks. Evidence from our graphical results corroborates with the finding highlighted by the previous studies of [19]: there is no significant difference between the two types of banks during the crisis period. Beyond that, our results show a slight change in the performance with a lower degree of variability from December 2009 to December 2010. Hence, over 80% of the banks in our sample continued to remain in the same cluster till 2009–10. It is also important to note that the banks located within each of the fourteen clusters operate in different geographical locations. So, the geographic heterogeneity among the members of bank clusters remains roughly constant over time. MYRB 09 MYRB 10

MYDB 09 MYDB 10

PKBI*09

C6

SAAI*09

PKBI*10

C1

TRBB 09

BHAS*09 BHAS*10

C5

PKMB*09 PKMB*10

MYAF*09 MYAF*10

SAAI*10

TRBB 10

C2

PKHM 09 PKHM 10

JOJI*09 JOJI*10

BHKF*09 BHKF*10

C11 BHAU 09 BHAU 10

C7

QAQI*09

QAQB*09 QAQB*10

QACB 10

QADB 09 QADB 10

SAAR*09

AEUN 09 AEUN 10

QAMR*09 QAMR*10 TRAB 09 TRAB 10

TRAL*09 TRAL*10 TRKT*09 TRKT*10

TRDB 10 TRDB 09 MYAL*10

TRFB 10 C4

AECB 09 AECB 10

TRFB 09 TRSB 10

TRTF*09 TRTF*10 MYAL*09 JOII*09 JOII*10

MYBN 09

QAAB 09

MYBN 10

MYBB 09 MYSC*10

TRSB 09

KWKI*09 KWKI*10

SASI 09 SASI 10

AESI*09 AESI*10

JOCA 09 C2 JOCA 10 BHKC*10 MYKF*10 MYKF*09

KWKF*09 KWKF*10

AENB 09 AENB 10

KWAU 09 KWAU 10

MYBB 10

BHAL*10

AEAD*10

SAAR*10

BHBB 09 BHBB 10

SAAN 09 SAAN 10

C8 KWNB 09 KWNB 10

MYOC 09 MYOC 10

QAAB 10

SASB 09 SASB 10

MYUO 09 MYUO 10

QACB 09

3

MYSB 09 MYSB 10

MYHA*09

MYHS 09 MYHS 10

MYHA*10

MYRH*10 QAQI*10

MYCB 09 MYCB 10

MYRH*09

PKNB 09 PKNB 10

AEEI*10

C13

MYAI*10

MYBM*10

SANC 10 BHKC*09

JOBJ 09 JOBJ 10

MYAI*09

C10 MYSC*09

SANC 09

MYHL*09 MYHL*10

C9

KWBB*10 KWCB 09 KWCB 10

MYAR*09 AEEI*09

MYAR*10

BHAL*09 C12

BHBM 09 BHBM 10 KWBB*09 BHBI*09 C14 BHBI*10

Fig. 2. Clustering of 31 IBs and 34 CBs using SOM from 2009 and 2010

Visual Exploration and Analysis of Bank Performance

431

Figure 2 shows clustering of 32 IBs and 34 CBs using a 30-by-30 unit SOM from 2009 and 2010. Each bank is labeled with a four-letter abbreviation. The starred labels refer to IBs, while the non-starred ones refer to CBs. The additional index (09 or 10) contained in all labels refers to the year of financial data. C1, C2, C3 up to C14 represent the optimum number of clusters formed by training a SOM. When studying the performance bank variables over the period 2009–2010 (Table 4), it is possible to depict more specific results. Banks grouped in clusters 1, 2, 6, 12 and 14 are less profitable than other banks in the developing countries, based on measures of return on asset divided by average total assets. Most of these banks are Islamic and are coming from Bahrain, United Arab Emirates, Malaysia, and Kuwait. Indeed, we find that banks in clusters 3, 4, 5, 7, 8, 9, 11 and 13 perform significantly better than banks in the other clusters, with a return on equity on average 7 basis points higher in the aftermath of the 2007–2008 recession. We can also look at the column 3, that the banks regrouped in cluster 1 are less able to generate interest income compared to domestic and foreign banks in 9 countries. Regarding the risk profiles of Islamic and conventional banks in 2009–2010, the ratio of NLTA which partially reflects the bank’s loan portfolios and the ratio of LLRGL which captures the bank loan quality differ across different groups of banks. In examining the relationship between bank profitability and the risk-taking, it becomes apparent that grouped banks in cluster 3, 4, 8, 11 display higher net loans to total assets ratio, higher return on average asset and higher return on average equity ratios. These results crucially suggest that risk-taking behavior has the tendency to increase their profitability. It is also noticeable that although the 65 commercial banks differ in their levels of risk exposure, the higher performing groups of banks exhibit a lower probability of default compared to their competitors. Table 4. Mean values for each cluster of banks during 2009–2010 Cluster 1 2 3 4 5 6 7 8 9 10 11 12 13 14

ROAA 0.85 0.41 2.45 2.13 1.19 −0.69 1.25 1.76 2.33 0.07 1.26 0.61 1.23 −3.64

ROAE 5.15 1.85 17.06 15.60 12.42 −4.52 13.44 11.69 2.47 0.10 13.40 4.72 13.79 −24.25

NIM 1.37 3.86 3.81 5.31 4.40 5.41 4.27 4.02 3.86 3.06 2.80 3.29 3.60 2.53

LLRGL 6.82 3.60 2.60 2.69 4.69 3.08 5.85 2.94 0.00 0.02 3.05 5.62 3.13 9.37

NLTA 10.78 44.36 58.98 72.01 37.52 33.07 47.94 59.65 6.42 58.47 60.57 63.91 51.88 60.71

Z-score 13.53 8.31 14.78 21.32 16.42 22.01 27.61 30.05 83.93 53.21 45.61 12.26 66.08 6.17

EQTA 14.98 21.53 14.42 13.08 9.95 14.35 9.28 16.65 90.17 58.57 10.49 13.41 9.52 13.92

432

M. Kessentini and E. Jeffers Table 5. Clustering evaluation of experimental results using SOM algorithm Datasets QE TE CPU time Datasets 2007 and 2008 1.517 0.000 0 m 45 s Datasets 2009 and 2010 1.089 0.008 0 m 43 s

5 Conclusion This paper has been devoted to laying out the usefulness of SOM to explore, in topological space, the relative differences between Islamic and conventional banks in different countries over the period 2007–2010. The use of the SOM has a dual objective: to show if there is a difference in the financial characteristics of Islamic and conventional banks and to establish whether there is a significant correlation between profitability and risk profile. The used neural network is also suitable to achieve a longitudinal analysis by observing the variation of bank’s performance in the Middle East and Asian countries. We think that it can be a very interesting tool for managers, financial analysts, and supervisors in the banking sector. The interest of this method is threefold. For shareholders and investors, performance and risks measures are interrelated and necessary to choose the investments opportunities. The SOM tool makes it possible to aggregate a set of criteria (for example, performance and risk) into a single graphical representation and thus makes possible an overall view not only of a single bank but also of the entire banking sector in a given region. For regulators, because the banking sector is vital to the economies of developed countries, it is often heavily regulated by strict banking laws and national or international prudential standards. A classification of banks using the SOM tool makes it possible, for example, to know the level of risk of a given sector and to give precise information on a possible regulation. It also makes possible more accurate targeting of banks that fail in regulatory matters. For bank executives, the SOM tool makes it possible to classify a bank according to several criteria when compared to its competitors and thus constitutes a sort of dashboard for making strategic decisions. Besides the methodological contribution, our research has provided multi-dimensional criteria integrating the interests and objectives of the relevant stakeholders. The selected ratio reflects stakeholders’ perception of key performance indicators. More importantly, the rapid increase of Islamic banking in some countries like Bahrain and Malaysia since the financial global crisis has arguably promoted a more active debate around the comparison between Islamic and conventional banks. So, it is crucial to understand how Islamic banks in both the Middle East and Asian countries performed. However, despite the popular use and the numerous advantages of SOM, some methodological pitfalls have been revealed. In fact, the convergence of the SOM algorithm is strongly related to the choice of several parameters: the size of the map, the adaptation step, the extent of the neighborhood, and policy changes in the learning process. Inappropriate choices of the network structure can degrade SOM performance. Numerous researchers have

Visual Exploration and Analysis of Bank Performance

433

developed new versions of Kohonen networks in order to overcome such elementary methodological pitfalls. Further researches should consider a larger sample of banks, which have other business models (cooperative banks, savings banks) in order to obtain a more exhaustive comparison. The used of another version of the SOM can be also considered in order to obtain more robust results.

References 1. Tufte, E.R.: Visual Explanations: Images and Quantities, Evidence and Narrative, 1st ed. Graphics Press, Pristine Condition, Cheshire (1997) 2. Card, S.K., Mackinlay, J.D., Shneiderman, B.: Readings in Information Visualization: Using Vision to Think. Morgan Kaufmann, New York (1999) 3. Perin, C.: Direct manipulation for information visualization. Thèse de doctorat Université Paris Sud - Paris XI (2014) 4. Ware, C., Bobrow, R.: Motion to support rapid interactive queries on node-link diagrams. ACM Trans. Appl. Percept. 1(1), 3–18 (2004) 5. Peng, W., Ward, M.O., Rundensteiner, E.A.: Clustter reduction in multi-dimensional data visualization using dimension reordering. In: Proceedings of the IEEE Symposium on Information Visualization (infovis’04), pp. 89–96. IEEE, Austin, TX, USA (2004) 6. Charnes, A., Cooper, W.W., Rhodes, E.: Measuring the efficiency of decision-making units. Eur. J. Oper. Res. 2(6), 429–444 (1978) 7. Golany, B., Storbeck, J.: A data envelopment analysis of the operational efficiency of bank branches. Interfaces 29(3), 14–26 (1999) 8. Mostafa, M.: Clustering the ecological footprint of nations using Kohonen’s self organizing laps. Expert Syst. Appl. 37(4), 2747–2755 (2010) 9. Kohonen, T.: Self organized formation of topological correct feature maps. Biol. Cybern. 43 (1), 59–69 (1982) 10. Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195(1), 215–243 (1968) 11. Gray, R.M.: Vector quantization. IEEE ASSP Mag. 1(2), 4–29 (1984) 12. Kaski, S., Lagus, K.: Comparing self-organizing maps. In: von der Malsburg,C., Sendho, B. (eds.) Lecture Notes in Computer Science, ser. 1112, pp. 809–814. Springer-Verlag, Berlin, Germany (1996) 13. Najand, S., Lo, Z., Bavarian, B.: Application of self organizing neural networks for mobile robot environment learning. Neural Netw. Robot. 202(1), 85–96 (1993) 14. Jagric, T., Bojnec, S., Jagric, V.: Optimized spiral spherical self-organizing map approach to sector analysis—the case of banking. Expert Syst. Appl. 42(13), 5531–5540 (2015) 15. Mart’n del Br’o, B., Serrano-Cinca, C.: Self-organizing neural networks for the analysis and representation of data: some financial cases. Neural Comput. Appl., Springer Verlag, 1(3), 93–206 (1993) 16. Kiviluoto, K., Bergius, P.: Analyzing financial statements with the self organizing map. In: Proceeding WSOM’ 97 Workshop Self-Organizing Maps, 362–367, Helsinki University of Technology, Espoo, Finland (1997) 17. Sarlin, P.: Decomposing the global financial crisis: a self-organizing time map. Pattern Recogn. Lett. 34, 1701–1709 (2013) 18. Sarlin, P., Zhiyuan, Y.: Clustering of the self-organizing time map. Neurocomputing 121, 317–327 (2013)

434

M. Kessentini and E. Jeffers

19. Eklund, B., Back, H., Vanharanta, H., Visa, A.: Assessing the feasibility of self-organizing maps for data mining financial information. In: Proceedings of the Xth European Conference on Information Systems (ECIS 2002), pp. 528–537, Gdansk, Poland (2002) 20. Dittenbach, M., Merkl, D., Rauber, A.: The growing hierarchical self organizing map. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 2000) 21. Länsiluoto, A., Eklund, T., Back, B., Vanharanta, H., Visa, A.: Industry-specific cycles and companies’ financial performance comparison using self-organizing maps. Benchmarking: Int. J. 11, 267–286 (2004) 22. Beck, T., Kunt, A., Merrouche, O.: Islamic vs. conventional banking: business model, efficiency and stability. J. Bank. Financ. 7, 433–447 (2013) 23. Fethi, M., Pasiouras, F.: Assessing bank efficiency and performance with operational research and artificial intelligence techniques: a survey. Eur. J. Oper. Res. 204, 189–198 (2010) 24. Olson, D., Zoubi, T.: Using accounting ratios to distinguish between Islamic and conventional banks in the GCC region. Int. J. Account. 43, 45–65 (2008) 25. Olson, D., Zoubi, T.: Convergence in bank performance for commercial and Islamic banks during and after the global financial crisis. Q. Rev. Econ. Financ. 65, 71–87 (2016) 26. Abedifar, P., Molyneux, P., Tarazi, A.: Risk in Islamic banking. Rev. Financ. 17(6), 2035– 2096 (2013) 27. Boyd, J.H., Graham, S.L.: Risk, regulation, and bank holding company expansion into nonbanking. Q. Rev., Federal Reserve Bank of Minneapolis 10(2), 2–17 (1986) 28. Tan, Y.: The impacts of risk and competition on bank profitability in China. J. Int. Financ. Mark., Inst. Money 40, 85–110 (2016) 29. Kohonen, T.: Essentials of the self-organizing map. Neural Netw. 37, 52–65 (2013) 30. Pal, N., Bezdek, J., Tsao, E.K.: Generalized clustering networks and Kohonen selforganizing scheme. IEEE Trans. Neural Netw. 4, 549–557 (1993) 31. Zhao, W.L., Deng, C.H., Ngo, C.W.: k-means: a revisit. Neurocomputing 291, 195–206 (2018) 32. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinbach, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

A Pattern Methodology to Specify Usable Design and Security in Websites Taheni Filali(&) and Med Salim Bouhlel Research Lab SETIT, University of Sfax, Sfax, Tunisia [email protected], [email protected]

Abstract. Over the past decade, with the progress in technology, there has emerged various needs for users to get more interactive with Web applications. Commercial web services have spread the use of rich interfaces to provide users with a meaningful interaction with these applications. Nevertheless, the dynamic nature of the context of interaction imposes practitioners to extract various requirements such as user needs and choose the appropriate actions to perform them. However, in real practice, practitioners hardly combine between exploring the space problem to extract users’ goals and creating a high quality user interface. To overcome this challenge, we propose a set of measurements criteria basically integrated into specific comprehensive indicators. Our goal is to evaluate the main aspects of quality requirements (Design, security). Thus, practitioners will have a flexible support based on a high quality model to improve the extracted services and the usability of commercial Web-based application in a dynamic context. Keywords: Context engine Assessment platform

 Web interfaces  Quality model 

1 Introduction Various quality issues have recently affected e-commercial services. One of these main issues is the difficulty of using a Web-application and to interact with it. This may contrarily influence the association’s position. In this manner, it is critical for any association to effortlessly lead an assessment of the nature of their e-business administrations. Accordingly, they can enhance their contributions after some time and their benchmark against rivals [1]. The programmed assessment of the nature of commercially oriented Web interfaces has been a rising field of research over the most recent couple of years. Several approaches have been proposed under various names. Huge numbers of them are not consumer-based measures of quality, for example, measures incorporate devoured time per visit [2], deals exchanges [3], Web traffic and server logs [4]. Thus, they are not designed to assess the site quality but the site efficiency. Others approaches listed multiple quality features of Web applications without investigating the relationships between them [5]. They do not demonstrate the structure of the quality dimensions.

© Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 435–448, 2020. https://doi.org/10.1007/978-3-030-21005-2_42

436

T. Filali and M. S. Bouhlel

Nevertheless, several approaches are limited to evaluate the performance metrics for general Web-applications without treating the specific case of e-commercial services [6]. Based on these findings, our work introduces another technique to evaluate usability, security, and e-commerce prerequisites of sites. Past research works propose a rundow of essential prerequisites for human-computer interaction, security, and web based business in a free way, however they don’t incorporate these three perspectives into a solitary assessment strategy [7]. We introduce a review of these fundamental prerequisites, which are in this way fused into an arrangement of measurements. The proposed measurement model relies upon the Goal Question Metric system and enhanced with a plan of scientific equations [8]. We show the suitability of the estimations by using an illustrative case as a proof-of-thought together with a preliminary ease of use consider. The remaining of the paper is as per the following. Section 2 gives a short survey of the past work. Section 3 talks about and dissects the proposed methodology. Section 4 characterizes the dimensions of the proposed methodology and its markers. Section 5 finishes up the paper and recommends some future works.

2 Previous Work Different researchers appear in their work the noteworthiness of finding quality among security and ease of use. As demonstrated by Cranor et al. [9], Atoyan et al. [10], and Braz et al. [11], it is critical to think about design principles, ease of use and security of commercial website in the same time. Remembering these thoughts, we raise the need of a strategy that shows whether the sites need to upgrade as shown by necessities in HCI, security, and e-business without performing resource asking for convenience inspects. We propose an estimation strategy, which shows which of these three perspectives explicitly require change. We take into account the automation of ease of use assessment (the utilization of computer software and computer strategies to check the ease of use and to recognize the interface issues without human intervention) as a remarkable instrument for computer interface progression and convenience examination. In addition, it very interesting to enhance subjective and qualitative data procured from ease of use [15]. Existing Web Usability Validation Tools. Table 1 takes a gander at existent web convenience and accessibility validators with our instrument “WQM”. Conventionally, they are executed as a web application themselves, only a couple are work zone GUI applications. After a URL has been entered in an information field, the validator downloads the page, investigations the HTML and yields an once-over of likely convenience issues. WebXACT is a business validator which centers on covering anyway a great part of the WCAG and region 508 guidelines as could be normal. Under the name “quality”, its results fuse some page content checks which can be seen as usability instead of transparency checks, e.g. finding broken associations and evaluating the page click significance.

A Pattern Methodology to Specify Usable Design and Security in Websites

437

WebTango [16] receives an unpredictable system: Usability rules are not executed as a computation for each standard. Or maybe, quantifiable techniques are used to figure the resemblance between the site that is surveyed and a course of action of “known-great” locales whose accessibility and usability has been assessed by authorities using manual examination. Kwaresmi [17] is a scholastic model with an accentuation on rapidly indicating extra tests for the validator utilizing an uncommon rule definition language. MAGENTA is a case for a device which does distinguish issues, as well as right a few mistakes if the client wishes. This can be worthwhile for web designers who don’t have the essential learning to recognize the right fix for an issue. A comparable, XML based rule depiction language is utilized by EvalIris [18]. The ATRC Web Accessibility Checker is the upgraded electronic version of the APrompt GUI application. Beside extent of WCAG 1.0/2.0 and related tenets, it features yield in a machine-fathomable arrangement (W3C EARL, Evaluation And Report Language). It is exemplary in its modularized and systematic approach to manage WCAG endorsement. Additionally, it can get some data about a specific assumed issue (“Does the stay contain content that identif[ies] the association objective?”) rather than yielding a notice for every event of the issue. Thusly, it reduces the amount of false positives. Our model WQM separates itself from different instruments around there in light of the fact that it bases its website architecture and security provide details regarding web designing models of the site pages. Our model “WQM” is the only tool which takes advantage of the information in Web engineering models when performing Usability, Design and Security validation. Table 1 represents this comparison. Table 1. Correlation of tools for automated accessibility and usability validation of Web interfaces. Analysis Web quality type standard WebXACT Heuristics acc., privacy, content WebTango Statistics Reference sites Kwaresmi Heuristics Accessibility MAGENTA Heuristics Accessibility ATRC Heuristics Accessibility ArgoUWE Heuristics no acc./usab. tool WQM Heuristics Design, security

Run-time extensibility No

Input

Output

Interaction

Site

Reports

No

N/a

Site

Reports

No

Yes Yes No No

Page Page Page Models

Report Report Report EARL IDE warnings

No No No No

Yes

Page

Report +graphical chart

Yes

438

T. Filali and M. S. Bouhlel

3 Assessment Model for Secure and Usable CommerciallyOriented Websites The work done by the distinctive scientists is a fascinating beginning stage. They propose rules, criteria and measurements within a technique specific to every one of them. In light of this state of art examination, it would now eagerness to make a social occasion of the criteria proposed in the unmistakable examinations. The goal of this examination is to build up a theoretical, expansive, and quantifiable framework for evaluating the nature of business web interfaces to give straight forward criteria to connect with updates of website composition and its utilization [20]. What’s more, we except to build up a structure that is set up for reliable applications over an expansive degree of business web interfaces. Our method overlaid industry and academic research to see quality components with a specific extreme aim to meet the goals of this examination. The goal is the enhancement of methodology for other quality sections, particularly, appropriateness and the progress of a purpose behind a web application-specific quality tree, seeing the specific quantifiable points of view and proposing a test model to assess them. Thusly, after a trial of gathering, regrouping and expanding the criteria raised by the researchers, we propose 2-estimational criteria which are clear and joins every single past measurement and components, having in mind the ultimate objective being for it to be utilized as a general criteria to assess commercial web interfaces. The measurements of the proposed model are the quality of design and the quality of security. To ask about how our proposed norm were utilized as a bit of past examinations, we rearranged each fragment of each estimation of the past work to be under one of the two new measurements. The proposed approach endeavors to join information and experience from different sources, an extent of reference disciplines and observational practices. The goal is to see quantifiable features and indicators that beginning from now contains a convincing site. The proposed model can be utilized to break down between the quality of commercial web interfaces, to perceive a route for development of a site page, and to give a standard to originators and designers while making new goals. After we assessed every appraisal norm, we added its pointers to the fitting spot of the proposed 2-estimations criteria, other than fusing two or three markers in which we see them essential from our own particular experience. Our criteria consolidates each and every principle indicator of the previous investigations of assessing the quality of commercial web interfaces Fig. 1 assembles the dynamic chain of importance of the proposed model. Reflecting to Lord Kelvin’s, as referred by Bellovin [21]: “If you cannot measure it, you cannot improve it, When you have the ability to assess what you are discussing, and present it in numbers, you know something about it; nevertheless when you can’t check it, when you can’t present it in numbers, your knowledge is of unsatisfactory kind; it may be the beginning of data, yet you have scarcely in your thoughts advanced to the state of science, whatever the issue may be”. It is critical to have an apparatus that gives quantitative dimensions to design, security, and e-commerce necessities.

A Pattern Methodology to Specify Usable Design and Security in Websites

439

Commercial Website Quality Security Quality

Design Quality

-Confidentiality-Privacy

-Balance

-Integrity

-Symmetry

-Authentication

-Order & Complexity

-No-repudiation

-Sequence

-Authorization

-Equilibrium

-Intrusion detection -Transparent security -Auditing

Fig. 1. The progressive hierarchy of the proposed model.

3.1

Design Quality

This measurement is identified with the visual highlights of sites’ design. Poor design can make the client exhausted and confounded. Thus, most previous examinations consider the design quality as an imperative measurement. All organizations and companies should try to design their sites in an alluring and imaginative way to pull in their clients and urge them to remain for a longer period exploring within site and reenter it. This dimension characterizes the measures of equilibrium, balance, complexity, order and symmetry. Balance Dimension. Balance is achieved by centering the format, therefore keeping up a comparable weighting of the parts on each side of the format is a necessity. The equation is: BD ¼ ðWT þ WB þ WL þ WR Þ

ð1Þ

Where, W represent the weighting of each side of the format (top, bottom, right and left). The layout is in a state of balance. If BD ¼ ð0; 0Þ. The weightiness of a layout is the arithmetical entirety of the heaviness of its parts: W¼

X i

ai þ di

Reflect on the layout shown in Fig. 2, we obtain from Eq. 2 W L ¼ a1 x 1 þ a2 x 2 ; W R ¼ a3 x 3

ð2Þ

440

T. Filali and M. S. Bouhlel

W T ¼ a1 y 1 W B ¼ a2 y 2 þ a3 y 3 Visual adjust might be accomplished through symmetry or asymmetry. An adjusted display makes a sentiment of trust while a lopsided display makes a bad impact for the viewer.

Fig. 2. Layout example

Equilibrium Dimension. Equilibrium can be portrayed as identical agreement between negating powers. Any visual item establishes a point of convergence of powers. The cooperation between numerous graphic items as focal points of powers is the premise of structure. A design is in harmony when its inside orchestrates with the point of convergence of the edge. The equation is: ED ¼ ðxc ; yc Þ  ðx0 ; y0 Þ

ð3Þ

Where ED is the dimension of equilibrium, (x0, y0) is the focal point of a layout. (xc, yc) is the focal point of the frame. Equilibrium and balance are the equal when xc ¼ x0 and yc ¼ y0 or WL ¼ WR and WT ¼ WB Symmetry Measure. Symmetry is the adjusted conveyance of comparable components around a common line.

A Pattern Methodology to Specify Usable Design and Security in Websites

441

Vertical symmetry indicates the adjusted arrangement of components around a vertical pivot, however the horizontal symmetry indicates to the adjusted arrangement of components around a horizontal pivot. The equation is: SD ¼ f½wUL  wLL ; ½wUR  wLR ; ½wUL  wUR ; ½wLL  wLR ; ½wUL  wLR ; ½wUR  wLL  ð4Þ Where, SD is the dimension of symmetry. W is the weight on the (upper-left, upper-right, lower-left, lower-right) quadrant. For vertical symmetry, wUL  wUR ¼ ð0; 0; 0; 0Þ wLL  wLR ¼ ð0; 0; 0; 0Þ For horizontal symmetry, wUL  wLL ¼ ð0; 0; 0; 0Þ wUR  wLR ¼ ð0; 0; 0; 0Þ For radial symmetry, wUL  wLR ¼ ð0; 0; 0; 0Þ wUR  wLL ¼ ð0; 0; 0; 0Þ Sequence Dimension. Make out the arrangement of elements in a layout in a way that empowers the eye development. Streveler and Wasserman [23] obtained that visual objects arranged in the upper-left quadrant of a frame were found speediest, and that those arranged in the lower-right quadrant took long time to be found. The equation is: aq Where a: is the area of the object. q: is the value of the specific quadrant. The object will be more attractive, when the weight will be greater.

ð5Þ

442

T. Filali and M. S. Bouhlel

Order and Complexity Dimension. The order of a layout is expressed as: O ¼ OBD þ OED þ OSD þ OOCD

ð6Þ

Where OBM: Dimension of Balance OED: Dimension of Equilibrium OSD: Dimension of Symmetry OSQD: Dimension of Order &Complexity The complexity C of a design is characterized as the quantity of its segments. With arrange, the things are look into one piece. 3.2

Security Quality

Padmanabhuni et al. [24] accentuate that present security advancements on the web can deal with desires for based business websites. They call attention to the accompanying essential security necessities as alluring qualities of electronic frameworks. • Confidentiality: This need communicates that data should simply be recognized by the expected beneficiary. • No-revocation: This need guarantees that the creator or the sender of the information can’t repudiate at a delayed time his or her interest in the construction or the diffusion of the data. • Privacy: It is normally more broad than Confidentiality and manages the divergence of data to approved gatherings as it were. • Authentication: Sender and receiver ought to have the capacity to affirm their identities and the root of the data. • Authorization: This prerequisite guarantees that an individual has the expected consents to play out certain activity. • Availability: The computer assets ought to be accessible to approved users when they require them. • Integrity: This necessity expresses that data ought not to be adjusted away or transit between a sender and the proposed collector without the change being distinguished. • Reviewing: The ability to realize who did what, when and where. • Transparent security. A few clients have a tendency to consider security perspectives as totally auxiliary subjects. Therefore, a user-friendly system is necessary for general users. The formula of security is: S = R 8 i, i 2 {security metric} } ai wi Where ai: the value of the ith requirement for security. wi: The weight value for the ith security (the webmaster must specify the importance rate).

A Pattern Methodology to Specify Usable Design and Security in Websites

443

4 Case Study Considering a three-dimensional (3D) plot, where the x-pivot represent the design factor prerequisites, the y-pivot represent the security prerequisites and the z-pivot the context requirements. It is credible to represent graphically the satisfiability of prerequisites proposed by our technique (Fig. 3).

Fig. 3. Three-dimensional (3D) evaluation plot

The importance of the web evaluation framework has been proposed by three-level structures, which are context characteristics, aesthetic characteristics and security characteristics In the first level, the web evaluation framework proposed 3 context characteristics which included “user, platform and environment”. The second level characteristic is broken down by 5 aesthetic characteristics (Balance, symmetry, sequence, equilibrium, order and complexity). Each characteristic is inherited from parental design characteristics and directly divided into aesthetic measure. The third level represents the security characteristics (Confidentiality, privacy, integrity, authentication, no-repudiation, authorization, intrusion detection, transparent security and auditing). In this level, we use a Yes/No security questionnaire [26]. Totally, our questionnaire contains 10 questions. Lastly, the site quality measurements computes the quality criteria through a few assessment formulae giving outcomes with the significant quality scores. The following figure “Fig. 4” represents the architecture of our proposed assessment process. The purpose of the study is to validate if the proposed set of aesthetic and security measures can assess the quality of commercial web interfaces. Thus, we choose to test the quality of one of the most recognized commercial Tunisian website “Tunisair”. Our test was done by a group of students from the National School of Engineering Sfax -Tunisia. They were selected because of their interest in the field of the HMI. According to volunteers’ demographic data, we have found that 52.2% of respondents were males and 47.8% were females. In terms of age, 87% were between 18 and 25 years old, 13% were between 26 and 35 years, 83.3% respondents indicated a high

444

T. Filali and M. S. Bouhlel

Fig. 4. The architecture of our proposed assessment process

frequency of internet usage (every day), 13.8% respondents could be defined as medium frequency users (3 to 5 times a week), while the remaining 2.9% were low frequency users (1–2 times a week). The web assessment instrument has primary client work interface. It offers graphical symbols, content names, catches with completely speak to the data of site quality to a client. The data incorporates the score of every quality trademark, last score for nature of site, a graphical outline it identifies with every quality attributes, and these capacities are performed through measures and questions responses. Following Fig. 5 show the main interface for web evaluation tool.

A Pattern Methodology to Specify Usable Design and Security in Websites

445

Fig. 5. Interface of web evaluation tool

The results found validate that our quality model can specify which quality dimensions need to improvement and which are satisfactory, so our model of quality is practical and it can give us quite promising results, it looks into the latest assessment strategies that were utilized as a part of assessing the quality of various web interfaces, and proposes a complete model for evaluating the quality of any commercial web interfaces. These measurements with their indicators, subsequent to being given sure weights, could be operationalized and changed over into a survey. This survey could be connected to all commercial web interfaces. Results from the examination of the survey will help in assessing these measurements and their indicators and make the required update on them and build up a compelling base of guidelines.

5 Conclusion and Perspectives The use of recent information and communication technologies delivered new generation in business, trade, and economics. The Web application made another business condition far unique in relation with anything that has proceeded it. This enhanced the need of measurement criteria to assess the different aspects related to the quality of Web applications. Nevertheless, quality issues have influenced every business area in recent years, since a society or a company with a website that is hard to interact with, gives a poor image on the Internet and enfeebles a company’s position. In this manner, it is essential for a company to evaluate the quality of its e-commercial services. Thus, it will be able to improve its services and benchmarks against competitors. In this paper, we focus on studying the evaluation measurement. We proposed a model based on the most relevant dimensions. We found that incorporating these dimensions into

446

T. Filali and M. S. Bouhlel

specific comprehensive indicators will provide to practitioners a flexible support. Thus, our model will improve the extracted services and the usability of commercial Webbased application in a dynamic context. As a future target we aim to test our model of quality with separate users of no-identical context of use to extract the effect of each element of context in our model of quality. Acknowledgements. This work was supported and financing by the Ministry of Higher Education and Scientific Research of Tunisia.

References 1. Barnes, S., Vidgen, R.: Assessing the quality of auction web sites (2011) 2. Garzotto, F., Mainetti, L., Paolini, P.: Hypermedia design analysis and evaluation issues. Commun. ACM 38(8) (2010) 3. Alexander, J., Tate, M.A.: Checklist for an informational web page (1996). http://www2. widener.edu/Wolfgram-Memorial 4. Liu, C., Arnett, K.: Exploring the factors associated with web site success in the context of electronic commerce. Inf. Manag. 38, 23–33 (2000) 5. Morville, P.: Information, architecture and usability (2012). http://www.webreview.com/ 1999/03_12/strategists/03_12_99_3.shtml 6. Signore, O.: A comprehensive model for web sites quality (2005) 7. Chettaoui, N., Bouhlel, M.S.: I2Evaluator: an aesthetic metric-tool for evaluating the usability of adaptive user interfaces. Egypt, 31 August 2017 8. Filali, T., Chettaoui, N., Bouhlel, M.S.: Towards the automatic evaluation of the quality of commercially-oriented web interfaces. In: The 7th International Conferences: Sciences of Electronics, Technologies of Information and Telecommunications, SETIT 2016, Hammamat-Tunisia, ser. IEEE Conferences (2016). https://doi.org/10.1109/SETIT.2016. 7939873 9. Cranor, L.F., Garfinkel, S.: Security and Usability: Designing Secure Systems that People Can Use. O’Reilly, Sebastopol (2015) 10. Atoyan, H., Duquet, J., Robert, J.: Trust in New Decision Aid Systems. ACM Press, New York (2006) 11. Braz, C., Seffah, A., M’Raihi, D.: Designing a trade-off between usability and security: a metrics based-model. In: Baranauskas, C., Palanque, P., Abascal, J., Barbosa, S.D.J. (eds.) Human-Computer Interaction – INTERACT 2007, vol. 4663. Springer, Heidelberg (2007) 12. Cranor, L.F.: Designing a Privacy Preference Specification Interface: A Case Study. Press, New York (2013) 13. Yurcik, W., Barlow, J., Lakkaraju, K., Haberman, M.: Two Visual Computer Network Security Monitoring Tools Incorporating Operator Interface Requirements. Press, New York (2013) 14. Johnston, J., Eloff, J., Labuschagne, L.: Security and human computer interfaces. Comput. Secur. 22(8), 675–684 (2013) 15. Dhamija, R., Dusseault, L.: The seven flaws of identity management. IEEE Secur. Priv. 1540 (7993/08), 24 (2008) 16. Author, F.: Article title. Journal 2(5), 99–110 (2016) 17. Author, F., Author, S.: Title of a proceedings paper. In: Editor, F., Editor, S. (eds.) CONFERENCE 2016. LNCS, vol. 9999, pp. 1–13. Springer, Heidelberg (2016) 18. Author, F., Author, S., Author, T.: Book title. 2nd edn. Publisher, Location (1999)

A Pattern Methodology to Specify Usable Design and Security in Websites

447

19. Author, F.: Contribution title. In: 9th International Proceedings on Proceedings, pp. 1–2. Publisher, Location (2010) 20. LNCS Homepage. http://www.springer.com/lncs. Accessed 21 Nov 2016 21. Triki, N., Kallel, M., Bouhlel, M.S.: Imaging and HMI: fondations and complementarities, March 2012 22. Ivory, M.Y., Sinha, R.R., Hearst, M.A.: Empirically validated web page design metrics. Seattle, WA, USA, March/April 2001 23. Beirekdar, A., Vanderdonckt, J., Noirhomme-Fraiture, M.: A framework and a language for usability automatic evaluation of web sites by static analysis of html source code. Valenciennes, France, May 2012 24. Abascal, J., Arrue, M., Garay, N., Tomás, J.: EvalIris – a web service for web accessibility evaluation. Budapest, Hungary, pp. 20–24, May 2003 25. Knapp, A., Koch, N., Zhang, G., Hassler, H.-M.: Modeling business processes in web applications with argouwe. In: Baar, T., Strohmeier, A., Moreira, A., Mellor, S.J. (eds.) «UML » 2004 — The Unified Modeling Language. Modeling Languages and Applications, UML 2004. Springer, Heidelberg (2004) 26. Abderrahim, Z., Hamrouni, N.: Towards optimizing the dynamic placement of tasks by multi agent systems, March 2012 27. Bellovin, S.M.: On the brittleness of software and the infeasibility of security metrics. IEEE Secur. Priv. 1540, 7993–7996 (2006) 28. Chettaoui, N., Bouhlel, M.S., Lapayre, J.C.: Development of collaborative software platform of image processing for an optical probe (2014) 29. Streveler, D.J., Wasserman, A.I.: Quantitative measures of the spatial properties of screen designs, pp. 1–125 (1984) 30. Padmanabhuni, S., Adarkar, H.: Security in service oriented architecture: issues, standards and implementation. In: Service-Oriented Software System Engineering: Challenges and Practices, Chap. 1. Idea Group Publishing, Hershey (2015) 31. Assila, A., Bouhlel, M.S.: A Web questionnaire generating tool to aid for interactive systems quality subjective assessment. Publication (2013) 32. Al-Janabi, S.: Pragmatic Miner to Risk Analysis for Intrusion Detection (PMRA-ID). In: Mohamed, A., Berry, M., Yap, B. (eds.) Soft Computing in Data Science, SCDS 2017. Communications in Computer and Information Science, vol. 788. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-7242-0_23 33. Ahmed Patel, B., Al-Janabi, S., Al-Shourbaji, I., Pedersen, J.: A novel methodology towards a trusted environment in mashup web applications. Comput. Secur. 49, 107–122 (2015). https://doi.org/10.1016/j.cose.2014.10.009. http://www.sciencedirect.com/science/article/pii/ S0167404814001552. ISSN 0167-4048 34. Ahamad, S., Al-Shourbaji, I., Al-Janabi, S.: A secure NFC mobile payment protocol based on bio-metrics with formal verification. Int. J. Internet Technol. Secured Trans. 6(2), 103– 132 (2016). https://doi.org/10.1504/IJITST.2016.078579. https://www.inderscienceonline. com/doi/pdf/10.1504/IJITST.2016.078579 35. Al-Janabi, D.S., Patel, A., Fatlawi, H., Kalajdzic, K., Al Shourbaji, I.: Empirical rapid and accurate prediction model for data mining tasks in cloud computing environments. In: 2014 International Congress on Technology, Communication and Knowledge (ICTCK), Mashhad, pp. 1–8. IEEE (2014). https://doi.org/10.1109/ICTCK.2014.7033495. http://ieeexplore.ieee. org/stamp/stamp.jsp?tp=&arnumber=7033495&isnumber=7033487

448

T. Filali and M. S. Bouhlel

36. Ali, E.S.H.: Novel approach for generating the key of stream cipher system using random forest data mining algorithm. In: 2013 Sixth International Conference on Developments in eSystems Engineering, Abu Dhabi, pp. 259–269. IEEE (2013). https://doi.org/10.1109/ DeSE.2013.54. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7041127& isnumber=7041072 37. Al-Janabi, S., Al-Shourbaji, I.: A study of cyber security awareness in educational environment in the middle east. J. Inf. Knowl. Manage. 15(01), 1650007 (2016). https://doi. org/10.1142/S0219649216500076. http://www.worldscientific.com/doi/abs/10.1142/S021 9649216500076 38. Abdaoui, N., Khalifa, I.H., Faiz, S.: Sending a personalized advertisement to loyal customers in the ubiquitous environment. In: The 7th International Conferences: Sciences of Electronics, Technologies of Information and Telecommunications SETIT 2016, Hammamat-Tunisia, ser. IEEE Conferences (2016). https://doi.org/10.1109/SETIT.2016. 7939838 39. Khalfallah, N., Ouali, S., Kraiem, N.: A proposal for a variability management frame-work. In: The 7th International Conferences: Sciences of Electronics, Technologies of Information and Telecommunications SETIT 2016, Hammamat-Tunisia, ser. IEEE Conferences (2016). https://doi.org/10.1109/SETIT.2016.7939852 40. Takrouni, M. Hasnaoui, A., Gdhaifi, M., Ezzedine, T., Hasnaoui, S.: Design and implementation of a Data Distribution Service blockset for an SAE Benchmark electric vehicle. In: The 7th International Conferences: Sciences of Electronics, Technologies of Information and Telecommunications SETIT 2016, Hammamat-Tunisia, ser. IEEE Conferences (2016). https://doi.org/10.1109/SETIT.2016.7939871 41. Toujani, R., Akaichi, J.: Fuzzy sentiment classification in social network Facebook’ statuses mining. In: The 7th International Conferences: Sciences of Electronics, Technologies of Information and Telecommunications SETIT 2016, Hammamat-Tunisia, ser. IEEE Conferences (2016). https://doi.org/10.1109/SETIT.2016.7939902 42. Fredj, I.B., Ouni, K.: Fuzzy k-nearest neighbors applied to phoneme recognition. In: The 7th International Conferences: Sciences of Electronics, Technologies of Information and Telecommunications SETIT 2016, Hammamat-Tunisia, ser. IEEE Conferences (2016)). https://doi.org/10.1109/SETIT.2016.7939907

Multi-agents Planner for Assistance in Conducting Energy Sharing Processes Bilal Bou Saleh1,2(&), Ghazi Bou Saleh3(&), Mohammad Hajjar3(&), Abdellah El Moudni1(&), and Oussama Barakat1(&) 1

University of Bourgogne Franche Comté, Dijon, France [email protected], [email protected], [email protected] 2 Lebanese University, Beirut, Lebanon 3 Faculty of Technology-SAIDA, Lebanese University, Beirut, Lebanon [email protected], [email protected]

Abstract. The purpose of this paper is to present an agents-based methodology that allows for the creation and optimization of schedule while taking into account a wide range of constraints or preferences. When some smart households benefit from a common energy source, if the available power is limited, the problem to be solved for improving energy efficiency is how to program the power-on time of the peripherals according to the power limits and taking into account the preferences of the users. The proposed operating system was developed as multi-agent systems (MAS) on the JADE platform. The implementation is discussed by describing in detail each agent and the control algorithm. In addition, complementary metrics are proposed, to evaluate the performance of the planning method. Finally, to illustrate the proposed method, some simulation results are presented. Keywords: Multi-agents scheduler  Planner  Energetic efficiency  Smart grid

1 Introduction Creating a program is a common problem in many areas of application. This problem arises when one wants to distribute a set of activities in time intervals of the so-called horizon period. This distribution must respect the restrictions generated by the resources or by the activities themselves. Create a weekly course program in a school, a waiting list in the doctor’s office, the scheduling of surgical procedures in the operating rooms of a hospital, the planning of the activation of machines in a factory, the poweron planning for devices to manage the consumption of energy over time are some examples of applications. Timetabling is a classic NP-hard problem. To solve this problem, most of methods that we find in literature, belong to the operational research community. For example: staining of graphs [1], simplex method [2] and genetic algorithms [3, 4]. These methods are undeniably effective and have been very successful, which explains their popularity in the research community and their broad scope in many areas of application. © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 449–462, 2020. https://doi.org/10.1007/978-3-030-21005-2_43

450

B. Bou Saleh et al.

Despite their great success, these methods suffer from a huge drawback. In fact, it is not possible modifying the existing calendar without completely reconstructing it. These methods provide good predictive planning from static and determined data [5–8]. But during the implementation phase, if an unexpected activity occurs, these methods do not allow us to insert it into the calendar [9]. In addition, even in advance planning, the larger the data set, links, and input restrictions, the more complex the problem, and the more time management problem solving algorithms will require computing power and will take more time. Remember that because in each “modification”, these algorithms start from scratch and deal with the whole problem. This led us to conclude that these methods are not effective in modifying an existing schedule. This means that another approach is required to configure a scheduling system that includes two complementary functions, an “initial” calendar generation function and a maintenance function for that calendar [10]. On the other hand, in the field of artificial intelligence, software agents can perform both the construction and the maintenance of a program. Moreover, we know that they are characterized by their autonomy, their ability to react and their ability to negotiate to reach a satisfactory solution [11, 12].

2 Context Since 1987 (Brundtland Report), the concept of sustainable development is born of the progressive awareness of the ecological finiteness of the Earth. From a technological point of view, sustainable development is the search for the best available technology (BAT) for an identified need that reconciles the three pillars of sustainable development (ecological, social and economic). The energy saving component is an important element of environmental performance. Energy efficiency is based on the optimization of consumption, thus on a “rational use of energy” and on more efficient processes and tools. As all parts of the network, from producer to consumer, must be automated, the NIST-Framework-Roadmap [29] defines standards for smart grid interoperability. European organizations and research centers have adopted it as part of the European Intelligent Networks Reference Architecture to represent the communication and information layers [30]. In this context, technological developments have brought, in recent years, process of remodeling energy management. Our study fits precisely in this thematic of energy management. Let’s now summarize some of technological developments they concern our work: • Renewable energies participation in power generation at the end of 2017 was estimated at 26.5% [28]. Solar energy, wind energy, energy storage and recovery Can be owned and exploited locally. This energy must be taken into account by the management system at the level of the auxiliary sources. However, the production of energy, of so-called renewable sources, is dependent on “external conditions” so may be randomly unavailable. This complicates the problem of energy management and implies the need for dynamic management (in near real time). • Recent developments in communication technologies in smart grids are playing more and more important role in control of distributed energy systems. Thus, the

Multi-agents Planner for Assistance

451

SCADA (Supervisory Control And Data Acquisition) systems used, at the transmission level, for remote control will be used for wider and more sophisticated control. This is thanks to the new possibilities offered by electronic power interfaces capable of managing energy flows and interacts with renewable (or alternative) energy sources, storage units and governable loads. • Progress in computer science (software and hardware), and also in knowledge (practical and theoretical) in the relevant fields of information technology, have led to the new development for: Smart devices for the control of energy consumers [13], Software agents to solve problems through autonomous negotiations [14], Real time metering device to measure the energy consumption with the possibility of remote reading [15], prognostic trends [16]. • Development of auxiliary power sources, (e.g. generator) and high capacity batteries for storing electrical energy to delay its use. The concept of the Smart Grid system consists on an optimized management of electrical energy, but also consists on an intelligent management taking into account the behavioral expectations of the customers. To make a contribution in this area, we will focus below on planning the use of smart devices to meet each household’s preferences and to maintain the energy consumption of a group of households below the maximum available power. The use of software agents [17] would be an appropriate way to tackle this problem [26, 27]. In fact, each agent tries to satisfy the preferences of the household that he represents, collaborates with the other agents to find the right compromise for each request and thus establish with them the common energy program.

3 Cooperative Control System 3.1

Multi-agent System

In the Artificial Intelligence paradigm and in multi-agent systems (MAS) concept, several IT agents are used to model and to resolve by simulation complex problems [25]. In this article we focus on designing an Intelligent Distributed Control System (IDCS) whose agent system can help to solve the problem of the energy sharing application in a smart zone. Starting our study from the needs analysis, we conceptualized our IDCS Energy-planner, as below: Energy-Planner contains two distinct main functions: the creation of an initial planning and the optimization of an existing planning. The generation of the new planning and its optimization operations are performed as a MAS-based server application. Our software of choice is the JADE platform, with the data entry part is a graphic interface designed for this purpose. So, in our prototype, the input application was simply a “graphical interface” associated with a single adminagent. All devices and users data, or other data needed for testing, were manually inserted into the database through this graphical interface.

452

3.2

B. Bou Saleh et al.

Agents Platform

For MAS developments for applications in the field of energy, the JADE (Java Agent DEvelopment Framework [22]) is the most commonly used middleware. It has complete documentation and full compatibility with FIPA (Foundation of Intelligent Physical Agents [23, 24]). JADE is the platform we have chosen to develop EnergyPlanner. In this study, there are five types of agents: Admin-Agent (requests input), OutletAgents (power sockets), Tslot-Agents (partial planners), Manager-Agent (decision maker), expert-Agent, and database-Agent. The behavior of agents can be summarized in the following synoptic. N households merge their requests, issued by the household managers, into a series of data that is entered via GUI Admin-Agent. When ManagerAgent receives a request from Outlet-agents, to make a decision, he accesses the knowledge base and rules by information obtained from data-agent and expert- agent. He launches a call for tenders to Tslots-Agents. He receives the virtual prize offers provided by the partial planners. He concludes and his decision is based on virtual cost minimizing criterion (Fig. 1).

Fig. 1. Diagram of planification process

3.3

Software Architecture

The analysis of the characteristics of the Energy-planner system led to the creation of architecture composed of the following agents: • Admin-Agents: for entering the parameters of queries expressing the need of the users, and responsible for triggering a new planning or modifying an existing one. • Data-Agent: The Admin-Agent can add or update information about: buildings, influencing parameters, devices, preferences of using these devices. This planningrelevant data are sent for Data-Agent who is responsible for sorting, listing, and placing them in a database.

Multi-agents Planner for Assistance

453

• Outlet-Agents: represent subsets of loads (devices) connected to an intelligent socket. It downloads (from the database), filters and stores data about the socket they represent (e.g., connected devices, power capacity, maximum power etc.). In addition, Outlet-Agent communicates with all Device-Agents representing individual devices to control their activation or deactivation. • Tslot-Agents: The planning period is one day, that is, 24 h. We chose to divide the period into 12 slots of two hours for each slot. Through planning, each Tslot-Agent is loaded by devices that are allowed to operate within that time interval. • Manager-Agent: is the decision maker of the trading algorithm. It decides to assign a load to a time slot when building a new planning. It manages the negotiation between the Tslots for the load exchange during the optimization of the planning. It “knows” the Outlet-Agents involved in a negotiation in progress. He has access to the database of Tslot-Agents. Then, Manager-Agent can issue verdicts to decide the time slot to assign to a load (because it knows the content of the load of each slot). 3.4

Preference Weight

Users have preferences regarding the operating time intervals of each device. It is obvious that if the users impose the operating time of all their devices the planning is then imposed and the planning tool “Energy-Planner” is useless. Energy-Planner is designed for an energy distribution context in which users are fully cooperative all together to achieve the common goal of improving energy efficiency at the global level of the community. This means that all users accept the optimized result of the planning tool, which may, for some device power-on time intervals, not match their maximum preference. Thus, we admit that each device that can be controlled is associated, in addition to its technical characteristics, with a vector of preference. This vector indicates in each of its components a metric corresponding to the weight of the preference which is a real number between zero and one and which represents the request weight of the device power-on during corresponding time slots. Thus, we have set up in the database a matrix Pref (outlet, devices, slottime) which will be done and updated by the administrator and which will be entered into the system manually via the admin-agent. Energy-planner uses this matrix to calculate the priority rating function that is needed to decide which activity to plan or move. This feature requires that residents be more attentive in setting their preferences. This gives flexibility to the requested planning and thus gives the system the degree of freedom needed to find an optimal solution. In addition, appliances that are connected to a given outlet can have different importance in terms of service. Each user quantifies according to his needs this subjective parameter as he wishes. To be able to take into account this distinction of importance between the peripherals, we configure for each output a vector that contains the user’s classification for each device. As well, we think that we may encounter a situation in which an outlet is more important than others. To be able to take into account this distinction of importance between the outlets, we have defined a vector containing the classification given by the administrator for all outlets.

454

3.5

B. Bou Saleh et al.

Virtual Cost

To introduce the concept of the virtual cost of an activity consider the following situation: the majority of the users request to benefit from the heating between 18 h and 20 h. It is obvious that the so-called “optimized” solution, which will take into account the real cost and respect the constraint of a maximum energy limit not to be exceeded, such a solution is obsolete. Indeed how to explain to a user that he cannot use heating while his neighbor uses it at the same time. Then we thought that the dissatisfaction of the user must be taken into account in the calculation of the cost. In the case of the previous example, the displacement of the heating activity outside the time slots requested must be more expensive than keeping it in this range even with the additional cost of exceeding the contractual limit. Since the preference matrix Pref (outlet, devices, slottime) contains the user preference weight in the form of a real number between zero and one, we decided to use this matrix to calculate what we called the virtual cost. For better clarity we take the following example: Let the activity A1 corresponding to the devices (d) of the outlet (o) placed in slottime (s1) with the preference (p1). Let Cost (A1) be the real cost of the energy resulting from the insertion of A1 into slottime (s1). We define the virtual cost of A1 as: virtualcostðA1 Þ ¼ costðA1 Þ þ ð1  p1 Þ  costðA1 Þ In other words, we add a fictitious over cost that corresponds to the non-preference of the users. We emphasize that the Energy-Planner software is designed with a high regard for convenience for the user. This led us to use the virtual cost minimization criterion in the optimization process. Therefore, moving an activity to a place of lesser preference implies a deliberately exaggerated cost. 3.6

Evaluation Logic

The scheduling mechanism is executed by an algorithm based on an evaluation logic that is central to the decision to assign a time slot to a device. When creating a new program, during iteration over outlets, each one evaluates the priorities of the remaining devices demands (data is available locally but also in the management agent). Thus, the output agent knows the optimized program (in terms of priority between devices) and the desired one (in terms of preference) that best meets the needs of its peripherals. At the level of the agent manager, the evaluation of requests takes into account: the priority of the request, the preferences of location in terms of time slot, the category of requesting devices and the state of the planning under construction. Note that: each device belongs to a category (Temperature, lighting, ventilation, safety, health care, cleaning, etc.) of load and has a rank of importance within this category. In addition, some appliances are “more important” than others (for example in summer and in the Temperature category, an air conditioner is considered more important than a water heater (the rank of the air conditioner is therefore higher than that of the water heater). In addition, main category (for example, kitchen appliances) has higher rank

Multi-agents Planner for Assistance

455

than the cleaning category (that containing washing machine). In addition to these considerations, when building a program, an extra priority is add to devices that can be located in a time slot close to the one in which it is already programmed. This minimizes downtime for this device. 3.7

Schedule Creation Algorithm

We now explain the planning mechanism for a new schedule. The methodology is distinguished by the existence of a single “decision maker” (Manager-Agent), who has control over the current planning. Manager-Agent negotiates the calendar with OutletAgents, using information of Tslot-Agents about the current status of the calendar. The negotiations are spread over double iterations. Note that for a given day, different devices may require multiple time slots. For clarity in the description of the planning mechanism, we call “activity” the status of power-on one device during one time slot. During a round, each Outlet-Agent (with activities not yet planned) sends ManagerAgent a message requesting an activity that it has selected. Then the decision process start, The Manager-Agent examines all the requests received from Outlet-Agents during this round, it orders them according to their priority, it accepts those who respect the set of constraints specified at this level of decision and place the “reservations” in the current calendar, it refuses the others and informs the corresponding Outlet agents. Note that at this level, the set of constraints can be more or less flexible according to the chosen policy (for example we can admit to exceed the contractual power and refuse to exceed the maximum power limit). At this level, the manager’s decision depends on three situations: (1) if the requested activity causes the limit power to be exceeded during this time interval, the request is denied; (2) The priority of an requested activity is less than another and there is no location for both at once in terms of power, the activity with least priority will be refused; (3) Of course, multiple Outlet-Agents can request the same location. In this case, Manager-Agent selects the subset of queries containing activities with best values calculated by the algorithm of evaluation, provided that the limit power is not exceeded. The iterations for fulfilled locations end when the received requests list is emptied or when unplanned activities cannot be fulfilled. At the end of a series, Manager-Agent informs Outlet-Agents of his decision. It should be noted that in one round, the requests evaluated number equals the number of unplanned devices taken (it’s a small number and that explains the speed of the planning algorithm). Note that this approach ensures that all outlets have the same opportunities to operate, since each agent can, in one turn, reserve a place for one of its activities. After each regular round of activities placement, Manager-Agent consults the Outlet Agents about the activities that were rejected during this round. Each OutletAgent returns a proposal for the activity in which the requested time interval was eventually changed. Manager-Agent receives messages from Outlet-Agents that contain requested activity. To implement these activities in the schedule before moving on to the next round, the main process of negotiations is paused and the placement algorithm for rejected activities starts. Once the previously rejected proposals are added, Manager-Agent returns to the main road.

456

3.8

B. Bou Saleh et al.

Insert Algorithm for Rejected Activities

This algorithm processes an existing planning, to find the acceptable time slots for inserting one or more activities. When Manager-Agent rejects an activity submitted in a regular cycle, this algorithm is used to find an acceptable location for it in the current calendar. All re-proposals are stored and ranked according to their priority in the list of rejected activities. Then they are treated one by one, Energy Planner looking to find a location for them. Manager-Agent calculates the virtual extraction cost of each activity already scheduled in Timeslot. But also, Manager-Agent calculates the equivalent virtual cost of the activity to insert (equivalent here means that the cost is calculated for equal power). The virtual cost balance will serve as decision base for the consider replacement transaction. Whatever the decision, the activity that is now without a location is returned to its owner, who will be forced to change the desired time slot and submit it to a next cycle of negotiation. 3.9

Optimizations Algorithm

This Algorithm is possibly used when the filling step of the schedule is finished. It is a question of looking for possible modifications to balance the distribution of the power load on the timeslots without unduly violating the preferences of the users. ManagerAgent, knowing the load of each timeslot, asks the most heavily loaded Tslot-Agent to offer one of his “activities” for sale. Tslot-Agent calculates the virtual unit cost (cost/Kwh) of extraction of each activity. He chooses the best activity to be extracted and proposes it for sale. The Manager-Agent, launches a call for tenders to “all” the Tslot-agents to sell them this “activity”. Each Timeslot calculates the insertion cost of this “activity” taking into account its already planned load. The manager receives offers proposed by time slots. The agent manager decides to whom to assign this charge. Manager agent updates schedule and informs the outlets involved in this change. Manager-Agent retries the same optimization mechanism as long as the stop criterion is not reached. The sequence diagram of the optimization process is shown in Fig. 2. 3.10

Performance Metrics

Note that the EnergyPlanner software is designed to optimize the cost of energy distribution (to respect the maximum amount of energy contracted) while emphasizing the user convenience in each decision. To measure the performance of the proposed method, we create a customer satisfaction function. The customer satisfaction function is built as follow: Consider the satisfaction matrix S (outlet, devices) which is defined in follows: 8 < Sði; jÞ ¼ 1if device j 2 outlet i placed in time slot preferably maximum: : Sði; jÞ ¼ 0 if not

Multi-agents Planner for Assistance

457

Fig. 2. Sequence diagram of the optimization process

The customer satisfaction indicator is calculated using the following formula: PNoutlet PNdviceðiÞ CSI ¼

Sði; jÞ  Rank ði; jÞ  RankO ðiÞ j¼1 PNoutlet PNdviceðiÞ Rank ði; jÞ  RankO ðiÞ i¼1 j¼1

i¼1

In the Customer Satisfaction Indicator expression, the numerator represents the sum of the weights of all activities to which the planner software has assigned locations in slot time in accordance with the maximum preference indicated in the preference matrix. While the denominator represents the sum of the weights of all requested activities, in other words, the weight of all activities in the best potential program. The result obtained represents in a way the compliance rate of the calendar obtained compared to the ideal calendar. In addition, we considered and analyzed other indicators of performance measure: (1) The ratio sreal=virtual ¼ real cost=virtual cost is also an indicator that informs us about the rate of energy provided based on user preference. (2) The performance rate scontract=real ¼ cost respect contract=real cost is an indicator informing us of the performance in terms of compliance with the contract. (3) The ratio snotplan=expect ¼ energy not planned=energy expected is an indicator that informs us of the failure rate of the planning algorithm due to the technical limitation of available power. (4) The ratio sexpect=available ¼ energy expected=energy available is an indicator of the rate of energy requested in relation to the maximum energy available.

458

B. Bou Saleh et al.

4 Experimental Test 4.1

Test Data

The test data used in our experiments are based on the community structure and parameters described below. The problem involves a period of one day. Each day includes 12 time slots (two hours each), 20 households in the community and each household contains up to 11 smart devices connected to one smart socket, giving a raster of up to 220 devices. To test the developed solution, we performed several experiments with the general parameters: • The maximum available power limit is set at an arbitrary value of 27 kW. • The contract power limit is set to an arbitrary value of 21 KW. • The increase in the price of KWH beyond contractual power varies linearly. Over Cost is equal to 0% when the power equals the contractual power and is equal to 50% when the power reaches the maximum power available. • With the available power of the network, is added the power of five solar panels, a supplement of power up to a maximum of 3 kW according to a specific profile that represents a typical sunny day. • The power consumption of each device was randomly defined as a multiple of 250 watts in the range [500, 2000] watts. 4.2

Simulation Results

To illustrate how the Energy-planner software tool works, Fig. 2 shows a typical example of the schedules calculated during a multitude of tests. In these figures, the horizontal red line delimits the upper limit of available power. This is a technical limitation; the electrical installation is unable to provide power greater than this power. The orange line delineates the upper power limit stipulated in an assumed power purchase agreement. This limit may be exceeded but the cost of Kwh will be higher. It is a contractual limit and not a technical limit. During the hours of the day, between six o’clock and eighteen o’clock, we have increased this limit by adding the energy supplied locally by the solar panels (Green Period). This power is supposed to be available and is considered free in the calculation of the cost of energy consumed. In other words, when optimizing planning, exceeding the power limit during the hours of the day has a much lower impact than elsewhere because some of this energy is free. It is the part of energy produced locally and which has a cost considered as zero. The blue rectangles represent the sum of the power demanded by the users with a maximum preference. It’s the ideal schedule that users want. Finally, the yellow line represents the sum of the projected consumption power after optimization of the planning by the proposed method. The optimized schedule is the best compromise calculated between the technical constraints, the objective of cost limitation on the first side and the preferences of the users on the second side. By worry of place in this article, in Fig. 2 we have concentrated the maximum of information concerning the chosen example. In this example, we have 20 buildings, and 11 devices per building. The total energy required is 550 KWH for a maximum

Multi-agents Planner for Assistance

459

technically possible energy delivered by electricity grid, equal to 648 KWH. In principle, there is enough space to satisfy all demands if the violation of some maximum preferences of the users is tolerated. The distribution of the power indicated by the blue rectangles corresponds to a Planning in which we have chosen the policy to respect the maximum preference indicated for the operating hours of each device (when this preference is indicated) and not to refuse any request even if the technical constraint of not exceeding the maximum power is violated. This policy has been deliberately chosen to illustrate the situation of major violation of technical constraint and this in some cases of strict respect of the preferences of the user (between 06:00 and 12:00 o’clock). However, there are places, and it would have been possible to respect this technical constraint by violating the preferences of some users and by moving only the devices incriminated by exceeding the upper limit. This is a solution to the problem but it is not yet the optimal solution. So, it is a clear explanation of the problematic of our study and it is the proof of the necessity of the method proposed in this article to solve this problem and the need of a tool which implements this method to propose the planning which is the best acceptable compromise. The optimum solution calculated by Energy-agent is given by the distribution of the power corresponding to the projected planning illustrated by the yellow line in Fig. 2. The energy that corresponds to exceeding the contractual power limit is less than 4%, with a zero cost impact because the extra cost is compensating by free solar energy. At the same time, the indicator equal to the ratio (total actual cost/total virtual cost) which informs us of the energy rate provided according to the user’s preferences reaches 88%. Which means that 12% of energy delivery where, more or less, “moved” in relation to customer preferences (Fig. 3).

Fig. 3. Calculated schedule for: 20 buildings, 11 devices per building, 550 KWH of energy required

To test the Energy-Planner, a total of 100 new schedules have been completed. We compute the average satisfaction and performance rates calculated from the results of

460

B. Bou Saleh et al.

the 100 trials for the 20 outlets. The rate of satisfaction is on average 88.9%, and that of the performance is on average 90.9%. These two high and almost equivalent percentages reflect the smooth functioning of the planning tool that tries to find the best compromise between the satisfactions of energy users on the one hand and the optimization of costs and technical limitations on the other. The Table 1 contains in its first line the average of the “unplanned energy rate with respect to the energy demanded” and contains, in its second line, the average of the “rate of energy demanded with respect to available energy”. Table 1 illustrates the very rapid growth of unplanned appliances as demand for energy increases. It is obvious that the higher the energy demanded, the less space there is and the more unplanned energy will be important. It is also obvious that Energy-Planner offers the best possible solution and can only solve the problem of exceeding the technical limit by refusing a set of requests. Table 1. Unplanned energy rate versus the rate of demanded energy. snotplan=expected

0%

1.1%

3.5%

14.3% 24.0%

36.0%

sexpected=available 62.5% 72.9% 83.3% 93.7% 104.1% 114.5%

5 Conclusion In this article, it is proposed a methodology based on Multi-Agents System to plan the energy consumption in a smart area. The structure of the programming software tool for this application is presented and we have detailed its mechanism and its decision logic. We have presented the simulation results obtained on a typical example. Energy planner satisfactorily solves the problems of creation and optimization of energy consumption planning. Obviously, other simulations are necessary to validate the developed system. Here, it is interesting to note that although our scope in this paper is to plan electrical power consumption, we note that this same agent system has been successfully applied to the problem of predictive and dynamic planning of surgeries in operating theater in hospital setting. Finally, simulation with real data, near real-time planning and comparison of results with other time-management tools are the next research works that we will report in subsequent publications.

References 1. Redl, T.A.: On using graph coloring to create university schedules with essential and preferential conditions. http://cms.uhd.edu/faculty/redlt/iccis09proc.pdf 2. Nachtigall, K., pitz, J.: A modulo network simplex method for solving periodic schedule optimisation problems. In: Operations Research Proceedings (2007) 3. Genetic Algorithms Overview: geneticalgorithms.ai-depot.com/Tutorial/Overview.html 4. Kragelund, L.V.: Solving a timetabling problem using hybrid genetic algorithms. Softw. Pract. Exper. 27(10), 1121–1134 (1996)

Multi-agents Planner for Assistance

461

5. Cardoen, B., Demeulemeester, E., Beliën, J.: Optimizing a multiple objective surgical case sequencing problem. Int. J. Prod. Econ. 119(2), 354–366 (2009) 6. Cardoen, B., Demeulemeester, E., Beliën, J.: Sequencing surgical cases in a day-care environment: an exact branch-and-price approach. Comput. Oper. Res. 36(9), 2660–2669 (2009) 7. Cardoen, B., Demeulemeester, E., Beliën, J.: Operating room planning and scheduling: A literature review. Eur. J. Oper. Res. 201(3), 921–932 (2010) 8. Dekhici, L., Belkadi, K.: Operating theatre scheduling under constraints. J. Appl. Sci. 10 (14), 1380–1388 (2010) 9. Saleh, B.B., El Moudni, A., Hajjar, M., Barakat, O.: A multi-agent architecture for dynamic scheduling of emergencies in operating theater. In: Advances in Intelligent Systems and Computing, vol. 869. Springer, Cham (2019) 10. Saleh, B.B., El Moudni, A., Hajjar, M., Barakat, O.: Towards an integral operating room management system (2018). ieeexplore.ieee.org/document/8394877 11. Tkaczyk, R., Ganzha, M., Paprzycki, M.: Agent-planner agent, based timetabling system. Informatica 40(1), (2016) 12. Saleh, B.B., El Moudni, A., Hajjar, M., Barakat, O.: A cooperative control model for operating theater scheduling (2018). ieeexplore.ieee.org/document/8394888/ 13. Koutsopoulos, I., Hatzi, V.: Optimal energy storage control policies for the smart power grid. In: 2011 IEEE International Conference on Smart Grid Communications, pp. 475–480 14. Praça, I., Ramos, C., Vale, Z., Cordeiro, M.: Intelligent agents for negotiation and gamebased decision support in electricity markets. researchgate.net/publication/267806879 15. Petersen, J., Shunturov, V., Janda, K.: Dormitory residents reduce electricity consumption when exposed to real time visual feedback and incentives. Int. J. Sustain. High. Educ. 8(1), 16–33 (2007) 16. Gonzalez-Romera, E., Jaramillo-Moran, M.A., Carmona, D.: Forecasting of the electric energy demand trend and monthly fluctuation with neural networks. Comput. Ind. Eng. 52 (3), 336–343 (2007) 17. Carabelea, C., Boissier, O., Ramparany, F.: Benefits and requirements of using multi-agent systems on smart devices. Lecture Notes in Computer Science (2003) 18. Marik, V., Stepankova, O., Lazansky, J.: Artificial intelligence. In: J.ICIE 2015 3rd International Conference on Innovation and Entrepreneurship 19. Budinská, I., Dang, T.T.: A case based reasoning in a multi agents support system. In: Proceedings of the 6th International Scientific-Technical Conference, Process Control (2004) 20. Dang, T.T.: Improving plan quality through agent coalitions. In: IEEE International Conference on Computational Cybernetics – ICCC (2004) 21. Druiven, S.: Knowledge development in games of imperfect information. University Maastricht Master Thesis, Institute for Knowledge and Agent Technology, University Maastricht (2002) 22. JADE. http://jade.tilab.com/ 23. FIPA. http://www.fipa.org/ 24. FIPA (2002) FIPA ACL Message Structure Specification. SC00061G 25. Dounis, A.I.: Artificial intelligence for energy conservation in buildings. Adv. Build. Energy Res. 4(1), 267–299 (2010) 26. Pynadath, D.V., Tambe, M.: Multiagent teamwork: analyzing the optimality and complexity of key theories and models, pp. 873–880. ACM (2002) 27. Scerri, P., Pynadath, D.V., Tambe, M.: Towards adjustable autonomy for the real world. J. Artif. Intell. Res. 17, 171–228 (2002)

462

B. Bou Saleh et al.

28. Statistisches Bundesamt, “Wirtschaftsbereich energie - erzeugung,” Statistisches Bundesamt, Technical Report, 2017 29. NIST: Roadmap for smart grid interoperability standards, vol. 1108. NIST Special Publication (2010) 30. Bruinenberg, J., et al.: Smart grid coordination group technical report reference architecture for the smart grid version 1.0 (draft) 2012-03-02. Technical Report (2012)

Morocco’s Readiness to Industry 4.0 Sarah El Hamdi1,2(&), Mustapha Oudani1, and Abdellah Abouabdellah2 1

Information Technology and Communications Laboratory, FIL, International University of Rabat, Rabat, Morocco {sarah.elhamdi,mustapha.oudani}@uir.ac.ma 2 Laboratory Engineering Science, MOSIL TEAM, ENSA, Ibn Tofail University, Kenitra, Morocco [email protected]

Abstract. Nowadays, consumers in any community in the world, desire a progressive improvement of life quality, while thinking about sustainable approaches, and the industry has been advancing, evolving to keep up with these requirements. The consumption pattern faces a continuous and nearly unpredictable change; companies all over the word must adapt their business practices and cope with the customer needs the explicit and implicit ones. Rapid technologies upgrade, evolution and change are the actual main challenges of the industrial enterprises, data and data analytics has become a core asset related to the performance of the organizations. Industry 4.0 envisions a digital transformation in the enterprise, entwining the cyber-physical world and real world of manufacturing to deliver networked production with enhanced process transparency. The question is how about the developing countries on the African continent and in strategic geographical position such as Morocco will deal with this fourth revolution. The purpose of the current paper is to question the readiness of Morocco and highlight the challenges it faces to integrate the Industry 4.0. Keywords: Industry 4.0

 Readiness  Challenge

1 Introduction Latest development in technologies, information systems and software innovation became a major asset to every organization and this digital progress represents the newest challenge to be met in this era [1]. Ultimately the fourth industry revolution will impact directly business models, management practices and jobs, however it will also have a positive influence on the industry field, consequently the economy by ameliorating industrial processes and their harvests, profits, results. We may say that digital evolution or transformation is without doubt the most strategic priority for any organization [2]. It also constitutes a major headache for the development of small and medium sized enterprises especially in developing countries such as Morocco. Morocco’s industrial policy has gone through several phases after the independence, especially since 1960s, at the beginning the Kingdom has opted for a strategy of substituting imports to allow growth of local production an objective of reducing © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 463–472, 2020. https://doi.org/10.1007/978-3-030-21005-2_44

464

S. El Hamdi et al.

dependence on goods brought from abroad, however this model started to lag around 1970s since it focused on certain fields like textile and a debt problem raised. Since 1980s, to reduce the shortcoming of the old model, Morocco switched its policy and focused on promoting exports. The industrial sector remained confined to exportation only and had low value added. Starting 2005, Morocco experienced an industrial turning point by adopting a policy of focusing on global business considered the most dynamic worldwide. Since then many politico-economical plans emerged such as “Emergence Plan”, “National Pact for the Industrial Emergence” and actually “20142020 Industrial Acceleration Plan” [3]. The biggest worry of Moroccan elites is accepting the fact that although The Kingdom’s strategic geographical position, its economy will not be competitive enough on international level, if the Moroccan industrial fabric doesn’t adopt the shift towards Industry 4.0. Even the Government acknowledge the importance of levelling up to match the worldwide trend, according to The Minister of IITDE direct head of the trade, industries and digital economy, Morocco must acquire its data sovereignty in order to enter the 21st century [4]. To push forward the migration to 4.0, small and medium companies must make important and risky investments some may reach 9%–10% of their turnover to assimilate new information systems or and digital technologies, and past this investment, SME companies have to change to a fourth revolution strategy and plan, the major hick up they may face will be Data security this last step is an important challenge to overcome if they want to reach success in a flexible, evolutive, non merciful environment. Apart from Data security and huge need of investment, another factor must be taken into consideration which is the Human factor; Best example to give is the way the companies in the birth land of the fourth industrial revolution try to cope with the skills set gap of their actual and future employees [5]. They must establish a continuous training program to ensure the qualification for the smart plant of their employees. Thus, it is important to note that the human actor represents a common ground among all economies that leaped forward to I4.0 like U.S, France, U.K etc., Morocco has to face the same challenges: the skills sets required for industrial employees for Industry 4.0 are not yet present especially if we analyse the current graduates’ skills. The paper will give in the third section a global overview of industrial policies around the world with a focus on the economies that shifted to 4.0, also it will highlight the Moroccan context. The fourth section explains why the economy is faced with a multitude of challenges, then in its second sub-section, the paper will highlight the main axes to take into consideration before the move to 4.0, the paper will try to raise a list of requirements to meet before the shift.

2 Research Methodology and Approach The research was done in two steps, First step: launching the process by using key words such as “Industry 4.0” with a focus on the English term only, “Industry 4.0 policies” and “Industry 4.0 Challenges” on various platforms like Google Scholar,

Morocco’s Readiness to Industry 4.0

465

Sciences Direct, Springer, Researchgate, followed by identifying central key words per example, “Morocco Industry 4.0”, “Moroccan Digitalization”. We deemed necessary to cover only pertinent publications linked to our topic (Table 1). Table 1. Scientific papers with selected terminology 2011–2019 Industry 4.0 Industry 4.0 Policies Industry 4.0 Challenges Morocco & Industry 4.0

Google Scholar 362 000 54 400 37 000 3460

Science Direct Springer 93 281 20 296 15 665 309*(C.I) 27 612 28 133 952 –

From this table results, we focused on the accessible publications with a clear mention to the terminology “Industry 4.0”, “Smart Industry” in their title and/or abstract, furthermore we screened the papers analyzing the Moroccan context with high regards; Following theses steps we were able to review 31 papers and other sources which were read and analyzed. Second Step: Parallel to this, another research was launched looking for interviews of Moroccan Ministers with a topic linked to Industry 4.0 in the Moroccan context, Morocco and Big Data, Morocco and Data Security, E-Morocco; those interviews were either videos on websites like challenge.ma, or written articles on websites such as lesinfos.ma, Le Site info, etc. In the following section, we opted for a benchmark approach, why, because benchmarking is considered a central instrument to any performance improvement. Basically the goal is to find countries (or a country) that are best at what we want for our own, study how they achieved it, make plans for improvement, implement them and monitor the results. With this perspective, it was decided to benchmark with west European countries such as France and United Kingdom, including the Industry 4.0 leader Germany while considering the challenger, USA, looking to identify and implement best practices, in our case shed light on the possible needed requirements to implement the best practices and quality models to reach a certain world like standard. The main reasons as to what pushed us to consider those countries as a target for benchmarking are: the difficulty to establish comparison with neighboring emerging countries, the geographical position and historical link to Europe especially France and U.S, the fact that those countries and Morocco share a common ground as a matter of a fact, they established policies and programs to support SMEs manufacturers and innovative entrepreneurs [6].

3 Morocco and Industry 4.0 The Industrial revolution are huge events, the most important fact to remember is that there were only three revolutions, the first one was in 17th century due to steam engine and mechanical loom, followed by the second one around 1900s linked to the

466

S. El Hamdi et al.

harnessing of electricity. The Third one was set in motion after the Second World War by computer age. In 2011 Henning Kagermann [7], the head of the German national academy of science and engineering proclaimed the arrival of the industry 4.0 since he used the term to propose a government sponsored industrial initiative. Industry 4.0 signals the rapid change transforming the companies, it refers to the combination of different innovative digital technologies such as cloud computing, advanced robotics, Data analytics and Internet of Things that got embedded in the manufacturing sector value chain, the fourth revolution concept is based on a perpetual communication through Internet which allows an interaction and exchange of information between humans, human and machine, machine and machine [8]. 3.1

Global Overview of Industrial Policies

The inclusion of digital technologies and their integration into the core of the Industrial process has brought change worldwide to the industrial sector. The fourth industrial revolution which gives birth to a new generation of industrial production unit, “The Factory of Things”, “Smart Factory”, “Factory of the future” industrial strategies, adopted by the different countries, strongly encourage businesses to take lead towards Industry 4.0 while assimilating the sustainable development approach [9]. Table 2. Industry 4.0 globally Country

Objectives of the strategy

France Italy

Modernization and digitization of the manufactories Development of the technological offer & Diffusion to the industrial fabric Development of the technological offer & Diffusion to the industrial fabric Creation of a network of research centers and adaptation of employees’ skills [10] Creation of a network of research centers Development of the capital goods sector and Digitization of the production apparatus [11]

Germany U.K U.S S. Korea

Cost estimation Euros 2.3 billion 40 millions 200 millions 270 millions 900 millions 1.5 billion

As shown in Table 2, the superpowers are investing massively into the move towards the fourth revolution [12], it is also interesting to note that there is a strong commitment of less wealthy countries in analyzing and editing ambitious strategies in order to ensure a smooth integration with the 4th industrial revolution, including South Corsica, Malaysia, Turkey, Portugal, Rwanda, Brazil and Morocco. Morocco is considered the fifth largest economy on the African continent; The Kingdom is relatively small compared to South Africa [13]; however, it is the second emerging country after it, and the most promising for foreign investments [14].

Morocco’s Readiness to Industry 4.0

3.2

467

Moroccan Context

Since 2000, Morocco targeted few strategic sectors of its economy (agriculture, tourism, automotive offshoring) in parallel the Kingdom has improved the development of its infrastructures which allows him today a certain growth. According to the halfyearly economic report Growth projections are now at half-mast, from 4% in 2017 to 2.9% in 2018 [15].

Fig. 1. Brief PEST analysis of Morocco [16]

Following the two launched plans- “Morocco Numeric 2013” [17], “E-Morocco” in order to adapt their industry strategy to reach the issued objectives by 2030 which covers innovation [18], education, digitalization of the state, electronic commerce and industry- also with a purpose to find a second wind to meet the critical challenges that lie ahead Morocco launched a Digital Plan 2020, to position itself as a leading African Hub, and activate its digital transformation. The aim is ameliorate the GDP growth and bring foreign investment from organizations abroad, with first target the offshoring/outsourcing services which will help create job opportunities thus help to reduce youth unemployment especially since that youth –called Millennials- are digital natives and nearly two third of them use their mobile on a daily basis. Integration of the technologies and digital transformation are undoubtedly the strategic priorities of all organizations, they also represent a major challenge for the development of small and medium-sized enterprises, particularly with the growth of new technologies. According to the latest statistics of the French Chamber of Commerce and Industry of Morocco [19], the Kingdom has nearly 3.5 million potential self-entrepreneurs, more than 2 million Very Small Enterprises (TPE), 35,000 Small and Medium Enterprises (SMEs) - These are the structures that represent the real ground for digital growth in Morocco - and 800 Large Enterprises (GE) (Table 3).

468

S. El Hamdi et al. Table 3. Economical Moroccan fabric. Actor Very small businesses SME Big businesses

Percentage 98% 1.96% 0.04%

In the current economic conjuncture, digital transformation and the use of all available digital technologies enable all organizations to improve their performance and contribute to improving their productivity (Fig. 1). In Morocco, this transformation concerns a very large entrepreneurial fabric that consists of 99% the existing companies in various sectors. Digitalization will also increase the productivity of very small and SME companies, improve their ability to gain a foothold in the regional/African and global market, and a prerequisite for the growth of their exports. To ensure the vision of digitalization and the success of “Plan 2030”, the Moroccan government has activated a series of levers, starting with the creation of the DDA: Digital Development Agency. The agency aims to coordinate the efforts of public and private sector main actors, be the interface for foreign investors, and animate the Moroccan Digital ecosystem. The Ministry of Industry, which oversees the agency, builds on the success of the MASEN agency in renewable energy, and capitalizes on the experience of Morocco Numeric Fund. It is also an opportunity to put a spotlight on the youngest Moroccan Minister, and thus projecting the image of modern Morocco, where youth is driving change [20]. Nevertheless, the total funds and resources allocated to the achievement of this ambitious agenda still need to be clarified.

4 Challenges and Requirements for the Migration to I4.0 The transformation into the digital industry is still in progress, but artificial intelligence, big data, and connectivity demonstrates with certainty a new round of digital revolution. Industry 4.0 is on the way and will be the factor that influences the most the transformation of industry because it represents evolution [21], but what are the challenges? 4.1

Challenges

First Challenge: Funding Most concerned Moroccan companies are very small companies they represent 96% of the 98% Small sized companies shown in the Fig. 1 their counterpart in US and Germany represents 4–6% of the 84%–98.3% labelled as small companies [22].

Morocco’s Readiness to Industry 4.0

469

The Distribution of the Companies 120,00% 100,00%

84,80% 98,00% 98,30%

80,00%

60,00%

40,00%

20,00%

14,80%

1,96% 1,40% 0,40% 0,04% 0,30%

0,00%

Fig. 2. Comparison between leading countries and Morocco

This means that the digitalization process may be harder for the Moroccan very small companies as it represent a significant investment. Second challenge: Skills sets Nevertheless companies have realized the need to understand change to better deploy it. It is no longer a simple computer update, but a profound change in the culture and organization of the company [23]. It is a scheme where digital is no longer a tool but a global mindset, a cross-cutting culture applied to all departments and businesses of the company, which implies implicitly the need to understand the human factor [24], in order to get rid of the legacy of the old system and to build specific skill sets required, such as robotic programming and Big Data Analytics. Third challenge: Digital transformation process Using the words of Professor Bocquet, holder of a doctorate in AI: “Industry 4.0, is fundamentally characterized by smart automation, adoption and integration of new technologies to the core of the company” [25], meaning radical changes are to occur to the company’s information systems, business processes, employees mindset, management procedures and means once the migration is initiated, that is the reason why the shift upsets the company manufacturing industry. The process of the digital transformation impacts the business at three levels: In the first place, it is the Business Model that is impacted; the digital transformation revolutionizes traditional models. In a second step, it is the business processes that evolve. The offer is seen more and more digitized. Finally, the crucial point in the digital transformation, the customer journey changes. With the penetration of technology, the customer is now more connected, more mobile, more demanding. The government needs to manage this migration to the fourth industrial revolution well [26], offering assistance in the form of strategic support, facilitation of the process or subsidies, especially since the companies involved are very small, small and medium-sized enterprises (Fig. 2).

470

4.2

S. El Hamdi et al.

Requirements’ Framework

The government need to change its custom activity and tax structures to account for an environment in which physical goods of all kinds will rapidly decrease in value compared with the intangible’s ones may it be goods or services. The government has to legally consider a digital fabrication plant as a full scale industrial plant. The government ought to be able to answer if this type of manufacturing will create jobs to soothe the social actors questioning and fear of the change to come [27]. It must also relieve Entrepreneurs, directors and owners of companies that will move towards the integration of the technologies from cyber security challenges that will probably arise as well like IP Theft [28]. As we stand on the edge of an entirely new way of life studded with technological innovation, the challenges and relevant issues may come in any unfamiliar form. The government will have to design workshops to better understand the key elements of the fourth revolution, and the core technologies that represents the pillar of the shift, those workshop should not be intended to only industrial staff but also to common user and consumer to allow the philosophy of the industry 4.0 to spread regionally and nationally and create awareness of this move opportunities among youth and provide training for innovation managements [29].

5 Conclusion and Perspectives The technological development is rising with high speed in developed countries, even the developing countries felt the need to rush into the fourth revolution to reduce the gap between both worlds. It is important to note that the technological progress has no meaning whatsoever if it does not ameliorate the working conditions of humans, that’s why it is really primordial to ensure the engineers and researchers’s inventions are without harm to man, without fail, there is a need to be able to predict and to control the interactions between humans and machines [30], but it is not only the engineers’ responsibility even the users, employees must acquire skills to be better masters of the tool that is technology. This is a modest paper addressing the readiness of my country to the fourth industrial revolution, this work may be considered a new element to elevate the current discussion, to identify the actual Moroccan economical context, challenges and requirement in order to implement a policy to have the transition to a smart industry. In order to enrich this debate and deepen the framework of the proposed requirements, a “written and /or face-to-face” survey is conducted among the leaders of small and medium-sized enterprises in order to establish their need for material, technological, human, financial resources and the needed state support to successfully transition to I4.0. The industrial revolution is already on going in the developed countries and it is happening around the globe [31]; the revolution of the industry requires from each company no matter the size and each individual not matter the status to rethink what to expect or desire from smart Internet-connected devices.

Morocco’s Readiness to Industry 4.0

471

References 1. El Hamdi, S., Abouabdellah, A.: Literature review of implementation of an enterprise resource planning: dimensional approach. In: 4th International Conference on Logistics Operations Management 2018 2. Schröder, C.: The Challenges of Industry 4.0 for Small and Medium Sized Enterprises. Friedrich-Ebert-Stiftung, Bonn, 2016 3. Mokri, K.E.: Morocco’s 2014–2020 industrial strategy and its potential implications for the structural transformation process, Policy Brief, November 2016 4. Amoussou, R.: «Othman El Ferdaous donne un avant-goût de la feuille de route», “Othman El Ferdaous gives a peek of the road map”, Challenge.ma, January 2018 5. Rüßmann, M., et al.: Industry 4.0: the future of productivity and growth in manufacturing industries, pp. 1–14. Boston Consulting Group (BCG) (2015) 6. Ezzel, S.J., Atkinson, R.D.: International benchmarking of countries’ policies and programs supporting SME manufacturers. The Information Technology and Innovation Foundation, September 2011 7. Mosconi, F.: The new European industrial policy global competitiveness and the manufacturing resistance. In: Routledge Studies in the European Economy, 2015 8. Cooper, J., James, A.: Challenges for database management in the internet of things. IETE Tech. Rev. 26(5), 320–329 (2009) 9. Bakkari, M., Khatory, A.: Industry 4.0: strategy for more sustainable industrial development in SMEs, April 2017, Researchgate 10. McKinsey & Company.: Manufacturing the future: the next era of global growth and innovation. McKinsey Global Institute and McKinsey Operations Practice, November 2012 11. Sung, T.K.: Industry 4.0: a Korea perspective. Technol. Forecast. Soc. Chang. 132, 40–45 (2018) 12. Bider-Mayer, T.: «Tour d’Horizon des politiques de l’Industrie du Futur», “Overview of the policies of the industry of the future”, Annales des Mines - Réalités Industrielles, 4 November 2016 13. National Planning Commission, National Development Plan 2030 our future make it work 14. Bloomberg Homepage. http://www.bloomberg.com/news/articles/2015-02-11/gulf-na-tionsdefy-oil-rout-to-top-list-of-best-emerging-markets 15. Telquel Homepage: «Bilan2018». https://telquel.ma/2017/10/27/budget-letat-en-2018-lessecteurs-bien-dotes-ceux-auraient-merite_1565882 16. https://www.banquemondiale.org/fr/country/morocco/publication/economic-outlook-april2018 17. Morocco Numeric 2013: National strategy for the information society and digital economy, 2008 18. National Intelligence Council. Global Trends 2030: Alternative Worlds. December 2012 19. French Chamber of Commerce and Industry of Morocco, Homepage. http://www.cfcim.org 20. Tali, K.: Aujourd’hui homepage. December 2017. http://aujourdhui.ma/economie/lagencedu-developpe-ment-numerique-en-marche 21. Roblek, V., Mesko, M., Krapez, A.: A complex view of industry 4.0. June 2016. https://doi. org/10.1177/2158244016653987 22. Altuzarra, C.: Country risk panorama. COFACE Economic Publications: Focus USA. October 2012 23. Wang, S., Wan, J.: Implementing smart factory of industry 4.0: an outlook. Int. J. Distrib. Sens. Netw. 12(1), 3159805, 10 Pages

472

S. El Hamdi et al.

24. El Hamdi, S., Abouabdellah, A., Oudani, M.: Disposition of Moroccan SME manufacturers to industry 4.0 with the implementation of ERP as a first step. In: 2018 Sixth International Conference on Enterprise Systems 25. Foidl, H., Felderer, M.: Research challenges of industry 4.0 for quality management. In: 4th International Conference, ERP Future 2015 - Research, Munich, Germany, November 16– 17, 2015, Revised Papers 26. 13th World Economic Forum, Malaysia Prime Minister Speech 27. Haddadin, S.: Institute of Automatic Control, Leibniz University, Germany, panel discussion, UNIDO 50th anniversary. Industry 4.0 (2016) 28. Elkhannoubi, H., Belaissaoui, M.: Assess developing countries’ cybersecurity capabilities through a social influence strategy. In: 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (2016) 29. Liu, C.L.: Permanent mission of China to the International Atomic Energy Agency (IAEA), panel discussion, UNIDO 50th anniversary. Industry 4.0 (2016) 30. Lasi, H., Fettke, P., Kemper, H.G., Feld, T., Hoffmann, M.: Industry 4.0. Bus. Inf. Syst. Eng. 6(4), 239–242 (2014) 31. Seo-Zindy, R.: Industry 4.0 to digital industrialization: when digital technologies meet industrial transformation. The Centre for Development Informatics, University of Manchester, April 2018

Anti-screenshot Keyboard for Web-Based Application Using Cloaking Hanaa Mohsin(&) and Hala Bahjat Department of Computer Science, University of Technology Baghdad, Baghdad, Iraq {110113,110005}@uotechnology.edu.iq

Abstract. Online banking administrators usually try to use sufficiently strong security solutions to encourage individuals to use online banks. They provide virtual keyboards for users to log in with as a main security solution to facilitate their users’ trust in their security solutions. The virtual keyboards can be prone to screenshot capture by various means, such as Trojans, malware, and shoulder surfing attacks. Different proposals for virtual keyboards have been presented by many authors with different means of input. In this paper, we provide a cloaking virtual keyboard (CVK) as a model for a virtual keyboard with cloaked input. The proposed model goes beyond using a virtual keyboard to prevent a screenshot attack on the online banking application system. It provides a means to inform the administrators of the new attack and takes the attacker to a false account. Keywords: Online banking  Capturing screenshots Shoulder surfing  Cloaking virtual keyboard



Trojans



Malware



1 Introduction Accompanied by the development that took place in electronic commerce, many companies, such as online booking and online shopping, set up their own platforms for trading on the Internet. Online banking services are used for payment on these platforms. The huge problem that has emerged in terms of the security of electronic commerce and that has caused economic losses for both businesses and customers is caused by the growing emergence of attacks during the evolution of the Internet, which has led online banking to use what is known as virtual keyboard [1]. A virtual keyboard is component known as software that displays as a standard or numeric keyboard on the screen, which is accessed by a physical or nonphysical keyboard. This was found to help people with some disabilities or those who speak multiple languages and for user authentication [2] compared to the old user authentication by keyboard [1]. The virtual keyboard suffers from several problems. The most important is that it is not protected from the possibility of gathering users’ passwords via shoulder surfing attacks and malware attacks [1]. Many authors have presented ideas for overcoming these attacks [3, 4]. However, software Trojans can disable these countermeasures by taking a snapshot of the screen in three types of modes: one-time screenshot, multiple-time screenshot, and over-time screenshot [2]. This has been © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 473–478, 2020. https://doi.org/10.1007/978-3-030-21005-2_45

474

H. Mohsin and H. Bahjat

overcome by [2, 5]. Recently, [6] presented a hardware Trojan to disable the software Trojan countermeasure. The term cloaking is used in many information security application techniques, such as [7–9], to describe a covered resource [7]. In this paper, we present a proposal for a cloaking virtual keyboard (CVK) as a new software tool to protect user passwords and accounts from being attacked by capturing screenshots in online banking. We do this by adopting cloaking with eye-gaze tracking.

2 Motivation and Contribution Key loggers can use either hardware or software connected to the computer to crack user login passwords in three modes of attacks (one-time, multiple-time, and over-time screenshots) with many different means of input. Nonetheless, only a few researchers have attempted to find a solution beyond using a virtual keyboard. Since eye gaze can be a good input device instead of using a keyboard or mouse, it represents a cloaking input model. It uses the front camera of a computer or mobile device to track the position over the screen (virtual keyboard) to input the user name and password. In this case, we cloak the used input device, which is different from those used by previous researchers. We adopt the virtual keyboard with the cloaking input model to protect the user password and account in online banking systems from being attacked by a screen capture, even if the front camera video is captured. The proposed CVK is different from research in [7, 8] because we try to cloak the input device. The proposed CVK is protected from the possibility of gathering the user password via shoulder surfing attacks and malware attacks, which means it is better than [1]. Trojans take a snapshot of the screen in three types of modes, which is overcome by [2–5] and by our proposed CVK. In addition, [6] presented a hardware Trojan to disable the software Trojan countermeasure. We also provide another solution for this problem.

3 Proposed Modulo Background and Approach 3.1

Proposed Modulo Background

Cloaking. Cloaking is what is known as optimization technique of search-engine in which the content (information presented to the user) is different from that presented to the crawler’s part of search-engine for better indexing. It hides the true nature of an object by delivering obviously different content to users. These objects can be text [8], images [7], or websites [9]. Eye-gaze Detection System (EGDS). Recently, many authors have engaged EGDS in many hands-free applications [10–13]. The main parts of EGDS, as shown in Fig. 1, include image acquisition, in which the system input is an image from either a web camera or a pre-captured video. Next is the standardization block, in which the accused image initializes eye gaze for the pointer on the screen into sub-blocks, using detection

Anti-screenshot Keyboard for Web-Based Application

475

position of a pupil followed by a guesstimate of the transformation function (conversion of a pupil center into position over the screen). The eye-tracking block involves the operation of a pupil detection and tracking over all the image sequence. Finally the Action block, where the character of virtual keyboard is determined as input (user gazing point on the screen) using the uses pupil position and transformation matrix.

Fig. 1. Eye-gaze detection system block.

Virtual Keyboard. A virtual keyboard is basically software that operates as mean to enter characters by the user. Which is generally visual representation of a real keyboard adds standard output. That can actually operate besides other input devices, such as: actual-keyboard, computer-mouse, eye-mouse, or head- mouse [1]. 3.2

Proposed Modulo Approach

Secure online banking systems for legitimate users are sensitive for every user. From a security viewpoint, securing online banking is not sufficient if it depends on just using the legal user’s original password, one-time password, or a virtual keyboard. The concept of CVK is used to provide more security to the online banking system. In this paper, the camera mouse (2008) represents a cloaking means to select an input character from the virtual keyboard and is the supporting tool to mimic our proposed idea. Generally, using these tools will support the existing protocol, making the implementation possible with the current technology. The proposed online banking system provides the authorized user of the online banking system with a unique virtual keyboard of type one or two. In the proposed virtual keyboard, there are generally NN, where N is the number of keys in the used virtual keyboard. If N = 256, then there are 256 256 different virtual keyboards that can be generated. About 85% of them can be assigned to each legal authorized online bank user using the input detail information. This is type one. The remaining 15% are type two, which can be used as a virtual keyboard to detect new attackers in two modes. First, if an unauthorized user tries one of them, the proposed system sends a message to the online banking administrators about being hacked. Second, when an unauthorized user tries to log in to the online banking system using a stolen authorized user name, the online banking system opens an empty account using a type-one virtual keyboard and sends a message to the authorized users to stop the unauthorized user from trying to find the correct password. The virtual keyboard works as follows: 1. According to the user input information, the virtual keyboard is selected and engages with the user name and is presented on the screen.

476

H. Mohsin and H. Bahjat

2. When the user inputs the password by pressing on the screen keys, he or she selects a class with a key according to previously input information. 3. The output of the virtual keyboard is any selected system that is presented as a star on the user’s password location over the screen.

4 Proposed System Modules Registration: The user is registered to the system by entering his or her name and password using the cloaking input means. The system will generate a virtual keyboard, and the hashed values are stored in the database file. Login: The user will log in to the system using his or her password. If the user input password matches the hashed password, the user is authentic. Otherwise, a message is sent to the online banking system administrators about the breach, and a forged empty account is opened. Hacker: When the hacker tries to log in to the system with any password, a message is sent to the online banking system administrators about the breach, and a false message is shown to the hacker. The same situation occurs for a second attempt. In the third attempt, a message is sent to the online banking system administrators about the breach, and a forged empty account is opened. File upload and view: Authentic users can upload files onto the system. The files are encrypted using the user’s encryption key. For downloading/view files, the users must use their decryption key. Admin login: The administrator can log in to the framework. Once logged in, the administrator can deal with every issue according to regulatory capacity. Decoy file upload: An unauthentic user can log in to the system with the third trial, and a forged (decoy) empty account is opened. Log creation: For each login into the system, a log creation is done and stored in the database. Valid user behavior tracking: The system will track user operation after logging in to the system and will track MAC and IP addresses and the data size of resources downloaded by each user per position. User behavior analysis: The tracked parameters are analyzed to identify each user. If an invalid user in detected, then the user will be delivered decoy data for all downloads.

5 Cloaking Virtual Keyboard Security Analysis Overall, we divided the threat types (key loggers) into three categories to facilitate a simulated implementation of our proposed solution for ten training users (men and women aged between 20 and 60 years) and 15 users as observers.

Anti-screenshot Keyboard for Web-Based Application

5.1

477

Screen Capture Attack

An attacker uses a program that is capable of capturing the screen after the user clicks with the mouse [1] in three modes [2]: One-time screenshot. Using our proposed CVK, the key logger will wait until the user visits a specific site, and then the key logger performs a screen capture to show the entered pin when the user submits. Multiple screenshots. With each user interaction, the key logger takes the screenshot and sends it to a remote server to reveal the entered pin. Over-time screenshot. Using our proposed CVK, the key logger tries to attack screenshots at regular intervals, which is regarded as video capture in some cases. In all these modes, even when the key logger captures a video of the screen, the key logger cannot extract the user name and hashed stars of the input password. The key logger must reveal the stars to deduce the hashed password and finally crack the hashes to reveal the password. 5.2

Over Shoulder Surfing

A person standing behind another person entering a password via a virtual keyboard can remember or note the sequences of clicks, thereby discovering the password [1]. Using our proposed CVK, a person standing behind the user is incapable of seeing the selected keys. 5.3

Keyboard

The arrangement of the alphabet on a virtual keyboard is the same as in a normal QWERTY keyboard [1]. Using our proposed CVK, each authentic user has his or her own arrangement of the alphabet on a virtual keyboard that is different from the QWERTY keyboard.

6 Conclusions and Future Work For this study, we have analyzed the security of the CVK modulo and addressed a number of attacks to be handled with our proposed CVK in terms of using a camera mouse and virtual keyboard as the means to cloak input characters and keystrokes. The CVK adds tools that do not affect the current system technology. Using CVK, we facilitate users to make more transactions. One important limitation is the need for users to have system training before using the system in the future. We will try to design our proposed CVK for users with many different disabilities. Acknowledgements. We are thankful to the anonymous reviewers for their great efforts to improve this work quality with their valuable suggestions and comments.

478

H. Mohsin and H. Bahjat

References 1. Agarwal, M., Mehara, M., Pawar, R., Shah, D.: Secure authentication using dynamic virtual keyboard layout. In: Proceedings of the International Conference and Workshop on Emerging Trends in Technology, ISSN 2349–516, vol. 2, February 2011 2. Echallier, N., Grimaud, G., et al.: Virtual keyboard logging counter-measures using common fate’s law. In: International Conference on Security and Management (SAM 2017), Las Vegas, USA, 17–20 July 2017 3. Gong, S., Lin, J., Sun, Y.: Design and implementation of anti-screenshot virtual keyboard applied in online banking. In: 2010 International Conference on E-Business and EGovernment, pp. 1320–1322. IEEE, 2010 4. Nayak, C., Parhi, M., Ghosal, S.: Robust virtual keyboard for online banking. Int. J. Comput. Appl. 107(21) (2014) 5. Bacara, C., et al.: Virtual keyboard logging counter-measures using human vision properties. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (2015) 6. Peris-Lopez, P., Martín, H.: Hardware Trojans against virtual keyboards on e-banking platforms–A proof of concept. AEU-Int. J. Electron. Commun. 76, 146–151 (2017) 7. Choudhury, B., Reddy, P.V., Jha, R.M.: Permittivity and permeability tensors for cloaking applications. In: Permittivity and Permeability Tensors for Cloaking Applications, pp. 1–43. Springer, Singapore (2016) 8. D’Angelo, G., Vitali, F., Zacchiroli, S.: Content cloaking: preserving privacy with Google Docs and other web applications. In: Proceedings of the 2010 ACM symposium on applied computing, pp. 826–830. ACM, March 2010 9. Duan, R., Wang, W., Lee, W.: Cloaker catcher: a client-based cloaking detection system. arXiv preprint arXiv:1710.01387 (2017) 10. Cecotti, H.: A multimodal gaze-controlled virtual keyboard. IEEE Trans. Hum.-Mach. Syst. 46(4), 601–606 (2016) 11. Yang, S.W., Lin, C.S., Lin, S.K., Lee, C.H.: Design of virtual keyboard using blink control method for the severely disabled. Comput. Methods Programs Biomed. 111(2), 410–418 (2013) 12. Zhang, X., Liu, X., Yuan, S.M., Lin, S.F.: Eye tracking based control system for natural human-computer interaction. Computational Intelligence and Neuroscience (2017) 13. Mohsin, H., Hameedi, S.A.: Pupil detection algorithm based on feature extraction for eye gaze. In: Information and Communication Technology and Accessibility (ICTA), 2017 6th International Conference on, pp. 1–4. IEEE, 2017

Fall Prevention Exergame Using Occupational Therapy Based on Kinect Amina Ben Haj Khaled(&), Ali Khalfallah(&), and Med Salim Bouhlel(&) Sciences and Technologies of Image and Telecommunications, Higher Institute of Biotechnology, University of Sfax, Sfax, Tunisia [email protected], [email protected], [email protected]

Abstract. Fall is a major health problem, especially among elderly people living alone at home. This age group is characterized by a loss of physical and motor skills, balance and posture disorder, and a reduction in daily activities. These factors are the main cause of falling. However, it is important to design technologies that prevent falls and can help older people to practice exercise to improve balance, posture and strength. Due to the low use of conventional physical therapy, fall prevention interventions through the game demonstrated their effectiveness. Unfortunately, the most existing exergames, used for fall prevention, were not designed specifically for the elderly. For this reason, we are interested on fall prevention. We used the advantages of serious games and Kinect to do an occupational therapy for older community at home and make them more active. The proposed exercises were well studied with an ergo-therapist and an orthopedist. They specially designed for the elderly and are specifically aimed at improving balance and muscle strength. Elderly can easily practice these exercises easily and safely at home. This occupational therapy using serious games is able to ameliorate activities of daily living. It does not only decrease the risk of fall but it has many positive effects in psychology, health and physiology. Keywords: Older adults  Serious game  Fall prevention User interface  Occupational therapy  Kinect  Depth map Skeleton detection  Skeleton tracking

 

Fall detection



1 Introduction Life expectancy has increased compared to previous years. It can be explained by the decreasing of the fertility and mortality rates. In 2015, it reached 73 years old on average in Tunisia [1]. As a result, this country will undergo, in the coming years, impressive demographic changes especially as regards the increasing of the ageing population. The rate of elderly population is estimated to reach almost 8% of the population in Tunisia in 2015 [2]. Considering the magnitude of this phenomenon, it is important to ensure a secure lifestyle for seniors who are at risk of losing their physical abilities. Thus, it is essential © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 479–493, 2020. https://doi.org/10.1007/978-3-030-21005-2_46

480

A. Ben Haj Khaled et al.

to supervise the elderly to prevent age related disabilities such as decreased performance, walking disorders, abnormalities of posture, and loss of confidence that can lead to falls [3]. Research has found that risk of fall can be extrinsic and intrinsic. Extrinsic fall factors are related to the environmental features. Intrinsic factors are individual and they depend on the person. Intrinsic factors are more severe than extrinsic. They can be associated to impairments in posture control during walking. It is known as flexed posture [4]. These impairments affect mobility and increase risk of falling and cause 10% to 25% of all falls [5]. Falls can have more or less several consequences which can be divided in two categories physical (fracture, injuries) and psychological. The psychological effects are very significant as inactivity with 14.9% and functional dependence with 13.7% [6]. The loss of autonomy increases the risk of falling again. For this reason, it is important to monitor elderly. Traditionally, the wealthy older people move into a retirement home which number is much reduced in Tunisia (11 retirement homes) [7]. Other elderly prefers to live at home accompanied by a nurse or to use monitoring systems. All these methods do not respect patient privacy. They can effectively detect a fall but they cannot prevent it and its dangerous effects specially loss of autonomy. The use of serious games and smart sensors can prevent falls by improving physical capacity represents different advantages. On the psychological front, the elderly is maintained in its natural and autonomous environment in security. He is more confident and he can do his daily activity in security by improving his physical and motor abilities. On the economic front, fall prevention reduces the costs of therapy and the assistance compared to specialized hospital staff. In this paper, at first, we present some existing solution for monitoring. Then, we propose our technique to improve the activity of elderly and prevent the risk from fall which is the occupational therapy. And finally, we focus on the result. The general objective of this study is to maintain and improve the activity of elderly at home and giving them the feeling of safety and comfort.

2 The Existing Solutions Fall is the main cause of death among the elderly [8]. Indeed, 35% of senior, aged more than 65 years old, fall at least once each year [9]. To solve this problem, researchers are interested to develop fall detection systems. The principle of these systems is to control the elderly at home, to detect the fall and activate an alarm to alert medical services to rescue and help them. Diverse approaches have been explored to detect fall. They can be classified into 3 categories. The first one is the use of monitoring systems worn by older people. Their aim is to measure the acceleration and rotation to detect fall inside and outside. Accelerometers, gyroscopes and micro movement sensors are wearable detector and historically, they have been used as part of the loss of verticality. Noury et al. proposed a thorough study of this methodology [10]. But, this type of measurement is not always effective because it can be intrusive in the subject’s daily life.

Fall Prevention Exergame Using Occupational Therapy

481

The second category is an external (not wearable). These sensors are placed inside the house. It includes environmental and ground sensors. The environmental sensors are a simple and accurate way to analyze the actimetry of elderly. The detection is based on a situation that differs from the normal as a failure to open a door, pass in a room or the lack of movement may be due to unconsciousness after a fall [11]. This monitoring technique is based on the detection of abnormal activities. So, it can detect unnatural domestic animal’s action as a fall. The ground sensors are installed to control the floor using special flooring called “smart floor”. The floor grid is used to analyze the bearing points of the person’s foot and to detect its movement. In research, the use of smart floor began with Rimminen and AL [12]. This floor sensor can’t make the difference between a liquid poured on the floor and a drop. The second category requires a complex setup and is still in their infancy. In the third way to detect falls, monitoring needs only a simple camera. The images are treated by computer vision algorithms. These algorithms are studying the movements of the subject, according to several criteria to detect the fall. Images 2D are extracted to calculate the velocity of subject’s movement. The velocity is inversely proportional to the distance of the person from the camera. The more the person is near to the camera, the more the velocity is. In this case, it is difficult to differentiate between falls and a person sitting down abruptly. Rougier et al. solved this problem by using calibrated 2D camera to deal the real-world coordinates and they collected the 3D velocity vector [13]. Despite this solution, 2D cameras still have a major disadvantage. They can’t detect images, if an obstacle is placed in front of the sensor’s field of view. All these classical methods of monitoring study the physical point of view of fall and they are not looking on its most important cause. The absence of standardization is the main reason of the difficulty of interpretation and evaluation of a fall. After a long study of various fall’s research, we can deduce that the laziness and the lack of activities are more important risk factors than chronic and extrinsic causes. The feeling of isolation leads to a reduction in physical activity and then muscle strength. To avoid fall, the elderly reduced its output and displacement. This attitude causes a lack of contact with the outside world, a weakening of social life and a strong dependence on family members or home health aides. The use of the existing rehabilitation technique in order to recover lost motor abilities is painful boring and it requires moving into a rehabilitation center which is very difficult of elderly [14]. The emergence of innovative motion capture technology has revolutionized home monitoring. Kinect is a motion sensor. It allows person detection, tracking and recognition. It has transformed the use of video games in a revolutionary way. The interaction with the game does not need joystick; it simply needs gesture recognition. It motivates players to execute physical activities while playing. This activity has inspired researchers to model therapeutic serious game. In the literature, a considerable number of projects demonstrate an encouraging impact among elderly. This increased their enthusiasm and devotion to rehabilitation [15].

482

A. Ben Haj Khaled et al.

3 Our Contribution to Prevent Fall The fall is a very complicated phenomenon. Most existing systems are focused on detection. A simple search on google scholar using the search words fall detection algorithm gives 1690000 results compared to 363000 using the words fall prevention algorithm. Knowing that 78.53 research was conducted for fall detection and the importance of prevention, we are focused on the construction of a fall prevention system. As we know that prevention is better than cure. Our system aims to reduce the impact of falls and improve mobility of the elderly. As a matter of fact, we are inspired by the technique of physical therapy and specifically by the occupational therapy. This is an education, rehabilitation, adaptation or rehabilitation that allows the patient to preserve and develop their independence and autonomy in their familiar and social environment while also delivering advice to the person and his entourage. So, our proposition consists in a development system based on depth camera. Our solution is able to prevent fall by occupational therapy using serious games. At first the patient starts the game and tries to reproduce the gestures of the avatar in the most faithful way while being positioned in front of the Kinect sensor. At last, the interface will propose corrections to be made by means of a comparison algorithm angles, allowing the patient to know its evolution over time and improve its performance so that it can recover its functional capacity. The feature of this application also lies in the use of serious games for fall prevention. Application also lies in the use of serious games for fall prevention. 3.1

Kinect

To determine the most appropriate dispositive, we studied various motion sensors. Low-cost range sensors are very interesting. They are imposed in industry and several other domains. For this reason, we choose the Kinect sensor, this choice was made for its different advantages. Initially, Kinect is designed to control video games while allowing human machine interaction without markers or joystick. This is a simple and easy natural interaction where the body is simply used to interact with machine. Kinect allows the acquisition of RGB video, depth map and sound through its libraries supplied with the software kit. It includes an infra-red sensor, two cameras RGB and depth also called 3D camera which allows 3D motion capture and a microphone. The Fig. 1 explains the different compositions of Kinect.

a. Kinect sensor

b. Kinect sensor elements (cameras: RGB, IR and projector)

Fig. 1. Kinect sensor composition.

Fall Prevention Exergame Using Occupational Therapy

483

This simple and powerful input device enables gesture recognition. Motion capture is based on processing the depth map. The field of use of the Kinect is expanded and affects many other areas that play such as education, sport and medicine. The depth map permits to measure the distance between an object and the camera. The production of this card “depth map” is based on the principle of structured light. The projection of a light pattern allows having the information about its deformation when it encountered an obstacle in the scene. The infrared projector is shifted to the infrared camera. This enables it to detect the deformation of the pattern by comparing it to a reference pattern. The light pattern is a projected at a known distance from the Kinect sensor. Kinect uses structured light with the only difference that it is not light band but points of light. Indeed, a speckled pattern of dots that are projected on a scene using an infrared projector, and detected by an infrared camera. Each IR point has a unique peripheral zone so it allows identifying each point when projected on a scene. IR speckles projected by the Kinect have three sizes that are optimized for use in different depth ranges. This phenomenon is shown in Fig. 2.

Fig. 2. Light grid points created by the projector.

Thus, the Kinect can run between 1 m and 8 m. In addition to pixel shift, the Kinect compares the observed size of a particular point of the original size in the reference model. Any change in the size or shape is taken into account in the depth calculations. More the acquired infrared point is big more it is far. These calculations are performed on the device in real time as part of a system on chip (SOC) and the results are deep images of 640  480 pixels with a rate of about 30fps. To collect data from Kinect Xbox sensor, there are different projects with free libraries which are Open Kinect [16], OpenNI [17], CL NUI SDK [18] and Robot Operating System (ROS) Kinect. We opted to use SDK because it is able to calibrate automatically body, to adjust the motor’s angle and to detect skeleton with 20 jointpoints.

484

3.2

A. Ben Haj Khaled et al.

Methodology

The habitual methodology for the physical rehabilitation system using virtual reality is based on Cloning gestures of avatar without any assessment or correction. The contribution of our serious game is that not only it detects patient movement but also compares them with the movements of the avatar coach to check in late the validity of the gesture. Thus, the assessment of performance is flexibly and automatically, helping the patient to know if it is really progressing. The proposed exercise. One of the main causes of the fall is anomalies of posture and gait which is due as the decline of the capacity in the musculoskeletal system. Many studies examine the impacts of exercise to increase motilities, improve balance, and reduce risk of falls for elderly [19]. For this, we proposed an exercise to improve the patient’s posture. Good posture is an easy and important way to keep a healthy body (see Fig. 3). If a patient has a problem with balance, a right posture can help him to prevent expected fall. The principle is to do exercises that strengthen the upper back and shoulder muscles to maintain good posture. It does not need to have an athlete physique, the most important is to create a “muscle memory” to keep power naturally and unconsciously correct posture without fatigue.

Fig. 3. The proposed exercise is good for posture.

It consists to raise both arms out and to the side and up as far as is comfortable with palms forwards and return to the starting position. Fall prevention Algorithm. The skeleton detection principle begins with the acquisition of the depth map and the body classification using random forest. The accuracy of the exercise is determined by comparing the two skeletons (avatar and user). If all these functions return a positive result, correct exercise is indicated (Fig. 4).

Fall Prevention Exergame Using Occupational Therapy

485

Data acquisition

Skeleton detection

Data analysis

Patient skeleton

Data normalisation

Avatar skeleton

Angle calculation

Comparison = (+-10°)

Yes Next level

Fig. 4. Conceptual diagram of the fall prevention algorithm.

Depth map. The depth map returned by the Kinect associated with each pixel its distance from the sensor in mm. From this distance, which is equivalent to the z coordinate of the real point represented by the pixel, one can calculate the x-axis and y-axis (see Fig. 5) [20].

Fig. 5. Calculation of the coordinate “y” of the point [21].

486

A. Ben Haj Khaled et al.

Denoting z (gray level of the pixel) value of the depth map associated with the pixel, we can locate each point of the depth map by means of the following equations: x ¼ z  2  tanðfovh2Þ  ðxpXresÞ

ð1Þ

y ¼ z  2  tanðfovV2Þ  ðypYresÞ

ð2Þ

With xp and xp are the coordinates of the pixel, and Xres Yres the horizontal resolution and the vertical resolution of the depth map, and fovH fovV the field of horizontal and vertical viewing the Kinect (in radians). The actual known points of the pixel depth map (x, y, z) allow finding the 3D point cloud of the scene. By knowing the depth values, it is possible to determine the width and height of user using trigonometry as illustrated in Fig. 6.

Wp b User d 28,5° b=opposite

d= depth b= d.tan(28,5)

Kinect 57° Wp= user pixel width Wp/320=Wr/2b

Wr= user real width Wr=2b(Wp)/320

Fig. 6. User real world width [20].

Random forest for Skeleton detection. The skeleton of human body is inspired by Vitruve man. Each body parts are represented by a joint point as head, arms, shoulders, hands and foots. Shotton and Fitzgibbon proposed per-pixel that is a simple and innovative method for body part recognition. It is based on the detection of depth image. The skeleton detection algorithm starts by identifying the person realizing a segmentation of the depth image. Then, employing mean shift mode detection, it is possible to find local centroid of the body part probability mass. It is a weak signal and it can’t predict precisely pixel belong to which body part. The idea was to combine Randomized decision forests. It is very effective to accurately disambiguate all trained parts. Random decision forests are easy ensemble learning method for classification. They are always applied to 3D recognition (see Fig. 7). At a training time, it constructs many decision trees each consisting of split and leaf nodes. Distributed implementation is employed to keep the training times down we employ.

Fall Prevention Exergame Using Occupational Therapy

Depth image

Inferred body partd

Hypothesized joints

487

Tracked skeleton

Fig. 7. Skeleton detection.

Data analysis. The image provided by Kinect sometimes has a noise. For this a passage to the elimination of noise is necessary. Temporal and spatial parameters were examined to quantify performance. Parameters from upper body were measured with the Microsoft Kinect. Data normalization. From a theoretical point of view, the Kinect image has 3 x, y and z axes, but from a real point of view a patient can stand at any position. This requires standardization of coordinates to be able to represent the skeleton in the most uniform way possible. Angle calculation and comparison. The movement of the arm involves the position of 3 joint points on the right and left side namely (Shoulder, Elbow and Wrist). Thus, the positions of these 3 points will be used for the calculation of the angle of rotation performed by the arm. In the same way this angle is calculated to recognize the rotation of the arm of the avatar. If the angle of the old person corresponds approximately to the angle of the avatar with an acceptable error of + − 10 degrees, then the exercise is correct and it is possible to switch to the next level. If it does not fit, then the older person should try to repeat the exercise.

4 Occupational Therapy Using Serious Games 4.1

Serious Game Model

Several conceptual frames work for serious game design can be followed to help video games designers and provide a scientific and effective methodology to create a game that meets all the needs of the user. For example, Yusoff [22] some related works [23], defined the steps required to design a serious game. In our case, the most flexible model is considered necessary to design a game adapted to the occupational therapy for elderly patient to prevent fall so we chose the 6 facets model. It will allow us a simple design and gives us the opportunity to simultaneously improve and evaluate the design. The first facet defines the objective of the serious games. In general, it describes the knowledge model of the domain considering misconceptions. The goal of the game requires taking the opinion of all experts in the medical field. To determine our goal,

488

A. Ben Haj Khaled et al.

we had to take the advice of a specialist as orthopedic physician or physiotherapist. We also take in consideration the opinion of the patients to ensure the playability of our game. Domain simulation is the second facet. It is the field of simulation role is to ensure a consistent and adequate response to the actions of the player even if erroneous. In our case, if the player’s actions are not correct a message will appear to indicate the correction to make. The third simulation is called interactions with the simulation. This facet defines precisely how to engage the player to interact with the simulator. It is used to answer the question how to engage a player interacts with the game. For this we must imagine an attractive environment relaxing but at the same time attractive. The forth facet treats the difficulty of the game. It specifies how the player can solve the problem and in which order. The only way is to motivate the player to move from one problem to another and reward him. This facet defines how a player must do face the obstacles of the game and advance in the game. The solution is to design the progress considering both the acquisition of knowledge required and the player progress from one level to the other. The game progress can be seen as a series of challenges/problems (obstacles) have been overcome. An important point is how to get feedback on the progress made by the player and transfer them both to the player himself and the trainer. Decorum is the fifth facet. It identifies all the funny elements in the game. It serves to promote the motivation of the player without considering the simulation. In our applications, it is represented by an avatar so the main goal here is to increase the element of fun and consolidate commitment. The final facet is the condition of use. It explains how, where, when, and with which the game is played. The game can be played by one or more players in the classroom or online, with or without instructor. Our game software to be used, after seeking the advice of a specialist to indicate the types of exercise to do at home either and in the absence of a doctor. Several software designs can be used to design the avatar and the environment of our serious games. We need intuitive software to create the most realistic scene. Blender seems to be the most used in the design of 3D games because it has very advanced modeling software features. 4.2

The Proposed Occupational Therapy

Our occupational therapy for physical rehabilitation is very simple. Initially, the patient starts the game and tries to reproduce gestures of the avatar in the most accurate way possible. Gestures are proposed by a specialist or orthopedic physician and they depend on individual capacities. It may be made in the standing or sitting position (see Fig. 8). Kinect scans the depth image data to extract the patient’s skeleton. Finally, the interface offers corrections using a comparison algorithm angles allowing the patient to know his evolution and improve his performance so that it can recover his functional capacity. First, we detect the depth map given by Kinect. Different tests are made with different software as Matlab 2013a with SDK 1.6, OpenNI and SDK version 1.8.

Fall Prevention Exergame Using Occupational Therapy

489

Fig. 8. Elderly playing with the proposed exergame.

Depth image’s color as illustrated in Fig. 9 represents a distance between the user and the depth sensor. We should note that it is possible to program the Kinect to display the chosen color in function of distance.

Fig. 9. RGB and depth map result.

Second, the test we made on depth image detection seeks to detect person’s position. So, we programmed the sensor to track the user’s skeleton. The sensor is in standby mode if the person does not move in the scene. But when there is an action, Kinect focused on the skeleton recognition and tracking. The Kinect sensor is able to detect 20 joint points in the human body (see Fig. 10). There are 20 articulations which are head, shoulder center, spine, hip center, hip right, hip left, hand right, hand left,

490

A. Ben Haj Khaled et al.

wrist right, wrist left, elbow right, elbow left, knee right, knee left, ankle right, ankle left, foot right, foot left.

Fig. 10. User’s Skeleton tracking.

Then, we calculate the coordinate (x, y and z) of the joint points that we will use in the proposed exercise which are known for the Kinect and named as elbow, wrist and shoulder (see Fig. 11).

Fig. 11. The coordinates (x, y and z) of the corresponding joint points.

Indeed, the achieved interface is very simple. It uses the C# language to compute the angle between 3 points in the space. It is based on the principle of the calculation of an angle between two vectors. Arm movement involves the 3 joint points in right and left side namely (Shoulder, Elbow and Wrist). Thus, these three points of the positions will be used for calculating the angle of rotation effected by the arm. Similarly, this angle is calculated to recognize the rotation of avatar arm. The interface is composed

Fall Prevention Exergame Using Occupational Therapy

491

mainly of 2 windows (see Fig. 12). The first window is dedicated to the patient while the second is reserved for Avatar. Each window allows displaying two scenes. The user window displays rendering RGB camera and skeleton of the same patient. The second window enables to display the RGB and avatar movement. Each window also displays the angle made by the movement of arm. Including the two angles values of the arm interface to validate or reject gesture made by patient.

Fig. 12. The proposed interface of exergame using occupational therapy.

The effectiveness of our serious games can be translated through several facets. Trying to solve the difficulty of the games, the elderly is obliged to make several attempts to succeed. It results in very positive effect distinguished as psychological facets. The increase of motivation, self-efficacy, and enjoyment are highlighted. Depending on the competence and scoring, the game increases the sensory motor skills, posture, balance and gait. These are positive physical facets. Physiological facets can be explained as improving performance of the cardiovascular, cardio-respiratory or immune system and also re-establishing the neural plasticity. As a result, occupational therapy using serious games doesn’t simply mean serious purpose and motivation. Rather, it offers options for a new kind of prevention and rehabilitation with special emphasis on physical, psychological effects.

5 Discussion and Conclusion Falls is very dangerous phenomena. It is devastating and it increases the risk of mortality certainly for elderly. In this paper, we propose serious game specific to realize occupational therapy at home adapted to the limited abilities for elderly. This

492

A. Ben Haj Khaled et al.

occupational therapy is able to increase activity and to ameliorate posture balance and gait so it prevents the most serious causes that generate fall. Our game is easy to use and it does not require a complex arrangement. This study has some limitations. The Kinect sensor has already been used in research to detect and evaluate the fall of the elderly but this device has limitations in precision compared to other more expensive motion 3D sensor [24–27]. This work can be ameliorated by a multidimensional exercise program specific and adapted to the anomalies of each user. In a future study, an experimental study should be planned and a number of elderly people with equilibrium and posture walking disorders should be selected and their health status should be assessed by taking the necessary measures. We recommend this exercise to the elderly who do not have dizziness, neurological disorder or serious pathological diseases.

References 1. Ksouri Labidi, N.: Tunisia has the oldest population in Africa. Al Huffington Post Maghreb 1, 2014. http://www.huffpostmaghreb.com/2014/09/12/age-median-tunisie-monde_n_58052 38.html 2. National Office of Family and Population: Projection and perspectives of the population: what future for Tunisia. Documentary record. Circles of Population and Reproductive Health 8th Session. 4th Round table, 2009. http://www.onfp.nat.tn/cercles/TR4/dossier/dos_fr.pdf 3. Aniansson, A., Grimby, F., Gedberg, A.: Muscle function in old age. Scand. J. Rehabil. Med. 6(suppl), 43–49 (1978) 4. Groot MH1, van der Jagt-Willems HC2, van Campen JP3, Lems WF4, Beijnen JH5, Lamoth CJ6. A flexed posture in elderly patients is associated with impairments in postural control during walking. Gait Posture 39(2), 767–772 (2014). https://doi.org/10.1016/j. gaitpost.2013.10.015. Epub 22 Oct 2013 5. Nelson, R.C., Amin, M.A.: Falls in the elderly. Ernerg. Med. Clin. North. Am. 8, 309–324 (1990) 6. Terroso, M., Rosa, N., Marques, A.T., Simoes, R.: Physical consequences of falls in the elderly: a literature review from 1995 to 2010. Eur. Rev. Aging Phys. Activity 11(1), 51–59 (2014) 7. Ben Nasr, T.: Dar Ellama: A dignified retirement. Newspaper: Nawaat https://nawaat.org/ portail/2015/03/04/dar-ellama-une-retraite-dans-la-dignite/ 8. Hoskin, A.F.: Fatal falls: trends and characteristics. Stat. Bull. Metrop. Insur. Co. 79(2), 10– 15 (1998) 9. Shumway-Cook, A., Gmber, W., Baldwin, M., Liao, S.: The effect of multidimensional exercises on balance, mobility, and fall risk in community-dwelling older adults. Phys. Ther. 77(1), 46–57 (1997) 10. Noury, N., Fleury, A., Rumeau, P., Bourke, A., Laighin, G., Rialle, V., Lundy, J.: Fall detection - principles and methods. In: 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), pp. 1663–1666 (2007) 11. Demongeot, J.: Technology and gerontological care: ethical implications and considerations for the future - and Gerontology Society, 2005 - cairn.info and Gerontology Society 2005/2 (113), p. 146. https://doi.org/10.3917/gs.113.0121 Publisher: Fond. National Gerontology 12. Rimminen, H.: Positioning accuracy and multi-target separation with a human tracking system using near field imaging. Int. J. Smart Sens. Intell. Syst. 2(1), 156–175 (2009)

Fall Prevention Exergame Using Occupational Therapy

493

13. Rougier, C., Meunier, J., St-Arnaud, A., Rousseau, J.: Robust video surveillance for fall detection based on human shape deformation. IEEE Trans. Circuits Syst. Video Technol. 21, 611–622 (2011) 14. Burke, J., McNeill, M., Charles, D., Morrow, P., Crosbie, J., McDonough, S.: Optimising engagement for stroke rehabilitation using serious games. Visual Comput. 25, 1085–1099 (2009) 15. Sugarman, H., Weisel-Eichler, A., Burstin, A., Brown, R.: Use of the Wii Fit system for the treatment of balance problems in the elderly: a feasibility study. In: Virtual Rehabilitation International Conference, 2009, pp. 111–116 (2009) 16. OpenKinect. OpenKinect Main Page. http://openkinect.org/. Last Accessed April 2011 17. OpenNI. OpenNI. http://openni.org/. Last Accessed April 2011 18. Laboratories, C., About: CL NUI Platform. Code Laboratories 19. Province, M.A., Hadley, E.C., Hornbrook, M.C., et al.: The effects of exercise on falls in elderly patients. J. Am. Med. Assoc. JAhZA 272, 1341–1347 (1995) 20. Cruz, L., Lucio, D., Velho, L.: Kinect and RGBD Images. Challenges and Applications. Graphics, Patterns and Images Tutorials (SIBGRAPI-T). Conference, OuroPreto, Brazil, August 2012 21. Pishchulin, L., Jain, A., Wojek, C., Andriluka, M., Thermalon, T., Schiele, B.: Learning people detection models from few training samples. MPI Inf. Saarbrucken, MPI Inf. Saarbrucken, Germany. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1473–1480 (2011) 22. Yusoff, A., Crowder, R., Gilbert, L., Wills, G.: A conceptual framework for serious games. In: Proceedings of the 2009 Ninth IEEE International Conference on Advanced Learning Technologies, pp. 21–23. IEEE Computer Society, Riga, Latvia (2009) 23. Capdevila Ibáñez, B., Boudier, V., Labat, J.-M.: Knowledge management approach to support a serious game development. In: Proceedings of the 2009 Ninth. IEEE International Conference on Advanced Learning Technologies, pp. 420–422. IEEE Computer Society, Riga, Latvia (2009) 24. Stinghen Filho, I.A., Gatto, B.B., Pio, J.L.D.S., Chen, E.N., Junior, J.M., Barboza, R.: Gesture recognition using leap motion: a machine learning-based controller interface. In: Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2016 7th International Conference. IEEE 25. Ameur, S., Khalifa, A.B., Bouhlel, M.S.: A comprehensive leap motion database for hand gesture recognition. In: Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2016 7th International Conference on, pp. 514–519. IEEE, December 2016 26. Sellami, T., Jelassi, S., Darcherif, A.M., Berriri, H., Mimouni, M.F.: 3D finite volume model for free and forced vibrations computation in on-shore wind turbines. In: Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2016 7th International Conference on, pp. 104–108. IEEE December 2016 27. Habiba, N., Ali, D.: Spectral three-dimensional mesh matching using histograms descriptors and salient points. In: Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2016 7th International Conference on, pp. 370–373. IEEE, December 2016

An Assistance Tool to Design Interoperable Components for Co-simulation Yassine Motie(&), Alexandre Nketsa, and Philippe Truillet LAAS-IRIT, University of Toulouse III, Toulouse, France {yassine.motie,philippe.truillet}@irit.fr, [email protected]

Abstract. The high number of electronic devices used and their interactions lead us to the transition from a vision of multi-functions systems, used independently, to systems that are actually distributed and scattered in the environment. The heterogeneity of the components constituting some of these systems ultimately leads to call them “complex”. When a complex system [1] requires the use of different components specified by different designers working on different domains, this greatly increases the number of virtual prototypes. Unfortunately, these components tend to remain too independent of one another, thus preventing both the designers from collaborating and their system from being interconnected in order to full one or more tasks that could not be performed by one of these elements only. The need for communication and co-operation is necessary and encourages the designer (s) to inter-operate them for the implementation of a co-simulation [2] encouraging dialogue between disciplines and reducing errors, costs and Development time. In this article, we describe an assistance tool in order to generate black-box components, facilitating this design task for novices. Keywords: Complex systems Component generation

 Models  FMI  Co-simulation 

1 Introduction Designing an interactive system is a difficult task. Designing and evaluating a so-called “complex” system is even more so. From the software point of view, this system can be seen as an integrated set of elements interconnected with each other, in order to satisfy in a given environment one or more pre-defined objectives. In general, the components of this system include both the facilities, the hardware and software equipment, the data, the services, the qualified personnel and the techniques necessary to achieve, provide and maintain its efficiency. A complex system also has many characteristics such as the heterogeneity of its components, their evolution at different time scales and their geographical distribution integrating both digital systems, physical operators and/or human. These complex systems are usually broken down into subsystems, following either a top-down approach (also known as Stepwise Refinement and functional decomposition), or a bottom-up approach (from existing © Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 494–503, 2020. https://doi.org/10.1007/978-3-030-21005-2_47

An Assistance Tool to Design Interoperable Components

495

components that need to be reused). This often leads to a failure for these subsystems to speak the same language to each other and to share information effectively and correctly - to be interoperable - which is one of the major problems that complex systems frequently face. In order to succeed this collaboration at a global level, it is important to opt for an open environment that allows for a continuous dialogue between the different parties. One way to do this is by applying co-simulation which is defined as the combination of various simulation models, including other tools, for different components of a complex system. Co-simulation provides the abstraction needed for designers to work on their business expertise. The interest regarding co-simulation, is not only triggered by the coupling of the environments but also by the potential efficiency gain of decoupling a large system model. This is exemplified in [19] where a model of an engine is split into subsystems, leading to a decrease of the simulation time by an order of magnitude. FMI [20] (Functional Mock-up Interface) standard and data mediation were used in a previous work [5] for the structural and semantic interoperability levels respectively, in order to build an interoperable co-simulation framework applied in neOCampus which is a research project supported by the University of XXX [4]. The idea is that tools generate and exchange models that conform to the FMI specification. Such models are called FMUs (Functional Mock-up Units). The problem is that the transition from the subsystem studied to an FMU component (black box) requires knowledge of the FMI standard, thus constituting an obstacle for our designers, inhibiting thereby their collaborations. Rather than forcing the users to change their behavior to accommodate our framework, we tried, in this work, to optimize the framework’s structural interoperability level around how users can, want, or need to use it. Inspired by the user-centered design approach [7], we studied components generation methods and proposed a prototype-based on the generation tasks to be performed - for partial automation. The idea is to allow the designers to preserve their tools, their favorite languages and their expertise in order to guide them for the co-simulation first step with other heterogeneous simulators.

2 Related Work The participatory design cycle begins with the analysis of user needs and activities. The ISO 16982 standard offers, for example, methods of user observation, questionnaires, interviews, or the study of available documents. To achieve this phase with designers like those of the neOCampus operation, one should understand their needs to co-simulate (need for data, need to validate their simulation, etc.), their working environments, the tools they use, and their levels of expertise with computer (software, FMIs, GUIs,…). Often in a phase following this first, it is useful to put into practice methods of creativity, such as brainstorming [12], to produce ideas for solutions. There are variants and more specialized methods like the “Group Elicitation Method” [13] which proposes “brainwriting”, a written variant of brainstorming. In our case where it’s difficult to bring together the different researchers working in neOCampus, we assumed that this

496

Y. Motie et al.

phase isn’t that relevant at that moment while we collected the main needs of our users and make sure of the importance of the implementation of a black-box components generation tool, and especially to save the learning effort of the FMI standard. We went then directly to the design phase, then replace this brainstorming by adding a debriefing step as shown in (cf. Fig. 1).

Fig. 1. Our user-centered design approach

For the creation of solutions, the C and D phases of the process, there are many possibilities. The most common is to use low-fidelity prototypes. These are produced as in our case by us designers from the ideas generated collectively. They are used to present to users solutions to evaluate, validate or refute concepts or interactions, and to choose or propose new ideas. For the realization of these prototypes the designer has the choice between several methods. These are often based on the use of visual content. Rettig [14] and Snyder [15] show, for example, the use of paper prototyping, in which manipulative and discussion interfaces are prepared in the form of drawings or collages. The “Wizard of Oz” experiment [16] proposes to simulate the interactive functioning of the final prototype. This methodology is often based on such a visual paper mock-up. Serrano [17] proposes with “Open Wizard” a software solution of magician of Oz for multimodal systems. It allows to simulate input modalities but does not allow the simulation of the output modalities. An alternative to the Wizard of Oz is to code low-fidelity prototypes. According to Sefelin [18], the results achieved with these prototypes are equivalent to those obtained with paper models. In addition, the interviews conducted at the end of the tests comparing paper models and software prototypes reveal that 22 of the 24 subjects say they prefer working with software prototypes. New technologies like Adobe Flash or MS Silverlight make it easy to create low-fidelity prototypes.

An Assistance Tool to Design Interoperable Components

497

We coded a low-fidelity prototype which addressed two very interesting points: 1. Adequacy between proposed functionalities and user needs 2. Adequacy between the interface and the users Designers working in the neOCampus project mostly use COTS (commercial offthe-shelf) simulation software [3] to build and test their simulation models. Integrating these models to form a single meta-model is a major issue especially when distributed simulation technologies are not anchored in these software [6]. We have seen that because of the lack of communication and collaboration between these different designers, their models were built completely in a disconnected way and do not benefit from the exchange of information that can simplify and accelerate their work. Beyond these practices, another problem identified concerns the difficulty of being able to generate the co-simulation components (essentially components implementing the FMI) from the different simulators used. Indeed, the majority of designers are experts in their field but 1/are not necessarily computer experts and 2/if they are, not often expert in co-simulation nor in FMI. This time of taking over the FMI technology (time and practice) has led us to propose a mechanism to support the generation of FMU components based on user practices. It’s an interface that is intended to be easy to use for novice users, adapted and allows to accompany the process of generation of components ready to connect to our platform (cf. Fig. 2).

Fig. 2. Overview of the FMI component generation process

Many efforts have been made aiming to implement and test the FMI standard since its release. [8] discusses a generic interface’s implementation and technical problems and challenges helping the importation of FMUs into a simulator. [9] describes the implementation of FMI in SimulationX. In [10] an integration strategy for rapid prototyping for Modelica models into the FMI standard is presented. Since the modelling

498

Y. Motie et al.

and simulation step is done separately by each designer, where the model has to be set up and tested against its specifications first. Standalone executables may be launched, traced, and debugged using additional tools like an IDE (Integrated Development Environment). The next step in the FMU generation process is to determine the interface of the simulation model, which is later exposed through the FMI. This consists of the definition of input and output quantities, or states, as well as internal timing (accuracy and precision required by the simulation model) and external timing (simulation step size for data exchange considerations. These informations are gathered with the information about the FMUs architecture in a modelDescription.xml file, which is connected to the software code using functions usually provided by the FMU SDK (software development kit) [11]. The parts of an FMU are put together after being compiled. A zip file is shaped and the “.dll file” is put in the binaries folder for the interrelated platform. The modelDescription.xml file is set in the root folder. It is then a deliberate choice to put the model source files in the source folder. We first analyzed the information about an FMU stored in the modelDescription. xml file. For example, the latter contains elements like ModelVariables defining all the variables of the FMU that are visible/accessible via the FMU functions. This inventory made it possible to extract the minimum necessary and sufficient variables for a simulation in order to generate as easily as possible the FMU component allowing cosimulation between systems. Then, we built a task model to understand the steps to be performed, in order to build an FMU component, and proposed several models of which (cf. Fig. 3).

Fig. 3. Low-fidelity models implementing the scenario

An Assistance Tool to Design Interoperable Components

499

These steps were essential to identify the automatable steps of those where the actions of the user are essential. Finally, based on these identified tasks, we proposed a “medium fidelity” interface, functional on an identified scenario for which we conducted a pre-experiment.

3 Preliminary Study As mentioned earlier, we found in surveys that a large majority of users were unfamiliar with FMI technology and had trouble “to jump in”. That’s why we wanted to check that using an interface, to help with the elaboration of FMU components, made sense for the designers (to understand the process and allow to collaborate) and that it was useful (in terms of time saving in the construction of a component for example, …). 3.1

Participants and Procedure

Participants: We conducted this pre-experiment with 7 novice participants and FMI experts. 6 participants were adult men (mean age = 20 years old, standard deviation = 5) and 1 woman (age = 27 years old). Participants were recruited from xxxx, xxxx and xxxx laboratories and were familiar with computers. Equipment: We have developed the prototype under Java/QT on a laptop running Windows 10 OS (Core I7, 32 GB RAM, 17” screen). Procedure: After signing the consent form, we exposed the two tasks to be performed for the pre-experimentation. Task (1): an open program in the Eclipse framework was provided to the participants. The program was the same for all participants. They were asked to generate an FMU component using the latest version of the JavaFMI library and following the script available via a tutorial we provided them with (library download, class creation, component generation fmu2). For the second task (2), we asked them to launch and use the graphical interface, helping with the FMU component’s generation, that we developed. This interface proposes to guide the user in the generation process from the number of input and output variables, entering names and types of variables, respect of cast and spellchecker, until the download of the generated component, providing a summary of the performed tasks (cf. Figs. 4 and 5). Five participants (number P1 to P5) performed the task (1) then (2) and the last two ones (number P6 and P7) performed the task (2) before (1) to counterbalance learning effects.

500

Y. Motie et al.

Fig. 4. Screenshot of the proposed interface

Task completion time 100 50 0 0

2 Time Task (1)

4

6

8

Time Task (2)

Fig. 5. Task completion time for tasks (1) and (2)

Analysis: We recorded and analyzed the processing time of the different participants either by following the prescribed scenario or by using our interface. The number of actions performed was also recorded and compared to the fact that the participants were FMI experts or not, or if they used to work with integrated development environments (and more specifically Eclipse). Independent Variables: with interface, without interface. Dependent Variables: time (continuous variable), number of actions (discrete variable), FMI expertise: expert or not (categorical variable), experience related to frequency of use of IDE (categorical variable). The aim is both to verify our initial hypotheses (the main objective of the tool is to allow non-experts to generate FMU components for co-simulation) and to iterate on our solution. We completed the pre-experimentation phase by debriefing with the participants.

An Assistance Tool to Design Interoperable Components

501

4 Results and Discussion Users gave very mixed opinions on the experience, but most of them told us that following the script and trying to complete the program, to compile it and run it was a daunting task. In fact, one of these participants decided to leave the experiment after 32 min and found that the exercise was difficult. 6 users could reach the end of the task (1) but actually only 3 were able to generate a FMU component (two of which were experts FMI and one who frequently used the IDE and java as the main language in everyday work). The other 3 committed code errors that blocked this generation. On the other hand, the generation of the FMU component using our interface (task 2) was successful by all the participants. We were able to realize that the order of the tasks did not matter much regarding the results in terms of time of completion of the task. (cf. Table 1) describes the times performed by the 7 participants (participants P6 and P7 first performed task (2) before (1)). Table 1. Results – task completion time Participant P1 P2 P3 P4 P5 P6 P7

Time (1) 25’ 32’ -ab. 42’ 65’ 54’ 34’ 30’

Time (2) IDEuses 3’ 4 5’ 4 7’ 3 14’ 1 8’ 2 10’ 3 13’ 4

FMI expert oui non non non non non oui

Participants are distinguished according to their level of FMI expertise and their use of IDEs (with scale: 1: never, 2: rarely, 3: regularly, 4: all the time). There is systematically a longer realization time (at least a factor of 3) for the execution of the task (1) than for the task (2) regardless of their expertise. The ANOVA analysis also reveals a significant effect of the experience of an IDE on the time of completion of the task (F(1,8) = 50.02, p < 0.001). With only two expert users, we cannot say much about the impact of the expertise on the time of completion of the task. Nevertheless, we can note here again a saving of time of realization by using our interface, independently of the degree of expertise of the subject. The number of actions performed (being analyzed) is also sharply down (by a factor of 8).

502

Y. Motie et al.

5 Conclusion and Future Works This preliminary work first made it possible to draw up a complete list of the practices of the different neOCampus project’s actors. We understood it was ideal that experts could as a matter of first importance keep their very own practices by enabling them to build or enhance their own “business” simulator, so as to enable them to convey and collaborate. We therefore opted for the interoperability of these heterogeneous simulators based on the FMI co-simulation standard overcoming model semantic gaps and offering them a validation platform for co-simulation. In this process, we have targeted one of the most difficult tasks, which is the generation of pluggable components in our platform, from heterogeneous simulations. We have therefore designed an interface facilitating this generation by focusing on automation with an understanding of the generation process. Hence, we pre-evaluated our interface in order to improve it. Our interface will aim the integration of the total control of the co-simulation with potentially visualization tools adapted to each participant in this co-simulation according to its needs. The experience was enriching and appreciated. This constantly improving interface will not only be a mechanism to facilitate collaboration and simplify the co-simulation of our different systems, but can also be a plus for the FMI community that in all areas and for different uses is brought to generate components and for some languages and simulation environment for which there is currently no documentation or library to do so.

References 1. Bar-Yam, Y.: Dynamics of Complex Systems, vol. 213. Addison-Wesley, Reading, MA (1997) 2. Rowson, J.A.: Hardware/software co-simulation. In: DAC, vol. 94, pp. 6–10, June 1994 3. Boer, C.A., Verbraeck, A.: Distributed simulation and manufacturing: distributed simulation with cots simulation packages. In: Proceedings of the 35th Conference on Winter Simulation: Driving Innovation. Winter Simulation Conference, pp. 829–837 (2003) 4. Gleizes, M.-P., et al.: Neocampus: a demonstrator of connected, innovative, intelligent and sustainable campus. In: International Conference on Intelligent Interactive Multimedia Systems and Services, pp. 482–491. Springer (2017) 5. Motie, Y., et al.: A co-simulation framework interoperability for Neo-campus project (regular paper). In: European Simulation and Modelling Conference (ESM), Lisbon, 25/10/2017-27/10/2017. EUROSIS, 2017 6. Taylor, S.J., et al.: “Integrating heterogeneous distributed cots discrete-event simulation packages: an emerging standards-based approach. Syst. Man Cybern. Part A: Syst. Hum. IEEE Trans. on 36(1), 109–122 (2006) 7. Abras, C., Maloney-Krichmar, D., Preece, J.: User-centered design. Bainbridge, W. Encyclopedia of Human-Computer Interaction. Thousand Oaks: Sage Publications 37(4), 445–456 (2004) 8. Chen, W., Huhn, M., Fritzson, P.: A generic FMU interface for Modelica. In: 4th International Workshop on Equation-Based Object-Oriented Modeling Languages and Tools, pp. 19–24 (2011)

An Assistance Tool to Design Interoperable Components

503

9. Noll, C., Blochwitz, T.: Implementation of modelisar functional mock-up interfaces in SimulationX. In: 8th International Modelica Conference (2011) 10. Elsheikh, A., Awais, M.U., Widl, E., Palensky, P.: Modelica-enabled rapid prototyping of cyber-physical energy systems via the functional mockup interface. In: 2013 Workshop on Modeling and Simulation of Cyber-Physical Energy SystemsMSCPES 2013, pp. 1–6 (2013) 11. Qtronic: FMU SDK: free development kit (2014) 12. ISO/TR 16982. Ergonomics of Human-System Interaction–Usability Methods Supporting Human-Centred Design (2002) 13. Boy, G.A.: The group elicitation method for participatory design and usability testing. interactions 4(2), 27–33 (1997) 14. Rettig, M.: Prototyping for tiny fingers. Commun. ACM 37(4), 21–27 (1994) 15. Snyder, C.: Paper prototyping: the fast and easy way to design and refine user interfaces. Morgan Kaufmann, Amsterdam (2003) 16. Kelley, J.F.: An iterative design methodology for user-friendly natural language office information applications. ACM Trans. Inf. Syst. (TOIS) 2(1), 26–41 (1984) 17. Serrano, M., Nigay, L.: OpenWizard: une approche pour la création et l’évaluation rapide de prototypes multimodaux. In: Proceedings of the 21st International Conference on Association Francophone d’Interaction Homme-Machine, pp. 101–109. ACM, October 2009 18. Sefelin, R., Tscheligi, M., Giller, V.: Paper prototyping-what is it good for?: a comparison of paper-and computer-based low-fidelity prototyping. In: CHI’03 extended abstracts on Human factors in computing systems, pp. 778–779. ACM, April 2003 19. Hippmann, G., Arnold, M., Schittenhelm, M.: Efficient simulation of bush and roller chain drives. In: Proceedings in Multibody Dynamics, ECCOMAS Conference (2005) 20. Blochwitz, T., et al.: Functional Mockup Interface 2.0: The Standard for Tool independent Exchange of Simulation Models. In: Proceedings of the 9th International MODELICA Conference, pp. 173-184, 3–5 September 2012. https://doi.org/10.3384/ecp12076173

Author Index

A Abderrahmane, Atmani, 261 Abouabdellah, Abdellah, 59, 463 Abram, Alain, 131 Afif, Mouna, 234, 364 Aggoune, Aicha, 39 Ajili, Sondes, 311 Akoum, Alhussain, 382 Alkhatib, Ghazi, 131 Al-Sarayrah, Khalid, 131 Amar Bensaber, Djamel, 3 Atri, Mohamed, 234, 364 Attia, Rabah, 332 Ayachi, Riadh, 234, 364 B Baabou, Salwa, 303 Babahenini, Mohamed Chaouki, 202 Bahjat, Hala, 473 Barakat, Oussama, 449 Belhadef, Hacene, 144 Belhaj Salah, Latifa, 271 Ben Fekih, Rim, 192 Ben Haj Khaled, Amina, 479 Ben Salem, Aïcha, 182 Bhar Layeb, Safa, 14 Bou Saleh, Bilal, 449 Bou Saleh, Ghazi, 449 Bouakkaz, Mustapha, 144 Bouallegue, Ghaith, 372 Boufares, Faouzi, 182 Bouhlel, Med Salim, 435, 479 Boujelbene, Younes, 83 Bournene, El Bey, 244 Bourouba, Hocine, 244

Bouslama, Sarah, 14 Bremond, François, 303 C Chaieb, Ramzi, 341 Chantaf, Samer, 352 Chaouachi, Jouhaina, 14 Chaouch, Chakib, 155 Chemam, Chaouki, 202 D Derbel, Ahmed, 83 Derbel, Bilel, 393 Doghmane, Hakim, 244 Douali, Latifa, 123 Draa, Amer, 222 E El Hamdi, Sarah, 463 El Mariouli, Oussama, 59 El Moudni, Abdellah, 449 Elhadj Youssef, Wajih, 372 Ellouze, Nebrasse, 94, 410 Elsaleh, Rola, 352 F Farah, Mohamed Amine, 303 Filali, Taheni, 435 Fourati, Fathi, 271 Fradi, Marwa, 372 G Gafsi, Mohamed, 311 Ghali, Rafik, 332 Ghorbel, Fatma, 94, 410

© Springer Nature Switzerland AG 2020 M. S. Bouhlel and S. Rovetta (Eds.): SETIT 2018, SIST 146, pp. 505–507, 2020. https://doi.org/10.1007/978-3-030-21005-2

506 Giacobbe, Maurizio, 155 Graba, Dalila Djoher, 3 Guezmir, Naouel, 323 H Hachaïchi, Yassine, 283 Hage Chehade, Rafic, 382 Hain, Mustapha, 72 Hajjaji, Mohamed Ali, 311 Hajjami Ben Ghézala, Henda, 182 Hajjar, Mohammad, 449 Hamdi, Fayçal, 94, 410 Hilal, Alaa, 352 I Idri, Ali, 72 Ikram, Jaouadi, 172 J Jeffers, Esther, 420 Jemili, Farah, 192 Jmal, Marwa, 332 K Kachouri, Abdennaceur, 303 Kalti, Karim, 341 Keche, Mokhtar, 400 Keskes, Nabil, 3 Kessentini, Mouna, 420 Khalfallah, Ali, 479 Khan, Muzammil, 50 L Lahbib, Younes, 283 Larbi, Abdelmadjid, 115 Lasaygues, Philippe, 372 Lattar, Hafsa, 182 M Maâloul, Mohamed Hédi, 24 Machhout, Mohsen, 372 Mahatody, Thomas, 163 Majda, Labiadh, 24 Makkouk, Rabih, 382 Malek, Jihene, 311 Malki, Mimoun, 115 Mallek, Hadjer, 212 Mami, Sonia, 283 Manantsoa, Victor, 163 Marir, Toufik, 212 Marrakchi Charfi, Olfa, 323 Mars, Mokhtar, 323 Mazzaccaro, Daniela, 393

Author Index Mbainaibeye, Jérôme, 323 Messaoudi, Kamel, 244 Métais, Elisabeth, 94, 410 Miri, Rim, 393 Mirmahboub, Behzad, 303 Mohamed, Elmarraki, 261 Mohamed, Essalih, 261 Mohsen, Machhout, 172 Mohsin, Hanaa, 473 Motie, Yassine, 494 Mtibaa, Abdellatif, 311 Mukhamedshin, Damir, 105 N Nano, Giovanni, 393 Naseem, Rashid, 50 Nevzorova, Olga, 105 Nketsa, Alexandre, 494 O Ouamri, Abdelaziz, 400 Oubadi, Sihem, 212 Oudani, Mustapha, 463 P Pissaloux, Edwige, 364 Puliafito, Antonio, 155 R Rafalimanana, Hantanirina Felixie, 163 Ratovondrahona, Alain Josué, 163 Razafindramintsa, Jean Luc, 163 Rjab, Sabrine, 283 Rouane, Oussama, 144 S Said, Yahia, 234, 364 Sakka, Mustapha, 294 Scarpa, Marco, 155 Silem, Abd El Heq, 212 Smaoui, Souhaïl, 294 Soltani, Mohamed, 202 Souidene Mseddi, Wided, 332 Soyed, Emna, 341 Suleymanov, Dzhavdet, 105 T Talbi, Hichem, 222 Truillet, Philippe, 494 U Ullah, Muhammad, 50 Ur Rahman, Arif, 50

Author Index Y Yahia Lahssene, Yamina, 400 Younès, Bahou, 24

507 Z Zakrani, Abdelali, 72 Zarzour, Hafed, 202