The Fifth International Conference on Safety and Security with IoT: SaSeIoT 2021 (EAI/Springer Innovations in Communication and Computing) 3030942848, 9783030942847

This book presents the Fifth International Conference on Safety and Security with IoT (SaSeIoT 2021), which took place o

99 87 7MB

English Pages 217 [211] Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Fifth International Conference on Safety and Security with IoT: SaSeIoT 2021 (EAI/Springer Innovations in Communication and Computing)
 3030942848, 9783030942847

Table of contents :
Preface
Conference Organization
Steering Committee
Organizing Committee
Technical Program Committee
Contents
About the Editors
Opportunistic Multi-Modal User Authentication for Health-Tracking IoT Wearables
1 Introduction
1.1 Motivation
1.2 Contributions
2 Related Work
3 Approach
3.1 Wellue Dataset
3.2 Data Pre-processing
3.3 Usefulness of Blood Oxygen-Level Data
3.4 Feature Computation
3.5 Feature Selection
3.6 Methods
4 User Authentication
4.1 Performance Measures
4.2 Training-Testing Set
4.3 Hyper-Parameter Optimization
4.4 Optimal Feature Count Determination
4.5 Authentication Model Evaluation
5 Conclusion and Future Work
References
On-Phone CNN Model-Based Implicit Authentication to Secure IoT Wearables
1 Introduction
1.1 Motivation
1.2 Contributions
2 Related Work
3 Approach
3.1 System Overview
3.2 Datasets
3.3 Data Pre-processing
3.3.1 Data Segmentation and Cleaning
3.3.2 Audio Data Augmentation
3.4 Feature Computation
4 User Authentication
4.1 Training-Testing Set
4.2 Performance Comparison Measures
4.3 Hyper-Parameter Optimization
4.4 Authentication Model Evaluation
5 Conclusions and Future Work
References
A Cybersecurity Guide for Using Fitness Devices
1 Introduction
2 Personal Data Collection by Fitness Devices
3 Risks Arising from Fitness Devices
4 Cybersecurity Awareness of Users
5 Cybersecurity Guidelines for Users and Manufacturers
5.1 Cybersecurity Guidelines for Users
5.2 Cybersecurity Guidelines for Manufacturers
6 Discussion
7 Conclusion
References
An Efficient Algorithm for Human Abnormal Behaviour Detection Using Object Detection and Pose Estimation
1 Introduction
2 Organization of Paper
3 Literature Review
4 Proposed Work
4.1 Working and Operation
4.2 Flowchart of Proposed Work
5 Experimentation and Results
5.1 Dataset
5.2 Tabular Representation
5.3 Graphical Representation
5.4 Displaying the Normal Classes as Screenshots
5.5 Displaying the Abnormal Classes as Screenshots
5.6 Analysis
6 Conclusion and Future Scope
References
A Secure and Scalable IoT Consensus Protocol
1 Introduction
2 Background
2.1 A Blockchain Consensus Algorithm
2.2 The Consensus Problem in Context
3 The Balance Authentication Mechanism
4 The Consensus Process
4.1 Choosing a Lead Cell
4.2 An IoT Request
4.3 A Cell
4.4 Double Spend Protection
4.5 The Isomorphic Algorithm
4.6 The Security Provisions
4.6.1 Stage One
4.6.2 Stage Two
4.6.3 Stage Three
4.6.4 Stage Four
4.6.5 Stage Five
4.7 Error Detection
4.8 Merkle Tree
5 Achieving Scalability
6 Conclusion
References
Session Key Agreement Protocol for Secure D2D Communication
1 Introduction
2 Related Work
3 System Model
4 Results and Discussion
4.1 Security Analysis
4.2 Performance Evaluation
5 Conclusion and Future Work
References
What Do Your Smart Home Devices Reveal About You?
1 Introduction
2 Related Work
3 Threat Model and Evaluation Methodology
4 Evaluation: Google Home Mini
4.1 Baseline Scenarios
4.2 Standard Operating Scenarios
4.3 Non-standard Operating Scenarios
5 Evaluation: Netgear Arlo Pro
5.1 Baseline Scenarios
5.2 Audio and Motion Scenarios
5.3 Field of View Scenarios
5.4 Notifications Scenarios
6 Privacy Leakage
7 Conclusions
References
A Quantum-Resistant and Fast Secure Boot for IoT Devices Using Hash-Based Signatures and SRAM PUFs
1 Introduction
2 Related Work and Main Contributions
3 Hash-Based Signatures
3.1 WOTS+ Scheme
3.2 XMSS Scheme
4 Proposed Solution
4.1 Registration Phase
4.2 Secure Boot Operation
4.3 Secure Firmware Update
5 Implementation Results
6 Conclusions
References
On the Analysis of MUD-Files' Interactions, Conflicts, and Configuration Requirements Before Deployment
1 Introduction
2 Manufacturer Usage Description (MUD)
2.1 Components and Workflow
2.2 MUD-File ACL Abstractions
3 Motivation
4 Related Work
5 Methods
5.1 ACE Merging
5.2 ACE Tree
5.2.1 Pruning ACE Tree
6 Implementation
7 Results
8 Conclusions and Future Work
Availability
References
Natural Scenes' Text Detection and Recognition Using CNN and Pytesseract
1 Introduction
2 Related Work
3 Methodology
3.1 The Detection Stage
3.2 Recognition Stage
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Results and Evaluations
5 Conclusion and Future Work
References
Assessing the Resistance of Internet of Things Applications Against Memory Corruption Attacks: A Case Study for Contiki and Tizen
1 Introduction
2 Related Works
3 Security Assessment of Contiki OS
3.1 Stack-Based BOF
3.2 Heap-Based BOF
3.3 Buffer Overread
3.4 Format String
3.5 Use After Free
4 Security Assessment of Tizen OS
4.1 Stack-Based BOF
4.2 Heap-Based BOF
4.3 Buffer Overread
4.4 Format String
4.5 Use After Free
5 Discussion
6 Conclusion
7 Future Works
References
IoT Geography Chain: Blockchain-Based Solution for Logistics Ecosystem
1 Introduction
2 Literature Review
3 Proposed Framework
4 Conclusion
References
Author Index
Subject Index

Citation preview

EAI/Springer Innovations in Communication and Computing

Anand Nayyar Anand Paul Sudeep Tanwar   Editors

The Fifth International Conference on Safety and Security with IoT SaSeIoT 2021

EAI/Springer Innovations in Communication and Computing Series Editor Imrich Chlamtac, European Alliance for Innovation, Ghent, Belgium

The impact of information technologies is creating a new world yet not fully understood. The extent and speed of economic, life style and social changes already perceived in everyday life is hard to estimate without understanding the technological driving forces behind it. This series presents contributed volumes featuring the latest research and development in the various information engineering technologies that play a key role in this process. The range of topics, focusing primarily on communications and computing engineering include, but are not limited to, wireless networks; mobile communication; design and learning; gaming; interaction; e-health and pervasive healthcare; energy management; smart grids; internet of things; cognitive radio networks; computation; cloud computing; ubiquitous connectivity, and in mode general smart living, smart cities, Internet of Things and more. The series publishes a combination of expanded papers selected from hosted and sponsored European Alliance for Innovation (EAI) conferences that present cutting edge, global research as well as provide new perspectives on traditional related engineering fields. This content, complemented with open calls for contribution of book titles and individual chapters, together maintain Springer’s and EAI’s high standards of academic excellence. The audience for the books consists of researchers, industry professionals, advanced level students as well as practitioners in related fields of activity include information and communication specialists, security experts, economists, urban planners, doctors, and in general representatives in all those walks of life affected ad contributing to the information revolution. Indexing: This series is indexed in Scopus, Ei Compendex, and zbMATH. About EAI - EAI is a grassroots member organization initiated through cooperation between businesses, public, private and government organizations to address the global challenges of Europe’s future competitiveness and link the European Research community with its counterparts around the globe. EAI reaches out to hundreds of thousands of individual subscribers on all continents and collaborates with an institutional member base including Fortune 500 companies, government organizations, and educational institutions, provide a free research and innovation platform. Through its open free membership model EAI promotes a new research and innovation culture based on collaboration, connectivity and recognition of excellence by community.

Anand Nayyar • Anand Paul • Sudeep Tanwar Editors

The Fifth International Conference on Safety and Security with IoT SaSeIoT 2021

Editors Anand Nayyar School of Computer Science Duy Tan University Da Nang, Vietnam

Anand Paul The School of Computer Science and Engineering Kyungpook National University Daegu, South Korea

Sudeep Tanwar Computer Science and Engineering Nirma University Gujarat, India

ISSN 2522-8595 ISSN 2522-8609 (electronic) EAI/Springer Innovations in Communication and Computing ISBN 978-3-030-94284-7 ISBN 978-3-030-94285-4 (eBook) https://doi.org/10.1007/978-3-030-94285-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

We are delighted to introduce the proceedings of 2021 EAI 5th International Conference on Safety and Security in Internet of Things (SaSeIoT 2021). The conference has brought together researchers, developers, and practitioners from across the world who are designing and developing security solutions for the Internet of Things (IoT). The theme of the conference is inspired by issues concerning safety, privacy, and others connecting the IoT. The technical program of SaSeIoT 2021 consisted of 12 papers in oral presentation sessions in the conference. The two keynote speeches were by Dr. Fadi Al-Turjman from Near East University, Turkey, on Intelligence in the Internet of Things and by Dr. Noor Zaman Jhanjhi from Taylor’s University on the Internet of Things (IoT) for Smart Society 5.0. Coordination with the steering chairs, Imrich Chlamtac and Hakima Caouchi, was essential for the success of the conference. We sincerely appreciate their constant support and guidance. It was also a great pleasure to work with such an excellent organizing committee team for their hard work in organizing and supporting the conference, in particular the Technical Program Committee, led by our TPC Co-Chair, Dr. Anand Paul, who have completed the peer-review process of technical papers and made a high-quality technical program. We are also grateful to conference managers, Eliska Vlckova and Elena Davydova, for their support and all the authors who submitted their papers to the SaSeIoT 2021 conference. We strongly believe that SaSeIoT 2021 conference provides a good forum for all researchers, developers, and practitioners to discuss all diverse aspects that are relevant to the Internet of Things. We also expect that the future SaSeIoT conferences will be as successful and stimulating, as indicated by the contributions presented in this volume. Da Nang, Vietnam Daegu, South Korea Gujarat, India

Anand Nayyar Anand Paul Sudeep Tanwar

v

Conference Organization

Steering Committee Imrich Chlamtac, Bruno Kessler Professor, University of Trento, Italy Hakima Chaouchi, Institut Mines Telecom, Telecom SudParis

Organizing Committee General Chair Anand Nayyar, Duy Tan University, Da Nang, Vietnam Sudeep Tanwar, Nirma University, India General Co-Chairs Anand Paul, Kyungpook National University, Daegu, South Korea TPC Chair and Co-Chair Anand Paul, Kyungpook University, Korea Hakima Chaouchi, Institut Mines Telecom, Telecom SudParis, France Sponsorship and Exhibit Chair Noor Zaman, Taylor’s University, Malaysia Local Chair Anand Nayyar, Duy Tan University, Vietnam V. Ajantha Devi, AP3 Solutions, India Publicity and Social Media Chair Rajni Mohana, Jaypee University of Information Technology, India Workshop Chair V. Ajantha Devi, AP3 Solutions, India

vii

viii

Conference Organization

Publications Chair Anand Nayyar, Duy Tan University, Da Nang, Vietnam Sudeep Tanwar, Nirma University, India Anand Paul, Kyungpook National University, Daegu, South Korea Web Chair Mohammed Naved, Jagannath Institute of Management Sciences, India Arun Solanki, Gautam Buddha University, India Panels Chair Sudeep Tanwar, Nirma University, India Tutorials Chairs V. Ajantha Devi, AP3 Solutions, India

Technical Program Committee Keshav Sood, Deakin University, Australia Youyang Qu, Deakin University, Australia Adarsh Kumar, University of Petroleum and Energy Studies, India Noor Zaman, Taylor’s University, Malaysia Jafar Al Zubi, Al-Balqa Applied University, Jordan Rajalakshmi Krishnamurthi, Jaypee Institute of Information Technology, India Arun Solanki, Gautam Buddha University, India Howard Chuan Ming Liu, National University of Science and Technology, Taiwan Chetan R. Dudhagara, Anand Agricultural University, India Sandeep Kumar Poonia, Amity University, India Zorica Bogdanovi´c, University of Belgrade, Serbia Manuel Munier, University of Pau and Adour Countries, France Nathalie Mitton, Inria Lille-Nord Europe, France Mamoun Alazab, Charles Darwin University, Australia Muhammad Bilal, Hankuk University of Foreign Studies, South Korea Mohamed Abouhawwash, Michigan State University, USA Anastasius Moumtzoglou, Hellenic Society for Quality & Safety in Healthcare and P. & A. Kyriakou Children’s Hospital, Greece Nicolas Sklavos, University of Patras, Greece Ernesto Exposito, Université de Pau et des Pays de l’Adour

Contents

Opportunistic Multi-Modal User Authentication for Health-Tracking IoT Wearables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexa Muratyan, William Cheung, Sayanton V. Dibbo, and Sudip Vhaduri

1

On-Phone CNN Model-Based Implicit Authentication to Secure IoT Wearables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sayanton V. Dibbo, William Cheung, and Sudip Vhaduri

19

A Cybersecurity Guide for Using Fitness Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Bada and Basie von Solms An Efficient Algorithm for Human Abnormal Behaviour Detection Using Object Detection and Pose Estimation . . . . . . . . . . . . . . . . . . . . . Vaishnavi Narang and Arun Solanki

35

47

A Secure and Scalable IoT Consensus Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Beverley A. MacKenzie, Ian Ferguson, and Abdul Razaq

65

Session Key Agreement Protocol for Secure D2D Communication . . . . . . . . Vincent Omollo Nyangaresi and Zeyad Mohammad

81

What Do Your Smart Home Devices Reveal About You? . . . . . . . . . . . . . . . . . . . 101 Hima Boddupalli, Shivakant Mishra, and Mohammed Almutawa A Quantum-Resistant and Fast Secure Boot for IoT Devices Using Hash-Based Signatures and SRAM PUFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Roberto Román and Iluminada Baturone On the Analysis of MUD-Files’ Interactions, Conflicts, and Configuration Requirements Before Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Vafa Andalibi, Eliot Lear, DongInn Kim, and L. Jean Camp Natural Scenes’ Text Detection and Recognition Using CNN and Pytesseract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Ashee Mahajan, Anand Nayyar, Rachna Jain, and Preeti Nagrath ix

x

Contents

Assessing the Resistance of Internet of Things Applications Against Memory Corruption Attacks: A Case Study for Contiki and Tizen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Mohammad Basiri and Maryam Mouzarani IoT Geography Chain: Blockchain-Based Solution for Logistics Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Malik Junaid Jami Gul and Anand Paul Author Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

About the Editors

Anand Nayyar received Ph.D. (Computer Science) from Desh Bhagat University in 2017 in the area of wireless sensor networks and swarm intelligence. He is currently working in the School of Computer Science, Duy Tan University, Da Nang, Vietnam, as Assistant Professor, Scientist, Vice Chairman (Research), and Director of the IoT and Intelligent Systems Lab. He is a certified professional with 90+ professional certificates from CISCO, Microsoft, Oracle, Google, Beingcert, EXIN, GAQM, Cyberoam, and many more. He has published more than 125 research papers in various high-quality ISI-SCI/SCIE/SSCI impact factor journals cum Scopus journals; 50+ papers in international conferences indexed with Springer, IEEE Xplore, and ACM Digital Library; and 40+ book chapters in various Scopus journals and Web of Science indexed books with Springer, CRC Press, Elsevier, and many more with citations: 6000+ Citations, H-Index: 40 and I-Index: 145. He is a member of more than 50 associations as senior and life member including IEEE and ACM. He has authored/co-authored cum edited 30+ books of computer science. He is associated with more than 500 international conferences as program committee/chair/advisory board/review board member. He has 18 Australian Patents, 25 Indian Patents, 3 Indian Copyrights, 3 German Patents and 2 Canadian Copyrights to his credit in the area of wireless communications, artificial intelligence, cloud computing, the IoT, and image processing. He has been awarded 35+ awards for teaching and research—Young Scientist, Best Scientist, Young Researcher, Outstanding Researcher, Excellence in Teaching, and many more. He is acting as Associate Editor for Wireless Networks (Springer), Computer Communications (Elsevier), International Journal of Sensor Networks (IJSNET) (Inderscience), Frontiers in Computer Science, PeerJ Computer Science, Human Centric Computing and Information Sciences (HCIS), IET-Quantum Communications, IET Wireless Sensor Systems, IET Networks, IJDST, IJISP, IJCINI, and IJGC. He is acting as Editor in Chief of IGI-Global, USA, for a journal titled “International Journal of Smart Vehicles and Smart Transportation (IJSVST).” He has reviewed more than 1400 articles for various Web of Science indexed journals. He is currently researching in the area of wireless sensor networks, the IoT, swarm intelligence,

xi

xii

About the Editors

cloud computing, artificial intelligence, drones, blockchain, cyber security, network simulation, and wireless communications. Anand Paul (senior member, IEEE) received the Ph.D. degree in Electrical Engineering from the National Cheng Kung University, Tainan, Taiwan, in 2010. He is currently a Full-Time Professor with the School of Computer Science and Engineering, Kyungpook National University, Daegu, South Korea. He is a delegate representative of Korea for the M2M focus group and MPEG. His research interests include artificial intelligence, data science, and blockchain technology. Dr. Paul is an Associate Editor for the IET Wireless Systems, IEEE ACCESS, Cyber-Physical Systems (Taylor & Francis), International Journal of Interactive Multimedia and Artificial Intelligence, and Human Behavior and Emerging Technologies (Wiley). Sudeep Tanwar (senior member, IEEE) is currently working as a Full Professor with the Computer Science and Engineering Department, Institute of Technology, Nirma University, India. He is also a Visiting Professor with Jan Wyzykowski University, Polkowice, Poland, and the University of Pite¸sti in Pite¸sti, Romania. He received B.Tech. in 2002 from Kurukshetra University, India; M.Tech. (Honors) in 2009 from Guru Gobind Singh Indraprastha University, Delhi, India; and Ph.D. in 2016 with specialization in wireless sensor network. He has authored 2 books and edited 13 books and more than 250 technical articles, including in top journals and top conferences, such as IEEE Transactions on Network Science and Engineering, IEEE Transactions on Vehicular Technology, IEEE Transactions on Industrial Informatics, IEEE Wireless Communications, IEEE Networks, ICC, GLOBECOM, and INFOCOM. He initiated the research field of blockchain technology adoption in various verticals in 2017. His H-index is 50. He actively serves his research communities in various roles. His research interests include blockchain technology, wireless sensor networks, fog computing, smart grid, and the IoT. He was a Final Voting Member of the IEEE ComSoc Tactile Internet Committee in 2020. He is a senior member of IEEE; a member of CSI, IAENG, ISTE, and CSTA; and a member of the Technical Committee on Tactile Internet of IEEE Communication Society. He has been awarded the Best Research Paper Awards from IEEE IWCMC 2021, IEEE GLOBECOM 2018, IEEE ICC 2019, and Springer ICRIC 2019. He has served many international conferences as a member of the organizing committee, such as the Publication Chair for FTNCT 2020, ICCIC 2020, and WiMob 2019; a member of the Advisory Board for ICACCT 2021 and ICACI 2020; a Workshop Co-Chair for CIS 2021; and a General Chair for IC4S 2019 and 2020 and ICCSDF 2020. He is also serving the editorial boards of Computer Communications, International Journal of Communication System, and Security and Privacy. He is also leading the ST Research Laboratory, where group members are working on the latest cuttingedge technologies.

Opportunistic Multi-Modal User Authentication for Health-Tracking IoT Wearables Alexa Muratyan, William Cheung, Sayanton V. Dibbo, and Sudip Vhaduri

1 Introduction 1.1 Motivation With the explosion of the internet of things (IoT) and increased popularity of mobile networks [9], individuals’ personal information is further exposed through the internet and IoT-connected wearables and other gadgets. According to a survey taken by over one thousand internet users conducted in September of 2020, one in five users have experienced an online account being compromised, and 70% of users have mentioned the hurdle to remember over ten passwords [7]. Thereby, knowledge-based authentications, such as passwords, PINs, and pattern locks, are not satisfactory for non-stop seamless IoT authentication. Therefore, biometricbased user authentication techniques are preferable to handle these issues. However, most traditional biometrics, such as fingerprints [28], facial images [49], voice [22], breathing patterns [10], keystroke dynamics [24], and gait [16], are difficult, and nearly impossible, to adopt for tiny wearables with limited sensing and computing capabilities. While the IoT wearables are helping us with a wide range of services [6], these wearables also have the potential to authenticate a user implicitly and thereby secure the user’s access to other IoT objects and accounts seamlessly and continuously [52].

A. Muratyan · W. Cheung · S. Vhaduri () Purdue University, West Lafayette, IN, USA e-mail: [email protected]; [email protected]; [email protected] S. V. Dibbo Dartmouth College, Hanover, NH, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_1

1

2

A. Muratyan et al.

While researchers have been relying on various biometrics collected by wearables, such as gait and breathing patterns, they have their limitations [19, 27]. For example, models developed for gait-based authentication do not work when a user is sedentary [40]. Therefore, there is a need for an authentication approach that can work continuously without a need for user input. While various types of behavioral and physiological biometrics are already available in many market wearables, new types of data, such as oxygen saturation, collected continuously using the oxygen saturation (SpO2 ) sensors and represent the percentage of oxygen-saturated hemoglobin compared to the total amount of hemoglobin in the blood, is becoming available to market wearables [4]. This personalized data could be valuable to identify an individual and thereby could be useful for implicit user authentication. In addition to securing various IoT objects [15, 38–40, 44, 45], an implicit wearable user authentication can be applicable to many other IoT supported services/sectors, e.g., health monitoring [21, 32, 47], well-being tracking [34–36], stress monitoring [30], sleep quality improvement [11, 41], disease monitoring [29, 31, 48], and place discovery [33, 37, 42, 43, 46]. Thereby, it is essential to develop an implicit wearable user authentication system that can effortlessly validate a user’s identity and secure the user’s cyber-physical space in a non-stop fashion based on the data collected from the wearables.

1.2 Contributions The main contribution of this work is to include oxygen saturation data in the development of an implicit wearable user authentication mechanism by using the oxygen saturation on its own or in combination with other standard biometrics, such as heart rate, used by other researchers [44]. We first test the usefulness of oxygen saturation data by attempting to distinguish an individual from others using the Two-Sample T-tests (Sect. 3.3). We find that 92% of the subject-pairs are significantly distinguishable based on their average oxygen saturation. This shows a promise to use oxygen saturation data to develop user authentication models. Next, we develop an authentication model using only oxygen saturation data, followed by a model using oxygen saturation with heart rate data (Sect. 3.6). While using oxygen saturation and heart rate together, we observe an improvement in average authentication accuracy of 0.80 compared to their solo performance (oxygen saturation: 0.69 and heart rate: 0.70) (Sect. 4.5). This shows the potential to develop and deploy a new implicit wearable user authentication using oxygen saturation and heart rate data, which could be further enhanced with additional biometrics.

Opportunistic Multi-Modal IoT Auth

3

2 Related Work Compared to behavioral biometrics, such as gait, physiological biometrics, such as heart rate, is considered one of the most readily available biometric irrespective of the physical states, e.g., sedentary and non-sedentary states, of a person. Fortunately, most of the market wearables are already equipped with photoplethysmogram (PPG) sensors to collect heart rate biometrics. Thereby, researchers began to develop various PPG-based authentication models. Since solo PPG-based heart rate data-driven models can achieve an authentication success rate of around 0.90 [53], segregating data by activity helps to achieve an accuracy value around 0.95 [50]. But, the activity-based segregated models make it difficult to find models for all possible activities due to the wide range of activities and their skewed distribution throughout the day. In addition to PPG sensors, researchers have been also utilizing electroencephalography [51], electrooculography [25], electrocardiogram [26], and electromyography [20] sensors to collect various types of biometrics from dedicated or prototypical wearables, which are not available in commonly found market wearables. Thereby, the ultimate benefits of wearables cannot be fully utilized to develop an implicit and continuous IoT authentication. However, the recent inclusion of FDA-approved electrocardiogram (ECG) sensors to Apple, Fitbit, and Samsung smartwatches has brought new opportunities for wearable devices and their implicit user authentication systems [3, 5]. While the ECG sensors are innovative in providing more fine-grained user-specific information to boost the identification models further, these ECG sensors require a user to interact with the device, i.e., a user needs to complete an electric circuit to collect electrocardiogram data. Though experiments demonstrate newer and more powerful ECG sensors enable heart rate models to achieve around 0.99 accuracy [17], it will require more work to adapt for continuous and seamless authentication systems. While it is relatively easy to develop and deploy single biometric-based user identification models, they usually suffer from low-performance [23]. In addition to the low-performance of single biometric models, another flaw is that once the biometric is compromised, there is nothing the user can do to unlock the device. Thereby, the user will not be able to access any information from that device [12]. As a result, multi-biometric models are emerging to be the most robust authentication strategy. Often, combining multiple complementary biometrics can achieve optimal performance. While researchers have been using multi-model wearable authentication using a heart rate, gait, and breathing hierarchy and has a F1 score of 0.93, [13, 14], considering SpO2 as an easily obtainable user-specific data could improve the performance of user authentication models. Currently, SpO2 is used in a multi-factor fingerprint authentication system. After the system performs a valid fingerprint check, it checks whether the heart rate and SpO2 levels are at human levels preventing some spoof attacks [18]. But, that work does not utilize the capability of SpO2 to uniquely identify an individual, which could improve multi-biometric user authentication systems’ performance.

4

A. Muratyan et al.

3 Approach In this paper, we intend to demonstrate the importance and effectiveness of heart rate (H R) and oxygen saturation (SpO2 ) data to identify wearable device users with the help of different machine learning models. Before we present the detailed analysis, we first introduce the datasets, pre-processing steps, usefulness of SpO2 data, feature engineering, and methods used in this work.

3.1 Wellue Dataset We use the Wellue SleepU wrist-worn oxygen monitor to collect oxygen saturation (SpO2 ) values and heart rate (H R) data. The data is gathered at a rate of one sample every four seconds. We collect data from 25 healthy subjects with average age 37 ± 20.3 years. Each subject wears the device continuously for 8 hours during his/her normal daily activity. The SpO2 and H R data are collected through the device’s finger pulse oximeter sensor [2], stored locally in the wearable, and later transferred to a laptop using a USB connector.

3.2 Data Pre-processing Due to the device’s extended wear time, there are missing entries where the sensor failed to record information. We define those missing entries as invalid data. To ensure our computations are accurate, we first clean the raw data to remove any invalid blocks. Once the data is clean and all invalid data is removed, we segment the oxygen saturation data and its corresponding heart rate data into five different zones based on each subjects’ demographics and maximum heart rate. Each zone can be categorized into a specific level of physical activity, ranging from very light to maximum activity, defined in Table 1. In Sect. 3.3, we perform statistical tests to investigate whether the oxygen saturation data at different heart rate zones can be used to distinguish individuals. Finally, we segment the continuous stream of heart rate and oxygen saturation information using fixed 40-second non-overlapping windows (i.e., 10-sample windows with one sample recorded every 4 seconds) to compute different types of Table 1 Heart rate zones [1]

H R zones 1 (very light) 2 (light) 3 (moderate) 4 (intense) 5 (very intense)

Ranges 50–60% of max H R of an individual 60–70% of max H R of an individual 70–80% of max H R of an individual 80–90% of max H R of an individual 90–100% of max H R of an individual

Opportunistic Multi-Modal IoT Auth

5

statistical features to develop user authentication models (Sect. 3.4). In a window of continuous samples, we calculate a representative heart rate zone based on the majority voted zones of all samples in the window. We use this reference point, i.e., representative heart rate zone of a window, as an additional feature to our sets of statistical features.

3.3 Usefulness of Blood Oxygen-Level Data To determine the usefulness of the SpO2 data, we use the Two-Sample T-tests in two-ways: comparing one subject with the other 24 subjects and comparing pairs of subjects. In both cases, we compare the average blood oxygen saturation values of two groups at a specific heart rate zone with a null hypothesis: “average oxygen saturation values of two groups are the same at a specific heart rate zone.” In the case of a rejection, we conclude that the average oxygen saturation values of the two groups are different at that specific heart rate zone, i.e., the reference point. Next, we aggregate the results obtained from all possible comparisons and normalize the rejection counts by the total number of comparisons to get an idea about the goodness (the higher, the better) of oxygen saturation to use as a person identification metric. In Fig. 1, we present our aggregated findings while comparing one subject to the other 24 subjects using the Two-Sample T-tests across the five heart rate zones. In the

Fig. 1 T-test summary while distinguishing one subject from the rest of the 24 subjects based on SpO2 data obtained at a specific heart rate zone

6

A. Muratyan et al.

figure, we observe that from moderate to highly intensive heart rate zones, around 90% of the cases we reject the null hypothesis. However, lightly active heart rate zones have lower rejection percentages. Overall, we obtain an average rejection rate of 82% across all five heart rate zones while comparing one subject with the other 24 subjects at a specific zone. However, while comparing pairs of subjects and their average oxygen saturation values at a specific heart rate zone, we obtain an average rejection rate of 92%. These high rejection rates show a promise to use the oxygen saturation data to distinguish individuals.

3.4 Feature Computation We compute the following sets of candidate features. – Heart rate features: From each 10 sample (i.e., 40-second) window we compute 21 statistical features. They are mean, median, standard deviation, variance, coefficient of variance, range, coefficient of range, first quartile or 25th percentile, third quartile or 75th percentile, maximum, interquartile range, coefficient of interquartile, mean absolute deviation, median absolute deviation, energy, power, root mean square, root sum of squares, signal to noise ratio, skewness, and kurtosis. – Oxygen saturation features: From each 10 sample (i.e., 40-second) window we compute the same 21 statistical features as listed in heart rate features. Thereby, we obtain 21 statistical heart rate features, 21 statistical oxygen saturation features, and the window-level representative heart rate zone, i.e., the reference points, as an additional feature. Therefore, when we develop models from either heart rate or oxygen saturation data, we have 22 features in total. However, while using heart rate and oxygen saturation together, we have a total of 43 features.

3.5 Feature Selection To identify the most influential features while training the binary classifiers, we use a two-level approach. In the first level, we remove highly correlated features. When a pair of features have a correlation value of higher than 0.9, we drop one of them; this way, we end up with a set of uncorrelated features. We use these uncorrelated features in our second level. In the second level, we try two different approaches using the sci-kit learn package: Principal Component Analysis (PCA) and Select the K-Best (SelectKBest). PCA is a linear transformation algorithm, which is also a dimensionalityreduction method. Dimensionality is reduced by transforming a large set of variables into smaller ones. The entire feature set becomes condensed into vectors that best

Opportunistic Multi-Modal IoT Auth

7

represent the data. SelectKBest is a form of univariate feature selection which works by selecting the best features based on univariate statistical tests. SelectKBest removes all but the K highest scoring features. After analyzing both techniques, we find that PCA-based feature selection outperforms the SelectKBest-based features while keeping the feature count fixed. Once PCA was chosen, we test for the optimal feature count. We discovered 31 is an optimal feature count while using heart rate and oxygen data together, compared to 21 for the models using either heart rate or oxygen saturation data (details can be found in Sect. 4.4). In the case of unary classifiers, the feature selection process is based on a different two-level approach. Similar to the binary classifiers, the first layer is passed through a correlation check with the same 0.9 correlation value used as a filtering threshold. Then, the second level focuses on selecting the features with lowest variance among the training set as influential features. Similar to binary models, we find 31 as an optimal feature count while using heart rate and oxygen data together and 21 as optimal counts while using two types of data separately (details can be found in Sect. 4.4).

3.6 Methods Based on the combination of the two types of data, i.e., oxygen saturation (SpO2 ) and heart rate (H R) that we use to develop our authentication approaches, we define the following models: – Heart rate data-driven model (H R model) – Oxygen saturation data-driven model (SpO2 model) – Heart rate and oxygen saturation data-driven model (H RSpO2 model) While developing the above models, we consider various classifiers, including random forest (RF), the k-nearest neighbor (k-NN), naive bayes (NB), and support vector machine (SVM) with binary and unary schemes. Compared to binary, unary models are available only for the SVM classifiers with radial basis function (RBF) and polynomial (Poly.) kernels.

4 User Authentication Before presenting detailed evaluation of our models, we first present a list of performance measures, followed by training-testing set split, hyper-parameter optimization, and selection of an optimal number of features.

8

A. Muratyan et al.

4.1 Performance Measures To evaluate the performance of different modeling approaches, we consider the following measures: Accuracy (ACC), which is the fraction of predictions that are correct, i.e., TP +TN T P + FN + FP + T N

ACC =

(1)

Root Mean Square Error (RMSE), which is the square root of the sum of squares of the deviation from the prediction to the actual value. It is equivalent to the square root of the rate of misclassification, i.e.,  RMSE =

FP + FN T P + FN + FP + T N

(2)

Genuine Rejection Rate (GRR), which is the fraction of invalid users rejected by an authentication system, or one minus the False Acceptance Rate (FAR), i.e.: GRR =

TN = 1 − F AR FP + T N

(3)

Genuine Acceptance Rate (GAR), which is the fraction of valid users accepted by an authentication system or one minus the False Rejection Rate (FRR), i.e.: GAR =

TP = 1 − F RR T P + FN

(4)

F1 Score, which is the measure of performance of an authentication system based on both its precision (positive predictive value) and recall (true positive rate) measures, i.e.: 

TP TP F1 Score = 2 + T P + FN T P + FP

−1 (5)

Area Under the Curve—Receiver Operating Characteristic (AUC-ROC), which is the graphical relationship between FAR and FRR with the change of thresholds. Where terminologies used in Eqs. 1, 2, 3, 4, and 5 have their usual meaning in machine learning, when classifying a subject using a feature set. Therefore, a desirable authentication system should have lower negative measures (i.e., RMSE, FAR, and FRR), but higher positive measures (i.e., ACC, F1 Score, GRR, GAR, and AUC-ROC) of performance. Area, which is the collective measure of performance based on accuracy, genuine rejection rate, genuine acceptance rate, F1 score, and AUC-ROC measures. With each value ranging from zero to one, the area is defined as the pentagon shape of

Opportunistic Multi-Modal IoT Auth

9

the connected performances for the binary cases and square shape (excluding AUCROC) for the unary case. The area computation of any classifier is normalized by dividing by the equivalent area in which there are perfect scores in all attributes, see Figs. 4 and 5. This allows for classifiers to be comparable even between models, even among binary and unary models.

4.2 Training-Testing Set In our binary model, we try to distinguish a valid user from the imposters. During the training-testing procedure, we follow the one-valid-user strategy. This means we train and test N unique models one-by-one, where in each iteration one of the N subjects is treated as the valid user. During each round of training-testing, use a 90–10% train-test split with the imposter set having equal composition of the N-1 subject’s data. The split is in sequential order which means the test data is the 10% that occurred at the end of the data collection. This best simulates a real use case scenario, as data is received by the model post training. Each subject is used once as a valid user’s data with 25 iterations of training and testing in total.

4.3 Hyper-Parameter Optimization We use the Sci-kit Learn library grid search package to find the most favorable hyper-parameter sets. The hyper-parameter optimization is performed separately for each iteration, and tested using various ranges of values. The different iterations of this approach resulted in similar values for the hyper-parameters, which are presented in Tables 2, 3, and 4.

4.4 Optimal Feature Count Determination Before the feature selection process for the unary and binary models are executed, we first test the models through various feature count amounts. It is necessary to identify the most suitable feature count to avoid overfitting and underfitting. Overfitting is the case where the organization of the model is unreliable due to the fact that the model is learning too much information from the training data. Underfitting is the opposite, meaning the model learns too little information from the training data. To recognize the optimal count, we run the top performing binary model, RF, and top performing unary model, SVM RBF, through feature counts of 11, 21, 31, and 41. These specific numbers are due to the fact that we want to add the HR zones feature to the 10, 20, 30, and 40 features we found to be optimal. Figure 2a shows the

10

A. Muratyan et al.

Table 2 H R models with average and standard deviation of performance measures BINARY model Classifier (parameters) RF (n estimators = 50) k-NN (k = 5, Minkowski distance) NB SVM (RBF kernel, γ = 0.05, C = 5) SVM (Poly. kernel, d = 3, C = 12) UNARY model SVM (RBF kernel, γ = 0.05, nu = 0.5) SVM (Poly. kernel, d = 1, nu = 0.5)

FC ACC 21 0.70 (0.14) 21 0.69 (0.14) 21 0.64 (0.12) 21 0.70 (0.14) 21 0.67 (0.10)

RMSE 0.04 (0.01) 0.04 (0.01) 0.04 (0.01) 0.04 (0.01) 0.04 (0.01)

GRR 0.79 (0.21) 0.71 (0.21) 0.79 (0.18) 0.79 (0.17) 0.80 (0.15)

GAR 0.68 (0.16) 0.67 (0.12) 0.40 (0.23) 0.61 (0.16) 0.51 (0.19)

F1 score 0.68 (0.14) 0.68 (0.13) 0.48 (0.20) 0.66 (0.16) 0.59 (0.15)

AUC-ROC 0.82 (0.10) 0.69 (0.14) 0.64 (0.12) 0.69 (0.15) 0.67 (0.10)

21

0.05 (0.01) 0.05 (0.01)

0.72 (0.24) 0.67 (0.28)

0.42 (0.24) 0.32 (0.25)

0.31 (0.18) 0.29 (0.17)

N/A

0.28

N/A

0.22

21

0.58 (0.10) 0.51 (0.07)

Area 0.52 0.48 0.38 0.48 0.43

preliminary analysis of performance of the binary H RSpO2 across the mentioned feature counts. There is a considerable performance jump of eight percentage points in accuracy from 21 to 31 features, indicating increased learning. However, from 31 to 41 features there is not much gain in performance, which means it might start to over fit the data. Figure 2b shows the same analysis for unary models; we came to similar conclusions as we did for the binary case. For both the unary and binary models, 31 features are the optimal count. In the case of single biometric models, they do not have 31 features, so 21 are used.

4.5 Authentication Model Evaluation In Tables 2, 3, and 4, we present the performance of the models using different biometric combinations and various classifiers. Heart rate data is commonly used in authentication models, so it is therefore used as the base metric in attempting to authenticate the user. Starting with the heart rate data-driven model (H R model), displayed in Table 2, we observe that the best classifier for the binary H R model is RF, as it provides an average ACC of 0.70 and AUC-ROC of 0.82. Compared to the binary, the H R model’s unary classifiers present lower results, with SVM RBF having an average ACC of 0.58. This is foreseeable as the unary is not exposed to as much imposter training as the binary model is. To better the outcome of the H R model performance, which will implement the most accurate user authentication, we

Opportunistic Multi-Modal IoT Auth

11

Table 3 SpO2 models with average and standard deviation of performance measures BINARY model Classifier (parameters) RF (n estimators = 50) k-NN (k = 5, Minkowski distance) NB SVM (RBF kernel, γ = 0.05, C = 5) SVM (Poly. kernel, d = 3, C = 14) UNARY model SVM (RBF kernel, γ = 0.05, nu = 0.5) SVM (Poly. kernel, d = 2, nu = 0.05)

FC ACC 21 0.69 (0.11) 21 0.64 (0.10) 21 0.61 (0.08) 21 0.66 (0.10) 21 0.61 (0.08)

RMSE 0.04 (0.01) 0.04 (0.01) 0.04 (0.01) 0.04 (0.01) 0.04 (0.01)

GRR 0.70 (0.25) 0.61 (0.18) 0.73 (0.19) 0.67 (0.18) 0.74 (0.17)

GAR 0.70 (0.19) 0.67 (0.16) 0.50 (0.16) 0.64 (0.10) 0.49 (0.21)

F1 score 0.69 (0.12) 0.64 (0.12) 0.55 (0.12) 0.65 (0.07) 0.55 (0.15)

AUC-ROC 0.63 (0.11) 0.61 (0.10) 0.61 (0.08) 0.63 (0.08) 0.61 (0.08)

21

0.02 (0.00) 0.02 (0.00)

0.50 (0.27) 0.49 (0.32)

0.37 (0.25) 0.37 (0.27)

0.42 (0.16) 0.36 (0.15)

N/A

0.22

N/A

0.18

21

0.47 (0.08) 0.47 (0.09)

Area 0.44 0.37 0.36 0.40 0.36

Table 4 H RSpO2 models with average and standard deviation of performance measures BINARY model Classifier (parameters) RF (n estimators = 50) k-NN (k = 2, Minkowski distance) NB SVM (RBF kernel, γ = 0.08, C = 3) SVM (Poly. kernel, d = 4, C = 16) UNARY model SVM (RBF kernel, γ = 0.05, nu = 0.5) SVM (Poly. kernel, d = 1, nu = 0.75)

FC ACC 31 0.80 (0.09) 31 0.75 (0.01) 31 0.66 (0.07) 31 0.72 (0.13) 31 0.69 (0.10)

RMSE 0.04 (0.01) 0.04 (0.01) 0.05 (0.01) 0.04 (0.01) 0.04 (0.01)

GRR 0.79 (0.14) 0.73 (0.17) 0.84 (0.16) 0.74 (0.17) 0.77 (0.16)

GAR 0.77 (0.12) 0.68 (0.11) 0.42 (0.16) 0.71 (0.14) 0.62 (0.13)

F1 score 0.78 (0.09) 0.71 (0.09) 0.49 (0.12) 0.72 (0.13) 0.67 (0.11)

AUC-ROC 0.78 (0.10) 0.69 (0.11) 0.61 (0.08) 0.71 (0.13) 0.69 (0.11)

31

0.05 (0.01) 0.06 (0.01)

0.69 (0.27) 0.58 (0.32)

0.51 (0.25) 0.28 (0.22)

0.53 (0.16) 0.29 (0.17)

N/A

0.37

N/A

0.19

31

0.63 (0.08) 0.45 (0.10)

Area 0.61 0.49 0.38 0.52 0.50

will add the oxygen saturation (SpO2 ) data to the model. Combining the datasets will create a heart rate and oxygen saturation data-driven model (H RSpO2 ). In Table 3, we observe that the top performing binary classifier, RF of the SpO2 model, performs well, but in some cases is worse than the binary RF classifier in the H R model (Table 2). In comparison to the H R model, the SpO2 model is 1% less

12

A. Muratyan et al.

Fig. 2 Performance over feature count. (a) Binary RF. (b) Unary SVM RBF

accurate, GRR is reduced by 11%, and AUC-ROC is reduced by 23%. However, GAR and F1 score are higher in the SpO2 , as GAR is 3% higher and F1 score is 1.5% higher in reference to H R. As for the unary classifiers, the models both perform best with SVM RBF over SVM poly. Again, the H R model outperforms the SpO2 model with 23% higher accuracy, 44% higher GRR, and 13% higher GAR. Although it may seem like the SpO2 data is not useful, as it is less accurate than H R on its own, the SpO2 compliments the H R data so that when combined, the heart rate and oxygen saturation data-driven model performs the best. In Table 4, we observe that when adding the oxygen saturation data to the existing heart rate data, performance measures improve significantly. In the case of the best binary classifier (RF), we observe that the top H RSpO2 model is 15% more accurate and has a 13% higher F1 score compared to the best SpO2 model. Similarly, the best H RSpO2 model is 14% more accurate and has a 15% higher F1 score, 3% higher GAR, and 17% higher area when compared with the H R model. These are substantial improvements made from the best H R model. Similar to the binary, the unary H RSpO2 model shows promise over the unary H R model. The SVM RBF classifier has a 9% increase in ACC, a 21% increase in GAR, and a 71% rise in the F1 score. These, too, are significant results in comparison to the unary H R classifiers. In Fig. 3a, we present five summarized values of different performance measures of the RF binary model. We observe that the median of each box plot is very similar to its average, which indicates there are not many outliers in the data. Additionally, we obtain tight interquartile ranges of about 0.15 for each of the performance measures, representing the consistency of the outcome. In the case of the unary SVM RBF model, shown in Fig. 3b, interquartile ranges differ for each performance measure. ACC has a narrow interquartile range of 0.09, while GRR, GAR, and F 1 score have a range of about 0.2. The performance metrics are also presented in Figs. 4 and 5, through the form of spider graphs of the binary and unary models. Here we can visually understand the trade-offs of the various performances when choosing one algorithm over another in

Opportunistic Multi-Modal IoT Auth

13

Fig. 3 Boxplots of different performance measures of the best binary and unary models. (a) H RSpO2 model with binary RF classifier. (b) H RSpO2 model with unary SVM RBF classifier

Fig. 4 Spider plot of the five binary model measures of performance

Fig. 5 Spider plot of the three unary model measures of performance

each model. In Tables 5, 6, and 7, we summarize GRR, GAR, and Area to be used in the plot. GAR and GRR have meaning in security and represent the strength of the authentication system and therefore, are viewed as key measures. More specifically, GAR% tracks the accuracy of identifying if the valid user is actually valid. GRR% similarly tracks the accuracy of identifying if the invalid user is actually invalid. Although, we choose area as an overall metric to select the best model in certain cases GAR may hold higher significance. In Table 5 we can see that RF has the best overall performance with a minimal respective loss of 1.25% in GRR to NB. However, Random Forest makes up for it in terms of usability with 0.28 points higher in GAR. Oxygen saturation classifiers

14

A. Muratyan et al.

Table 5 Relative performance-loss of H R models for a particular performance measure with respect to the best value of that measure. Negative signs are used to indicate a loss BINARY model Classifier (parameters) RF (n estimators = 150) k-NN (k = 5, Minkowski distance) NB SVM (RBF kernel,γ = 0.05, C = 5) SVM (Poly. kernel, d = 3, C = 14) UNARY model SVM (RBF kernel, γ = 0.05, nu = 0.5) SVM (Poly. kernel, d = 1, nu = 0.5)

GRR 0.79 (−1.25%) 0.71 (−11.25%) 0.79 (−1.25%) 0.79 (−1.25%) 0.80 (0.00%)

GAR 0.68 (0.00%) 0.67 (−1.47%) 0.40 (−41.18%) 0.61 (−10.29%) 0.51 (−25.0%)

Area 0.52 (0.00%) 0.48 (−7.69%) 0.38 (−26.92%) 0.48 (−7.69%) 0.43 (−17.31%)

0.72 (0.00%) 0.67 (−6.94%)

0.42 (0.00%) 0.28 (0.00%) 0.32 (−23.81%) 0.22 (−21.43%)

Table 6 Relative performance-loss of SpO2 models for a particular performance measure with respect to the best value of that measure. Negative signs are used to indicate a loss BINARY model Classifier (parameters) RF (n estimators = 50) k-NN (k = 9, Minkowski distance) NB SVM (RBF kernel,γ = 0.05, C = 5) SVM (Poly. kernel, d = 3, C = 14) UNARY model SVM (RBF kernel, γ = 0.05, nu = 0.5) SVM (Poly. kernel, d = 1, nu = 0.5)

GRR 0.70 (−5.41%) 0.61 (−17.57%) 0.73 (−1.35%) 0.67 (−9.46%) 0.74 (0.00%)

GAR 0.70 (0.00%) 0.67 (−4.29%) 0.50 (−28.57%) 0.64 (−8.57%) 0.49 (−30.0%)

Area 0.44 (0.00%) 0.37 (−15.91%) 0.36 (−18.18%) 0.40 (−9.09%) 0.36 (−18.18%)

0.50 (0.00%) 0.49 (−2.00%)

0.37 (0.00%) 0.37 (0.00%)

0.22 (0.00%) 0.18 (−18.18%)

Table 7 Relative performance-loss of H RSpO2 models for a particular performance measure with respect to the best value of that measure. Negative signs are used to indicate a loss BINARY model Classifier (parameters) RF (n estimators = 50) k-NN (k = 2, Minkowski distance) NB SVM (RBF kernel,γ = 0.05, C = 4) SVM (Poly. kernel, d = 3, C = 16) UNARY model SVM (RBF kernel, γ = 0.05, nu = 0.5) SVM (Poly. kernel, d = 1, nu = 0.75)

GRR 0.79 (−5.95%) 0.73 (−13.10%) 0.84 (0.00%) 0.74 (−11.90%) 0.77 (−8.33%)

GAR 0.77 (0.00%) 0.68 (−11.69%) 0.42 (−45.45%) 0.71 (−7.79%) 0.62 (−19.48%)

Area 0.61 (0.00%) 0.49 (−19.67%) 0.38 (−37.70%) 0.52 (−14.75%) 0.50 (−18.03%)

0.69 (0.00%) 0.51 (0.00%) 0.37 (0.00%) 0.58 (−15.94%) 0.28 (−45.10%) 0.19 (−48.65%)

show a similar story in Table 6. RF is the top classifier, with a respective loss of 5.41% in GRR and no respective loss for the other metrics, indicating it is the top performer of the set. The most interesting point to notice is that once again, in H RSpO2 , RF outperforms other classifiers, as seen in Table 7. This solidifies the

Opportunistic Multi-Modal IoT Auth

15

strength of the RF classifier as well as demonstrates the consistency between the models. We see the graphical representation of all the trade-offs in Figs. 4 and 5. Even at a quick glance, it is evident that the classifiers of the binary H RSpO2 model, Fig. 4, cover the majority of the area of the plot, which indicate that it is the optimal model. Choosing this algorithm is the obvious move, being that RF is the top performing classifier and has no respective loss for GAR or area. Now shifting to the unary plots, Fig. 5, we observe that the SVM RBF classifier of the H RSpO2 model covers the greatest area. It is also important to note that of the Tables and Spider plot figures, the binary classifiers perform almost twice as well as the unary classifiers.

5 Conclusion and Future Work To the best of our knowledge, this is the first attempt to use blood oxygen saturation value to identify an individual while developing an implicit and continuous wearable device user authentication system. From our detailed analysis, we observe that oxygen saturation alone can provide around 0.69 average authentication accuracy using a random forest (RF) classification model. However, when combining with heart rate we can obtain around 0.80 authentication accuracy, i.e., 15% improvement. Interestingly, we also find that the heart rate alone provides a lower average authentication accuracy of 0.70, compared to the combined metrics. This shows the promise to develop a multi-model approach to authenticate a wearable device user implicitly and continuously using traditional heart rate biometric along with opportunistic oxygen saturation data. As IoT technology continues to grow, more sensors are becoming available in common market wearables, such as accelerometers and photoplethysmogram (PPG) sensors [8]. Thereby, the SpO2 is an opportunity to be used with other standard biometrics to increase the robustness and accuracy of multi-model biometric authentication systems. As mentioned before, some biometrics such as gait and breathing sounds are not always available. Compared to them, oxygen saturation is constantly available and provides a foundation piece for a multi-biometric scheme. However, this requires further long-term careful study with diverse sets of subjects over an extended period. We will also develop a multi-device authentication scheme, which will bring an opportunity to make more robust authentication scheme due to the increased popularity of IoT-based interconnected environments [44]. For example, an authentication mechanism can be developed by doing a fusion of Wellue data with similar or different types of data obtained from other wearables, such as Fitbit. In the future, we will investigate this multi-device implicit wearable user authentication scheme.

16

A. Muratyan et al.

References 1. Heart Rate Zones|The Basics. https://www.polar.com/blog/running-heart-rate-zones-basics/ (2019). Accessed September 2019 2. Wellue SleepU. https://getwellue.com/pages/sleepu-oxygen-monitor/ (2019). Accessed September 2019 3. 8 best ecg smartwatches & devices of 2020. https://rb.gy/suexlz (2020). Accessed November 2020 4. FDA regulations block usage of a feature in apple watches that would help millions of users monitor their blood-oxygen levels. https://rb.gy/rwsitw (2020). Accessed November 2020 5. Fitbit’s ECG app gets FDA nod to track heart rhythm irregularities. https://rb.gy/zndx18 (2020). Accessed November 2020 6. Global IoT-enabled industrial wearables market, 2020–2024. https://rb.gy/pkmt1c (2020). Accessed November 2020 7. Uncovering password habits: Are users’ password security habits improving? https://rb.gy/ viu8p5 (2020). Accessed November 2020 8. Quite simply the most advanced PPG sensors for wearables & hearables available today. https://valencell.com/ppgsensors/ (2021). Accessed January 2021 9. Al Amin, M.T., Barua, S., Vhaduri, S., Rahman, A.: Load aware broadcast in mobile ad hoc networks. In: 2009 IEEE International Conference on Communications. pp. 1–5. IEEE, Piscataway (2009) 10. Chauhan, J., Seneviratne, S., Hu, Y., Misra, A., Seneviratne, A., Lee, Y.: Breathing-based authentication on resource-constrained IoT devices using recurrent neural networks. Computer 51(5), 60–67 (2018) 11. Chen, C.Y., Vhaduri, S., Poellabauer, C.: Estimating sleep duration from temporal factors, daily activities, and smartphone use. In: 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 545–554. IEEE, Picataway (2020) 12. Chen, Y., Yang, Z., Abbou, R., Lopes, P., Zhao, B.Y., Zheng, H.: User authentication via electrical muscle stimulation. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (2021) 13. Cheung, W., Vhaduri, S.: Context-dependent implicit authentication for wearable device users. In: IEEE Personal, Indoor and Mobile Radio Communications (PIMRC) (2020) 14. Cheung, W., Vhaduri, S.: Continuous authentication of wearable device users from heart rate, gait, and breathing data. In: IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob) (2020) 15. Dibbo, S.V., Vhaduri, S., Cheung, W.: On-phone CNN model-based implicit authentication to secure IoT wearables. In: the 5th EAI International Conference on Safety and Security with IoT (SaSeIoT 2021) (2023) 16. Gafurov, D., Snekkenes, E., Bours, P.: Gait authentication and identification using wearable accelerometer sensor. In: 2007 IEEE workshop on automatic identification advanced technologies, pp. 220–225. IEEE, Piscataway (2007) 17. Hammad, M., Pławiak, P., Wang, K., Acharya, U.R.: ResNet-attention model for human authentication using ECG signals. Expert Syst. 38, e12547 (2020) 18. Henderson, L.: Multi-Factor Authentication Fingerprinting Device Using Biometrics. Ph.D. Thesis, Villanova University (2019) 19. Khan, H., Atwater, A., Hengartner, U.: Itus: an implicit authentication framework for android. In: Proceedings of the 20th annual international conference on Mobile computing and networking, pp. 507–518. ACM, New York (2014) 20. Kim, J.S., Pan, S.B., et al.: A study on EMG-based biometrics. J. Int. Serv. Inf. Secur. 7(2), 19–31 (2017) 21. Kim, Y., Vhaduri, S., Poellabauer, C.: Understanding college students’ phone call behaviors towards a sustainable mobile health and wellbeing solution. In: International Conference on Systems Engineering (CIIS) (2020)

Opportunistic Multi-Modal IoT Auth

17

22. Kounoudes, A., Kekatos, V., Mavromoustakos, S.: Voice biometric authentication for enhancing internet service security. In: 2006 2nd International Conference on Information & Communication Technologies, vol. 1, pp. 1020–1025. IEEE, Piscataway (2006) 23. Lai, L., Ho, S.W., Poor, H.V.: Privacy–security trade-offs in biometric security systems—part i: Single use case. IEEE Trans. Inf. Foren. Sec. 6(1), 122–139 (2010) 24. Monrose, F., Rubin, A.: Authentication via keystroke dynamics. In: Proceedings of the 4th ACM Conference on Computer and Communications Security, pp. 48–56 (1997) 25. Pal, A., Gautam, A.K., Singh, Y.N.: Evaluation of bioelectric signals for human recognition. Procedia Comp. Sci. 48, 746–752 (2015) 26. Singh, Y.N., Singh, S.K.: Evaluation of electrocardiogram for biometric authentication (2011) 27. Sun, F., Mao, C., Fan, X., Li, Y.: Accelerometer-based speed-adaptive gait authentication method for wearable IoT devices. IEEE Int. Things J. 6(1), 820–830 (2018) 28. Tan, T.N., Lee, H.: High-secure fingerprint authentication system using ring-lwe cryptography. IEEE Access 7, 23379–23387 (2019) 29. Vhaduri, S.: Nocturnal cough and snore detection using smartphones in presence of multiple background-noises. In: Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies, pp. 174–186 (2020) 30. Vhaduri, S., Ali, A., Sharmin, M., Hovsepian, K., Kumar, S.: Estimating drivers’ stress from GPS traces. In: Proceedings of the 6th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 1–8 (2014) 31. Vhaduri, S., Brunschwiler, T.: Towards automatic cough and snore detection. In: 2019 IEEE International Conference on Healthcare Informatics (ICHI), pp. 1–1. IEEE, Piscataway (2019) 32. Vhaduri, S., Munch, A., Poellabauer, C.: Assessing health trends of college students using smartphones. In: 2016 IEEE Healthcare Innovation Point-Of-Care Technologies Conference (HI-POCT), pp. 70–73. IEEE, Piscataway (2016) 33. Vhaduri, S., Poellabauer, C.: Cooperative discovery of personal places from location traces. In: 25th International Conference on Computer Communication and Networks (ICCCN). IEEE, Piscataway (2016) 34. Vhaduri, S., Poellabauer, C.: Design and implementation of a remotely configurable and manageable well-being study. In: Smart City 360, pp. 179–191. Springer, Berlin (2016) 35. Vhaduri, S., Poellabauer, C.: Human factors in the design of longitudinal smartphone-based wellness surveys. In: 2016 IEEE International Conference on Healthcare Informatics (ICHI), pp. 156–167. IEEE, Piscataway (2016) 36. Vhaduri, S., Poellabauer, C.: Design factors of longitudinal smartphone-based health surveys. J. Healthcare Inf. Res. 1(1), 52–91 (2017) 37. Vhaduri, S., Poellabauer, C.: Hierarchical cooperative discovery of personal places from location traces. IEEE Trans. Mobile Comput. 17(8), 1865–1878 (2017) 38. Vhaduri, S., Poellabauer, C.: Towards reliable wearable-user identification. In: 2017 IEEE International Conference on Healthcare Informatics (ICHI), pp. 329–329. IEEE Computer Society, Washington (2017) 39. Vhaduri, S., Poellabauer, C.: Wearable device user authentication using physiological and behavioral metrics. In: IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC) (2017) 40. Vhaduri, S., Poellabauer, C.: Biometric-based wearable user authentication during sedentary and non-sedentary periods. In: International Workshop on Security and Privacy for the Internetof-Things (2018) 41. Vhaduri, S., Poellabauer, C.: Impact of different pre-sleep phone use patterns on sleep quality. In: IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks (BSN) (2018) 42. Vhaduri, S., Poellabauer, C.: Opportunistic discovery of personal places using multi-source sensor data. IEEE Transactions on Big Data 7(2) 383–396 (2018) 43. Vhaduri, S., Poellabauer, C.: Opportunistic discovery of personal places using smartphone and fitness tracker data. In: 2018 IEEE International Conference on Healthcare Informatics (ICHI), pp. 103–114. IEEE, Piscataway (2018)

18

A. Muratyan et al.

44. Vhaduri, S., Poellabauer, C.: Multi-modal biometric-based implicit authentication of wearable device users. IEEE Trans. Inf. Forens. Sec. 14(12), 3116–3125 (2019) 45. Vhaduri, S., Poellabauer, C.: Summary: Multi-modal biometric-based implicit authentication of wearable device users (2019). Preprint arXiv:1907.06563 46. Vhaduri, S., Poellabauer, C., Striegel, A., Lizardo, O., Hachen, D.: Discovering places of interest using sensor data from smartphones and wearables. In: 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing (UIC) (2017) 47. Vhaduri, S., Prioleau, T.: Adherence to personal health devices: A case study in diabetes management. In: EAI International Conference on Pervasive Computing Technologies for Healthcare (EAI PervasiveHealth) (2020) 48. Vhaduri, S., Van Kessel, T., Ko, B., Wood, D., Wang, S., Brunschwiler, T.: Nocturnal cough and snore detection in noisy environments using smartphone-microphones. In: 2019 IEEE International Conference on Healthcare Informatics (ICHI), pp. 1–7. IEEE, Piscataway (2019) 49. Wang, Y., Plataniotis, K.: Face based biometric authentication with changeable and privacy preservable templates. In: 2007 Biometrics Symposium, pp. 1–6. IEEE, Piscataway (2007) 50. Yadav, U., Abbas, S.N., Hatzinakos, D.: Evaluation of PPG biometrics for authentication in different states. In: 2018 International Conference on Biometrics (ICB), pp. 277–282. IEEE, Piscataway (2018) 51. Yang, S., Deravi, F.: On the usability of electroencephalographic signals for biometric recognition: a survey. IEEE Trans. Human-Machine Syst. 47(6), 958–969 (2017) 52. Yang, Y., Sun, J.S., Zhang, C., Li, P.: Retraining and dynamic privilege for implicit authentication systems. In: 2015 IEEE 12th International Conference on Mobile Ad Hoc and Sensor Systems, pp. 163–171. IEEE, Piscataway (2015) 53. Zhao, T., Wang, Y., Liu, J., Chen, Y.: Your heart won’t lie: PPG-based continuous authentication on wrist-worn wearable devices. In: Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 783–785 (2018)

On-Phone CNN Model-Based Implicit Authentication to Secure IoT Wearables Sayanton V. Dibbo, William Cheung, and Sudip Vhaduri

1 Introduction 1.1 Motivation As the internet of things (IoT) and mobile networks [14] are getting popular, we are able to access more physical objects remotely. From 2000 to 2016, internet users have grown more than eight times (413 million to 3.4 billion [8]). A similar rise in smart technology has followed suit. Products from smart glasses, smartwatches to smart cars are being developed. Analysts predict that from 2016 to 2022, there will be a 73% increase in smart wearable production and a 78% increase in sales [1]. However, IoT also leads to increased availability of digital footprint from a user. While knowledge-based authentication approaches, such as passwords, can be easy to implement, they also bring additional challenges, such as the hassle of memorizing. This often leads to weaker password selection for about 80% of the time, as found in a McAfee survey [9]. However, the digital footprints that the IoTconnected devices are collecting continuously can be used to authenticate and secure individuals in an implicit and user-friendly manner. To develop an implicit and continuous authentication mechanism for IoTconnected devices, it is crucial to make trade-offs among various parameters/ constraints, such as types of data, granularity of data, computation power of the device, among several others. Thereby, researchers have been using various popular

S. V. Dibbo Dartmouth College, Hanover, NH, USA e-mail: [email protected] W. Cheung · S. Vhaduri () Purdue University, West Lafayette, IN, USA e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_2

19

20

S. V. Dibbo et al.

biometrics, such as gait, gesture, keystroke, voice, faces, palm prints, fingerprints, finger veins, electrocardiogram, or heart rate to authenticate a user [31]. However, not all IoT devices have the same capability. Therefore, authentication mechanisms developed for different devices, biometrics, and application scenarios may not apply to all types of devices. For example, while a gait-based authentication system can achieve an accuracy of 0.92 [32], this approach fails when a user is sedentary. Similarly, while the key-phrase-based voice authentication approaches are highly accurate, they impose a burden on users to remember the key-phrase and require the user to speak [23]. Thereby, it will be effective to develop an on-device implicit user authentication utilizing mobile devices’ sensing and computation capabilities, such as smartphones and IoT connectivity. Such an implicit on-phone user authentication will not only secure various IoT objects [20, 30, 42–44, 48, 49] but also the implicit authentication can be applicable to many other IoT supported services/sectors, e.g., health monitoring [24, 36, 51], well-being tracking [38–40], stress monitoring [34], sleep quality improvement [19, 45], disease monitoring [33, 35, 52], and place discovery [37, 41, 46, 47, 50]. Thereby, it is extremely important to develop an implicit on-phone user authentication system that can seamlessly validate a wearable device user’s identity and secure the user’s wearable and hence his/her cyber-physical space in a non-stop fashion.

1.2 Contributions In this paper, we present an implicit approach to identify a wearable device user through an on-phone convolutional neural network (CNN) model, which utilizes the user’s breathing patterns captured from the smartphone microphone. Compared to PIN/password or other one-time biometric authentications, such as face recognition, our implicit authentication can verify a user based on the user’s breathing patterns, whenever captured by the phones, and thereby, this approach does not require active user interaction. Due to limited sensing and computation capabilities of wearables, e.g., smart watches, we implement our authentication system leveraging smartphone’s audio sensing, signal processing, and computation capabilities (i.e., ability to deploy onphone deep neural network models using the TensorFlow Lite framework, discussed in Sect. 3.1). The core piece of this implicit wearable user authentication system is a smartphone application, TFL Auth, which tests a user’s breathing patterns using a CNN model and then sends the decision of acceptance/rejection to the target wearable using IoT-connected phone, server, and wearable communication. The CNN model is trained in the server but deployed on the phones to perform realtime user identification (authentication) while confirming that no raw or processed audio data or features leave the phone. Thereby, this approach overcomes the issues and challenges associated with server-based user authentication schemes, i.e., approaches where models are deployed on servers to authenticate a user. For our model development, we conduct a data collection effort from 10 subjects (Sect. 3.2),

On-Phone CNN Model-Based Implicit Authentication to Secure IoT Wearables

21

followed by three types of augmentations to create real-world effects with various background noises and phone-mouth distance variation (Sect. 3.3), and computation of an extended set of audio features (Sect. 3.4). Next, we perform an extensive search to find the optimal sets of parameters for different CNN models (Sect. 4.3). Using a leave-one-event-out validation approach, we are able to achieve an average accuracy of 0.92 ± 0.01, average genuine rejection rate of 0.93 ± 0.01, average genuine acceptance rate of 0.90 ± 0.01, and average F1 score of 0.92 ± 0.01 using the Mel-frequency cepstral coefficients (MFCC) feature set (Sect. 4.4). This smartphone-based implicit wearable authentication system can be extended to other smart wearables, such as smart glasses and headphones.

2 Related Work Biometric authentication techniques are continuously evolving. Earlier biometric schemes primarily focused on physical biometrics, such as face recognition, finger print [16], veins scan [29]. They have become powerful solutions to reducing memory-based passwords. However, the primary drawback of these solutions is that there are conditions when it becomes difficult to obtain clean images. Additionally, users need to actively interact with the authentication systems, which are often built using dedicated devices with powerful sensors and computation capability. Often these limitations restrict the use of such authentication systems from deploying in mobile environments. Thereby, researchers have started using behavioral biometrics to develop robust authentication systems. This type includes but are not limited to gait [22], wrist movement [13], and keystroke patterns [25]. Sensors that collect data through movement face fewer restrictions than camera-based implementation. They also do not require the user’s active participation thus can be used to validate users continuously. This would strengthen the security beyond the initial point of access. The glaring weakness to this is when users are in the sedimentary state; these behavioral biometrics fail to authenticate a user. Passive behavioral biometrics such as heart rate, ECG [15], and breathing [17] overcome the user-state change-related limitation that other behavioral biometrics face. Compared to other biometrics, non-speech human sounds, such as breathing sounds, have higher availability. Therefore, researchers have been trying to use that as a digital signature. In their work, they have developed various feature extraction techniques and modeling approaches to find personalized information from audio clips in order to identify an individual. A research team used the heart rate, gait, and breathing data to continuously authenticate users utilizing standard machine learning models, which cannot be deployed on smartphones for real-time authentications [21]. Additionally, that work did not consider reallife environments with background noises and they do not address the challenges of security and privacy of audio data. Another group of researchers was able to achieve an accuracy of 0.9 using a long short-term memory (LSTM) model

22

S. V. Dibbo et al.

developed from sniff and breathing recordings and gammatone-frequency cepstral coefficients (GFCC) feature extraction scheme [18]. However, a major limitation of LSTM and other recurrent neural networks (RNN) is the self-looping nature, which incurs around 1.68 times longer training time for RNNs compared to the convolutional neural network (CNN)s [4]. A research effort led by another group of researchers has found the usefulness of breathing data to authenticate a user utilizing the Gaussian mixtures model (GMM) from sniff, normal, and deep breathing gestures [17]. However, this work is an intermittent authentication based on breathing data collected deliberately and vulnerable to sophisticated attacks. Also, this work is device-dependent; they do not try to train and test the model on an independent device. Another group of researchers has used breathing recordings and waveform morphology analysis with fuzzy wavelet packet transform (FWPT) to achieve accuracy over 0.9 in their laptop-based implementation [27]. However, these models have not been tested/deployed on smartphones for a real-time implicit user authentication system that does not require any raw/processed audio data or features out of the phone. In line with this research direction, we seek to develop a continuous secure authentication scheme, where breathing data from smartphone microphones are processed and feed into on-phone artificial neural network models to identify a valid user and then transfer this acceptance/rejection decision to a target wearable of the user utilizing IoT connectivity.

3 Approach In this section, we discuss our datasets, pre-processing, and feature computation approaches. However, before we discuss those details about the data, we first present the overview of our authentication system.

3.1 System Overview In Fig. 1, we present our system diagram. Our implementation consists of multiple IoT-connected devices, including a smartphone (a Samsung S9+ Android phone in our case), smartwatch (Fitbit Ionic in our case), and a server. The heart of this system is our TFL Auth application running on the smartphone. In the last paragraph, we will discuss the details of the app. The TFL Auth app continuously listens to the smartphone microphone and when it detects a target user’s breathing patterns, it sends an acceptance notification to the server. The server then sends back an acceptance notification to the Fitbit companion app running on the phone. Then, the companion app sends the acceptance notification to Fitbit watch. Finally, the watch takes an action (e.g., “Accept” text as shown on the Fitbit screen in Fig. 1a). “Rejection” decisions are also passed following the same path (Fig. 1b).

On-Phone CNN Model-Based Implicit Authentication to Secure IoT Wearables

23

Fig. 1 System diagram. (a) Case of acceptance. (b) Case of rejection

In our implementation, all audio signal processing, including buffering and feature computation, as well as the user validation (authentication), are performed in realtime by the TFL Auth application running on the smartphone. Then a decision of acceptance or rejection is sent to the smartwatch. In our implementation, communication between the Fitbit watch and phone is setup by Wi-Fi connection and via the Fitbit companion application running on the phone [6]. Since we cannot exchange information directly between the two phone applications, i.e., the Fitbit companion app and TFL Auth app, we setup two communication channels with a server, i.e., TFL Auth app to server and server to Fitbit companion app. We implement the popular web socket (ws) library socket.io on both the server and Android applications. However, socket.io is not compatible with Fitbit; thus we use the native ws library from the server to communicate to the companion app [7]. The core component of the TFL Auth app running on the phone is a convolutional neural network (CNN) model. We import the CNN model into our on-phone TFL Auth app using the Google TensorFlow Lite framework [11]. The on-phone TFL Auth app temporarily stores audio samples from the microphone into 1.63 seconds (i.e., maximum duration of an inhalation event in our dataset) long buffers to extract different audio features using Python libraries enabled by Chaquopy SDK [3]. After

24

S. V. Dibbo et al.

feature computation, the CNN model decides whether a buffer of audio samples comes from a valid user, and based on that, the app sends an acceptance or rejection notification to the server.

3.2 Datasets We obtain breathing audio clips from 10 subjects using the Evistr digital voice recorder with a sampling rate of 44.1 KHz. We collect our breathing data during a subject’s resting states, placing the recording device at an arm-distance from the subject’s mouth. We collect all our data in quiet environments with low to no background noise. Each subject contributed to six breathing events. We use the inhalation part of each breathing event for our modeling (discussed in Sect. 4). In order to resemble natural environments with different background noises, we add various background sound clips obtained from the Environmental Sound Classification (ESC-50) [2] and Urban Sound 8K (US-8K) [12] datasets to our inhalation events. While we use the vacuum and washing machine noise clips from the ESC-50 dataset, from the US-8K we use the siren, dog barking, and air conditioner noise clips. In addition to imposing noise clips on breathing events, we perform time stretching and pitch shifting of original breathing events to resemble the effects of varying source to receiver distance and other factors that may cause variations in a user’s breathing patterns/sounds.

3.3 Data Pre-processing Since we are using real-world datasets, we first clean the dataset before using it. Then, we segment the continuous stream of desired audio events (i.e., inhalation part of a breathing event). Next, we perform the noise augmentation to increase the data volume. Finally, we compute and select influential features before constructing authentication models.

3.3.1

Data Segmentation and Cleaning

We consider the inhalation part of a breathing event as our desired data, i.e., instance. While collecting the desired data, other types of sounds are mixed with it. Additionally, some clips come with multiple inhalation breathing events separated by silence or noisy parts. Therefore, we segment the audio clips to fetch single inhalation breathing events. We obtain a variable number of events from different subjects, so in order to balance the dataset, we consider six inhalation breathing events per subject. In Fig. 2, we present the probability density function (PDF) and cumulative distribution function (CDF) of inhalation duration (in seconds).

On-Phone CNN Model-Based Implicit Authentication to Secure IoT Wearables

25

Fig. 2 PDF and CDF of inhalation event duration obtained from our dataset

Since audio clips of inhalation vary in length, padding is required to set the time dimension of the data the same while computing features. In the figure, we observe that the maximum length of a breathing event is 1.63 seconds; therefore, we use 1.63 seconds as our window size while padding zeros. Then, each event is modified in 222 ways mentioned in the next section (Sect. 3.3.2). Thereby, we obtain a total of 1332 instances from each subject before computing features for our models.

3.3.2

Audio Data Augmentation

Breathing audio could be altered due to changes in the environment, physical state, or mood. To simulate this and capture these variations, we augment the original audio breathing events using various values of pitch shifts and speed changes. These various adjustments help simulate tired, excitement, exercising, and other various states that can impact a person’s breathing patterns. This also helps introduce data variation to train a model more resistant to overfitting. We also inject noise into the original breathing clips in order to simulate different background noises present in a given situation. To give a fair representation of performance while developing different models, we consider the 1332 instances as batches of 222 related clips, as further explained in Sect. 4.1. – Pitch shift: We consider 15 different pitch shifts ranging from − 72 to 72 with 12 increments – Speed change: We consider seven-speed changes ranging from 14 x to 2x times the speed of an original clip with an increment of 14 x, skipping 1x since that would

26

S. V. Dibbo et al.

represent the original clip, which we have already included as a pitch shift with value 0. – Noise Superposition: We consider five randomly picked vacuum and washing machine sound clips, obtained from the environmental sound classification dataset [2]. We also use five random clips of sirens, dog barks, and air conditioner from the US-8K dataset [12]. These clips are used as background noises to modify original breathing events. While adding noise clips, we consider eight different signal-to-noise ratio levels ranging from 10−4 to 104 , incremented by magnitudes of 10 while skipping 1. Thereby, we generate 222 variations of each original breathing clip.

3.4 Feature Computation We compute the following sets of candidate features in order to develop different CNN models to detect the inhalation breathing events of a valid user. – – – – – – – – – – –

Normalized gammachirp cepstral coefficients (NGCC) with 40 bins Bark-frequency cepstral coefficients (BFCC) [26] with 13 number of cepstras Linear predictive coefficients (LPC) with 13 number of cepstras Rasta perceptual linear prediction coefficients (RPLP) with 13 number of cepstras Chroma short-time Fourier transform (STFT) with 12 octave values Mel-Spectrogram (Mel-Spect.) with 128 windows Constant-Q chromagram (chroma) with 40 chroma values Root-mean-square (RMS) with 20 windows Spectrogram (Spect.) with 10 features form spectral centroid, spectral bandwidth, spectral contrast, and spectral rolloff Tonnetz with 6 tonnetz dimensions Mel-frequency cepstral coefficients (MFCC) with 40 bins

We use Librosa [28] and Spafe [10] python libraries to generate the aforementioned features. We convert the two-dimensional features into one-dimensional features by condensing along the time access using the average values.

4 User Authentication Compared to standard machine learning algorithms, deep neural networks are more resource-intensive to train. Therefore, we train our convolutional neural network (CNN) models on the Google cloud platform [5]. We perform a detailed analysis of various types of audio features to maximize performance.

On-Phone CNN Model-Based Implicit Authentication to Secure IoT Wearables

27

While comparing performance of different CNN models trained from different feature sets, we name the models according to feature types. For example, the CNN model trained from the MFCC features is referred to as the MFCC model. Additionally, we consider different network configurations while developing models for each feature set. In Sect. 4.3, we present different types of parameters and their ranges of values that we try to optimize while developing CNN models. After training, we deploy an optimal CNN model on the phone using the Google TensorFlow Lite framework [11]. This phone-based implementation obliviates the need to transfer data to the cloud since the TFL Auth app performs all signal processing, feature computation, and user identification task on the phone before sending the acceptance/rejection decision to the watch (discussed in Sect. 3.1). In the next two sections, we present our training approaches and performance criteria to compare models. First, we devise a method to fairly split training and testing datasets. Then, we walk through hyper-parameter tuning to optimize the models. We discuss more in the following sections.

4.1 Training-Testing Set As discussed in Sect. 3.3.2, augmented inhalation breathing sounds typically have some traces of semblance to their original or parent audio clips. As a result, for a valid user, we have six sets of inhalation sounds, each from one of the six original inhalation events. In a leave-one-event-out fashion, we keep five inhalation events and all their augmented events for training and validation. The remaining one inhalation event and all its augmented events are kept for testing. In order to create a balanced dataset, the invalid class is evenly comprised of the other subjects’ data. In our leave-one-event-out testing approach, each subject had six iterations of training-testing; thereby, for a total of 10 subjects, a total of 60 iterations are run. We train-test models for every feature set separately (described in Sect. 3.4). In Sect. 4.4, we discuss the performance of our models.

4.2 Performance Comparison Measures To evaluate the performance of different modeling approaches, we consider the following measures: Accuracy (ACC) is the fraction of predictions that are correct, i.e., ACC =

TP +TN T P + FN + FP + T N

(1)

28

S. V. Dibbo et al.

Root Mean Square Error (RMSE) is the square root of the sum of squares of the deviation from the prediction to the actual value, which is equivalent to the square root of the rate of misclassification, i.e.,  RMSE =

FP + FN T P + FN + FP + T N

(2)

Genuine Rejection Rate (GRR) is the fraction of imposters rejected by an authentication system, which is the inverse of the False Acceptance Rate (FAR), i.e., GRR =

TN = 1 − F AR FP + T N

(3)

Genuine Acceptance Rate (GAR) is the fraction of target users accepted by an authentication system, which is the inverse of the False Rejection Rate (FRR), i.e., GAR =

TP = 1 − F RR T P + FN

(4)

F1 Score is the measure of performance of an authentication system based on both its Precision (positive predictive value) and Recall (true positive rate) measures, i.e.: 

TP TP + F1 Score = 2 T P + FN T P + FP

−1 (5)

4.3 Hyper-Parameter Optimization We first use Scikit Learn grid search wrapper to find the optimal batch size, number of the epochs, learning rate, and optimizer. Then, we try different numbers of hidden layers and different numbers of neurons in those hidden layers. After finding the optimal models, we perform our last step of the grid search through deciding the optimal activation layer, pooling method, and regularizer. Below is the list of hyperparameters with their ranges of values that we tune for each model. – – – – –

Optimizer: sgd, adam, rmsprop Batch size: 50,75,100 Epoch: 50,100,150,200 Number of Hidden Layers: 1,2,3 Regularizer • l1 • l2 • dropout: 0.1,0.25,0.5,0.7

On-Phone CNN Model-Based Implicit Authentication to Secure IoT Wearables

29

Table 1 Optimal values of different CNN models with Activation = Relu, Epochs = 100, Initialization = Normal, and Learning Rate = 0.001 Feature/model NGCC BFCC LPC RPLP STFT Mel-Spect. chroma RMS Spect. Tonnetz MFCC

Batch size 50 50 50 50 50 75 50 100 50 50 50

Regularizer L2 Dropout L2 L2 L2 w. pooling L2 w. pooling L2 L2 L2 L2 L2

Feature count 40 13 13 13 12 128 40 20 10 6 40

Network shape 128–64 128–64 32 32 32 16 128–64 280-128-64 128–64 32 32

– Neurons • 1 hidden layers: (16),(32) • 2 hidden layers: (128,64),(280,64) • 3 hidden layers: (560,128,64), (280,128,64) – – – –

Learning Rate: 0.1,0.001 Window Size: 0.4,0.6,0.8 Activation: Relu, Leaky Relu, Sigmoid Pooling Method: Max Pooling, Average Pooling with pooling size 2

Table 1 shows the best hyper-parameter configuration for each feature set. A majority of the feature sets resulted in similar network configurations. Batch size of 50, L2 regularization, and one to two hidden layers are the most frequent best performing network structure. This signifies that lightweight models can be used in breathing authentication. The highest performing feature class MFCC has similar configurations as that of the norm, along with the simplest hidden layer structures.

4.4 Authentication Model Evaluation In Fig. 3, we observe that MFCC is the best performer with an average test accuracy of 0.92. The second most influential feature set is the spectrogram, which has a test accuracy of 0.86 and a standard error of less than 0.01, i.e., consistent performances across all subjects. Between the two feature sets, the MFCC also can operate in a smaller network with only one hidden layer while the spectrogram uses a two hidden layer configuration. With a 10 feature count set of spectrogram, we are using coarse-grained information, which provides a greater advantage in situations where audio quality is very low. Depending on the situation, either MFCC or

30

S. V. Dibbo et al.

Fig. 3 Bar graph with errors bars of training-testing accuracy of different CNN models developed from various types of features classes

spectrogram provides the best performance. We further observe that the non-cepstral coefficient features perform better than the cepstral coefficient features, in general. On average, the non-cepstral coefficient features outperform their counterparts (except the MFCC) by more than 8.7% accuracy. More astounding is that MFCC performs 48% better than the average of other cepstral coefficient features (Table 2). When diving deeper into the other metrics, both GRR and GAR play roles in describing important characteristics of the model performances. Genuine rejection rate (GRR) symbolizes the robustness of a security model. MFCC model has a 0.93 GRR meaning it correctly denies invalid user access 93% of the time. Genuine rejection are (GAR) represents user-friendliness, and the same model has a 0.90 score meaning it accepts the valid user 90% of the time. Depending on the application and the sensitivity of the information behind the security scheme, there might be an elevated focus on GRR. In Fig. 4, we use the spider chart to combine multiple performance measures, i.e., ACC, F1 , GAR, and GRR, of the top three models (i.e., MFCC, Spectrogram, and STFT). In the spider chart, we observe that the CNN model with MFCC feature set achieves the highest coverage. Thereby, we conclude that based on our dataset, the CNN models with MFCC feature set will be the best authentication model to deploy into our TFL Auth app for the smartphone-based implicit user authentication system.

5 Conclusions and Future Work To the best of our knowledge, this is the first attempt to verify a wearable device user utilizing an on-phone artificial neural network (ANN) model that performs

On-Phone CNN Model-Based Implicit Authentication to Secure IoT Wearables

31

Table 2 Test performance summary of different CNN models with their optimal parameter settings. Performance measures are presented as averages over all iterations with standard errors in parenthesis Feature/ model NGCC

Batch Epoch Feature Network size count shape 50 100 40 128–64

BFCC

50

100

13

LPC

50

100

13

RPLP

50

100

13

STFT

50

100

12

Mel-Spect. 50

100

128

Chroma

50

100

40

RMS

100

100

20

Spect.

50

100

10

Tonnetz

50

100

6

MFCC

50

100

40

Fig. 4 Spider chart of top three CNN models with their associated feature sets/classes

ACC 0.619 (0.006) 128–64 0.625 (0.011) 32 0.538 (0.005) 32 0.539 (0.004) 32 0.812 (0.017) 16 0.527 (0.004) 128–64 0.718 (0.014) 280-128-64 0.649 (0.015) 128–64 0.857 (0.009) 32 0.605 (0.011) 32 0.918 (0.006)

RMSE 0.033 (0.000) 0.041 (0.000) 0.035 (0.002) 0.031 (0.002) 0.035 (0.001) 0.028 (0.002) 0.034 (0.001) 0.031 (0.001) 0.032 (0.000) 0.035 (0.000) 0.033 (0.000)

GRR 0.609 (0.010) 0.372 (0.025) 0.434 (0.049) 0.514 (0.053) 0.761 (0.039) 0.553 (0.053) 0.679 (0.025) 0.678 (0.038) 0.874 (0.011) 0.575 (0.018) 0.933 (0.008)

GAR 0.629 (0.009) 0.885 (0.012) 0.645 (0.049) 0.564 (0.052) 0.868 (0.019) 0.499 (0.053) 0.758 (0.012) 0.619 (0.037) 0.840 (0.010) 0.635 (0.008) 0.903 (0.007)

F1 score 0.619 (0.006) 0.701 (0.007) 0.512 (0.026) 0.463 (0.029) 0.809 (0.017) 0.413 (0.031) 0.730 (0.012) 0.606 (0.024) 0.852 (0.009) 0.613 (0.008) 0.916 (0.006)

32

S. V. Dibbo et al.

all breathing audio data processing and user identification in the phone implicitly. Thereby, such an implicit breathing pattern-based on-phone authentication overcomes the limitations of systems that rely on server-based implementation and require sensitive audio to off-load from the phone. Our experiment shows that Mel-frequency cepstral coefficients (MFCC) and spectrogram have the best ability to interpret personalized audio data. While the best model trained with MFCC features has an average test accuracy of 0.92, the best model trained with spectrogram features has an average accuracy of 0.86. However, the spectrogram feature-driven model has substantially greater consistency across subjects. This shows how effective the breathing biometric can be in serving as the basis of an implicit continuous authentication system. It provides an opportunity to implement a strong security system that can verify a user throughout the sessions without requiring an active user participation. We seek to expand this implementation to a multi-biometric-based IoT wearable authentication scheme. This includes combining the heart rate and gait movements of a wearable device user along with the breathing clips. Fitbit Ionic is a common market wearable with sensors to track heart rate and gait, making it easier to include the watch in an expanded robust authentication scheme. The next step is to elaborate Fig. 1, where we have communication from the watch back to the phone to include the heart rate and gait data. The combinations of biometrics can then be used in the models housed on the phone. From the watch, the data will be then sent to the Fitbit companion app and then from the companion app to a local server before sending back to the on-phone the TFL Auth app. Finally, the model in the TFL Auth app will make a decision from multi-modal biometric data, and the decision will be sent back to the watch via the server and companion app.

References 1. Forecasted value of the global wearable devices market. https://goo.gl/C682Rv (2018). Accessed February 2018 2. Esc-50: Dataset for environmental sound classification. https://bit.ly/2uT9Ddc (2019). Accessed November 2019 3. Chaquopy python SDK for android. https://chaquo.com/chaquopy/ (2020). Accessed June 2020 4. Dropout on CNN vs. RNN. https://rb.gy/ki1iwx (2020). Accessed October 2020 5. Google cloud compute engine. https://cloud.google.com/compute (2020). Accessed June 2020 6. How to talk to phone apps. https://rb.gy/7ih2ow (2020). Accessed July 2020 7. Implementing websockets to communicate between fitbit versa and local server! https://rb.gy/ usg8oo (2020). Accessed July 2020 8. Internet. https://ourworldindata.org/internet. Accessed October 2020 9. McAfee research finds troubling use of insecure cloud passwords. https://rb.gy/7fnde8 (2020). Accessed: June 2020 10. spafe: Simplified python audio-features extraction. https://rb.gy/jmdvms (2020). Accessed June 2020 11. Tensorflow lite example apps. https://rb.gy/ggnlz3 (2020). Accessed June 2020 12. Urbansound8k dataset. Available: https://bit.ly/2uHhhYh (2020). Accessed March 2020

On-Phone CNN Model-Based Implicit Authentication to Secure IoT Wearables

33

13. Acar, A., Aksu, H., Uluagac, A.S., et al.: Waca: Wearable-assisted continuous authentication (2018). Preprint arXiv:1802.10417 14. Al Amin, M.T., Barua, S., Vhaduri, S., Rahman, A.: Load aware broadcast in mobile ad hoc networks. In: 2009 IEEE International Conference on Communications, pp. 1–5. IEEE, Piscataway (2009) 15. Bugdol, M.D., Mitas, A.W.: Multimodal biometric system combining ecg and sound signals. Patt. Recog. Lett. 38, 107–112 (2014) 16. Camlikaya, E., Kholmatov, A., Yanikoglu, B.: Multi-biometric templates using fingerprint and voice. In: Biometric Technology for Human Identification V, vol. 6944, p. 69440I. International Society for Optics and Photonics, Bellingham (2008) 17. Chauhan, J., Hu, Y., Seneviratne, S., et al.: Breathprint: Breathing acoustics-based user authentication. In: ACM Mobile Systems, Applications, and Services (2017) 18. Chauhan, J., Seneviratne, S., Hu, Y., et al.: Breathing-based authentication on resourceconstrained IoT devices using recurrent neural networks. Computer 51(5), 60–67 (2018) 19. Chen, C.Y., Vhaduri, S., Poellabauer, C.: Estimating sleep duration from temporal factors, daily activities, and smartphone use. In: 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 545–554. IEEE, Piscataway (2020) 20. Cheung, W., Vhaduri, S.: Context-dependent implicit authentication for wearable device users. In: IEEE Personal, Indoor and Mobile Radio Communications (PIMRC) (2020) 21. Cheung, W., Vhaduri, S.: Continuous authentication of wearable device users from heart rate, gait, and breathing data. In: IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob) (2020) 22. Cola, G., Avvenuti, M., Musso, F., et al.: Gait-based authentication using a wrist-worn device. In: ACM Mobile and Ubiquitous Systems: Computing, Networking and Services (2016) 23. Dai, H., Wang, W., Liu, A.X., et al.: Speech based human authentication on smartphones. In: IEEE International Conference on Sensing, Communication, and Networking (SECON) (2019) 24. Kim, Y., Vhaduri, S., Poellabauer, C.: Understanding college students’ phone call behaviors towards a sustainable mobile health and wellbeing solution. In: International Conference on Systems Engineering (CIIS) (2020) 25. Kumar, R., Phoha, V.V., Raina, R.: Authenticating users through their arm movement patterns (2016). Preprint arXiv:1603.02211 26. Lalitha, S., Tripathi, S., Gupta, D.: Enhanced speech emotion detection using deep neural networks. Int. J. Speech Technol. 22(3), 497–510 (2019) 27. Liu, J., Dong, Y., Chen, Y., et al.: Leveraging breathing for continuous user authentication. In: International Conference on Mobile Computing and Networking (2018) 28. McFee, B., Raffel, C., Liang, et al.: librosa: Audio and music signal analysis in python. In: Python in Science Conference (2015) 29. Mohsin, A., Zaidan, A., Zaidan, B., Albahri, O., Ariffin, S.A.B., Alemran, A., Enaizan, O., Shareef, A.H., Jasim, A.N., Jalood, N., et al.: Finger vein biometrics: taxonomy analysis, open challenges, future directions, and recommended solution for decentralised network architectures. IEEE Access 8, 9821–9845 (2020) 30. Muratyan, A., Cheung, W., Dibbo, S.V., Vhaduri, S.: Opportunistic multi-modal user authentication for health-tracking iot wearables. In: the 5th EAI International Conference on Safety and Security with IoT (SaSeIoT 2021) (2023) 31. Sarkar, A., Abbott, A.L., Doerzaph, Z.: Biometric authentication using photoplethysmography signals. In: Biometrics Theory, Applications and Systems (BTAS). IEEE, Piscataway (2016) 32. Sun, F., Mao, C., Fan, X., et al.: Accelerometer-based speed-adaptive gait authentication method for wearable iot devices. IEEE Int. Things J. 6(1), 820–830 (2018) 33. Vhaduri, S.: Nocturnal cough and snore detection using smartphones in presence of multiple background-noises. In: Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies, pp. 174–186 (2020) 34. Vhaduri, S., Ali, A., Sharmin, M., Hovsepian, K., Kumar, S.: Estimating drivers’ stress from GPS traces. In: Proceedings of the 6th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 1–8 (2014)

34

S. V. Dibbo et al.

35. Vhaduri, S., Brunschwiler, T.: Towards automatic cough and snore detection. In: 2019 IEEE International Conference on Healthcare Informatics (ICHI), pp. 1–1. IEEE, Piscataway (2019) 36. Vhaduri, S., Munch, A., Poellabauer, C.: Assessing health trends of college students using smartphones. In: 2016 IEEE Healthcare Innovation Point-Of-Care Technologies Conference (HI-POCT), pp. 70–73. IEEE, Piscataway (2016) 37. Vhaduri, S., Poellabauer, C.: Cooperative discovery of personal places from location traces. In: 25th International Conference on Computer Communication and Networks (ICCCN). IEEE, Piscataway (2016) 38. Vhaduri, S., Poellabauer, C.: Design and implementation of a remotely configurable and manageable well-being study. In: Smart City 360, pp. 179–191. Springer, Berlin (2016) 39. Vhaduri, S., Poellabauer, C.: Human factors in the design of longitudinal smartphone-based wellness surveys. In: 2016 IEEE International Conference on Healthcare Informatics (ICHI), pp. 156–167. IEEE, Piscataway (2016) 40. Vhaduri, S., Poellabauer, C.: Design factors of longitudinal smartphone-based health surveys. J. Healthcare Inf. Res. 1(1), 52–91 (2017) 41. Vhaduri, S., Poellabauer, C.: Hierarchical cooperative discovery of personal places from location traces. IEEE Trans. Mobile Comput. 17(8), 1865–1878 (2017) 42. Vhaduri, S., Poellabauer, C.: Towards reliable wearable-user identification. In: 2017 IEEE International Conference on Healthcare Informatics (ICHI). pp. 329–329. IEEE Computer Society, Washington (2017) 43. Vhaduri, S., Poellabauer, C.: Wearable device user authentication using physiological and behavioral metrics. In: IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC) (2017) 44. Vhaduri, S., Poellabauer, C.: Biometric-based wearable user authentication during sedentary and non-sedentary periods. In: International Workshop on Security and Privacy for the Internetof-Things (2018) 45. Vhaduri, S., Poellabauer, C.: Impact of different pre-sleep phone use patterns on sleep quality. In: IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks (BSN) (2018) 46. Vhaduri, S., Poellabauer, C.: Opportunistic discovery of personal places using multi-source sensor data. IEEE Transactions on Big Data 7(2), 383–396 (2018) 47. Vhaduri, S., Poellabauer, C.: Opportunistic discovery of personal places using smartphone and fitness tracker data. In: 2018 IEEE International Conference on Healthcare Informatics (ICHI), pp. 103–114. IEEE, Piscataway (2018) 48. Vhaduri, S., Poellabauer, C.: Multi-modal biometric-based implicit authentication of wearable device users. IEEE Trans. Inf. Foren. Secur. 14(12), 3116–3125 (2019) 49. Vhaduri, S., Poellabauer, C.: Summary: Multi-modal biometric-based implicit authentication of wearable device users (2019). Preprint arXiv:1907.06563 50. Vhaduri, S., Poellabauer, C., Striegel, A., Lizardo, O., Hachen, D.: Discovering places of interest using sensor data from smartphones and wearables. In: 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing (UIC) (2017) 51. Vhaduri, S., Prioleau, T.: Adherence to personal health devices: A case study in diabetes management. In: EAI International Conference on Pervasive Computing Technologies for Healthcare (EAI PervasiveHealth) (2020) 52. Vhaduri, S., Van Kessel, T., Ko, B., Wood, D., Wang, S., Brunschwiler, T.: Nocturnal cough and snore detection in noisy environments using smartphone-microphones. In: 2019 IEEE International Conference on Healthcare Informatics (ICHI), pp. 1–7. IEEE, Piscataway (2019)

A Cybersecurity Guide for Using Fitness Devices Maria Bada

and Basie von Solms

1 Introduction Currently there is a lot of discussion around the Internet of Things (IoTs) and the new types of threats emerging from new technologies [1, 2]. IoTs create great opportunities for everyone but also create new tools for citizens and criminals. Wearable devices are becoming very popular especially in healthcare and fitness. Such devices could be health monitors, fitness bands or smartwatches. In addition, the worldwide sales of these devices have been growing with 110 million units in 2018 [3]. Wearable devices are quite popular, since they merge the physical and digital world. A recent study [4] identified that smartphone users tend to also own wearables. Moreover, a large percentage of users would use wearables in the next 5 years not only to track health related information, but also for everyday actions such as unlocking a door, making financial transactions or authenticating their identity. Some devices such as Fitbits are currently offering new services such as making transactions with contactless payment, for example, in order to do shopping or buy train tickets. In order to use the device for such services consumers need to add their credit or debit card to their Fitbit watch or tracker [5]. Fitbit Pay works with a number of Fitbit smartwatches [6]. This function is also supported by the consumer’s bank fraud protection services. In the UK, the device can be used for M. Bada () University of Johannesburg, Johannesburg, South Africa Queen Mary University of London, London, UK e-mail: [email protected] B. von Solms University of Johannesburg, Johannesburg, South Africa e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_3

35

36

M. Bada and B. von Solms

Transport for London (TfL) transit system including buses, London Underground and trains and other services [5]. However, more and more reports are warning of the cybersecurity risks of such devices, and the possibilities for such devices to be hacked. Guidelines do exist for securing such devices, but most of such guidance is directed towards device manufacturers, IoT providers and more. One good example is the Code of Practice for consumer IoT security by the UK Department of Culture Media and Sport [7]. Very little, if any such guidance does exist for the real end user, the ordinary citizen, who makes use of fitness devices and other smart home devices. The purpose of this paper is to put the focus on the end users and emphasise on the security risks of fitness devices. The focus of this research lies on fitness devices due to their wide current usage and the direct or indirect risks they pose to users. Risks are not only related to security but also to user privacy. In order to create a set of guidelines for users around security and privacy issues, we reviewed current research following a grounded theory approach. In Sect. 2, we discuss the personal data collected by fitness devices, while in Sect. 3 we provide a description of the cybersecurity risks they are exposed to. In Sect. 4 we discuss the cybersecurity awareness needs of users of wearable devices and in Sect. 5 we present a number of guidelines for users and manufacturers in order to lower their exposure to risks. Finally, Sect. 6 discusses the findings and Sect. 7 concludes the paper.

2 Personal Data Collection by Fitness Devices Different models of wearable devices are offered to consumers from different companies. For example, Fitbits, Apple Watches and Samsung Galaxy Watches are some of the products available today. These devices have a particular function; however, they also collect personal data from users [8]. We will review Fitbit as an example; however other products collect similar information. For example, Fitbit smartwatches collect information that can be considered private and potentially dangerous in the wrong hands. According to Fitbit’s Legal Policy, Fitbit receive or collect three categories of information from user devices [9]: • Physical Data: data such as body temperature, pulse rate, food habits and body weight, steps-distance travelled, calories burned, sleep stage, and active minute. These data are synchronised to devices that transfer information to Fitbit servers. • Location Data: Fitbits receive information regarding location through GPS signals or Wi-Fi access points. • Usage Data: Fitbits collect information about a user’s interaction with services, such as when a user views or searches content or installs software. Users typically transmit this data using a wireless connection, such as a Bluetooth, Wi-Fi or a cellular connection. For example, devices transfer the data

A Cybersecurity Guide for Using Fitness Devices

37

they collect through a smartphone using Wi-Fi or Bluetooth connections. The data received are then stored or transmitted to a cloud server [10]. Fitbit introduced a series of technologies for workout tracking, such as PurePulse, SmartTrack and Sleep Tracking, which automatically recognise users’ exercises and record the data through a smartphone [11].

3 Risks Arising from Fitness Devices Although fitness devices have been developed and used for a number of years, it seems that developers still do not focus on the security of these devices. As mentioned above, these devices collect information related to the health and wellness of consumers, thus collecting a big amount of data. As a result, device manufacturers are often putting consumer data at risk of exposure due to device related vulnerabilities. The main types of security incidents related to Fitbits are based on [9]: 1. Lack of Authentication and Physical Security Control: Currently, Fitbit smartwatches do not have a built-in security mechanism. Without authentication, Fitbits pose a threat to an individual’s personal information. Due to the lack of authentication these devices are placed as highly vulnerable, since a hacker can gain access to a company’s network through these for potential exploitation. 2. Disadvantages of Bluetooth Connections: Since the main connectivity method for a Fitbit is Bluetooth, it inherits the same vulnerabilities that most Bluetooth devices have during times of communication. Fitbits are susceptible to various threats such as message modification, denial of service (DDoS), and eavesdropping attacks. 3. Location/Tracking and Biometric Leakage: A Fitbit can acquire information regarding a user’s location, either through the built-in GPS, or by pairing with a smartphone’s GPS. Location tracking can raise serious security concerns for individuals and organisations. 4. Third-Party Related Attacks: In 2015, a big number of online accounts of Fitbit users were attacked by hackers. The hackers used details such as email addresses and passwords from third-party websites to log in to Fitbit accounts. They then used the details to file false claims for replacement orders. Also, they managed to gain access to customer personal data such as GPS history [12]. As mentioned above, wearable devices allow the collection of personal data through device sensors initially while the data are then shared through a smartphone [13]. Due to their wireless transmissions, these devices can potentially have vulnerabilities to a malicious attack allowing the data to be exposed. Because a wearable, like a Fitbit exercise watch, is mostly connected and synchronised with the user’s smart phone or laptop, infecting the Fitbit with malware will automatically infect the user’s smart phone when fitness data is uploaded. Once the phone is infected,

38

M. Bada and B. von Solms

all possible compromises can happen, and the infection can be then propagated to other trackers [14]. For example, hackers could wirelessly upload malware onto a Fitbit by using Bluetooth. Although a hacker would need to be near the targeted Fitbit in order to infect it, the Bluetooth connection might take place in any public area such as in a park or coffee shop. The process would take only 10 s, and the user would not notice anything wrong with their Fitbit at that point. Once the user connects the infected wearable device to a PC or laptop, the malware can spread to the personal computer or the business computers and even the entire network [15]. In order for information to be transmitted between two devices, one device must establish a central role in the connection with Bluetooth and the second device must play a peripheral role [11]. For example, in the case of a pair of Bluetooth Fitbit SmartTrack to iPhone, the iPhone would play the role of the central device and Fitbit SmartTrack would be the peripheral device that indicates available connection where the signals contain the IP address of a mobile device and a payload containing data about the connection [11]. Also, these devices include tools such as accelerometers and gyroscopes, which provide useful data with which someone can identify fine motor task movement [16], and orientation [17]. If the potential risks are considered a step further, then it is clear that such devices can be used for more sophisticated cyber-attacks by capturing the gentle movements and position changes of the wrist while writing. Such sophisticated tools can be used to recognise the password typed by a user or the security number of an employee when they type it in real time. Using machine learning, researchers were able to detect the wrist movement while writing digits and were able to construct a robust machine learning model predicting perfect realtime performance [18]. Other security related risks are arising from the payment facilities, through the credit card information which can be stored on the user’s device. As mentioned above, some devices such as Fitbit are currently offering the possibility to connect the device with a credit card and use it as an electronic wallet. The main reference around security of this functionality is around fraud and the responsibility of the consumer’s bank to provide support [5].

4 Cybersecurity Awareness of Users Although users might expect a USB stick to be a way of transmitting malware or become a target for hackers, most users do not expect their fitness trackers to be a target [19]. However, these little devices are the perfect delivery system for malware. People perceive wearable devices as a new mean to interact with their social groups and not as a potential threat to their medical information. In addition, younger people are more likely to share their wearable device’s data online [20]. It is often the case that users are willing to sacrifice security and privacy because of the convenience a smart device provides [21].

A Cybersecurity Guide for Using Fitness Devices

39

Users often lack awareness about the information security and privacy related issues of using wearable devices [3]. In addition, users might not know what types of data are being collected or being stored or transmitted by their wearable devices. Moreover, there is a general lack of awareness around encryption of data during transmission. It is therefore, not a surprise that lack of awareness also leads to users not being aware of the security policy for their wearable device or the security measures used to protect their data. In addition, if users suspect an information security incident, they are not aware where to report it. This lack of awareness of users around privacy and the blind trust of users that their data would be protected might complicate the accountability of service providers in the case of a data breaches [3, 22, 23]. Additionally, there is the issue of ownership of the data [24]. Previous research showed that only a small number of users backed up sensitive or critical data at a regular basis or tested recovery [3]. Therefore, it is obvious that security and privacy considerations of such delicate data collected by IoT devices are essential. As technology developers push new wearable devices to the market and make these devices sync with existing smartphones, malware creators are looking for new avenues of attack [25]. Criminals can use the data to organise cyber-attacks based on identity theft, impersonation by creating a fake user profile or by using more targeted phishing attacks. Voice-recognition technology might also facilitate such cyber-attacks through the different tools already being used in vishing. It is important to notice that wearable devices are the first category of information technology (IT) devices where there is not only danger due to the exposure of consumer data, but also the real potential to cause physical harm to wearers. Therefore, it is essential that all users of fitness devices are informed about the potential risks stemming from using them.

5 Cybersecurity Guidelines for Users and Manufacturers Policy makers, but also businesses, NGOs or users, face enormous challenges since current policies are often inadequate [26]. Currently emerging technologies require a new thinking around privacy, freedom of expression, intellectual property protection and national security. However, policies are not the only necessary measure that needs to be taken. It is important to establish a cybersecurity ecosystem in all sectors and focus more on the attitudes, beliefs and practices of end users. Dutton [27] describes a cybersecurity mindset as “a pattern of attitudes, beliefs and values that motivate individuals to continually act in ways to secure themselves and their network of users”. Consumers need to take some basic steps in order to ensure that their personal information is not exposed to malicious intent. Some of the simplest steps is to research the security features of the intended device and read relevant reviews of the product [28].

40

M. Bada and B. von Solms

Table 1 Warnings and instructions for using wearable devices Functionalities Electronic wallet Data collected

Warning Risks of fraud

Testing

The device is collecting data such as body temperature, pulse rate, food habits and body weight, steps-distance travelled, calories burned and sleep stage. These data are synchronised to devices that transfer information to Fitbit servers

Synchronising with phone or laptop

After the user connects a wearable which has been infected with malware to a PC or laptop, the malware can spread to via the PC or laptop and even infect the entire network Using email addresses and passwords from third-party websites to log in to Fitbit accounts is risky

Connecting the device to email

The device can be vulnerable to risks via the Bluetooth or Wi-Fi connection used

Instructions • Encrypt critical data elements such as ID, passwords and PIN. • Use an antivirus program to scan any connected device for viruses and malware and determine the settings to limit the personal data captured by the device. • Disable the permission in capturing data such as sleep patterns and capture only the data you really need. • Keep Bluetooth in “off” mode when not intentionally being used. • Use a VPN service on all your devices to ensure your privacy. • Think about the purpose and the environment you might want to wear your fitness tracker. • Read the company’s privacy policy and ensure that reasonable steps are taken to protect it. • Conduct research on data breaches of a specific device or company and prevention measures taken in the occasion of a future attack. • Keep your devices and software current with the latest security updates, and the best choice settings relevant to security, such as data sharing, location and Bluetooth. • Use strong passwords and avoid email-password combination repetitions on multiple sites.

However, there is a number of steps users and manufacturers can take in order to lower their exposure to risks. These are summarised in Table 1.

5.1 Cybersecurity Guidelines for Users A number of guidelines which users can follow to ensure a safe use of wearable devices are presented below: Adjusting the Settings. To prevent malware infection, it is important that all personal and business devices are always protected from outside threats. Many

A Cybersecurity Guide for Using Fitness Devices

41

devices, such as smartphones or laptops, allow to secure their Bluetooth connection with a password to prevent unauthorised access. This way, Fitbit will only connect to the customer’s phone. Users need to always exercise caution when plugging any device into a computer. Using an antivirus program to scan any connected device for viruses and malware and determining the settings to limit the personal data captured by the device are important steps towards security. Disabling the permission in capturing data such as sleep patterns and capture only the data a user really needs is also a good measure. Education on Risks. A significantly stronger security posture can be achieved simply by educating Fitbit users about keeping their devices and software current with the latest security updates, and the best choice settings relevant to security, such as data sharing, location and Bluetooth. In particular, users should keep Bluetooth in “off” mode when not intentionally being used to avoid known hacks discussed earlier [9]. Education on Good Practices. One of the reasons for the security breaches mentioned in Sect. 3 is the fact that the service doesn’t require the use of strong passwords, leading to higher risk of users repeating the same email-password combination on numerous websites. Weak passwords such as “123456” are still being used by many Fitbit users [12]. It is therefore essential for users to be aware of good practices around building strong passwords for all their accounts. Consumer-Friendly Privacy Practices. Many of the potential risks described above could be partially resolved by providing consumers with much greater control regarding their data and how they choose to use them. For example, users could have the choice to opt out of targeted advertising, having the right to be forgotten regarding all health related and non-health data. In addition, consumers could be asked regularly to update their privacy-friendly defaults [29]. It is therefore essential also for users of wearable devices to be educated about potential privacy and information security related risks to which they are exposed when using these devices.

5.2 Cybersecurity Guidelines for Manufacturers A number of guidelines which manufacturers can follow to ensure secure wearable devices are presented below: Create Policies and Standards Rigorous quality testing should be a standard practice followed by all IoT device manufacturers. Implementing security techniques like strong password protection for security agnostic IoT devices could incur memory and cost overheads, although these practices are essential in securing the network [30]. Organisations can deter some cyber threats by creating appropriate use policies and user agreements for Fitbits and other wearables.

42

M. Bada and B. von Solms

In addition, manufacturers can establish a functionality of the devices informing the user of potential risks to their personal information, when they log into the device application for the first time. Only after the user accepts the warning advice they can log into the device. This way manufacturers can promote general awareness for consumers and support a cybersecurity mindset for users. By understanding how hackers think, users can take measures to mitigate the potential risks of these devices.

6 Discussion As described above, new IoT devices such as fitness wearable devices place new threats for users. Lack of security considerations or privacy and data protection considerations around these devices pose serious risks initially to the user of the device with potential cascading effects of a security incident to a large number of devices of the user and others connected to that user. While the data security of fitness wearable devices is questionable [31], a research gap is recognised around security and privacy by design considerations of IoT [32]. Despite these threats, manufacturers of IoT devices still do not provide consumers with enough information about the security features of the devices before they purchase the product. In addition, little information on user behaviour and good practise while using these devices is provided [33]. The General Data Protection Regulation (GDPR) [34] provides users with the rights to their personal data that is held by firms, specifically around the right to be informed of how the data will be used or processed, the right of erasure of data or the right to object the data being collected. Consumers should be given easy to understand security information in order to make correct choices when they shop smart devices. Additionally, users should be given enough information regarding their rights around protecting their personal data after purchasing these devices. However, it would still depend on the consumer and the purchase decision they make based on the level of awareness and knowledge regarding cybersecurity. Currently, there is lack of awareness of these vulnerabilities by device owners [35], and this is posing difficulties in addressing the security challenges of IoTs [36]. Considering approaches to educate or protect users while imposing security has proved ineffective [35]. Also, creating a culture of fear around potential threats is problematic. Cybersecurity awareness raising initiatives are necessary in order to increase baseline cybersecurity and develop the skills needed to ensure a safer cyberspace [37]. Users might not realise the risks associated with the use of smart devices [38], this is why it is imperative for all users to gain a basic understanding of the potential harms associated to their use. The security of IoTs is complex and many stakeholders need to be involved, including the user. There is almost no limit to the variety of devices that can be IoT

A Cybersecurity Guide for Using Fitness Devices

43

enabled via wireless connectivity. This is why it is imperative to adopt a proactive IoT-centric security posture, focusing on education and awareness for users.

7 Conclusion The current research has provided a review of the security and privacy related risks related to the use of wearable devices. Users might lack knowledge and awareness around these risks; therefore more efforts are needed from manufacturers of IoT devices to provide consumers with information on user behaviour and good practise while using these devices. Although the focus of this study has been fitness devices, in future work further research will be conducted comparing the risks among different wearable devices.

References 1. Radanliev, P., De Roure, D.C., Maple, C., Nurse, J.R., Nicolescu, R., Ani, U.: Cyber Risk in IoT Systems. Preprints. (2019) 2. Europol: Internet Organised Crime Threat Assessment (IOCTA). https:// www.europol.europa.eu/activities-services/main-reports/internet-organised-crime-threatassessment-iocta-2019 (2019). Accessed 15 Jan 2020 3. Cilliers, L.: Wearable devices in healthcare: privacy and information security issues. Health Inf. Manage. J. 49(2–3), 150–156 (2020) 4. Poongodi, T., Krishnamurthi, R., Indrakumari, R., Suresh, P., Balusamy, B.: Wearable devices and IoT. In: Balas, V.E., Solanki, V.K., Kumar, R., Ahad, M., Rahman, A. (eds.) A Handbook of Internet of Things in Biomedical and Cyber Physical System, pp. 245–273. Springer International Publishing, Cham (2020) 5. Fitbit: Fitbit Pay. https://www.fitbit.com/global/be/technology/fitbit-pay. Accessed 15 Jan 2020 6. Pocket-lint: What is Fitbit Pay, how does it work, and which banks support it? https:/ /www.pocket-lint.com/fitness-trackers/news/fitbit/142115-what-is-fitbit-pay-how-does-itwork-and-which-banks-support-it. Accessed 15 Jan 2020 7. Department of Culture Media and Sport: Code of Practice for consumer IoT security. https://www.gov.uk/government/publications/code-of-practice-for-consumer-iot-security/ code-of-practice-for-consumer-iot-security. Accessed 15 Jan 2020 8. Farnell, G., Barkley, J.: The effect of a wearable physical activity monitor (Fitbit One) on physical activity behaviour in women: a pilot study. J. Hum. Sport Exerc. 12(4), 1230–1237 (2017) 9. Blow, F., Yen-Hung (Frank), H., Hoppa, M.A.: A study on vulnerabilities and threats to wearable devices. J. Colloquium Inf. Syst. Secur. Educ. 7(1) (2020) 10. Kolamunna, H., Jagmohan, C., Hu, Y., Thilakarathna, K., Perino, D., Makaroff, D., Seneviratne, A.: Are wearables ready for secure and direct Internet communication? GetMobile Mobile Comput. Commun. 21, 5–10 (2017)

44

M. Bada and B. von Solms

11. Zhang, C., Shahriar, H., Riad, A.B.M.K.: Security and privacy analysis of wearable health device. In: IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), Madrid, Spain, pp. 1767–1772 (2020) 12. Hackernoon: Million Fitbit accounts were exposed by cybercriminals. https://hackernoon.com/ 2-million-fitbit-accounts-was-exposed-by-cybercriminals-aa7u36pj. Accessed 15 Jan 2020 13. Stuhr, S.A.: Wearable devices and their impact on the security of personal information. Available from ProQuest Dissertations & Theses A&I. (2447022760). https:// ezp.lib.cam.ac.uk/login?url=https://www.proquest.com/dissertations-theses/wearable-devicestheir-impact-on-security/docview/2447022760/se-2?accountid=9851. Accessed 15 Jan 2020 14. Helpnetsecurity: Fitbit trackers can easily be infected with malware, and spread it on. https://www.helpnetsecurity.com/2015/10/22/fitbit-trackers-can-easily-be-infected-withmalware-and-spread-it-on/. Accessed 15 Jan 2020 15. Bay Computing: New Malware can infect your FitBit and spread to your computer. https://baymcp.com/new-malware-can-infect-your-fitbit-and-spread-to-your-computer/ #:~:text=Infecting%20a%20Fitbit%20via%20Bluetooth,or%20any%20other%20public %20area. Accessed 15 Jan 2020 16. Ching, K., Mahinderjit Singh, M.: Wearable technology devices security and privacy vulnerability analysis. Int. J. Netw. Secur. Appl. 8, 19–30 (2016) 17. Britt Cyr, W.H.: Retrieved from Security Analysis of Wearable Fitness Devices (Fitbit). https://www.semanticscholar.org/paper/Security-Analysis-of-Wearable-Fitness-Devices(-)-Cyr-Horn/f4abebef4e39791f358618294cd8d040d7024399. Accessed 15 Jan 2020 18. Lambert, L., Wiere, S.: Digit recognition from wrist movements and security concerns with smart wrist wearable IOT devices. In: Proceedings of the 53rd Hawaii International Conference on System Sciences, Hawaii International Conference on System Sciences (2020) 19. Gizmodo: Hackers can wirelessly upload malware to a Fitbit in 10 seconds. https:/ /gizmodo.com/hackers-can-wirelessly-upload-malware-to-a-fitbit-in-10-1737880606. Accessed 15 Jan 2020 20. Zanella, G., Guda, T.: Managing the gap between disruptive innovation and people’s perceptions: the case of wearable devices. Int. J. Technol. Intell. Plan. 12, 4 (2020) 21. Zeng, E., Roesner, F.: Understanding and improving security and privacy in multi-user smart homes: a design exploration and in-home user study. In: 28th {USENIX} Security Symposium ({USENIX} Security 19), pp. 159–176 (2019) 22. Anaya, L.S., Alsadoon, A., Costadopoulos, N., et al.: Ethical implications of user perceptions of wearable devices. Sci. Eng. Ethics. 24(1), 1–28 (2018) 23. Ogundele, O., Isabirye, N., Cilliers, L.: A model to provide health services to hypertensive patients through the use of mobile health technology. In: Conference Proceedings of African Conference of Information and Communication Technology, Cape Town, South Africa, 10–11 July (2018) 24. Piwek, L., Ellis, D.A., Andrews, S., Joinson, A.: The rise of consumer health wearables: promises and barriers. PLoS Med. 13(2) (2016) 25. Security Intelligence: Wearable IoT ransomware: locking down your life? https:/ /securityintelligence.com/news/wearable-iot-ransomware-locking-down-your-life/. Accessed 15 Jan 2020 26. World Economic Forum: 3 ways AI will change the nature of cyber-attacks. https:// www.weforum.org/agenda/2019/06/ai-is-powering-a-new-generation-of-cyberattack-its-alsoour-best-defence/. Accessed 15 Jan 2020 27. Dutton, W.H.: Fostering a cyber security mindset. Internet Policy Rev. 6(1) (2017) 28. Bada, M.: IoTs and the need for digital norms—a global or regional issue? GigaNet Annual Symposium, 2019 November 25, Berlin. https://www.giga-net.org/2019symposiumPapers/ 27_Bada_IoTs-and-the-need-for-digital-norms.pdf (2019). Accessed 15 Jan 2020 29. Centre for Economic Policy Research: Google/Fitbit will monetise health data and harm consumers. https://euagenda.eu/upload/publications/policyinsight107.pdf.pdf. Accessed 15 Jan 2020

A Cybersecurity Guide for Using Fitness Devices

45

30. Alladi, T., Chamola, V., Sikdar, B., Choo, K.R.: Consumer IoT: security vulnerability case studies and solutions. IEEE Cons. Electron. Mag. 9(2), 17–25 (2020) 31. Hilts, A., Parsons, C., Knockel, J.: Every step you fake: a comparative analysis of fitness tracker privacy and security. Technical Report, for public dissemination. Munk School of Global Affairs, University of Toronto: Open Effect/Citizen Lab, (2016). Accessed 15 Jan 2020 32. Bourgeois, J., Kortuem, G.: Towards responsible design with Internet of Things data. In: Proceedings of the Design Society: International Conference on Engineering Design, vol. 1(1), pp. 3421–3330 (2019) 33. Blythe, J.M., Sombatruang, N., Johnson, S.D.: What security features and crime prevention advice is communicated in consumer IoT device manuals and support pages? J. Cybersecur. 5(1) (2019) 34. European Union: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation). I. 119. https://tinyurl.com/h9qbbur eur-lex.europa.eu. Accessed 15 Jan 2020 35. Bada, M., Sasse, A.M., Nurse, J.R.C.: Cyber security awareness campaigns: why do they fail to change behaviour? In: International Conference on Cyber Security for Sustainable Society, CSSS, 2015, pp. 118–131 (2015) 36. Mannilthodi, N., Kannimoola, J.M.: Secure IoT: an improbable reality. In: IoTBDS, pp. 338– 343 (2017) 37. De Zan T.: Mind the gap: the cyber security skills shortage and public policy interventions. https://gcsec.org/wp-content/uploads/2019/02/cyber-ebook-definitivo.pdf. Accessed 15 Jan 2020 38. Houses of Parliament, Cyber Security of Consumer Devices. Number 593 February (2019)

An Efficient Algorithm for Human Abnormal Behaviour Detection Using Object Detection and Pose Estimation Vaishnavi Narang and Arun Solanki

1 Introduction Human abnormal behaviour detection is an interdisciplinary task that includes various computer vision techniques like video processing, detection of humans within a specific frame, and recognizing the action he performs. This task of human behaviour recognition is challenging since it depends upon various factors. The factors include the video’s resolution, the appearance, size and shape of the human within the frame, the light effects in the video. The video with more shadows makes it more challenging to recognize human behaviour. With recent progress in deep learning, machine learning algorithms [1–3] have evolved so much that the techniques can compete and even beat humans in some tasks, such as image classification on various datasets [4–6]. The task of image classification includes the understanding of the semantics of the image [7]. Various researchers/analysts have already been working on numerous steps for human behaviour recognition. Past researchers have proposed various techniques based on local representations [8–10] which determine local and global area characteristics [11–13] that depict the overall frame characteristics within a video. Some methods even combine the local and global [14, 15] representations with improving efficiency and accuracy. It comprehends the boons from both local and global representations. Human abnormal behaviour detection can strengthen the security systems and can be used in each of those domains where security is an issue. The proposed model can be used in railway stations, metro stations, and metros since it can be used to identify abnormal actions like pushing someone in a metro/platform, hitting

V. Narang · A. Solanki () Gautam Buddha University, Greater Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_4

47

48

V. Narang and A. Solanki

somebody or something, punching, etc. This model can classify between running, walking and standing, and running within the bank premises can be considered as abnormal action as it may depict a theft from the bank. This model can be used there and will alarm the security guards whenever someone tries to run away. This technique can also be used as fall detection, especially in old age homes where many older people stay together. This model can be used in old age homes to detect if any older person has fallen, which may be a persistent case there. The paper is about detecting abnormal behaviour done by people in an enclosed environment like offices, colleges, etc. using video clips. Past researchers have worked in this domain and used various techniques to solve this problem. In multiple techniques are discussed and reviewed to recognize human action. The author identified the action using the Hidden Markov Model. Some authors have used 2D and 3D invariants of view to understand the image’s semantics. In [16], the authors extracted the low/mid-level features from the image to estimate real-time datasets’ pose. The authors also used a variant of CNN (3 stream CNN) to recognize the action [17]. In [18], the researchers proposed a novel SVM classifier to extract the structural information on KTH, UCF-Sports dataset. There are various drawbacks to these techniques. The dataset which has been worked upon must be more realistic. KTH and UCF-Sports datasets are datasets restricted to some basic actions like sitting, standing, running, etc. A human being performs more specific activities like catching a ball while running, or hitting someone/something while standing or drinking while sitting. The other problem with these existing techniques is that the process model’s speed is significantly less, making it challenging to use it in real time. Human behaviour detection can only be used if it provides real-time results, but if the speed of recognizing the action is slow, that system cannot be implemented. The accuracy of the system to recognize abnormal behaviour can be increased. This may include increasing accuracy in detecting human beings within the frame or detecting their posture and classifying them as normal or abnormal. Fast R-CNN model is one of the best-suited models for Object Detection. There are various reasons for using this model. Fast R-CNN allows the object to be detected in real time. Object detection involves the selective search algorithm, which will first extract the region of interest. A single image containing 2000 regions of interest will need to pass through the model 2000 times using a normal CNN model. It concludes that for a 500 images dataset, there would be 1,000,000 forward passes from the same model layer. Using the Fast R-CNN model, the image is passed to the model only once for any number of regions of interest. Also, the training loss in CNN or R-CNN incurs three-stage losses—SVM classifier loss, Log loss and L2 loss. Fast R-CNN involves Softmax Classifier and L1 loss. SSD MOBILE NET is used for action recognition. The model is Single Shot Detector which means that multiple objects can be recognized in a single frame. It helps in determining the action which is performed in the frame in a single shot. It contains multi-scale feature maps to detect the action and finds the result by calculating the weighted sum of localization loss (localizing the frame in which the

An Efficient Algorithm for Human Abnormal Behaviour Detection Using. . .

49

action is performed from the image) and confidence loss (loss from softmax layer). These two combine to recognize the action which is being performed in the image. In the proposed work, object detection techniques combined with pose estimation have been used to recognize abnormal behaviour in human beings. Pose estimation also works on the basics of learning semantics of the image and extracting the frame’s required features. The proposed work’s novelty is to combine Fast RCNN model [19] with SSD-Mobilenet architecture to automate abnormal behaviour recognition. The major contributions of this paper are: The proposed method detects objects using Fast R-CNN and recognizes the pose made by humans using SSD in the video based on Tensorflow Objection Detection. This will increase the speed of the system and thus can be used in real time. The Fast R-CNN model provides better results with lesser time. The paper represents the technique that improves the accuracy of the model on more realistic datasets. The proposed work is experimented on the HMDB dataset, which is more natural. Training the model on such datasets also improves the accuracy of the system in real time. The paper also compares the results on increasing the epochs/training period. This is done to reduce the chances of over-training and thus provides the most accurate results.

2 Organization of Paper The remaining sections of this paper are organized as follows. Section 3 discusses the work done by past researchers in a similar domain. Section 4 provides the details and implementation of the proposed work. The experimental results and outcomes are discussed in Sect. 5, and Sect. 6 concludes the paper with future scope.

3 Literature Review Different approaches have been discussed to find the best algorithm for human behaviour recognition, which is accurate and less complex to be used in real time. Based on physical feature representations, and behaviour recognition, systems can be categorized on various grounds like the point of interest-based representations [5, 6, 10], shape representations, appearances [12, 15], and optical-flow-based representations [19, 20]. Yan Hu [21] an algorithm that uses feature models which aggregate to process images and extracts the objects of interest. The objects extracted from the model are then looked for the salient features. These features are then fed into the leastsquares SVM classifier model to detect the behaviour and classify them as normal or abnormal. The system is made real time with the help of the Hi353I embedded chip.

50

V. Narang and A. Solanki

Popoola et al. [22] previous related surveys, including understanding the semantics of a scene and making the correct inference or meaning from the observed scenario, which may be dynamic. The paper focused on human abnormal behaviour recognition in video surveillance in a restricted context. The motive of the survey was to find the already existing methods or tools and determine their drawbacks. Video quality, occlusion, shadow, illumination, camera movement, and different backgrounds contribute to the challenges. The algorithm was designed to overcome the problems and challenges and determine abnormal behaviour even in a new scenario. Wang et al. [23] proposed a method that handles the challenge of detecting and recognizing abnormal events. The proposed method used an SVM classifier which trains from a minimal number of frames/images. The method used a couple of nonlinear 1-class classifiers to detect human behaviour. Ahmed et al. [24] proposed a method to analyse Dynamic image for abnormal behaviour detection. In this research, CNN (Convolutional Neural Network) was used to detect abnormal behaviour, making it efficient. The proposed system detected humans’ abnormal behaviour in the basic/normal scenario with an accuracy of up to 98%. It could also detect the movement in the dynamic images and recognize the differences. This system could be used as a Security system. Hueihan J. et al. [16] provided a strategic method that offers insights into improving accuracy and producing better results in case of real-time datasets. It trains its model using annotated data of human actions. The dataset used in this work is Joints for HMDB dataset (also called J-HMDB). The proposed work also suggests that the highlevel pose estimation provides better results and even beats the low/mid-level pose estimation. Nasim K. et al. [17] proposed a 3D CNN, i.e., three-dimensional convolution network, which could increase the accuracy in determining human beings’ movement in an environment. The proposed method contains a type of convolution network, which is called three-stream CNN. It is based on 2-dimensional and 3dimensional kernels and is applied to some pre-trained models on datasets like ImageNet and Kinetics. The dataset used is the HMDB dataset. The work reached up to an accuracy of 80.92%. Nicolas B. et al. [18] addressed human behaviour recognition in unconstrained videos using a new method that is robust in global space-time transformations. The structural information is determined using the iterative weighted Support Vector Machine (WSVM). Along with this, a new optimization technique is used to solve the non-smooth WSVM function outcomes. The datasets which are used to train the model are KTH, UCF50 and HMDB dataset. The accuracy came out to be 51.8% in HMDB dataset, which defeats various classic algorithms [18]. Xin C. et al. [25] proposed a new method to detect human action. The method used was MTCNN (multi-task CNN Model) which can detect the change in the frames both spatially and temporally. The datasets used are UCF101-24 and J-HMDB-21. Tasweer A. et al. [26] proposed deep learning approaches to recognize human action. The researchers used path signature theory which can efficiently analyse the temporal sequences. All the path signature features are then passed to CNN to provide a recognition outcome. The results are calculated on three datasets like HMDB-51, UCF-101 and J-HMDB.

An Efficient Algorithm for Human Abnormal Behaviour Detection Using. . .

51

Jianan L. et al. [27] solved the issue of detecting people walking on roads is detected in real-time scenarios. The method proposed is a CNN model, which is also a scale-aware framework. The datasets which are used in this work are INRIA, ETH and KITTI. Various methods have been discussed in this section. Some of the methods used by the past researchers have not been much accurate. The system’s accuracy can be increased by either using a new technique or by combining the boons of existing techniques. Some of the researchers have used a restricted dataset like UCF-Sports or KTH. The models trained on such datasets may exponentially decrease the accuracy in real time where the actions are more specific. The techniques used by some authors, which include CNN architecture, are time-consuming and hamper the technique to be used in real time. The system must provide accurate results within less time.

4 Proposed Work The proposed work contains the present techniques and developments of the video surveillance system. It focuses on the utility and challenges of automating the visual surveillance system and detecting abnormal behaviour, inhospitable intent, and various such malicious activities. The proposed method Human Abnormal Behaviour Detection using SSD-Mobilenet and Fast R-CNN is discussed in this section.

4.1 Working and Operation The proposed model contains a combination of two models. One is Fast R-CNN, which is used for object detection, i.e., detecting a human being or the objects in the human’s environment. The architecture of this model is shown in Fig. 1. This model provides an efficient way to detect multiple objects in a single frame. In this process, the pre-trained model Fast R-CNN_inception_v2_coco will provide an area of interest in the whole frame. It extracts features from the image and produces a feature map. The image is passed through several CNN layers, which are fine-tuned to find Regions of Interest (RoI). This feature map is then fed through another convolution network that contains 3 × 3 filters 1 padding and 512 output channels. The output is then connected to a 1 × 1 convolution layer for classification. This classification is to classify whether that feature map contains the region of interest or not. The final output is then flattened and passed through the Softmax function responsible for classifying the RoI to belong to a certain category [28]. As a result, this model classifies these images into 11 categories. The fast R-CNN model is faster than other classical CNN models because the image is trained on only these outputs instead of training on the whole image. The other model which is used

52

V. Narang and A. Solanki

Fig. 1 Basic architecture of Fast R-CNN model

Fig. 2 Complete architecture of the proposed model

is SSD Mobilenet which is used as a pose estimator. This model is used to estimate the action taken in the image provided by the first model. This means that the output of the first model will be served as input for this model. The SSD-Mobilenet model is one of the best models for pose prediction, as in this model, one shot is enough to detect multiple objects. The basic architecture of the complete model is shown in Fig. 2.

An Efficient Algorithm for Human Abnormal Behaviour Detection Using. . .

53

4.2 Flowchart of Proposed Work The overall functionality of the research work includes the preprocessing, training and testing of the dataset. The working of the research involves object-detection, human pose detection, action detection and finally recognizing them to be as normal or abnormal. The flowchart of the proposed work is depicted in Fig. 3. The main steps in the proposed work include: Step 1: Extracting images from videos: The dataset contains multiple videos for each type of category. The first step is to extract images from the videos. This is done using a python script that captures and saves the frames at some regular interval. The frames are then

Fig. 3 Flowchart of proposed work

54

V. Narang and A. Solanki

selected manually, which best suits the training purpose. These images will act as input data for further processing. Step 2: Labelling the images and converting them to .xml files: The next step involves labelling the image. A single image may contain multiple actions. Then each of those actions must be labelled. A separate script is written for labelling the image by manually drawing boxes around the area of interest within the image and then labelling it according to the action performed. Once the image is labelled, a .xml file is created for each image to be further converted into .csv file. Step 3: Dividing the images for training and testing The dataset is then divided into a ratio of 90:10 and is classified into different folders. These images will be used separately for training and testing purposes. Step 4: Training the model for normal and abnormal actions: The data is now ready for training using the model described in Sect. 4.1. The training model must also contain which actions are considered normal and which actions come into the category of abnormal actions within that certain domain. In this proposed work, 11 actions are taken into consideration and out of which [“hug”, “drink”, “clap”, “catch”, “wave”] are considered as normal behaviour and [“handstand”, “hit”, “fall_floor”, “punch”, “shoot_gun”, “sword”] are considered as abnormal. Step 5: Testing the model: Once the training is complete, the proposed work is tested using a webcam and performing different actions. It shows different coloured boxes for normal and abnormal behaviour. Step 6: Final Detection result: The result is then calculated for its performance based on various parameters like precision and accuracy.

5 Experimentation and Results 5.1 Dataset In this work, the HMDB dataset is selected. This dataset is captured from multiple sources like movies, YouTube and Google videos. The dataset contains 6849 video clips depicting the actions (see Figs. 4 and 5). These actions are divided into 51 categories, and each category includes a minimum of 101 video clips with an average length of 3 s each. These 51 actions categories can be broadly classified as follows: • General facial actions like smiling, laughing, chewing and talking. • Facial actions using some objects like smoking a cigar, eating and drinking.

An Efficient Algorithm for Human Abnormal Behaviour Detection Using. . .

Fig. 4 Illustration of HMDB Dataset

Fig. 5 Illustration of HMDB Dataset

55

56

V. Narang and A. Solanki

• General body movements like a cartwheel, clapping, climbing stairs, diving, falling on the floor, doing a handstand, jumping, pull-ups, push-ups, running, sitting, sit-ups, somersault, stand up, turning, walking and waving. • Body movements with object manipulation like brushing hair with a comb, catching a ball, drawing a sword, dribbling, playing golf, hitting something, kicking a ball, picking something, pushing something, riding a bike, riding a horse, shooting a ball, shooting a bow, shooting a gun, swinging a baseball bat, sword exercise and throwing. • Body movements among human beings like fencing, hugging, kicking someone, kissing, punching, shaking hands and sword fighting. The model was trained for 50,000 steps which give the following graphs were inferred from the training process.

5.2 Tabular Representation Table 1 depicts the change in training loss as the number of epochs increased.

Table 1 Training loss at different epochs Epochs 1000 2000 3000 4000 5000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 46,000 47,000 48,000 49,000 50,000

Learning rate 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5 1e−5

Time (s/step) 2.982 2.645 2.637 2.634 2.632 2.627 2.531 2.597 2.602 2.683 2.632 3.421 2.932 3.578 3.083 3.221 2.523 2.726

Train loss 0.7025 1.5396 0.5708 0.2349 0.1843 0.3843 0.2542 0.1836 2.4513 0.1765 0.1452 0.1132 0.0923 0.1265 0.1306 0.1979 0.0464 0.1139

Validation loss decreased Inf → 0.7025 0.7025 → 1.5396 1.5396 → 0.5708 0.5708 → 0.2349 0.2349 → 0.1843 0.1843 → 0.3843 0.3843 → 0.2542 0.2542 → 0.1836 0.1836 → 2.4513 2.4513 → 0.1765 0.1765 → 0.1452 0.1452 → 0.1132 0.1132 → 0.0923 0.0923 → 0.1265 0.1265 → 0.1306 0.1306 → 0.1979 0.1979 → 0.0464 0.0464 → 0.1139

An Efficient Algorithm for Human Abnormal Behaviour Detection Using. . .

57

5.3 Graphical Representation The localization loss is calculated for the model (see Fig. 6). The localization loss is the loss to localize the object within the frame, i.e. detecting humans within the given image. It may be said as the region proposal network loss. In Fig. 6, it is observable that the average loss was less than 0.05. This means that if a human is present in a frame, it will most likely be detected by the model. The other graph is shown in Fig. 7 is the classification loss. The classification loss is classifying different objects within the frame and inferring whether a human is present within the image/frame or not. It acts as a binary classifier. The graph in Fig. 7 shows a rapid decline in the loss as the number of epochs increased. After the 25,000 steps, the loss was mostly less than 0.05. Figure 8 represents the total loss which can be said as the total summarized result of the network. In the graph shown in Fig. 8, the average loss came out to around 0.2. The proposed work aims at producing results better than the existing algorithms presented so far.

Fig. 6 Localization loss

Fig. 7 Classification loss

58

V. Narang and A. Solanki

Fig. 8 Total loss

Fig. 9 Normal action (CLAP)

5.4 Displaying the Normal Classes as Screenshots The classes which are taken as Normal classes are “Hug”, “Clap”, “Wave”, “Catch” and “Drink” (see Figs. 9, 10, 11 and 12). Figure 9 depicts the normal action “CLAP”. This action is recognized correctly by the system as “CLAP”. Figure 10 depicts the normal action “WAVE”. The system could recognize the action as “WAVE” only. Figure 11 depicts that the system recognizes the action as “DRINK” which is correct. Similarly in Fig. 12, the system correctly recognizes the action as “CATCH”.

An Efficient Algorithm for Human Abnormal Behaviour Detection Using. . .

59

Fig. 10 Normal class (WAVE)

Fig. 11 Normal class (DRINK)

5.5 Displaying the Abnormal Classes as Screenshots The classes which are taken as Abnormal classes are “Hit”, “Fall_floor”, “Sword”, “Handstand”, “Shoot_gun” and “Punch” (see Figs. 13, 14, 15 and 16). Figure 13 depicts the abnormal action as “Handstand” which is correctly recognized by the system.

60

V. Narang and A. Solanki

Fig. 12 Normal class (CATCH)

Fig. 13 Abnormal action (HANDSTAND)

Similarly, Figs. 14 and 15 depict the abnormal action as “HIT” and “FALL FLOOR” which are also identified the same by the proposed system. Figure 16 depicts the abnormal action as “PUNCH” but the system recognizes it as “SHOOT GUN”.

An Efficient Algorithm for Human Abnormal Behaviour Detection Using. . .

61

Fig. 14 Abnormal action (HIT)

Fig. 15 Abnormal action (FALL FLOOR)

5.6 Analysis This work proposed a model that is fast enough to provide results in almost real time. The Fast R-CNN model is used for human detection within a frame because it gives results with an accuracy of 70.4% and is nearly 213 times faster than regular CNN and 9 times faster than R-CNN, according to the study. Thus, this approach

62

V. Narang and A. Solanki

Fig. 16 Abnormal action (PUNCH)

provides accurate results in a very minimal time. The localization loss was less than 0.05. The localization loss has a significant variance at different intervals of epochs. The loss was as maximum as 0.11 at epoch 49,560 and reduced to 0.04 at epoch 50,000. The classification loss decreased monotonously and reached an average of 0.05. After training, the total loss came out to be around 0.2. When tested on realtime videos, the system correctly identifies almost all the actions. The action it could not recognize was “punch”. The confidence level in each of the action recognition was above 82% in all the categories.

6 Conclusion and Future Scope This paper proposes a novel method to detect abnormal human behaviour based on object detection and pose estimation. Object detection is used for locating human beings within the video frame and is done using a model architecture known as Fast R-CNN. The next step involves recognizing the actions of human beings located within the frame. This is done using SSD Mobinet architecture. The combined model is trained on the HMDB dataset. It is a large human database that contains 51 categories of actions. Eleven categories are taken from this dataset to train the model. The categories used in this paper are “Catch”, “Clap”, “Hit”, “HandStand”, “Drink”, “Sword”, “Shoot gun”, “Fall floor”, “Wave”, “Punch” and “Hug”. The model is trained for 50,000 epochs. The total loss of training of the combined model is 0.2. This work also discusses the accuracy, error or loss of training and the impact of increasing or decreasing the number of epochs on the proposed model with its

An Efficient Algorithm for Human Abnormal Behaviour Detection Using. . .

63

result. The proposed work is limited to a specific domain but can be easily trained for any environment or real-time scenario. This model proved to be better than the state-of-the-art approach. The proposed work’s future work can be extended to better understand the abnormal behaviour by detecting abnormal objects in the videos/images. The abnormal objects include knives, guns, wands, swords or any such objects. The future work can also be extended to analyse the sentiments or expressions in the video. This will surely improve abnormal behaviour detection. The person with an angry expression with a wand is abnormal, as the person might hit other people around.

References 1. Tayal, A., Kose, U., Solanki, A., Nayyar, A., Saucedo, J.A.M.: Efficiency analysis for stochastic dynamic facility layout problem using meta-heuristic, data envelopment analysis and machine learning. Comput. Intell. (2019). https://doi.org/10.1111/coin.12251 2. Rajput, R., Solanki, A.: Review of sentimental analysis methods using lexicon based approach. Int. J. Comput. Sci. Mob. Comput. 5(2), 159–166 (2016) 3. Rajput, R., Solanki, A.: Real-time analysis of tweets using machine learning and semantic analysis. In: International Conference on Communication and Computing Systems (ICCCS2016), Taylor and Francis, at Dronacharya College of Engineering, Gurgaon, 9–11 Sept, vol. 138(25), pp. 687–692 (2016) 4. Ashwan, A., Yu-Kun, L., Xianfang, S.: Saliency guided local and global descriptors for effective action recognition. Comput. Vis. Media. 2(1), 97–106 (2016) 5. Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., Pietikäinen, M.: Deep learning. Int. J. Comput. Vis. 128, 261–318 (2020) 6. Scovanner, P., Ali, S., Shah, M.: 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 357–360 (2007) 7. Priyadarshni, V., Nayyar, A., Solanki, A., Anuragi, A.: Human age classification system using K-NN classifier. In: Luhach, A., Jat, D., Hawari, K., Gao, X.Z., Lingras, P. (eds.) Advanced Informatics for Computing Research. ICAICR 2019. Communications in Computer and Information Science, vol. 1075. Springer, Singapore (2019) 8. Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatiotemporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) Lecture Notes in Computer Science, vol. 5303, pp. 650–663. Springer, Berlin (2008) 9. Yuan, C., Li, X., Hu, W., Ling, H., Maybank, S.: 3D R transform on spatiotemporal interest points for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 724–730 (2013) 10. Zhang, H., Zhou, W., Reardon, C., Parker, L.: Simplex-based 3D spatio-temporal feature description for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2067–2074 (2014) 11. Chen, S., Jakeman, A., Norton, J.: Artificial Intelligence Techniques: An Introduction to Their Use for Modelling Environmental Systems. IMACS (2008) 12. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013) 13. Taylor, W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatiotemporal features. In: Proceedings of the 11th European Conference on Computer Vision: Part VI, pp. 140–153 (2010)

64

V. Narang and A. Solanki

14. Sun, X., Chen, M., Hauptmann, A.: Action recognition via local descriptors and holistic features. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 58–65 (2009) 15. Tayal, A., Solanki, A., Singh, S.P.: Integrated frame work for identifying sustainable manufacturing layouts based on big data, machine learning, meta-heuristic and data envelopment analysis. Sustain. Cities Soc. (2020). https://doi.org/10.1016/j.scs.2020.102383 16. Hueihan, J., Juergen, G., Silvia, Z., Cordelia, S., Michael, J.B.: Towards Understanding Action Recognition, pp. 1550–5499. IEEE, New York (2013) 17. Nasim, K., Mehdi, R.: Three-stream very deep neural network for video action recognition. In: 4th International Conference on Pattern Recognition and Image Analysis (IPRIA), Tehran, Iran (2019) 18. Nicolas, B., Yi, Y., Zhen-zhong, L.: Space-time robust video representation for action recognition. IEEE Int. Conf. Comput. Vis. (2013) 19. Singh, T., Nayyar, A., Solanki, A.: Multilingual opinion mining movie recommendation system using RNN. In: Singh, P., Pawłowski, W., Tanwar, S., Kumar, N., Rodrigues, J., Obaidat, M. (eds.) Proceedings of First International Conference on Computing, Communications, and Cyber-Security (IC4S 2019) Lecture Notes in Networks and Systems, vol. 121. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3369-3_44 20. Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: IEEE Conference on Computer Vision and Pattern Recognition on Proceedings, pp. 1932– 1939 (2009) 21. Hu, Y.: Design and implementation of abnormal behavior detection based on deep intelligent analysis algorithms in massive video surveillance. Journal of Grid Computing. 18 (2020). https://doi.org/10.1007/s10723-020-09506-2 22. Popoola, O.P., Kejun, W.: Video-based abnormal human behavior recognition—a review. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(6), 865–878 (2012) 23. Wang, T., Jie, C., Hichem, S.: Online detection of abnormal events in video streams. J. Elec. Comput. Eng. (2013) 24. Ahmed, Md, Mushfique I., Al-Amin A.: Dynamic image analysis for abnormal behavior detection, PhD diss., BRAC University (2017) 25. Xin, C., Yahong, H.: Multi-task CNN Model for Action Detection. IEEE, New York (2018) 26. Tasweer, A., Lianwen, J., Jialuo, F., Guozhi, T.: Human Action Recognition in Unconstrained Trimmed Videos Using Residual Attention Network and Joints Path Signature, vol. 7. IEEE, New York (2019) 27. Jianan, L., Xiaodan, L., Shengmei, S., Tingfa, X., Jiashi, F., Shuicheng, Y.: Scale-Aware Fast R-CNN for Pedestrian Detection. IEEE, New York (2017) 28. Ahuja, R., Solanki, A.: Movie recommender system using K-means clustering and K-nearest neighbor. In: Accepted for Publication in Confluence-2019: 9th International Conference on Cloud Computing, Data Science & Engineering, vol. 1231(21), pp. 25–38. Amity University, Noida (2019)

A Secure and Scalable IoT Consensus Protocol Beverley A. MacKenzie, Ian Ferguson, and Abdul Razaq

1 Introduction In a few short years the Internet of Things (IoT) has become an intrinsic part of life, creating a world where computers become ambient technologies which are always on and always available. Technologies which are capable of instinctively obeying demands while inconspicuously remaining in the background. Technologies which are already having a positive effect on the human to computer experience [1]. However, there has been inconsistency with respect to infrastructure design and the application of different protocols [2]. The IoT ecosystem is therefore filled with a range of incompatible technologies, devices and protocols which are plagued by scalability and security issues [3]. Despite this, IoT connected devices are being included in homes, cars, medical equipment, children’s’ toys and doorbells. In addition to this, the IoT ecosystem is now having an adverse impact on the Internet, with miscreants using IoT devices to orchestrate denial-of-service attacks and distributed-denial-of-services attacks [4, 5]. There is now a considerable body of research that has recognised that IoT’s heterogeneous mesh of network devices and protocols have created a unique set of risks and problems that will affect most households [6, 7], from breaches in confidentiality, which could allow users to be snooped on, through to failures in integrity, which could lead to consumer data being compromised [5, 7–9]. IoT devices are presenting many security challenges to which consumers are ill equipped to protect themselves from [10]. Vulnerabilities are due to a range of factors including unsafe networks, infected mesh devices, poor password protection and data being transmitted in clear text [4,

B. A. MacKenzie () · I. Ferguson · A. Razaq Division of Cyber Security, Abertay University, Dundee, UK e-mail: [email protected]; [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_5

65

66

B. A. MacKenzie et al.

11]. The potential impact of these insecurities was demonstrated by the Mirai and Persirai botnet attacks [12]. The progression of these attacks was via IoT network devices. Due to poor security, miscreants were able to use IoT devices to instigate DDoS attacks on Dyn, a DNS service provider, GitHub, Twitter, Reddit, Netflix and Airbnb [12]. Even though it has been shown that the infrastructure suffers with security and scalability issues which are now compromising the internet [4, 13], new IoT devices are continually being rolled out [12, 14, 15]. Notwithstanding this, IoT technology is being embedded in many everyday items. The IoT environment has therefore been described as a poorly protected, hostile network where data may be snooped upon and exploited [7, 16, 17]. To resolve these issues it has been suggested that blockchain technology may contain the answer [18–21]. Due to blockchain’s security characteristics and its ability to securely transfer data across a distributed network, it has been suggested that blockchain could be capable of meeting IoT’s security and safety requirements [20]. However, to achieve this, blockchain will need to resolve its security, scalability, safety and liveness issue [8, 10], a challenge that is at the heart of blockchain breaches [19]. However, if blockchain technology is to be successfully used in the IoT environment, an applicable blockchain consensus protocol will need to be identified [22], a consensus protocol which can deal with billions of IoT request [10]. In this paper, a new consensus algorithm is presented. This consensus algorithm first uses randomisation to identify its lead node—a process that reduces scalability. Next it uses error detection to achieve safety and prevent double spend/fraudulent spend. Finally, the random selection of the lead cell and partial synchronisation, between validating and authenticating nodes, enables this consensus protocol to achieve liveness. Security is achieved via the application of cryptographic primitives to provide non-repudiation, integrity, immutability and confidentiality [10]. Under this model, cells who deviate from normal behaviour, in an arbitrary way, are identified by the implementation of an error checking algorithm. The division of data into two channels provides separation of duties [23]. A robust security mechanism can only be compromised when each node is individually attacked. Moreover, the inclusion of synchronised time provides a mechanism which protects a cell from becoming compromised by a man-in-the-middle attack [24–29]. Finally, the model also includes cryptographic services which provide data origin authentication, along with encryption of data in transit and data at rest. This paper is structured as follows: Sect. 2 highlights the background; Sect. 3 elaborates the balance authentication mechanism; Sect. 4 enlights consensus process; achieving scalability is in Sect. 5 and the paper concludes with future scope in Sect. 6.

A Secure and Scalable IoT Consensus Protocol

67

2 Background 2.1 A Blockchain Consensus Algorithm Blockchain is the technology behind bitcoin, which was originally described as an electronic currency [30] and over the past decade has gained acceptability. This technology can store digital information in a secure and safe manner. Data held in a blockchain is protected from alterations and snooping while being transmitted across networks. At the heart of this technology is a consensus algorithm, an algorithm which is used by nodes to achieve agreement on the validity of a transaction. It is now well established from a variety of studies that the development of a peer-to-peer blockchain consensus protocol for the IoT environment could resolve its security and scalability issues [31]. Due to blockchain’s security and scalability properties it has been postulated that blockchain smart contracts (BC—smart contracts) could provide secure transportation of IoT traffic [20]. A blockchain consensus algorithm that is capable of being used in the IoT ecosystem has generated a lot of interest. Particularly a technology which could operate in a decentralised data intensive network. If such a consensus protocol could facilitate the transmission of billions of bits of information, it would also be capable of operating within an IoT data intensive environment [22]. This is because of blockchains consensus protocol’s ability to provide [10]: Pseudo anonymity Confidentiality Authenticity Immutability Interoperability Scalability Privacy Non-repudiation Data integrity Although before these technologies can be successfully integrated, the correct blockchain consensus protocol will need to be identified.

2.2 The Consensus Problem in Context Lamport et al. [28] in 1982 introduced the network consensus protocol problem in the paper ‘The Byzantine General Problem’ discussing the issues which affects consensus. However, it was Fischer et al. [26], who provided details on the difficulty of achieving consensus in a distributed network if just one node failed. In this paper, it was shown that when an unbounded time network is faced with a single node

68

B. A. MacKenzie et al.

failure safety and liveness could not be guaranteed. (Safety—all nodes agree on the authenticity of an IoT request; Liveness—all nodes responsible for consensus take part in the process [26]). Although in 1994 it was Chandra et al. [24] who identified how it was possible to circumvent the restrictions which had been laid down in the FLP impossibility problem. Chandra et al. [25] proposed four methods that could be used to circumvent these restrictions as follows [25]: • • • •

Randomisation Weak Failure Detection Weak Problem and Solution Model of Partial Synchronisation

Moreover, two of Chandra et al. [25] identified mechanisms have already been used in well-established consensus protocol. Castro et al. [46] consensus protocol used partial synchronisation in its application of the Practical Byzantine Fault Tolerant consensus protocol. (However, due to quadratic message authentication requirement this consensus protocol contains scalability issues when used in a distributed environment.) Nakamoto in 2006 blockchain Proof of Work (PoW) consensus used randomisation in its lead node identification [30]. (However, PoW’s high use of resources renders it impractical for an IoT environment.) Moreover, over the last decade there have been several consensus protocols that have been presented as potential blockchain consensus protocols. However, none of these have provided a complete solution to the security, scalability, safety and liveness issues that affect consensus; when that consensus takes places in an asynchronous environment, that is not subjected to bounded time restrictions [27, 30, 32–39]. The consensus protocol presented in this paper circumvents the FLP restrictions by implementing randomisation, error detection and partial synchronisation.

3 The Balance Authentication Mechanism In this section a new consensus algorithm is presented. This section contains information pertaining to how this consensus protocol achieves security and scalability. The model is the combination of a consensus protocol and a blockchain environment. The Balance Authentication Mechanism (BAM) uses probabilistic randomness to choose the lead node and timing synchronisation with respect to the communication of nodes in a cell [24–26]. An isomorphic balance equation and an error algorithm are used to ensure liveness and safety. Security is achieved via the application of the cryptographic primitives of nonrepudiation, integrity, immutability and confidentiality [10]. Under this model cells who deviate from normal behaviour, in an arbitrary way, are identified by the implementation of an error checking algorithm. The division

A Secure and Scalable IoT Consensus Protocol

69

of data into two channels provides separation of duties [23]. A robust security mechanism can only be compromised when each node is individually attacked [23]. Moreover, the inclusion of a synchronised time bound between Acell and Fcell requires both nodes be compromised, if the cell is to be compromised (a requirement that is in line with a double entry validation process [40, 41]). Both cells have to be compromised with isomorphic data, within the bounded time window, a requirement that protects against dominance attacks. This is because to gain control of the consensus process, a miscreant must gain control of all Acells and Fcells. A task that has an attack surface area of c2 —where c indicates cells. However, a miscreant could attempt to gain control of the lead node, a process which would require the miscreant to identify the lead node. Moreover, as the lead node identification is based on randomness (4.1) in an environment where there are only 2 cells, the probability of identification is 0.05; however, in an environment where there are 100 cells, the probability of identification is 0.001. The model also includes cryptography primitives which provide data origin authentication, along with encryption of data in transit and data at rest.

4 The Consensus Process BAM’s distributed network is broken down into cells (see Fig. 1). Each cell contains four nodes. The communication between nodes in a cell is subjected to synchronise time restrictions. The processor node is responsible for authenticating data and data origin. Data is then separated into two channels and broadcasted to the other members of the cell. The other nodes are responsible for data authentication, validation, verification and consensus. The use of randomness ensures that the byzantine failure of a cell or a node has no impact on the operation of the algorithm [30].

4.1 Choosing a Lead Cell The Balance Authentication lead cell (BA lead cell) is responsible for authenticating and validating each IoT request. The choice of lead cell happens when a cell announces that an IoT authentication and validation has occurred. However, a second announcement of data authentication and validation must occur before the data achieves consensus. Moreover, if a subsequent cell rejects the authentication and validation of an IoT request, then the IoT request is suspended until the error checking mechanism identifies which cell is in an error state. As in the case of proof of work, BA lead cell identification is based on its location and its transactional speed, i.e. probabilistic random factor in asynchronous network where transactional speed is a constant variable [30].

70

B. A. MacKenzie et al.

IoT Request

Processor

Processor

Transacon Data

Financial Data

CELL 1

Authencaon

Transacon Data

Financial Data

CELL 2

Authencaon

Authencaon

Validaon

Authencaon

Validaon

Consensus

Fig. 1 The Consensus Process

4.2 An IoT Request IoT requests are placed in blocks. Each block in the process contains a single transaction. The block contains a request header and a request body. The header contains information pertaining to nonce, date, digital signature, transaction ID, balancing figure and code. The body of the request contains the IoT instructions. Consensus is based on the Boolean valuation of each part of the header data such that consensus is: (Vcell1 (Fcell1 &&Acell1 )) && (Vcell2 (Fcell2 &&Acell2 )).

The data contained in each IoT request is ordered and separated into two parts— financial data and transactional data. This data is transmitted along two separate and independent channels—financial channel or a transaction channel. Data types are sent along these channels for authentication. Therefore, financial authentication is: a→b

A Secure and Scalable IoT Consensus Protocol

71

Transaction authentication is: d→c This process complies with the principles of separation of duties and balance authentication [23, 41]. Each channel is responsible for authenticating IoT header data against ledger data, financial data and smart contract data. A Boolean checking algorithm is used in this process.

4.3 A Cell All nodes in the distributed network are assigned to a cell. Cells contain four nodes. Nodes can either be a finance node (f), transaction node (t), process node (p) or a validation node (v). Each cell is a uniquely identifiable independent entity. All communication between cells operates within a time bounded environment via the application of either asynchronous or synchronous key exchange (a partial synchronised environment).

4.4 Double Spend Protection To protect from double-spend or fraudulent misappropriation of funds, the consensus protocol complies with the following rules: • • • •

Only a correct node may propose an IoT request Only a proposed IoT request may be authenticated Only an authenticated IoT request can be validated Only a validated IoT request can achieve consensus

Invariants are in line with the requirements of safety and liveness [26, 28, 29]. A correct node is a node who is a part of a cell. A proposed IoT request is a request which has been proposed by a processing node. An authenticated IoT request is a request which has been authenticated by both transaction and financial nodes. A validated IoT request is a request that has been validated by a validating node. Consensus is achieved when two cells have authenticated and validated an IoT request.

72

B. A. MacKenzie et al.

4.5 The Isomorphic Algorithm Boolean isomorphic algorithms are used to authenticate each channels data, validate the output of each channel and to error check the consensus process. Such that: Consensus Algorithm Input: Client request Device requests are subjected to cryptographic checks to ensure the data is correct and complete. Only clients’ request that pass these checks achieve consensus. A devices transaction (T) request is defined as containing a: nonce (r), timestamp (t), digital signature (sigk ), code (c), balancing figure (b), action (a), finance (f) and message (m). A client’s transaction is therefore defined as T∈{r,t,sigk ,c,m,b} Sept 1 Pcell is responsible for authenticating clients’ digital signatures if sigk == smart contract digital signature then exit 1, ‘error, request failed’ else, data is split into two, financial data, and transactional data. Each half is placed in a block (b). Each block is broadcast via an independent channel to a financial authentication node, or a transaction authentication node. Step 2 Acell is responsible for authenticating the data contained in b1 ∈{a,r,t,sigk ,cf ,m} against the clients smart contract allowed action list if request == smart contract data then exit 1, ‘error, request failed’ else, print ‘authentication data to Vcell’ Step 3 Fcell is responsible for authenticating the data contained in b2 ∈{f,r,t sigk ,ct ,m} if request == smart contract data then exit 1, ‘error, request failed’ else print ‘authentication data to Vcell’ Step 4 Vcell authentication if Acell == Fcell then exit 1, ‘error, request failed’ else print ‘authentication has been validated’ (continued)

A Secure and Scalable IoT Consensus Protocol

73

a Æb validation dÆc

(1) For consensus to be achieved, two independent cells have to authenticate the IoT request such that: a1 Æb1 v1 d1Æc1

C a2 Æb2 v2 d2Æc2

(2)

Consensus is therefore Cn ← (v1 &v2 )

(3)

4.6 The Security Provisions This model attempts to protect data at rest and data in transit by applying accepted cryptographic primitives. The security mechanism used in this process is in line with an IoT security framework [10]. The IoT request, TCP/IP packet header, contains security mechanisms that provide protection against hacking attacks and errors. The consensus protocol and blockchain presented in this paper are built to be used in the internet. The protocol is designed to provide security to IoT request packets that uses TCP/IP protocol to travel across the internet. Security provisions are broken down into five stages. 4.6.1

Stage One

Device requests are subjected to cryptographic checks to ensure the data is correct and complete. Only IoT requests which pass these checks are processed.

74

B. A. MacKenzie et al.

An IoT request should contain nonce (r), timestamp (t), digital signature (sigk) two authentication codes (cf and ct), balancing figure (f) and message (m). The use of r,t,sigk ensures: The IoT request contains data origin authentication—ensuring the data was issued by an authenticated IoT device. Data Integrity—ensures data has not been tampered with. Non-repudiation—prevents a user from denying their action. The use of ct,cf.,b primitives protects against: • Double spend [42] • Replay attack [5] • Eclipse attack [43] The security that is provided to an IoT request is: Request∈(r,t,sigk ,ct ,cf ,b,m)

4.6.2

Stage Two

Transactional data is split into two, financial data and transactional data. Each half is placed into a block (b). Each block is broadcast via an independent channel to a financial authentication node or a transaction authentication node. The following cryptographic primitives are applied to each block of b11 ∈ (r,t,sigk,cf,f,m) and b12 ∈ (r,t,sigk,ct,f,m). Blocks are therefore provided the following protections: • Authentication: This can be split into: Entity authentication—ensuring the person/system you are communicating with is the person/system you intend to be communicating with; data origin authentication—ensuring the data you received came from the correct place [44]. • Data Integrity: Preventing an unauthorised entity from carrying out unauthorised changes or destruction of data. The integrity of each block should be verifiable and accountable [44]. • Non-Repudiation: Preventing an entity from denying they took a specific action [44]. • Access Control: Access control relates to authorisation methods used to ensure only authorised persons have access to data [44]. • Immutability: Immutability provides data with a fixed and unchangeable audit trail [30, 45].

4.6.3

Stage Three

The authentication of data contained in each block is carried out by two nodes, the transaction node and the finance node. The authentication process involves both cells independently authenticating the header and body data. This security mechanism is based on the separations of duties principle [23]. A security mechanism that

A Secure and Scalable IoT Consensus Protocol

75

is used in the balance authentication of the IoT request. It protects the IoT request from carrying out unauthorised actions. 4.6.4

Stage Four

The validation of data is carried out by a fourth node, the validation node (v). The validation node ensures transactional authentication (at) and financial authentication (af) are equivalent and true (where true means authenticated). This provides consensus validation. This security mechanism protects the IoT request from unauthorised changes. Validation (v) is achieved when: F: {at == af } → v. 4.6.5

Stage Five

The final stage requires two cells to confirm the authentication and validation of an IoT request. Moreover, because data from both cells must agree, it complies with the requirements of liveness and safety. However, if consensus is rejected, the request is suspended while both cells are subjected to an error checking process [24, 25]. Only the first two nodes need to broadcast their confirmation of consensus. Consensus is therefore described as: Cn ← (v1 &v2 ) Once consensus is achieved, nodes in a cell are independently responsible for committing the block to their Merkle Tree.

4.7 Error Detection The system uses a binary decision checking algorithm to check for errors in consensus. Because consensus is based on conditional Boolean logic, it is possible to build an error checking mechanism into the system. This error checking method can identify errors which create a byzantine failure. This process also identifies stop and start errors when it prevents Fcell and Acell responding within their synchronised time window.

4.8 Merkle Tree The Merkle Tree is a record of all IoT requests that have been authenticated or validated by a node. It uses a mathematical formula to create a hash of this data. Each blocks hash is chained to the subsequent blocks hash. It provides data integrity and data auditability.

76

B. A. MacKenzie et al.

The validator Merkle tree is the aggregate of all IoT requests. Whereas the financial node’s Merkle tree is an aggregate of all financial data and the transaction node is an aggregate of all transaction data. These Merkle trees are made up of recursive hash pairs of data. Moreover because of the design it is also possible to use the financial node’s and the transactional node’s Merkle tree to confirm the validity of the validation node’s Merkle tree.

5 Achieving Scalability The use of time complexity to measure the scalability of a consensus protocol was demonstrated in Luu et al. [33]. BAM uses probabilistic randomness and an error detection mechanism which enables the process to be scalable. Unlike the quadratic message exchange that takes place in BFT style consensus process, BAM uses a single message process which gives it a time complexity of O(n). The scalability of this consensus protocol was assessed while 1000, 2000, 3000, 4000 and 5000 blocks were processed. It should also be noted that testing was carried out in a virtual environment which ensured time stamps and synchronisation. The speed and scalability of the consensus process is presented in Table 1. As seen in Fig. 2, the proposed protocol has a constant time complexity. Therefore, BAM scalability is based on its time complexity which is constant regardless of an increase in the task.

6 Conclusion The main aim of this research paper is to present an IoT Blockchain consensus protocol that meets the requirements of safety and liveness, while ensuring security and scalability. This paper first comprehensively explains the limitations of previous researchers, to take these requirements into consideration—an omission which leaves the vast majority of consensus protocols susceptible to dominance attacks and fraudulent activity, while other consensus protocols are subjected to a time complexity issue, which leads to exponential growth.

Table 1 Scalability Blockchain request 1000 2000 3000 4000 5000

Start time 16.44:30 16.50:19 16.54:02 16:58.59 17.06:14

Finish time 16.46:11 16.52:56 16.57:39 17.04:33 17.11:53

Duration 101 157 217 322 299

Processing time 0.101 0.0785 0.072 0.0805 0.0598

A Secure and Scalable IoT Consensus Protocol

77

Processing time per transaction

Processing Time 0,12 0,1 0,08 0,06 0,04 0,02 0 0

10000

20000

30000

40000

50000

60000

Number of Transactions Fig. 2 Time complexity

The consensus protocol proposed in this paper offers a solution to the FLP impossibility problem, by providing integrity of the consensus decision [26]. It is a solution that does not have an adverse impact on time complexity, with a linear growth rate—i.e. it is very scalable [33]. The presented solution also used established cryptographic primitives to protect the integrity of data in transit and at rest. Due to the use of an isomorphic HMAC validation process, the algorithm is tamper resistant—i.e. it provides both integrity protection and integrity detection. Moreover, when this equation is combined with a Merkle tree blockchain algorithm, immutability protection is also provided. By providing integrity of data at rest, integrity of data in transit and integrity of the consensus decision, this consensus offers a method for circumventing the FLP restrictions [26]. A widely held belief that has been affecting the direction of modern-day consensus discussion—with respect to achieving consensus in an asynchronous environment, when faced with unbounded time restrictions. A discussion that is having a direct effect on the security, scalability, safety and liveness of the present-day Blockchain environment. The results of this paper prove that the proposed system is affective at preventing fraudulent spending (double spend) [17] and erroneous consensus. The result of testing also demonstrated that the information security mechanisms of nonrepudiation, confidentiality, integrity, authentication and authorisation provide data with protection from miscreant activity. This consensus protocol guarantees immutability by the use of a Merkle tree, block hashing and an error checking algorithm. A process that ensures data at rest is tamper resistant.

78

B. A. MacKenzie et al.

This mathematic approach to the problem removes the need for an exhaustive threat, vulnerability and likeness analysis. Testing of this consensus protocol was in line with the assertions of cryptographic primitives that were employed, a process that confirmed this consensus protocol to be robust enough to withstand attacks on data at rest and in transit. Testing confirmed the BAM consensus protocol linear time complexity, which means regardless of failure or an increase in requests, this consensus protocol will have a consistent linear increase in processing time. BAM is a consensus protocol that is resistant to byzantine failure, non-byzantine failure and miscreant activity. It therefore protects data at rest, data in transit and the consensus decision process. Future work will focus on assessing the consensus protocol in a live, wild environment, to assess the robustness of its logic.

References 1. Hung, M.: Insight on How to Lead in a Connected World (2017) 2. Neshenko, N., Bou-Harb, E., Crichigno, J., Kaddoum, G., Ghani, N.: Demystifying IoT security: an exhaustive survey on IoT vulnerabilities and a first empirical look on internet-scale IoT exploitations. IEEE Commun. Surv. Tutorials. 21, 2702–2733 (2019) 3. Dawson, M.: Cyber Security Architectural Needs in the Era of Internet of Things and Hyperconnected Systems (2016) 4. Hossain, M.M., Fotouhi, M., Hasan, R.: Towards an analysis of security issues, challenges, and open problems in the Internet of Things. In: 2015 IEEE World Congress on Services (2015) 5. Hwang, Y.H.: IoT security & privacy: threats and challenges. In: Proceedings of the 1st ACM Workshop on IoT Privacy, Trust, and Security (2015) 6. Sicari, S., Rizzardi, A., Grieco, L.A., Coen-Porisini, A.: Security, privacy and trust in Internet of Things: the road ahead. Comput. Netw. 76, 146–164 (2015) 7. Gupta, S.D., Ghanavati, S.: Towards a heterogeneous IoT privacy architecture. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing (2020) 8. Khan, M.A., Salah, K.: IoT security: review, blockchain solutions, and open challenges. Futur. Gener. Comput. Syst. 82, 395–411 (2018) 9. Zhang, Z.-K., Cho, M.C.Y., Wang, C.-W., Hsu, C.-W., Chen, C.-K., Shieh, S.: IoT security: ongoing challenges and research opportunities. In: 2014 IEEE 7th International Conference on Service-Oriented Computing and Applications (SOCA) (2014) 10. MacKenzie, B., Ferguson, R.I., Bellekens, X.: An assessment of blockchain consensus protocols for the Internet of Things. In: 2018 International Conference on Internet of Things, Embedded Systems and Communications (IINTEC) (2018) 11. Feng, H., Fu, W.: Study of recent development about privacy and security of the Internet of Things. In: 2010 International Conference on Web Information Systems and Mining (2010) 12. Kolias, C., Kambourakis, G., Stavrou, A., Voas, J.: DDoS in the IoT: Mirai and other botnets. Computer. 50, 80–84 (2017) 13. Zhou, W., Jia, Y., Peng, A., Zhang, Y., Liu, P.: The effect of iot new features on security and privacy: new threats, existing solutions, and challenges yet to be solved. IEEE Internet Things J. 6(2), 1606–1616 (2018) 14. Ali, S.T., McCorry, P., Lee, P.H.-J., Hao, F.: Zombiecoin: powering next-generation botnets with bitcoin. In: International Conference on Financial Cryptography and Data Security (2015) 15. Dittrich, D.: So you want to take over a botnet . . . . In: Presented as part of the 5th USENIX Workshop on Large-Scale Exploits and Emergent Threats (2012)

A Secure and Scalable IoT Consensus Protocol

79

16. Shang, W., Yu, Y., Droms, R., Zhang, L.: Challenges in IoT networking via TCP/IP architecture, Technical Report NDN-0038. NDN Project (2016) 17. Biason, A., Pielli, C., Zanella, A., Zorzi, M.: Access control for IoT nodes with energy and fidelity constraints. IEEE Trans. Wirel. Commun. 17, 3242–3257 (2018) 18. Huang, J., Kong, L., Chen, G., Wu, M.-Y., Liu, X., Zeng, P.: Towards secure industrial IoT: blockchain system with credit-based consensus mechanism. IEEE Trans. Ind. Informatics. 15, 3680–3689 (2019) 19. Christidis, K., Devetsikiotis, M.: Blockchains and smart contracts for the Internet of Things. IEEE Access. 4, 2292–2303 (2016) 20. Dorri, A., Kanhere, S.S., Jurdak, R., Gauravaram, P.: Blockchain for IoT security and privacy: the case study of a smart home. In: 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) (2017) 21. Reyna, A., Martín, C., Chen, J., Soler, E., Díaz, M.: On blockchain and its integration with IoT. Challenges and opportunities. Futur. Gener. Comput. Syst. 88, 173–190 (2018) 22. Li, X., Jiang, P., Chen, T., Luo, X., Wen, Q.: A survey on the security of blockchain systems. Futur. Gener. Comput. Syst. 107, 841–853 (2020) 23. Botha, R.A., Eloff, J.H.P.: Separation of duties for access control enforcement in workflow environments. IBM Syst. J. 40, 666–682 (2001) 24. Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM. 43, 685–722 (1996) 25. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM. 43, 225–267 (1996) 26. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. Assoc. Comput. Mach. 32(2), 374–382 (1985) 27. Na, M., Liskov, B.: Practical Byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20, 398–461 (2002) 28. Lamport, M.P.R.S.L.: The Byzantine generals problem. ACM Trans. Prog. Lang. Syst. Microsoft Res. 4, 382–401 (1982) 29. Lamport, L., et al.: Paxos made simple. ACM Sigact News. 32, 18–25 (2001) 30. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008) 31. Dorri, A., Kanhere, S.S., Jurdak, R.: Blockchain in Internet of Things: challenges and solutions. arXiv preprint arXiv:1608.05187. (2016) 32. Deep, G., Mohana, R., Nayyar, A., Sanjeevikumar, P., Hossain, E.: Authentication protocol for cloud databases using blockchain mechanism. Sensors. 19, 4444 (2019) 33. Luu, L., Narayanan, V., Baweja, K., Zheng, C., Gilbert, S., Saxena, P.: SCP: a computationallyscalable Byzantine consensus protocol for blockchains. https://www.weusecoins.com/assets/ pdf/library/SCP, vol. 20, p. 2016 (2015) 34. Mazieres, D.: The Stellar Consensus Protocol: A Federated Model for Internet-Level Consensus, p. 32. Stellar Development Foundation, San Francisco, CA (2015) 35. Kosba, A., Miller, A., Shi, E., Wen, Z., Papamanthou, C.: Hawk: the blockchain model of cryptography and privacy-preserving smart contracts. In: 2016 IEEE Symposium on Security and privacy (SP) (2016) 36. Schwartz, D., Youngs, N., Britto, A., et al.: The Ripple Protocol Consensus Algorithm. https:/ /ripple.com/files/rippleconsensuswhitepaper.pdf (2014) 37. Valenta, M., Sandner, P.: Comparison of Ethereum, Hyperledger Fabric and Corda (2017) 38. Milutinovic, M., He, W., Wu, H., Kanwal, M.: Proof of luck: an efficient blockchain consensus protocol. In: Proceedings of the 1st Workshop on System Software for Trusted Execution (2016) 39. Poelstra, A., et al.: Distributed consensus from proof of stake is impossible, Self-published Paper (2014) 40. Sangster, A., Scataglinibelghitar, G.: Luca Pacioli: the father of accounting education. Account. Educ. 19, 423–438 (2010) 41. Pacioli, L., Brown, R.G., Johnston, K.S.: Paciolo on accounting, Facsimiles-Garl (1963)

80

B. A. MacKenzie et al.

42. Karame, G.O., Androulaki, E., Capkun, S.: Double-spending fast payments in bitcoin. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security (2012) 43. Heilman, E., Kendler, A., Zohar, A., Goldberg, S.: Eclipse attacks on Bitcoin’s peer-to-peer network. In: USENIX Security Symposium (2015) 44. Dent, A.W., Mitchell, C.J.: User’s Guide to Cryptography and Standards (Artech House Computer Security). Artech House, Inc, Norwood, MA (2004) 45. Jakobsson, M., Juels, A.: Proofs of Work and Bread Pudding Protocols, pp. 258–272. Springer, New York (1999) 46. Castro, M., Liskov, B.: Practical byzantine fault tolerance. In: Proceedings of the Third USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 173–186, New Orleans, Louisiana, USA., February 22–25 (1999, 1999). https://doi.org/10.1145/ 296806.296824

Session Key Agreement Protocol for Secure D2D Communication Vincent Omollo Nyangaresi and Zeyad Mohammad

1 Introduction D2D communication is the direct packet exchanges among mobile devices [1] which provides services based on device proximity. It exhibits low latencies, high throughput, and instant communications among devices [2]. It offers traffic offloading from cellular networks to reduce high bandwidth demands imposed on these networks, thereby boosting spectral efficiency and minimizing base station (BS) energy consumption [1]. Here, devices communicate devoid of an intermediary BS or access point [3] which facilitates cell coverage expansion and enhanced radio frequency reuse in 5G networks [4]. The cellular network expansion is realized through bridging data transmission to nodes situated outside the cell coverage area while energy consumption is achieved through direct data transmission between devices. In addition, since the distance between devices is smaller than between devices and the BS, radio frequency interference is decreased in D2D and hence multiple data packets can be transmitted via same radio frequency. Despite these D2D performance gains, these networks have inherent security challenges [5] occasioned by lack of authentication during its device discovery, link setup, and data transmission phases [6]. In addition, D2D does not offer encryption and message authentication and hence attacks such as eavesdropping, impersonation, free-riding, privacy sniffing, and location spoofing are possible in a typical 4G network [4]. The deployment of IoT technology in 5G’s massive machine type

V. O. Nyangaresi () Tom Mboya University College, Homabay, Kenya e-mail: [email protected] Z. Mohammad Al-Zaytoonah University of Jordan, Amman, Jordan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_6

81

82

V. O. Nyangaresi and Z. Mohammad

communication (mMTC) and ultra-reliable low latency communication (URLLC) serves to render these security issues more critical and cumbersome to address due to IoT devices’ resource constrained nature. Authors in [7] point out that although group-oriented D2D reduces latencies and enhances spectral efficiency, secure and efficient negotiation of group session key is challenging. Considering D2D based m-health applications, security and privacy requirements are higher due to the sensitive patient data transmitted, and hence proper authentication must be executed. Unfortunately, the vulnerability of the conventional authentication and key agreement approaches, coupled with their intensive resource utilization renders them inapplicable in this scenario [8]. In critical IoT services [9] and applications exampled by smart grids, VANETs, and machine to machine (M2M) communications, security and privacy preservation is key [3, 10]. This can be achieved through proper mutual authentication, secure key exchanges, and anonymity. However, the conventional public key infrastructure (PKI) based mutual authentication has poor scalability and is resource intensive, rendering it unsuitable for resource constrained D2D devices. Authors in [11] explain that anonymity can be weak or strong. Whereas real device identity concealment through encryption is employed to attain weak anonymity, an attacker is unable to trace the D2D entity through mutual interactions with it for the case of strong anonymity. The problem with weak anonymity is that the same ciphered output can be utilized to trace D2D device or its activities. For 4G based D2D communications, authors in [12] explain that the Third Generation Partnership Project (3GPP) has standardized authentication and key agreement (AKA) for achieving mutual authentication among devices and the core network. However, the conventional AKA is not ideal for D2D authentication owing to its high processing and bandwidth requirements. Authors in [13] point out that D2D security is key especially in e-health data shared among these devices. Specifically, anonymity of the patients and other parties is paramount. Although D2D communication is crucial in 5G and IoT, its security techniques and protocols are still in the initial development stages. As such, there is need for more research on authentication and key agreement protocols geared towards enhancing D2D security posture [13]. Despite the fact that the success of D2D communication is hinged on privacy and security, these networks have not addressed these issues adequately [2]. Consequently, in [2, 14], privacy, authentication, anonymity, integrity, non-repudiation, and resistance to attacks have been identified as being crucial in D2D communication. In [15], the authors point out that D2D’s open wireless communication nature brings forth security and privacy risks which are still unresolved. As such, the development of an ideal D2D authentication protocol is paramount. D2D group communication encompassing huge numbers of participating entities over open wireless media exposes these devices to security and privacy attacks [16]. Consequently, effective schemes of addressing these challenges are critical [17]. To ensure efficient and secure communication while upholding security and privacy in critical IoT application scenarios [9, 18], privacy preserving and secure protocols with lower computational requirements need to be developed. Since D2D

Session Key Agreement Protocol for Secure D2D Communication

83

devices are memory and energy constrained, authentication schemes need to address these issues [19]. Authors in [4] point out that a lightweight and secure D2D communication technique is required to address security issues in 5G IoT. To address the key generation center key escrow problems, authors in [15, 20] advocate for independent partial key generation among D2D. The contributions of this paper are as follows: 1. We deploy the physically unclonable function as device fingerprint during session key derivation and authentication to curb key escrow issues in PKI. 2. Dynamic session keys are implemented to eliminate the necessity of storage of long-term secrets in D2D entities. 3. We employ digital signatures and timestamps to validate all authentication requests to thwart message replays and other attacks. 4. We show that 1–3 above not only secure D2D communications but also render the proposed protocol lightweight and hence applicable in resource constrained D2D devices. The rest of this paper is organized as follows: Sect. 2 presents a discussion on related work while Sect. 3 elaborates on the deployed system model. On the other hand, Sect. 4 presents simulations results together with evaluations of the developed protocol. Lastly, Sect. 5 concludes this paper and gives future direction in this research domain.

2 Related Work Over the recent past, numerous schemes have been developed for either key management or authentication in D2D communication environment. For instance, in [21], authenticated key exchange protocols are developed to offer D2D group information anonymity, but which fails to protect user’s privacy. In [22], privacypreserving key agreement protocols are proposed, but which necessitate that each device execute multiple complex operations. This renders them computationally intensive and they are also susceptible to man-in-the-middle (MitM) attack. Authors in [23] develop a D2D group communications key agreement protocol in which a single user takes charge of security parameters initialization and participants’ identity verification. However, any compromise of this user leads to privacy leaks of all participants. A threshold anonymous authentication scheme is proposed in [24] but which involves numerous bilinear pairing operations for message signing and verification. Batch verification is proposed in [25] to reduce these computation overheads. However, this scheme is susceptible to fake message injection attacks. A hybrid D2D message authentication technique is developed in [26], but which requires a pre-computed lookup table for the reduction of modular exponentiation operation computations. Authors in [27] propose a certificateless generalized signcryption

84

V. O. Nyangaresi and Z. Mohammad

scheme for m-health applications. However, this approach is susceptible to insider attacks [28]. On the other hand, authors in [29] develop a group key exchange technique for D2D medical IoT communication, but fail to elaborate the computations and messages exchanged during device authentication. In addition, since the scheme deploys a single node for session key generation and distribution, this node can be a single point of failure. The symmetric cryptography based scheme developed in [30] has confidentiality issues when the patient’s device is stolen while the protocol developed in [31] does not offer confidentiality and is susceptible to patient trackability upon patient’s device loss. A secure lightweight D2D communication technique is proposed in [17] while a lightweight public key authentication scheme is developed in [32]. Authors in [33] develop an ECC public key based lightweight security scheme while an incentive-aware public key lightweight secure D2D data sharing scheme is proposed in [34]. In addition, authors in [35] develop a behavior analysis based D2D authentication technique while a lightweight D2D key exchange scheme is proposed in [36]. On the other hand, authors in [19] have developed a privacy preserving device discovery and D2D authentication technique. Although the protocols developed in [17, 19], and [32–36] offer authentication, data confidentiality and integrity, most of them cannot provide anonymity. In addition, most of them never incorporated D2D data transmission phase, and are based on lightweight public key algorithms instead of lightweight symmetric encryption algorithms. Authors in [37] proposed a key agreement and energy efficient mutual authentication scheme for user anonymity protection while a symmetric key cryptographic anonymous user authentication scheme has been developed in [38]. In addition, authors in [39] have also developed anonymous IoT authentication scheme. All the schemes in [37–39] require the maintenance of user’s information table, which if stolen, may compromise the security of all D2D entities. In addition, these schemes are vulnerable to smart cards loss attacks, and employ low entropy passwords which require frequent refreshments for enhanced security. In addition, schemes in [37, 38] do not offer perfect secrecy. Since users’ identity remains the same in all authentication sessions, previous session keys can be retrieved upon long-term key exposure, and the gateway node (GWN) is required to perform exhaustive search operations to establish random user identity. The technique in [39] may result in high computational costs when the GWN receives wrong pseudonym identity. On the other hand, anonymous access authentication scheme developed in [40] is vulnerable to smart card loss attack. Authors in [41] have developed two-factor authentication scheme, but which is vulnerable to denial of service (DoS) attacks, impersonation, and privacy leaks. In addition, it can neither offer non-repudiation and unlikability nor can it protect against traceability attacks. Similarly, the lightweight two-factor authentication scheme developed in [42] is susceptible to MitM and key compromise attacks due to its dependence on the system key.

Session Key Agreement Protocol for Secure D2D Communication

85

3 System Model The review of the current D2D key agreement and authentication schemes has revealed that the provision of efficient, secure key agreement and authentication in D2D still presents numerous challenges. The dependence on the key generation center results in key escrow problems [15, 20], and hence in this paper, D2D entities generate their own intermediary key parameters that are never shared with other network entities. Since the utilization of conventional public key infrastructure (PKI) in resource constrained D2D devices is not practical, we utilize lightweight cryptography based on ECC, which is a lightweight asymmetric-key algorithm capable of providing 128-bit cryptographic security using a 256-bit key. This is considered sufficiently efficient when compared with the 3072-bit key of the Rivest–Shamir–Adleman (RSA), which is the most widely used public-key encryption algorithm. Moreover, we deployed cryptographic primitives such as oneway hash function and exclusive-OR (XOR) operations which are both lightweight so that the proposed protocol can exhibit reduced computational costs. Conventional authentication protocols require the storage of crucial cryptographic keys, in which a key is tied to each user. To address this key escrow problem, the physically unclonable function (PUF) was deployed as a secure alternative to the secret keys and IDs storage. This was informed by the fact that PUF based protocols are lightweight and hence rendering the authentication process less computationally intensive. This was considered efficient for resource constrained D2D entities compared with PKI based approaches. For the deployed PUF and digital signatures, the following definitions hold: Definition 1 A digital signature protocol consists of three probabilistic polynomialtime algorithms ( , , ) such that: (a) Executing the key generation algorithm yields a pair (ωk , k ), where ωk is the public or verification key while k is the secret or signing key, (b) for signing algorithm , an input of message from some message space , as well as k produces a signature ð on that particular message, (c) verification algorithm takes in message , signature ð and ωk to output either accept or reject decision. Definition 2 All signature schemes must satisfy the correctness requirement, where for any (ωk , k ) generated by , and any , if ð ← ( , k ), then ( , ð, ωk ) = “accept.” Definition 3 Existential unforgeability under adaptive chosen message attack (euacma) is a standard security notion for digital signature schemes, where a game between an attacker and challenger involves: (a) executes and transmits to , one at the resulting ωk to , (b) transmits up to messages 1 , 2 , . . . a time such that for each message receives, it transmits back ði = ( i , k ) to , (c) finally outputs a pair ( *,ð*) to . This becomes a valid forgery if * = i ∀I {1, . . . , } and ( *, ð*, ωk ) = “accept.” Provided that for polynomial bounded

86

V. O. Nyangaresi and Z. Mohammad

it is computationally infeasible for to construct a valid forgery, the scheme is said to be existentially unforgeable under adaptive chosen message attack. Definition 4 For a secure hash function: (a) given input message of arbitrary length, the message digest of fixed length output h( ) can be generated, (b) given , it is cumbersome to compute =h−1 ( ), (c) given , it is computationally infeasible to find  = such that h(  ) = h( ). Definition 5 A physically unclonable function (PUF) is a one-way digital fingerprint based on challenge-response features. Given that is the PUF challenge space and is the response space, then PUF is an injective mapping from a challenge space to a response space : → such that the integrated circuit accepts a string of bits as challenge C , and generates a unique string of bits as response R . Here, PUF: (a) yields the same response to the same challenge C with high probability, (b) same challenge C will yield different response R with high probability when it is employed as input for a different PUF. Definition 6 In accordance with the PUF instance sufficiently unpredictable property, an adversary should be able to predict response R to challenge C with negligible probability. Definition 7 Taking as an elliptic curve group defined by prime numbers q and generator Υ , E as an elliptic curve y2 = x3 + ax + b mod q, and a, b∈R Zq∗ , then given two random points Υ and of group on E, the objective of the Elliptic Curve Discrete Logarithm (ECDL) is to find an integer a∈R Zq∗ that satisfies = aϒ, where the unknown number a is difficult to calculate. Consequently, the problem of ECDL is assumed to be computationally infeasible for any probabilistic polynomial time algorithms to solve. As illustrated in Fig. 1, there are diverse ways that a legitimate user can employ to access sensor data. This access may occur indirectly through the trusted service provider GWN, or through direct communication with the sensor node. During data access process, the communication between the GWN, users, and sensor nodes occurs via the internet. The sensor nodes can exchange data directly with one another or do so via the cluster head. The large number of sensor nodes operating in an unsecure and unattended environment exposes these devices and the data held therein to a number of threats and attacks. Since an adversary may take advantage of the open wireless channels communication among the D2D entities and launch attacks that compromise both security and privacy, the proposed protocol addressed these issues by upholding data confidentiality, secure authentication, anonymity, and integrity. Table 1 gives the notations used in this paper and their brief descriptions. The proposed session key agreement protocol comprised of four major phases which included the parameter initialization phase, WSND authentication, CH authentication, and secure data exchange phase. Parameter initialization phase The first step in the proposed protocol is the initialization of the required parameters which consists of the selection of  ,  ,

Session Key Agreement Protocol for Secure D2D Communication

Sensor Node

87

Service Provider Gateway Node

Cluster Head

Sensor Node Sensor Node

Users

Fig. 1 Communication among D2D Entities Table 1 Notations Notation

Description

WSND AuthReq

Wireless sensor network device Authentication request WSND pseudo-identity Cluster head

CH C

Ӄ Ψ



CH random challenge PUF output CH secret key One-time pseudo-identity CH master key System key WSND nonce CH nonce WSND joining message CH signature XOR operation

Notation

*

N



 ξ Tses || h(.)

Description Broadcasting nonce Cluster key pseudo-identity Updated cluster key pseudo-identity Cluster key validation parameter Cluster session key Threshold time for refresh Beacon broadcasting pseudo-identity Timestamp Message hash signature Data to be exchanged Threshold timestamp Lightweight hash signature Session duration Concatenation operation One-way hash function

master key and a number of system keys, , and (step 2). During the setup phase, the WSND sends an authentication request AuthReq accompanied by its pseudo-identity to the CH (phase 3) as shown in Fig. 2. Upon receipt of these parameters, the CH computes a random challenge C to be deployed for subsequent authentications and transmits it to WSND . The WSND then extracts PUF output Ӄ (step 6) before sending it to the CH. Afterwards, CH computes secret key for the initial authentication of the WSND , followed by the generation of one-time mutual authentication pseudo-identity

88

V. O. Nyangaresi and Z. Mohammad

CH

WSND {AuthReq, ȹ}

{ ɌC}

Compute Ӄ =

(ɌC)

{Ԏ,ѱ, ᶓ,}

Generate ɌC

Ӄ

Calculate Ԏ =h(Ӄ||Ԋ) buffer { Ԋ, ѱ, ɌC,Ӄ,ᶓ, ȹ }

Buffer {Ԏ,ѱ, ᶓ,}

Fig. 2 Initialization phase

before buffering { , , C , Ӄ , , } in phase 8. Thereafter, CH transmits { , , } to the WSND (step 9) as shown in Fig. 3. WSND authentication The next procedure is that of WSND authentication which starts by having nonce generated followed by the computation of * as shown in phase 10, before generating the joining message 1 (step 11) which is then sent to CH. Upon receipt of 1 , CH retrieves { C , Ӄ , } from the buffered and recalculates (phase 13). In step 14, nonce is generated followed by the computation of both * and C * , while in phase 15, signature and 2 are generated before being transmitted to the WSND (step 16). Upon receipt of these parameters, signature is re-computed (phase 17) and verified such that if it is invalid, the join request is rejected (step 19). However, if it is valid, WSND authenticates CH and recovers C from { C * ⊕ } and from { * ⊕ }, before re-computing PUF output Ӄ * (phase 20). Thereafter, broadcasting nonce β 1 and β 2 are generated (phase 21), followed by the computation of security parameters β 1 * , CN , Ӄ N , Ӄ N * , 1 , N , and N (step 22) and the generation of message 3 (phase 23). In step 24, 3 is sent to CH to mark the end of WSND authentication. CHs authentication In phase 25, CH authentication is initialized by the recomputation of β 1 and 1 * , while in step 26, signature 1 * is verified such that if it is invalid, the authentication is terminated (phase 27).

Session Key Agreement Protocol for Secure D2D Communication

89

INPUT: Δŧ, ΔḈ, Ԋ, ᶓ, ᶋ , ᵹ, Tses OUTPUT: Ӄ, ѱ, Ԏ, ῄ*, ℓ1, ῄ, Ŋ*, ɌC*, ⱴ, ℓ2, ß1*, ɌCN, ӃN, ԎN, ѱN, ℓ3, ᶋ*, ᵹ*, ℓ4, Ḉ, Ħ, ğ, ℓ5, ξ, ℓ6 BEGIN 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.

Initialize required parameters ΔḈ, Δŧ, ∈ ∗ /* Parameter initialization phase*/ Chooses master key, Ԋ, system keys, ᶓ, ᶋ , ᵹ, h: {0,1}* WSND →CH: {AuthReq, ȹ} Generate random challenge ,ɌC CH → WSND: { ɌC} Extract PUF output Ӄ = PUF (ɌC) WSND →CH: {Ӄ} Generate secret key ѱ ∈ ∗ ,Ԏ =h(Ӄ||Ԋ) & buffer {Ԋ, ѱ, ɌC,Ӄ,ᶓ, ȹ} CH → WSND: {Ԏ,ѱ, ᶓ,} /* End of Parameter initialization phase*/ Generate nonce ῄ ∈ ∗ & compute ῄ*= ῄ⊕ ѱ /* start of WSND authentication phase*/ Generate ℓ1={Ԏ, ῄ*} WSND →CH: { Ԏ, ῄ*} From Ԏ, retrieve { ɌC,Ӄ, ѱ} & compute ῄ= ῄ*⊕ ѱ Generate nonce Ŋ, compute Ŋ*= Ŋ ⊕ ѱ, & ɌC*= ɌC ⊕ ѱ Compute ⱴ =h(ῄ||Ŋ*||ѱ||ɌC*) & ℓ2={ ɌC*, Ŋ*, ⱴ} CH → WSND: { ɌC*, Ŋ*, ⱴ} Re-compute ⱴ IF h(ῄ||Ŋ*||ѱ||ɌC*) !=ⱴ THEN: Terminate join request ELSE: Re-compute ɌC= ɌC*⊕ ѱ, Ӄ *= PUF (ɌC) & Ŋ= Ŋ*⊕ѱ Randomly generate broadcasting nonce ß1 & ß2 Compute ß1*=h(ѱ||Ŋ)⊕ß1, ɌCN=h(ɌC||ß2), ӃN= PUF (ɌCN), ӃN*=ß2⊕ӃN, ⱴ1=h(ß2||ß1||Ŋ||ӃN*), ԎN=h(Ԏ||ß2), ѱN=h(ѱ||ß2) Compose ℓ3={ӃN*, ß1*, ⱴ1} WSND →CH: {ӃN*, ß1*, ⱴ1}/* End of WSND authentication phase*/ Re-compute ß1=h(ѱ||Ŋ) )⊕ß1*, ⱴ1*=h(ß2||ß1||Ŋ||ӃN*)}/* start of CH authentication phase*/ IF h(ß2||ß1||Ŋ||ӃN*)!=ⱴ1 THEN: Terminate authentication request ELSE: Compute{Ԏ*N, ѱ*N, Ɍ*CN,ӃN}: ӃN= ß2⊕ӃN*, Ɍ*CN=h(ɌCN||ß2), Ԏ*N=h(Ԏ||ß2), ѱ*N=h(ѱ||ß2) Append {ß2,ß1} to CH validation data {Ԏ*N, ѱ*N, Ɍ*CN,ӃN} Choose ᶋ, ᵹ & compute ᶋ*=ᶋ⊕ѱ*N, ᵹ*= ᵹ⊕Ŋ, ⱴ2=h(ᶋ||ᶋ*||ᵹ||ᵹ*||ѱ*N||Ŋ) Compose ℓ4= {ᶋ*,ᵹ*, ⱴ2} CH → WSND: {ᶋ*,ᵹ*, ⱴ2} Re-compute ᵹ=ᵹ*⊕Ŋ, ᶋ=ᶋ*⊕ѱ*N IF h(ᶋ||ᶋ*||ᵹ||ᵹ*||ѱ*N||Ŋ)!= ⱴ2 THEN: Reject authentication request ELSE: Trust CH/* end of CH authentication phase*/ Compute cluster session key Ḉ= ᶓ⊕ᶋ /* start of message exchange*/ Generate Ħ=h(Ԏ*N||ѱ*N||ȹ||ß2)⊕h(Ԏ*N||ŧ) & ğ=h(Ħ||Ḉ||ţ||ŧ) and compose ℓ5={ŧ, ğ, ţ, Ħ} WSNDS →All: {ŧ, ğ, ţ, Ħ} IF ŧ is > Δŧ THEN: Flag as replay ELSE: Compute ğ*=h(Ħ||Ḉ||ţ||ŧ) IF h(Ħ||Ḉ||ţ||ŧ)!= ğ THEN: Flag ℓ5 malicious ELSE: Process ℓ5 ENFIF; ENDIF; ENDIF; ENDIF; ENDIF IF sender window is empty and Tses > ΔḈ THEN: Compute Ḉ*= ᶋ*N⊕ᵹ, ξ=h(ᶋ*N||Ḉ*||ᵹ||ᶓ||ŧ) Compose ℓ6={Ḉ*, ξ, ŧ} CH →All: {Ḉ*, ξ, ŧ} IF ŧ is > Δŧ THEN:

Fig. 3 Proposed session key agreement protocol

However, if this signature is valid, security parameters Ӄ N , * CN , * N , and * N are re-computed (step 29) before { 1 } are appended to CH validation data { * N , * N , * CN , Ӄ N } (phase 30). Next, random nonce and are generated and employed to compute security parameters * , * and signature 2 (step 31). In phase 32, verification message 4 is composed before being sent to WSND as shown in

90

V. O. Nyangaresi and Z. Mohammad

CH

WSND Generate nonce: ῄ ∈ Compute: ῄ*= ῄ⊕ ѱ



, ℓ1={Ԏ, ῄ*}

{ Ԏ, ῄ*}

{ ɌC*, Ŋ*, ⱴ}

Retrieve { ɌC,Ӄ, ѱ} ; Generate nonce Ŋ Compute: ῄ= ῄ*⊕ ѱ; Ŋ*= Ŋ ⊕ ѱ; ɌC*= ɌC ⊕ ѱ; ⱴ=h(ῄ||Ŋ*||ѱ||ɌC*) ; ℓ2={ ɌC*, Ŋ*, ⱴ}

Generate: ß1 & ß2 Re-compute: ⱴ; ɌC= ɌC*⊕ ѱ; Ӄ *= PUF (ɌC) ; Ŋ= Ŋ*⊕ѱ Compute: ß1*=h(ѱ||Ŋ)⊕ß1; ɌCN=h(ɌC||ß2); ӃN=PUF (ɌCN); ӃN*=ß2⊕ӃN; ⱴ1=h(ß2||ß1||Ŋ||ӃN*); ԎN=h(Ԏ||ß2); ѱN=h(ѱ||ß2) * * Compose: ℓ3={ӃN , ß1 , ⱴ1}

{ᶋ*,ᵹ*, ⱴ2}

{ӃN*, ß1*, ⱴ1}

Generate: ᶋ, ᵹ Re-compute: ß1=h(ѱ||Ŋ) )⊕ß1*, ⱴ1*=h(ß2||ß1||Ŋ||ӃN*)} Compute: {Ԏ*N, ѱ*N, Ɍ*CN,ӃN}: ӃN= ß2⊕ӃN*; Ɍ*CN=h(ɌCN||ß2); Ԏ*N=h(Ԏ||ß2); ѱ*N=h(ѱ||ß2); ᶋ*=ᶋ⊕ѱ*N, ᵹ*= ᵹ⊕Ŋ, ⱴ2=h(ᶋ||ᶋ*||ᵹ||ᵹ*||ѱ*N||Ŋ) Append: {ß2, ß1} to {Ԏ*N, ѱ*N, Ɍ*CN,ӃN} Compose: ℓ4= {ᶋ*,ᵹ*, ⱴ2}

Generate: Ħ=h(Ԏ*N||ѱ*N||ȹ||ß2)⊕h(Ԏ*N||ŧ); ğ=h(Ħ||Ḉ||ţ||ŧ) Re-compute ᵹ=ᵹ*⊕Ŋ, ᶋ=ᶋ*⊕ѱ*N Compute: Ḉ= ᶓ⊕ᶋ Compose: ℓ5={ŧ, ğ, ţ, Ħ}

Fig. 4 WSND -CH authentication

Fig. 4. Upon receipt of 4 , WSND re-calculates and which are utilized to validate signature 2 (phase 35) such that if it is invalid, the authentication request is rejected (step 36), or else CH is now trusted. This marks the end of CH authentication and the onset of data exchanges in the D2D network which commences by having the sender generate cluster session key (phase 38), broadcasting pseudo-identify and message hash signature before composing message 5 (step 39) and sending it to all D2D entities (phase 40). To verify the received message 5 , timestamp is employed such that if it is more than a set threshold  , 5 is flagged as a replay (step 42). However if it is valid, signature * is re-computed (phase 44) and validated against such that if they are not equivalent, then 5 is flagged as malicious (step 46), or else it is processed (phase 48). On condition that the sender window is empty and the session duration Tses is greater than the threshold time for refresh  , new cluster session key * is computed (step 50). During this process, a lightweight hash signature ξ is also computed and incorporated in message 6 (phase 51) which is broadcasted to all D2D entities (phase 52). In step 53, the freshness of 6 is checked by all D2D receivers such that if it is beyond the set threshold, then 6 is flagged as replay (phase 54) or else * N is decoded followed by the re-computation of lightweight hash signature, ξ * (step

Session Key Agreement Protocol for Secure D2D Communication

91

56). Provided that the two signatures are equivalent, the old cluster session key is replaced with the updated version * , or else the session update request is ignored.

4 Results and Discussion The proposed protocol was simulated with the consideration of D2D entities randomly deployed in a 100 × 100 m2 within the 500 m radius. On the other hand, the number of nodes per cluster was 100 as shown in Table 2. The number of cluster heads was taken to be between 10 and 100, giving the total number of nodes of between 1000 and 10,000. The maximum transmission range for each device was taken to be 50 m while the maximum transmission power was 20 dBm. The simulations were executed within the duration of 7 min as the required parameters were measured.

4.1 Security Analysis To evaluate the developed protocol, typical D2D attacks and threat models were utilized. The specific attack and threat models included cloning attacks, message forgery, source authentication, identity theft, masquerade, and message replays as discussed below. Resilience against cloning attacks: Each D2D entity is embedded with PUF that is utilized to derive Ӄ that is employed to generate one-time mutual authentication pseudo-identity , and security parameter Ӄ N * that is one of the components of signature 1 . This signature is subsequently used to authenticate D2D entities. Any physical tampering with PUF will change its subsequent operations, rendering it ineffective. This implies that the proposed protocol is robust against cloning attacks, unlike schemes in [41, 42]. Resilience against message forgery: In the proposed protocol, all WSND entities are required to mutually authenticate themselves to the CH before they obtain Table 2 Simulation parameters

Parameter Simulation area Sensor nodes per cluster Total number of sensor nodes Number of cluster heads Maximum D2D transmission range Simulation duration Cell radius Maximum transmit power

Value 100 × 100 m2 100 1000–10,000 10–100 50 m 7 min 500 m 20 dBm

92

V. O. Nyangaresi and Z. Mohammad

Cluster key pseudo-identity that they then utilize to compute the cluster beacon broadcasting pseudo-identity, . Consequently, the integrity of exchanged message 5 is upheld by both and hashing operations in message hash signature, . This makes it infeasible for an attacker to construct a valid message. Source authentication: In the proposed protocol, all D2D senders are verified by checking the validity of signature and timestamp , and if they pass the validity check, all their transmitted packets can be trusted. The sender constructs its message 5 using identity-specific data generated from unique parameters { * N , * N , , β 2 } that are not shared with any other network entities. Moreover, these unique parameters are XORed before being enciphered in the h(.) operation. This makes it difficult for adversaries to extract these parameters to present to the receiver as their source authentication features. Resilience against identity theft: The proposed protocol employs pseudo-identities for both D2D entities and cluster identity through and respectively. In addition, during the actual packet transmissions exampled by 5 , highly dynamic beacon broadcasting pseudo-identity is utilized. As such, an attacker has no access to the real D2D or cluster identity even in the face of active attack. However the scheme developed in [41] generates pseudo-identities using fixed index and system secret key shared among all communicating entities. Therefore all entities that receive the exchanged beacons can extract the real identities of other parties. Resilience against masquerade attack: To construct message 5 , four parameters are required { , , , }. Here, the beacon broadcasting pseudo-identity is not only enciphered in signature but is also refreshed after every message transmission. In addition, all the four constituents { * N , * N , , ß2 } are never transmitted to other D2D parties. Consequently, an adversary cannot obtain * N or any other secrets required for D2D impersonation. This is unlike in [41] where real identities can be extracted and employed by an attacker and later masquerade as the legitimate network entity. Resilience against message replays: In the proposed protocol, the derivation of 5 , , * , ξ , 6 , and ξ * takes timestamp as one of the constituents to render them truly random. Since this timestamp is enciphered in signatures , * , ξ , and ξ * , it is infeasible for an attacker to alter them to launch replay attacks. In addition, the signatures and timestamp are checked at each receiver to thwart replays and other malicious attacks.

4.2 Performance Evaluation The first part of performance evaluation involved the determination of message signing and verification latencies. Next, communication and computation costs were

Session Key Agreement Protocol for Secure D2D Communication

93

Fig. 5 Message signing and verification latencies

employed as key metrics for the comparison of the proposed session key agreement protocol against other similar schemes as discussed below. Message signing and verification The execution times presented in [41] adopted for this evaluation. Here, the SHA-256 hash function operation 0.006 ms, while the hash-based message authentication code, HMAC 0.0167 ms. In the proposed protocol, message signing required three operations:

were takes takes hash

(1)

(2) As such, the signing operation needed the total cost of 0.018 ms. On the other hand, message verification required only one hash operation: (3) Consequently, the total message verification cost was 0.006 ms, bringing the message signing and verification costs to 0.024 ms. To analyze how message signing and verification delays were influenced by an increase in the number of beacons within the network, message numbers were increased from an initial value of 10 beacons to a maximum of 100 as shown in Fig. 5. As shown in Fig. 5, an increase in the number of beacons in the network led to a corresponding increase in both signing and verification costs. However, message signing remained well above verification costs for all message numbers. The graphs are not perfectly linear as expected as

94

V. O. Nyangaresi and Z. Mohammad

Fig. 6 Communication overheads

other network conditions such as congestion and subsequent retransmissions crop up, affecting the observed signing and verification costs. Increased number of beacons leads to increased processing at the nodes and hence the observed increment in network latencies. To analyze the proposed protocol communication overheads, the number of clusters was incremented to a maximum of 10,000 as the number of nodes in each cluster was also incremented from an initial value of 50 to a maximum of 150 nodes as shown in Fig. 6. It is evident from Fig. 6 that considering a constant number of nodes in a given cluster, then communication overheads increased exponentially as the number of clusters were increased. This can be attributed to increased handshaking, network congestions, and packet re-transmission of lost data that crop up when the volume of data in the network becomes large. Considering all the three node instances, then it is clear that as the number of nodes within each cluster was increased from 50 to 150, there was a corresponding increase in communication delays that can also be attributed to the resulting congestions and subsequent activation of error correction techniques. The performance of the proposed protocol was also compared to that of similar D2D authentication schemes in [37–40]. For valid comparisons, bit lengths for all authentication parameters were as specified in [40]. Here, passwords, real and pseudo-identities, sequential and serial numbers were 64 bits, timestamps and hash values were 160 bits, random numbers were 256 bits, block size of encryption function, plaintext of encryption functions, and cipher text of the decryption functions were 128 bits each. In the proposed protocol, during CH and WSND authentication phases, the messages in Table 3 are exchanged. As shown in Table 3, the total number of bits for the exchanged messages during mutual authentication was 2016. On the other hand, the schemes in [37–

Session Key Agreement Protocol for Secure D2D Communication

95

Table 3 Mutual authentication message exchanges Message exchanges

Computations

Total bits

WSND → CH: { , *} CH → WSND : { C * , * , } WSND → CH: { Ӄ N * , ß1* , 1 } CH → WSND : { * , * , 2 } Total

Ӄ = 64; * = 256

320 672 480 544 2016

= 256; * = 256; = 160 * Ӄ N = 64; ß1* = 256; 1 = 160 * = 256; * = 128; 2 = 160 C

*

Fig. 7 Communication costs comparisons

40] exchanged a total of 3904, 3136, 2688, and 2592 bits, respectively, as shown in Fig. 7. It is evident from Fig. 7 that the proposed protocol had the least communication costs among all the related schemes. The scheme in [37] had the highest communication costs followed by the schemes in [38–40] in that order. Consequently, our protocol is truly lightweight and hence ideal for resource constrained D2D entities. For computation overheads comparisons, the conventions given in [39] were utilized. Here, we let Th denote hashing running time while TE/D denotes encryption or decryption, taking approximately 0.00032 s and 0.0056 s, respectively. In our protocol, 14Th and 8TE/D operations are executed during the authentication process. Table 4 gives the comparisons of these computation overheads with other schemes proposed in [37–40]. It is clear from Table 4 that the scheme in [37] required 19Th and 8TE/D operations and hence its total computation time was 50.88 ms while the scheme in [38] executed 14Th and 4TE/D operations, yielding a total computation time of 26.88 ms. Similarly, the technique in [39] executed 24Th and 4TE/D operations while the scheme in [40] executed 18Th and 4TE/D operations, taking a total of 30.4 ms and 27.44 ms, respectively. As shown in Fig. 8, the scheme in [37] had the largest

96 Table 4 Computation overheads comparisons

V. O. Nyangaresi and Z. Mohammad

Scheme Scheme in [37] Scheme in [38] Scheme in [39] Scheme in [40] Proposed protocol

Required functions 19Th + 8TE/D 14Th + 4TE/D 24Th + 4TE/D 18Th + 4TE/D 14Th + 8TE/D

Total overheads (ms) 50.88 26.88 30.4 27.44 49.28

Fig. 8 Computation costs comparisons

computational costs followed by our protocol. Then, the protocol in [39] was next followed by [38, 40] in that order. The lower computation overheads of the schemes in [38–40] is attributed to the deployment of low entropy passwords. However, since these passwords require frequent updating, the messages exchanged during this process can lead to higher bandwidth consumptions. Although the schemes in [38–40] exhibited lower computation costs, they require the maintenance of user’s information table, which may compromise the security of all D2D entities if captured by an adversary. In addition, they are susceptible to smart cards loss attacks. Moreover, schemes in [37, 38] fail to provide perfect secrecy due to the deployment of static users’ identity for all authentication sessions. Moreover, the protocol developed in [39] can easily generate high computational costs if the GWN receives wrong pseudonym identity.

Session Key Agreement Protocol for Secure D2D Communication

97

5 Conclusion and Future Work The goals of this paper was to develop a session key agreement protocol that employed lightweight XOR and hashing operations, coupled with PUF that acted as D2D entities fingerprint. In addition, dynamic session keys were implemented to eradicate the need of storing long-term secrets in tamper proof devices as is the case with majority of the current schemes. During the authentication process, digital signatures and timestamps were deployed to validate all requests and messages. This was shown to be robust when compared with smart cards and low entropy passwords utilized in most of the conventional security schemes. The simulation and evaluation results have demonstrated better performance of the proposed protocol in terms of security, privacy, communication, and computation overheads. Future work in this area lies in the formal verification of the security features offered by the proposed protocol as well as the utilization of other performance metrics that were not within the scope of this paper. There is also need to devise techniques for minimizing the computation overheads of this protocol while maintaining its security characteristics.

References 1. Huang, C., Yan, K., Wei, S., Lee, D.H.: A privacy preserving data sharing solution for mobile healthcare. In: 2017 International Conference on Progress in Informatics and Computing (PIC), Nanjing, China, pp. 260–265 (2017) 2. Wang, M., Yan, Z.: A survey on security in D2D communications. Mobile Netw. Appl. 22(2), 195–208 (2017) 3. Gope, P.: LAAP: lightweight anonymous authentication protocol for D2D-aided fog computing paradigm. Comput. Secur. 86, 223–237 (2019) 4. Seok, B., Sicato, J.C.S., Erzhena, T., Xuan, C., Pan, Y., Park, J.H.: Secure D2D communication for 5G IoT network based on lightweight cryptography. Appl. Sci. 10(1), 217 (2020) 5. Zhang, S., Wang, Y., Zhou, W.: Towards secure 5G networks: a survey. Comput. Netw. 162, 106871 (2019) 6. Lin, Z., Du, L., Gao, Z., Huang, L., Du, X.: Efficient device-to-device discovery and access procedure for 5G cellular network. Wirel. Commun. Mob. Comput. 16, 1282–1289 (2016) 7. Shang, Z., Ma, M., Li, X.: A secure group-oriented device-to-device authentication protocol for 5G wireless networks. IEEE Trans. Wirel. Commun. 19(11), 7021–7032 (2020) 8. Lopes, A.P.G., Gondim, P.R.: Low-cost authentication protocol for D2D communication in m-Health with trust evaluation. Wirel. Commun. Mob. Comput. 2020, 1–16 (2020) 9. Poh, G.-S., Gope, P., Ning, P.: PrivHome: privacy-preserving authenticated communication in smart home environment. IEEE Trans. Depend. Secure Comput. 2019, 1–14 (2019) 10. Nyangaresi, V.O., Rodrigues, A.J., Abeka, S.O.: Efficient group authentication protocol for secure 5G enabled vehicular communications. In: 2020 16th International Computer Engineering Conference (ICENCO), pp. 25–30. IEEE, Cairo, Egypt (2020) 11. Gope, P., Sikdar, B.: An efficient privacy-preserving authentication scheme for energy internetbased vehicle-to-grid communication. IEEE Trans. Smart Grid. 10(6), 6607–6618 (2019) 12. Cao, J., Ma, M., Li, H.: GBAAM: group-based access authentication for MTC in LTE networks. Secur. Commun. Netw. 8(17), 3282–3299 (2015)

98

V. O. Nyangaresi and Z. Mohammad

13. Lopes, G.A.P., Gondim, P.R.: Mutual authentication protocol for D2D communications in a cloud-based e-health system. Sensors. 20(7), 2072 (2020) 14. Haus, M., Waqas, M., Ding, A.Y., Li, Y., Tarkoma, S., Ott, J.: Security and privacy in deviceto-device (D2D) communication: a review. IEEE Commun. Surv. Tutorials. 19(2), 1054–1079 (2017) 15. Tan, H., Choi, D., Kim, P., Pan, S., Chung, I.: Comments on ‘dual authentication and key management techniques for secure data transmission in vehicular ad hoc networks’. IEEE Trans. Intell. Transp. Syst. 19, 2149–2151 (2017) 16. Zhang, Z., Guo, X., Lin, Y.: Trust management method of D2D communication based on RF fingerprint identification. IEEE Access. 6, 66082–66087 (2018) 17. Cao, M., Wang, L., Xu, H., Chen, D., Lou, C., Zhang, N., Zhu, Y., Qin, Z.: Sec-D2D: a secure and lightweight D2D communication system with multiple sensors. IEEE Access. 7, 33759– 33770 (2019) 18. Nyangaresi, V.O., Rodrigues, A.J., Abeka, S.O.: Neuro-fuzzy based handover authentication protocol for ultra dense 5G networks. In: 2020 2nd Global Power, Energy and Communication Conference (GPECOM), pp. 339–344. IEEE, Izmir, Turkey (2020) 19. Sun, Y., Cao, J., Ma, M., Li, H., Niu, B., Li, F.: Privacy-preserving device discovery and authentication scheme for D2D communication in 3GPP 5G HetNet. In: 2019 International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA, pp. 425–431 (2019) 20. Xu, X., Zhang, Y., Sun, Z., Hong, Y., Tao, X.: Analytical modeling of mode selection for moving D2D-enabled cellular networks. IEEE Commun. Lett. 20, 1203–1206 (2016) 21. Hsu, R., Lee, J., Quek, T.Q.S., Chen, J.: Graad: group anonymous and accountable D2D communication in mobile networks. IEEE Trans. Inf. Forensics Secur. 13(2), 449–464 (2018) 22. Wang, M., Yan, Z.: Privacy-preserving authentication and key agreement protocols for D2D group communications. IEEE Trans. Ind. Informatics. 14(8), 3637–3647 (2018) 23. Wang, L., Tian, Y., Zhang, D., Lu, Y.: Constant round authenticated and dynamic group key agreement protocol for D2D group communications. Inf. Sci. 503, 61–71 (2019) 24. Shao, J., Lin, X., Lu, R., Zuo, C.: A threshold anonymous authentication protocol for VANETs. IEEE Trans. Veh. Technol. 65(3), 1711–1720 (2015) 25. Jiang, S., Zhu, X., Wang, L.: An efficient anonymous batch authentication scheme based on HMAC for VANETs. IEEE Trans. Intell. Transp. Syst. 17(8), 2193–2204 (2016) 26. Wang, P., Chen, C.M., Kumari, S., Shojafar, M., Tafazolli, R., Liu, Y.N.: HDMA: hybrid D2D message authentication scheme for 5G-enabled vanets. IEEE Trans. Intell. Transp. Syst. 2020, 1–10 (2020) 27. Zhang, A., Wang, L., Ye, X., Lin, X.: Light-weight and robust security-aware D2D-assist data transmission protocol for mobile-health systems. IEEE Trans. Inf. Forensics Secur. 12(3), 662– 675 (2017) 28. Zhou, C.: Comments on ‘Light-weight and robust security-aware D2D-assist data transmission protocol for mobile-health systems’. IEEE Trans. Inf. Forensics Secur. 13(7), 1869–1870 (2018) 29. Mustafa, U., Philip, N.: Group-based key exchange for medical IoT device-to-device communication (D2D) combining secret sharing and physical layer key exchange. In: 2019 12th International Conference on Global Security, Safety and Sustainability (ICGS3). IEEE, London, UK (2019) 30. Jiang, Q., Lian, X., Yang, C., Ma, J., Tian, Y., Yang, Y.: A bilinear pairing based anonymous authentication scheme in wireless body area networks for mHealth. J. Med. Syst. 40, 231 (2016) 31. Shen, J., Gui, Z., Ji, S., Shen, J., Tan, H., Tang, Y.: Cloud-aided lightweight certificateless authentication protocol with anonymity for wireless body area networks. J. Netw. Comput. Appl. 106, 117–123 (2018) 32. Abro, A., Deng, Z., Memon, K.A.: A lightweight elliptic-Elgamal-based authentication scheme for secure device-to-device communication. Futur. Internet. 11, 108 (2019)

Session Key Agreement Protocol for Secure D2D Communication

99

33. Javed, Y., Khan, A.S., Qahar, A., Abdullah, J.: EEoP: a lightweight security scheme over PKI in D2D cellular networks. J. Telecommun. Electron. Comput. Eng. 9, 99–105 (2017) 34. Mohseni-Ejiyeh, A., Ashouri-Talouki, M., Mahdavi, M.: An incentive-aware lightweight secure data sharing scheme for D2D communication in 5G cellular networks. ISeCure. 10, 15–27 (2018) 35. Tan, H., Song, Y., Xuan, S., Pan, S., Chung, I.: Secure D2D group authentication employing smartphone sensor behavior analysis. Symmetry. 11, 969 (2019) 36. Baskaran, S.B.M., Raja, G.: A lightweight incognito key exchange mechanism for LTE-A assisted D2D communication. In: Proceedings of the 2017 Ninth International Conference on Advanced Computing (ICoAC), Chennai, India, pp. 301–307 (2017) 37. Lu, Y., Li, L., Peng, H., Yang, Y.: An energy efficient mutual authentication and key agreement scheme preserving anonymity for wireless sensor networks. Sensors. 16(6), 837 (2016) 38. Jung, J., Kim, J., Choi, Y., Won, D.: An anonymous user authentication and key agreement scheme based on a symmetric cryptosystem in wireless sensor networks. Sensors. 16(6), 1299 (2016) 39. Xiong, L., Peng, D., Peng, T., Liang, H., Liu, Z.: A lightweight anonymous authentication protocol with perfect forward secrecy for wireless sensor networks. Sensors. 17(11), 2681 (2017) 40. Nashwan, S.: AAA-WSN: anonymous access authentication scheme for wireless sensor networks in big data environment. Egypt. Informatics J. 2020, 1–12 (2020) 41. Hakeem, S.A.A., El-Gawad, M.A.A., Kim, H.: A decentralized lightweight authentication and privacy protocol for vehicular networks. IEEE Access. 7, 119689–119705 (2019) 42. Wang, F., Xu, Y., Zhang, H., Zhang, Y., Zhu, L.: 2FLIP: a two-factor lightweight privacypreserving authentication scheme for VANET. IEEE Trans. Veh. Technol. 65(2), 896–911 (2016)

What Do Your Smart Home Devices Reveal About You? Hima Boddupalli, Shivakant Mishra, and Mohammed Almutawa

1 Introduction Home automation that began as early as Year 2000 has increasingly been becoming popular with the Internet of Things (IoT) introducing this technology into our homes by rapidly applying connectivity to everyday appliances and home features. As IoT devices become a part of our daily lives, we need to take a look at the security risks and privacy concerns this smart technology introduces into our lives. IoT manufacturers collect an incredible amount of data about the users and their homes with a promise to the users that these data points are used to make their smart home experience better and more personalized. For example, consider an Amazon or Google smart speaker voice assistant. It knows where you are located, what you buy, as well as your taste in music and movies. It knows when you are home, what your voice sounds like compared to, say, your roommate’s—and, if you have paired it with other smart devices and what those devices are sensing. In short, it knows a lot about you. While there is some evidence that such rich personal information helps service providers to provide better, personalized services, when in the wrong hands, it is a treasure trove of personal information that can be misused for nefarious purposes such as knowing when your house is unoccupied and safe to rob, using your credit card credential to make unauthorized purchases, or getting access to the camera feeds from your home.

H. Boddupalli · S. Mishra University of Colorado Boulder, Boulder, CO, USA e-mail: [email protected]; [email protected] M. Almutawa () Kuwait University, Sabah Al Salem University City, Shadadiyah, Kuwait e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_7

101

102

H. Boddupalli et al.

Indeed, users are increasingly becoming aware of these security and privacy risks. A recent survey of people from The United States, Canada, Japan, Australia, France, and the United Kingdom by Consumer International and Internet Society found that about 63% of people find connected devices to be “creepy” and 75% do not trust the way their data is shared by those devices [17]. IoT device manufacturers and service providers are also responding to these concerns by putting in appropriate safety measures, such as encrypting all messages exchanged between devices and external servers or incorporating strong authentication mechanisms for device access. To some extent, users are also following various safety tips, such as using strong passwords, disconnecting devices from the Internet when not needed, installing latest security patches, etc. However, despite these safeguards from both the users and the device manufacturers and service providers, important personal information about the users and their homes does fall into wrong hands and has been exploited for nefarious purposes [5, 13]. One common technique used by the adversaries to steal such personal information despite all the safeguards is via side channel attacks, which are based on information gained from the implementation of a computer system, rather than weaknesses in the implemented algorithm itself. Timing information, power consumption, electromagnetic leaks or even sound can provide an extra source of information, which can be exploited. For example, the power output generated by pressing a particular key on the keyboard will be different from that of another key being pressed which makes it the differentiating factor without giving away what was actually pressed. By observing the frequency of the power, it can be traced to frequently occurring alphabets. Power output is just one side channel. There are many such channels for any device. If gotten hold of, they can give an inside picture to attackers which makes it a target [1, 11]. In this paper, we explore the potential for a side channel attack on smart home devices, where in an adversary only has an ability to tap into and access all the Internet traffic coming in or out of a home. This is possible if the adversary has access to the home router or the router at the Internet service provider connected to the home router. We do not assume any other capability on behalf of the adversary. In particular, the adversary cannot decrypt any messages exchanged, drop, alter or inject any messages nor can they alter the traffic flow. The question we have explored in this paper is whether the adversary with such a limited capability can still learn something about the users and their homes. We have experimented with two very popular home devices, a pair of cameras for indoor and outdoor monitoring (Netgear Arlo Pro 2) and a Google Home Mini (Gen 2). The indoor and outdoor cameras are connected to a cloud server and their operation can be controlled by user’s smartphone or tablet. The Home Mini is powered by the Google Assistant, you can ask it questions or tell it to do things using your voice. Both of these devices are connected to their respective servers on the cloud via the Internet and the messages exchanged between the device and servers are encrypted. We have experimented with a variety of indoor scenarios ranging over different number of people at home, different types of audio and movements at home, and asking questions in different styles.

What Do Your Smart Home Devices Reveal About You?

103

Our key finding is that while these devices employ some very good security mechanisms, lots of personal information about the users and their home can still be inferred by simply monitoring and analyzing the traffic coming in and out of their homes. In particular, an adversary can infer the exact smart devices the users are using, whether the home is currently empty, or whether the current residents are behaving erratically. Further, the knowledge of any additional contextual information such as the sleeping schedules of the residents, whether children reside at home, drinking habits of the residents, or which room(s)/floor(s) the devices are located in can allow the adversary to infer much more finer grained information about the home residents. The paper describes the details of the experiments we have conducted under a range of different scenarios and the data we have collected, provides an analysis of this data, and identifies privacy leakage based on this analysis. The rest of the paper is structured as follows. Section 2 provides a brief overview of the related work. Section 3 describes the experimental setup. Sections 4 and 5 describe the results of all the experiments involving Home Mini and Arlo Pro, respectively. Section 6 describes the privacy leakage that can be inferred from these experimental results, and finally Sect. 7 concludes the paper.

2 Related Work In this paper, we focus on the internet-connected smart home devices, such as internet-connected appliances, lighting, sensors, door locks, security cameras, interactive smart speakers and voice services, etc. These devices rely on cloudbased integration to provide appropriate services via an API using HTTP. These devices can typically be controlled via commands from smartphones transiting via the cloud. Over the last decade, several security and privacy vulnerabilities have been reported and discussed in literature for such devices. These include privacy risks in pairing and discovery protocols [19] and insecure communication [7], remote spying possibilities [9, 16], over-privilege [12, 14], and end-user security and privacy concerns [20]. In this paper, we focus on side channels using message traffic analysis to infer privacy information of a home. Side channel privacy attacks have been discussed quite extensively in literature, e.g., [4] and [15] demonstrate side channel attacks on anonymity networks, and [10] and [18] use traffic fingerprinting to learn about users’ Internet browsing characteristics. Message traffic analysis has been done to infer information about potentially sensitive activities in [6]. The traffic analysis performed tries to correlate device events/actions with network activity at the time of the event/action. The devices used are second generation Nest Thermostat and second generation Nest Protect Wired. It is shown that with 88 and 67% accuracy, respectively, when the thermostat transitions between the Home and Auto Away mode and vice versa, based only on network traffic originating from the device. Message traffic analysis is also used

104

H. Boddupalli et al.

in [3] to demonstrate that despite the broad adoption of transport layer encryption, smart home traffic metadata is sufficient for a passive network adversary to infer sensitive in-home activities. The specific inferences include identifying the smart home devices and whether those devices are on or off. The devices used in this research are Amazon Echo, Belkin WeMo Switch, Orvibo Smart Socket, TP-Link Smart Plug, Nest Security Camera, Amcrest Security Camera, and Sense Sleep Monitor. A traffic shaper that uses anywhere between 10 and 40 KB worth extra bandwidth is proposed to provide a uniform traffic rate to prevent such inferences.

3 Threat Model and Evaluation Methodology As mentioned earlier, our goal is to explore any potential for a side channel attack on smart home devices, where an adversary only has an ability to tap into and access all the Internet traffic coming in or out of a home. In particular, the adversary is passive in nature (similar to an ISP) in that it cannot inject any new packets in the in traffic, cannot alter the contents of any packets and cannot alter the rate of packet flow. The adversary also has no access to any of the local area network(s) (LAN) being used in the home that is being targeted. We assume that the adversary has access to a large volume of message traffic from the past that he/she can analyze using machine learning algorithms. To gain an insight into the potential privacy leakage from smart home devices, we chose to experiment with two very popular devices, a pair of cameras for indoor and outdoor monitoring (Netgear ArloPro 2) and a Google Home Mini (Gen 2). Both of these devices need to be connected to the Internet in order for them to function properly. Our evaluation methodology is comprised of operating each device under very specific controlled scenarios and tapping the traffic from the device to the external servers and from the external servers back to the device. We then analyze this tapped traffic to identify any unique patterns that can be exploited by the adversary to infer the scenario within the home. Figure 1 illustrates our experimental setup. We used a laptop with Intel core i5, 4 GB of RAM, running Ubuntu 18.04.4 LTS, and Wireshark [8] network protocol analyzer version 2.6.10 to collect the data. The collected data was then analyzed using pandas[2] data analysis library. For the Google Home Mini, a Wi-Fi hotspot was setup on the laptop to act as an access point that the Google Home Mini connects to. The laptop connects to the Internet through an Ethernet port. By capturing all the data coming in or going out of the wireless interface, we were able to capture all the traffic related to Google Home Mini. We needed to use a different setup to capture the traffic of the Netgear Arlo Pro cameras. The Netgear Arlo cameras come with a dedicated base station unit, the cameras connect to the base station using Wi-Fi and the base station connects to the Internet through an Ethernet port. We connected the Ethernet port of our laptop (network protocol analyzer) to a switch port that is configured to mirror all the traffic coming from the Arlo cameras base station, and setup Wireshark to capture all the traffic of the Ethernet interface.

What Do Your Smart Home Devices Reveal About You?

105

Fig. 1 Experiment setup to tap messages to or from a smart home device

The next step is to take files generated by wireshark and process them using pandas. We removed all the irrelevant data such as broadcasts and multicasts, and kept only the TCP and TLS traffic. The filtered data is then used to create tables that we used to generate the plots. All our experiments were repeated at least three times.

4 Evaluation: Google Home Mini For all experiments with Google Home Mini, the device was triggered about 30 seconds after each controlled scenario was setup and the traffic was recorded for a duration of two minutes. We noted that Google Home Mini remains alert for about 8–10 seconds after it is triggered with “OK Google" voice command to listen for a question from the user. Our controlled scenarios for Google Mini can be divided into three different categories: baseline scenarios, standard operating scenarios, and non-standard operating scenarios.

4.1 Baseline Scenarios The baseline scenario category is comprised of scenarios where the device is not triggered at all and simply stays in the background. The goal here is to see if the

106

H. Boddupalli et al.

Fig. 2 Google Home: complete silence at home (no trigger)

Fig. 3 Google Home Mini: soft music playing in the background (no trigger)

adversary has any possibility of inferring anything about a home when the device is not being used at all. We have experimented with three different scenarios in this category: (1) when there is complete silence at home; (2) when some soft music is playing in the background; and (3) when the device is in a natural setting where there may be some background noise or conversation. Figures 2, 3 and 4 show the number of bytes transferred as a function of time for these three scenarios. As we can see from these figures, there is some message exchange between the Google Home Mini and the server even when the device is not triggered. We first suspected that this indicates that the Google Home Mini was eavesdropping, collecting data even when there is no trigger, especially for scenarios two and three.

What Do Your Smart Home Devices Reveal About You?

107

Fig. 4 Google Home Mini: some background noise or conversation (no trigger)

However, after further inspection and analysis we realized that the Google Home Mini periodically communicates with more than ten different servers even when it is in complete silence. Figure 4 shows the number of bytes being sent to the individual servers while Figs. 2 and 3 show the aggregate bytes. From the three figures we notice that other than the different spikes which occur at two to five minutes intervals and have values between 1 and 4 KB, the Google Home Mini sends less than 500 bytes/s of data. Upon further analysis, we found that the spikes are a result of regular encryption key exchanges, while the other low rate traffic is mainly comprised of keep-alive packets. As such, an adversary cannot distinguish between the three scenarios within the baseline category. However, as we will see in the next subsections the traffic rate due to keep lives is much lower than what the Google Home Mini sends when it is triggered. Therefore, if the adversary sees this type of data/plot, he/she will know for sure that no one is interacting with the Google Home Mini, and if this situation continues for a long time, he/she may conclude that no one is at home, particularly if he/she has access to some historical data to correlate with.

4.2 Standard Operating Scenarios The standard operating scenario category is comprised of scenarios where the device is used properly, i.e., it is triggered correctly with “OK Google” command and an appropriate question is asked within eight seconds after the trigger. We have experimented with five different questions in this category: (1) ask a question and get an answer; (2) ask a question and get no answer; (3) say “play some trivia”; (4)

108

H. Boddupalli et al.

Fig. 5 Google Home Mini: triggered and a questions asked; with answer received (top) and with no answer received (bottom)

Fig. 6 Google Home Mini: triggered; play some trivia (top), tell a joke (middle) and I am bored (bottom)

say “tell a joke”; and (5) say “I am bored.” Figures 5 and 6 show the number of bytes transferred as a function of time for these five scenarios. From both Figs. 5 and 6 we notice that a tall spike (normally above 60 KB) is produced whenever we trigger Google Home Mini. The shape of the plot then differs depending on what the user does after the trigger. In scenarios one and two when a normal question is asked, Fig. 5 shows that when we receive an answer (top), the plot becomes wider than when there is no answer (bottom). The same is true for scenarios three, four, and five shown in Fig. 6, where we do receive an answer for

What Do Your Smart Home Devices Reveal About You?

109

each scenario. In scenario five (I am board) (bottom) where we engage with the Google Home Mini we see the widest plot. Overall, we can make three observations from these results. First, as noted earlier, this category is clearly distinguishable from the baseline category for all scenarios. Second, the adversary can fairly well distinguish between the scenarios when no answer is received (Fig. 5 bottom) and when some answer is received (all other plots of Figs. 5 and 6). Finally, while it is difficult to clearly identify the exact scenario when an answer is received, the width of the spikes can be used to infer the length of the interaction time, i.e., the length of an answer.

4.3 Non-standard Operating Scenarios The non-standard operating scenarios category is comprised of scenarios where the device is triggered correctly with “OK Google” command, but is followed by unusual or improper usage. We have experimented with five different scenarios in this category: (1) not asking a question at all; (2) asking a very long question that is not answered; (3) talking gibberish for long times (10, 20, or 30 seconds); (4) triggering the device several times without asking any question at all; and (5) asking a proper question but only after some significant pause since the trigger. Figures 7, 8, 9, and 10 show the number of bytes transferred as a function of time for these five scenarios. These non-standard scenarios are the most revealing and most interesting ones for Google Home Mini. Since we are triggering the device, we notice the same initial spike that we saw in the standard scenarios. There is nothing special in scenarios one

Fig. 7 Google Home Mini: non-standard usage; triggered with “OK Google” and followed by silence (top), asking a very long question (bottom)

110

H. Boddupalli et al.

Fig. 8 Google Home Mini: non-standard usage; triggered with “OK Google” and followed by talking gibberish for long times (10 (top), 20 (middle), or 30 (bottom) seconds)

Fig. 9 Google Home Mini: non-standard usage; triggered with “OK Google” and followed by repeating “OK Google” very frequently for 30 seconds

and two, the plot (Fig. 7) gets a little bit wider when we ask a question. We do notice that in many cases the Google Home Mini captures only a part of the question if it is longer than 10–15 seconds. This is confirmed in scenario three (Fig. 8) where even though we talked for 30 seconds in one of our experiments the transfer of data lasted for less than 15 seconds. Looking at this figure, we notice that if we measure the time from the peak to the end of “hump” for the three plots we see that it almost the same at just over ten seconds. Scenario four (Fig. 9) is one of the interesting ones as it is very distinct. The width of the “hump" is 30 seconds, very long compared to the previous scenarios

What Do Your Smart Home Devices Reveal About You?

111

Fig. 10 Google Home Mini: non-standard usage; triggered with “OK Google” and followed by a question asked after a long pause; question is answered (top), question is not answered (bottom)

which last on average 15 seconds. Also the values in the plot are relatively high hovering around 40 KB/s. Even for scenarios where there is a long back and forth interaction with Google Home Mini, the values drop when the user stops talking and an answer is fetched from the server. So if the adversary notices this type of plot he/she can infer that either the user is being silly, e.g., like a child, or might be intoxicated. The fifth Scenario (Fig. 10) is also revealing. At first glance, it looks similar to the standard scenarios, but upon further inspection, we notice something different. After the initial spike the values drop to about 40 KB/s and stays there for about six seconds which is the amount of time we waited before asking the question and once we start talking/asking, it goes up by 10 KB/s. Overall, we make three important observations from our experiments in this category. First, this category is clearly distinguishable from the baseline category for all scenarios. Second, this category can be fairly well distinguished from the standard operating scenarios category as the width of the plots are generally longer. However, there are some exceptions here, e.g., scenarios reported in Fig. 7. Finally, silly behaviors like repeatedly saying “Ok Google” or talking gibberish after triggering the device are clearly distinguishable.

5 Evaluation: Netgear Arlo Pro There are two modes in this camera—armed (motion or sound detection on) and disarmed (detection off). The two modes can be set using the Arlo Pro app. A user can also live stream the video captured on his/her phone as and when needed. When the camera is armed, the user can choose between motion detection using the camera

112

H. Boddupalli et al.

IR sensor or sound detection using the microphone. The user can choose to get a notification every time some motion or sound is detected. As in the case of Google Home Mini, for all experiments with Arlo Pro camera, the device was triggered about 30 seconds after each controlled scenario was setup and the traffic was recorded for a duration of two minutes. Our controlled scenarios for Arlo Pro camera can be divided into four different categories: baseline scenarios, audio and motion interaction scenarios, field of view scenarios, and notifications scenarios.

5.1 Baseline Scenarios The baseline scenario category is comprised of a scenario where there is absolutely no activity at home—no motion or sound of any kind and the app notification is off. The goal here is to see what kind of traffic is produced by the camera and if the adversary has any possibility of inferring anything about a home when there is absolutely no activity at home, a possible indication that there is no one at home. Figure 11 shows the message traffic for this scenario. As we can see in the figure, there is a periodic pattern of TCP keep-alive packets (66 Bytes in size), which are essentially used to detect when and if the camera goes offline.

Fig. 11 Message traffic when the camera is not triggered

What Do Your Smart Home Devices Reveal About You?

113

5.2 Audio and Motion Scenarios The audio and motion scenario category explores the impact of audio and motion on the network traffic between the camera and the servers. We have experimented with five different scenarios here: (1) there is no movement in the field of view and there is no sound; (2) there is no movement in the field of view but there is some sound (music playing in the background); (3) there is movement in the field of view and there is no sound; (4) there movements in the field of view and there is sound as well; and (5) there is intermittent movement in the field of view and there is no sound. As we mentioned earlier, the camera can either be triggered by motion or by sound. In all five scenarios that we experimented with in this category, the camera was triggered using sound by tapping on the camera microphone. We chose the sound trigger to minimize the differences between the runs and between the relevant scenarios. We also used a video for the scenarios with motion to reduce the variations. Figures 12, 13, and 14 show the message traffic for these scenarios. In Fig. 12, we notice something very interesting; scenarios one and two have almost identical plots. In addition, after the initial large spike they have a very distinct saw-tooth shape. We conjecture that this shape is produced because of the way the video/audio stream is compressed and sent. In both scenarios, the camera is capturing the same still image that does not change for the whole duration of the experiment. Also in both scenarios the camera is recording sound, in scenario one there is no sound and in scenario two music is playing in the background. Since data size to capture sound is relatively much smaller compared to pictures, the difference in the amount of data that needs to be sent between the two scenarios is so small

Fig. 12 Message traffic when the camera is triggered but there is no motion, the blue line represents no sound scenario and the orange line represents the scenario with sound

114

H. Boddupalli et al.

Fig. 13 Message traffic when the camera is triggered and there is motion, the blue line represents no sound scenario and the orange line represents the scenario with sound

Fig. 14 Message traffic when the camera is triggered and there is intermittent motion

that the plots overlap majority of the times, and because of video compression, the plots keep oscillating between 110 and 40 KB/s. Figure 13 shows scenarios three and four when there is motion. We notice that again sound has no real impact on the shape of the plots but since the camera in capturing motion the amount of data that need to be compressed and sent in each frame is higher than that of scenarios one and two and is mostly close to the 100 KB/s level. Figure 14 shows the plot when we start with the camera capturing video with no motion and then something moves in the field of view of the camera for a short

What Do Your Smart Home Devices Reveal About You?

115

Fig. 15 Message traffic when two people are moving in front of the camera: Top plot is when they move very close to the camera and bottom plot is when they are moving far away from the camera

period and then move out of the field of view. In our experiment, we introduced movement that lasted for about 10 seconds. We can clearly identify this movement in the figure by the disruption in the saw-tooth pattern near the middle of the plot. Overall, we make three important observations. First, this category is clearly distinguishable from the baseline category, i.e., presence of motion or sound or both is clearly identifiable from when there is no sound or motion at all. Second, the presence or absence of sound cannot be identified (except the baseline category when the device not triggered). Finally, presence and duration of motion can be clearly identified.

5.3 Field of View Scenarios The field of view scenario category is comprised of natural home settings when there are some people at home, in which case it is expected that they will come in the field of view of the camera. We have experimented with two scenarios here: (1) two people walking close to the camera across its field of view; and (2) two people walking far away from the camera across its field of view. Figure 15 show the message traffic for these scenarios. Looking at the two plots in Fig. 15, we notice that when the movement is close to the camera (top plot), the saw-tooth pattern is disrupted near the beginning as oppose to the bottom plot where it is not disrupted. We reason that when the movement is close to the camera the whole video frame is changing thus more data need to be sent, whereas when the movement is far very small portion of the video frame is

116

H. Boddupalli et al.

changing requiring less data to be sent. This indicates that any motion that is far from the camera is indistinguishable from no motion at all.

5.4 Notifications Scenarios One feature that the Arlo Pro camara, and indeed most home security cameras provide is that users can choose to receive email alerts and push notifications whenever any motion or sound is detected. The notifications scenario category explores if an adversary can infer whether or not the user has turned the notifications on, which would potentially indicate whether the user is at home or not. We have experimented with several identical scenarios where the only difference was that notification was on in one and off in the other. In all our experiments, we were unable to find any differences in the data/plots that would allow us to differentiate between the two scenarios of notifications on or off. We also tried to see if there is a set of servers that show up only when the notification is turned on but that was inconclusive as well. We have chosen not to include those plots in the paper for brevity.

6 Privacy Leakage In this section, we discuss the findings from Sects. 4 and 5 and what privacy information is leaked using these or similar devices. With increasing awareness of security and privacy vulnerabilities, encryption of message content between smart home devices and servers is now commonplace. Typically security protocols used are TLS/SSL. However, it is important to note that while the content of the messages exchanged between the devices and the servers is encrypted, their headers are not. Thus, an adversary can infer the type, make and sometimes even the model of smart home devices that a user is using at home. This could be done either by using reverse DNS of server addresses or by simply knowing the list of IP addresses that the smart home devices connect to. For example, the Arlo Pro cameras connect to several amazon EC2 servers with some of the servers having “Arlo" in their name. Once an adversary knows the identity of these smart home devices, they can look for usage patterns to infer the possible activity going on at home. The first privacy leakage that we discovered in our experiments is that an adversary can infer whether there is someone at home or not with fairly high probability. Note that when there is no one at home, the devices will not be triggered, no one will say “OK Google" or tap the camera, and there will be no (or minimal) sound or motion. This situation corresponds to the baseline categories that we experimented with for both Google Mini and Arlo Pro camera and found that baseline categories in each device is clearly distinguishable from all other categories. Thus by noticing a traffic pattern in line with the baseline categories, the adversary would infer that there is no

What Do Your Smart Home Devices Reveal About You?

117

one at home. We do note that it is possible that there may be someone at home but does not trigger Google Mini or come in the camera view. This would be a baseline category scenario but the adversary would incorrectly infer that there is no one at home. However, the chances are relatively low for this situation unless the people at home are sleeping, e.g., in the night. In this respect any additional clues such as day vs night or knowledge of sleeping schedules of the residents in conjunction with observations from message traffic patterns from multiple devices would further help the adversary reinforce his/her inference about whether someone is at home or not with fairly high accuracy. Second, if the adversary infers that there is someone at home, he/she may be able to infer much finer details of the residents’ current activities with fairly high probability. This is because it is possible to infer whether the devices are being used in a normal standard way or non-standard way by observing the traffic patterns. In both devices, we observed that non-standard usage results in a traffic pattern that is distinguishable from standard usage or the baseline categories. A non-standard usage typically indicates erratic behavior that could be attributed to either children being at home (possibly alone) or people being intoxicated. Again, any additional clues such as knowledge of whether children reside at home or residents’ drinking habits in conjunction with observations from message traffic patterns from multiple devices would further help the adversary reinforce his/her inferences about the erratic behavior of the residents with fairly high accuracy. Ability to detect non-standard usage is especially problematic as it allows the adversary to gain finer grained information about the residents’ activities. We experimented with only a few non-standard usage in this paper. What other activities about the residents can be inferred from other kinds of non-standard usage is a future area of research for us. Finally, knowledge of additional contextual clues can provide more ammunition to the adversary to infer finer grained details about the home residents. One such contextual clue is if the adversary knows the location (which room or floor) these devices are installed in. For our two smart home devices, if the adversary knows that the devices are placed in two separate rooms/floors, simultaneous triggering of both devices would indicate that there are at least two people at home. Also, if the adversary knows that a single person lives at home, he/she can infer which room/floor the resident is in and in fact may be able to track his/her movement by observing the temporal sequence of device activations. We note that users may switch on/off some of the devices when they are at home or when they are away depending on the nature of the device. For example, in case of indoor security cameras, users may arm/activate the camera only when they are away. Similarly, users may switch off or disconnect voice service devices such as Google Home Mini when they are away. This would certainly change the traffic patterns and possibly limit the extent of privacy leakage.

118

H. Boddupalli et al.

7 Conclusions This paper explores the extent of privacy leakage from smart home devices in situations where an adversary has minimal capabilities, simply an ability to tap into and access/analyze all the Internet traffic coming in or out of a home, e.g., by having access to the home router or the router at the Internet service provider. The adversary cannot decrypt any messages exchanged, drop, alter or inject any messages nor can they alter the traffic flow. The key finding is that even such a weak adversary can infer quite a lot of privacy information about the home and residents. With a high probability, the adversary can infer if no one is at home at present and whether the residents currently at home are behaving erratically. Knowledge of any additional contextual information such as the sleeping schedules of the residents, whether children reside at home, drinking habits of the residents, or which room(s)/floor(s) the devices are located in can allow the adversary to infer much more finer grained information about the home residents. There are several future research directions that we plan to pursue. First, our current analysis has focused on observing the traffic pattern over time plotted as a graph. With large amount of traffic pattern data that we have been collecting, the next step would be to explore machine learning algorithms to develop classifiers that can predict with fairly high accuracy any information about home or resident activities. Second, we plan to explore additional non-standard usage of smart home devices to see if they can reveal any finer grained privacy information. Third, we plan to incorporate more smart home devices in our study and explore the extent of privacy leakage with the plethora of devices that are now becoming commonplace in our homes. Finally, while this paper is focused on detecting privacy leakage from smart home devices, the next step would be to explore possible solutions to prevent such leakage. For example, traffic shaping by sending extraneous data has been explored in literature to obfuscate network traffic patterns, but that comes at a cost of additional network bandwidth consumption. We plan to investigate other possible solutions that would involve less overhead.

References 1. Abrishamchi, M.A.N., Abdullah, A.H., David Cheok, A., Bielawski, K.S.: Side channel attacks on smart home systems: A short overview. In: IECON 2017—43rd Annual Conference of the IEEE Industrial Electronics Society, pp. 8144–8149 (2017) 2. Pandas Python data Analysis Library. https://www.pandaspydata.org (2020). Accessed 21 July 2020 3. Apthorpe, N., Reisman, D., Sundaresan, S., Narayanan, A., Feamster, N.: Spying on the smart home: Privacy attacks and defenses on encrypted IoT traffic. CoRR abs/1708.05044 (2017), http://arxiv.org/abs/1708.05044 4. Back, A., Möller, U., Stiglic, A.: Traffic analysis attacks and trade-offs in anonymity providing systems. In: International Workshop on Information Hiding, pp. 245–257. Springer, Berlin (2001)

What Do Your Smart Home Devices Reveal About You?

119

5. Breach, C.B.R.E.I.M.S.H.D.: https://www.forbes.com/sites/daveywinder/2019/07/02/ confirmed-2-billion-records-exposed-in-massive-smart-home-device-breach (2020). Accessed 28 July 2020 6. Copos, B., Levitt, K., Bishop, M., Rowe, J.: Is anybody home? inferring activity from smart home network traffic. In: 2016 IEEE Security and Privacy Workshops (SPW), pp. 245–251 (2016) 7. Cui, A., Stolfo, S.J.: A quantitative analysis of the insecurity of embedded network devices: results of a wide-area scan. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 97–106 (2010) 8. Deep, W.G.: https://www.wireshark.org (2020). Accessed 21 July 2020 9. Denning, T., Kohno, T., Levy, H.M.: Computer security and the modern home. Commun. ACM 56(1), 94–103 (2013) 10. Felten, E.W., Schneider, M.A.: Timing attacks on web privacy. In: Proceedings of the 7th ACM Conference on Computer and Communications Security, pp. 25–32 (2000) 11. Frustaci, M., Pace, P., Aloi, G., Fortino, G.: Evaluating critical security issues of the IoT world: Present and future challenges. IEEE Int. Things J. 5(4), 2483–2495 (2018) 12. Ho, G., Leung, D., Mishra, P., Hosseini, A., Song, D., Wagner, D.: Smart locks: Lessons for securing commodity internet of things devices. In: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pp. 461–472 (2016) 13. The Internet Isn’t Going Away, T.B.T.B.: https://www.wired.com/2016/12/botnet-brokeinternet-isnt-going-away/ (2020). Accessed 28 July 2020 14. Morgner, P., Mattejat, S., Benenson, Z.: All your bulbs are belong to us: Investigating the current state of security in connected lighting systems (2016). Preprint arXiv:1608.03732 15. Murdoch, S.J., Danezis, G.: Low-cost traffic analysis of tor. In: 2005 IEEE Symposium on Security and Privacy (S&P’05), pp. 183–195. IEEE, Piscataway (2005) 16. Oluwafemi, T., Kohno, T., Gupta, S., Patel, S.: Experimental security analyses of nonnetworked compact fluorescent lamps: A case study of home automation security. In: {LASER} 2013 ({LASER} 2013), pp. 13–24 (2013) 17. Say they care about privacy but they continue to buy devices that can spy on them, P.: https:// www.vox.com/recode/2019/5/13/18547235/trust-smart-devices-privacy-securityg (2020). Accessed 28 July 2020 18. Wang, T., Cai, X., Nithyanand, R., Johnson, R., Goldberg, I.: Effective attacks and provable defenses for website fingerprinting. In: 23rd {USENIX} Security Symposium ({USENIX} Security 14), pp. 143–157 (2014) 19. Wu, D.J., Taly, A., Shankar, A., Boneh, D.: Privacy, discovery, and authentication for the internet of things. In: European Symposium on Research in Computer Security, pp. 301–319. Springer, Berlin (2016) 20. Zeng, E., Mare, S., Roesner, F.: End user security and privacy concerns with smart homes. In: Thirteenth Symposium on Usable Privacy and Security ({SOUPS} 2017), pp. 65–80 (2017)

A Quantum-Resistant and Fast Secure Boot for IoT Devices Using Hash-Based Signatures and SRAM PUFs Roberto Román and Iluminada Baturone

1 Introduction The Internet of Things (IoT) is one promising technology that has grabbed the attention of researchers from academia and industry over the world. It can be defined as an interaction between the physical and digital worlds using a plethora of sensors and actuators where the data collected by the sensors have to be processed to derive useful inference from it [1]. As the cyber-physical systems are interconnected, a threat to one device can propagate to other devices having catastrophic consequences that directly affect human life [2]. For this reason, ensuring trustworthiness of IoT devices is a vital task for IoT widespread. In this work, three main parties are considered in IoT applications: devices, manufacturers, and users. The users are persons or organizations that acquire the devices from the manufacturers and want to check that the devices are trusted. The devices are often microcontroller-based embedded systems programmed to perform the wanted application in the IoT context. The manufacturers produce the devices and can also develop their firmware. A widespread security feature of microcontrollers is secure boot, whose goal is to check that the code to be executed is the expected one. The boot process usually consists of several stages executed in chain. The first code to be executed is stored internally in a One-Time Programmable (OTP) memory such as the internal ROM of the microcontroller. The code of the next stages is commonly stored in an external memory such as a flash memory or an SD card. Depending on the used device, the number of boot stages can vary. At the end of the boot process, it is located the application firmware. This work is focused on

R. Román () · I. Baturone Instituto de Microelectrónica de Sevilla, Universidad de Sevilla, CSIC, Sevilla, Spain e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_8

121

122

R. Román and I. Baturone

a two-stage boot process for simplicity and since it is the one used widely by IoT devices [3]. The thread model considers that the manufacturer is honest, that is, in a registration phase, prior to use the device in its application context, the manufacturer programs all the code without introducing backdoors or malware. Once devices are deployed, attackers can have physical access to them, especially in IoT applications. Besides, attackers can manipulate the content of external non-volatile memories but not the OTP memory, which is read-only. Hence, this internal OTP memory is inherently trusted and acts as a Root of Trust. Two kinds of cryptographic primitives can be used to ensure the integrity and authenticity of the code in a secure boot process: symmetric primitives such as HMACs, where a private key is involved in the verification procedure, and digital signatures, where a non-sensitive public key instead of a sensitive private key is used in the verification. The later solution is easier to implement securely since the integrity of a public key is easier to manage. Besides, digital signatures provide nonrepudiation, that is, the party that signed the code cannot claim later not having done that, which can be crucial in several scenarios, like those related to forensics. In fact, digital signatures are the preferred choice for securing the integrity and authenticity of the application firmware or kernel. The disadvantage of digital signatures against HMACs is their slower execution time. There are hybrid solutions such as the first secure boot version of the ESP32 microcontroller, widely used in IoT devices [3]. In the one side, a digest of the firststage software bootloader is generated, using AES in conjunction with a SHA512 hash function. Then, it is verified with the AES secret key stored in an eFuse. In the other side, the application firmware is signed and checked with the manufacturer’s public key. However, a vulnerability was discovered in which the secret key needed in the software bootloader verification could be easily read from the eFuses through an injection fault attack based on voltage glitches. Then, a new version using digital signatures in both software bootloader and application firmware was released. A secure boot solution needs to be considerably fast in certain applications like transportation systems or medical devices. For example, in the ventilators used extensively in the current COVID-19 pandemic, if any intermittent error such as some system crash makes one of them reboot and it is not done in a fast way, the patient can be in danger. Another requirement is that modern systems need to be prepared for the post-quantum era, since the digital signatures used nowadays such as RSA and ECDSA will be broken by a quantum computer able to carry out Shor’s algorithm. Among the post-quantum resistant signatures, hash-based signatures have been studied in depth and their security is well understood [4]. Although there are a wide variety of signature schemes that use cryptography based on hashes, this manuscript focuses on the eXtended Merkle Signature Scheme (XMSS). XMSS is constructed upon a one-time signature (OTS) scheme called WOTS+ , which uses a different key pair for signing each message [5]. This paper proposes the use of a post-quantum resistant hash-based signature for verifying the application firmware, with the modifications proposed in [6] to increase the speed of the verification process. It is also proposed the use of an

A Quantum-Resistant and Fast Secure Boot for IoT Devices Using Hash-Based. . .

123

HMAC for verifying the software bootloader but replacing the storage of the private key in an eFuse by its obfuscation with an SRAM Physical Unclonable Function (PUF). The latter exploits the manufacturing variability of SRAM memories to create a unique device identity, only able to be reconstructed by the device itself [7]. The use of physically obfuscated keys based on SRAM start-up values (PUF responses) is easier and cheaper to implement in resource-constrained embedded devices than cryptographically secure non-volatile memories. PUFs need a public piece of data called Helper Data (HD) for key reconstruction. Using the SRAM cell classification proposed in [8], a simple repetition error correcting code can be used to recover the secret key. The paper is organized as follows. Section 2 summarizes the related work in the literature and the main contributions of this work. Section 3 reviews the basis and latest modifications of stateful hash-based signatures. The proposal is presented in Sect. 4 and implementation results obtained with IoT boards based on ESP32 microcontrollers are shown in Sect. 5. Finally, conclusions are given in Sect. 6.

2 Related Work and Main Contributions The works in literature most related with the proposal of this paper are summarized in the following. In [9], it is proposed to use the start-up values of the on-chip SRAM cells to generate and reconstruct a primary secret seed from which a symmetric key and a unique device key pair are generated. The proposal uses an ARM platform with TrustZone technology. The Root of Trust is a BootROM that verifies the signed software in charge of the operations with the SRAM start-up values. The derived symmetric key is used to decrypt the secure operating system image and secure services of the device, so that the device is the only one able to decrypt them by using its on-chip SRAM. Then, if the signature of the manufacturer is verified, the recuperated device key pair is used by the secure services to seal/unseal sensitive data. They do not employ explicitly digital signature schemes resistant to quantum attacks. They do not classify the SRAM cells in a registration phase and they found that the minimum entropy obtained in the start-up values (in an SRAM chip of the type IS61LV6416-10TL) was about 5.5%. They needed to run 6 times a [1020,43,439]-BCH code to obtain 256 error-free bits from 6120 start-up values. The work in [10] proposes to use the start-up values of an on-chip SRAM to generate and reconstruct the seeds needed for signing with the XMSS scheme. Three ESP32 microcontrollers and their internal SRAM memories were used to carry out the experiments. Since SRAM cells were classified in a registration phase, the minimum entropy obtained in the start-up values was about 69.5% and an 8bit repetition error correcting code was used to obtain a 256 error-free bits from 2048 start-up values. Execution times for sensitive data reconstruction and signature generation are shown for an implementation of the RFC8391 specification [11] in the ESP32. The used parameters are n = 256 bits, w = 16, and h = 16 (see Sect.

124

R. Román and I. Baturone

3 for the meaning of these parameters). Execution time for signature verification is not provided. The mbedtls library was used to implement the SHA256 hash function without hardware acceleration. A description of a software-hardware co-design for the XMSS scheme specified in RFC8391 is presented in [12]. The design uses a RISC-V embedded processor. Firstly, they propose two software optimizations of the SHA256 hash function, exploiting that most of the inputs have a fixed length and that the internal state of the hash can be precomputed. Then, they develop several hardware accelerators to speed up the most timing-consuming operations in XMSS, which are the SHA256 hash function, the WOTS-chain computation, and the XMSS-leaf generation. The proposals were implemented and evaluated on an Intel Cyclone V SoC FPGA. A modification of the XMSS scheme specified in RFC8391 is performed in [6] to reduce the execution times of key generation, signature generation, and verification steps. They use the “simple” instantiations of tweakable hash functions presented in [13], simplifying the WOTS-chain computation and the XMSS-leaf generation considerably. As a result, the pure software implementation of their modification evaluated on an ARM Cortex-M4 microcontroller shows a significant speedup. The parameters used are n = 256, w = {16, 256}, and h = {5, 10}. In [14], the authors propose the first quantum-resistant secure boot solution fully implemented in hardware. They use a Chain of Trust in which each signed stage is verified with the public key contained in the payload of the previous one. The public key to verify the first stage is stored in a one-time-programmable (OTP) memory, so that it cannot be modified. The XMSS scheme specified in RFC8391 is used to verify the signatures. As in [12], they accelerate in hardware the WOTS-chain computation and the XMSS-leaf generation. In addition, they accelerate in hardware the Merkle tree root computation. Their implementation with the specific parameters n = 256 bits, w = 16, and h = 10 is evaluated on a Xilinx Kintex-7 FPGA. The main contributions of the work presented in this paper are the following: • To use a simple first-stage verification in the boot process that employs an SRAM PUF to recuperate the secret key used to compute an HMAC. The simple repetition error correcting code used to correct flipping bit errors from SRAM cells makes this verification stage fast. More complex error correcting codes are avoided because selected SRAM cells are classified in a registration phase. • To take advantage of that a XMSS signature verification can be divided into a WOTS+ signature verification and an authentication path verification and that the IoT device only needs the first part. It is proposed that the IoT device verifies only the WOTS+ signature and the device user, externally, can check the authentication path when the device is purchased. This reduces overhead on the secure boot process. • To use the “simple” instantiations of tweakable hash functions proposed in SPHINCS+ in the context of secure boot to speed up the second-stage verification, which uses a WOTS+ signature.

A Quantum-Resistant and Fast Secure Boot for IoT Devices Using Hash-Based. . .

125

• To evaluate the proposal with several security (n) and Winternitz parameters (w) in an ESP32 microcontroller, widely employed in IoT applications. Its SHA accelerator is used to speed up the proposal.

3 Hash-Based Signatures A digital signature is a cryptographic primitive that gives very strong reasons to believe that a message was created by a known signer and that it was never altered. These security properties are called authenticity and integrity, respectively. Digital signatures also provide non-repudiation, which means that the signer cannot successfully claim he or she did not sign a message. Digital signatures are basic primitives of asymmetric cryptography and, hence, they use both private and public keys. The message signer uses a signature generation algorithm along with its private key to sign the message and the message verifier uses a signature verification algorithm along with the signer’s public key to verify the signature. The signer’s private key must be kept secret, but the public key can be known by anyone without compromising the security of the scheme. In summary, digital signatures provide a layer of validation and security to messages sent through non-secure channels. Unlike RSA and ECDSA signature schemes, hash-based signatures are resistant to post-quantum attacks. WOTS+ and XMSS schemes are explained in the following subsections. XMSS is built on top of WOTS+ to allow more signatures per key pair since only one message can be signed per key pair with a One-Time Signature (OTS) scheme. The XMSS scheme is specified in RFC8391 but new works appeared in literature that deviate from it to reduce the complexity and the number of hashes to do [6]. In this work, WOTS+ and XMSS schemes are used with some modifications explained in SPHINCS+ [13].

3.1 WOTS+ Scheme WOTS+ scheme is a variant of a well-studied OTS scheme called Winternitz OneTime Signature (WOTS) scheme that in turn is a variant of the Lamport signature scheme published in 1979 [15]. The scheme employs two parameters: the security parameter n and the Winternitz parameter w. In addition, a hash function H has to be selected, being SHA256, SHAKE256 and Haraka the ones proposed in [13]. The parameter n has relation with the security of the selected hash function and its output size (in this work expressed in bits) while the Winternitz parameter w establishes a tradeoff between size and speed. Lower values of w lead to higher speed but bigger keys and signatures. The RFC8391 specification and NIST recommendations [16] only contemplate a value of 16 for w, but 4 and 256 are other values considered in literature [13]. Since in this

126

R. Román and I. Baturone

work it is wanted to achieve a low-latency secure boot, it is contemplated also the value of 4 for w. The other sub-parameters defined from n and w are the following (where x and x are the ceil and floor functions of x): t = log2 (w) len1 =

n t



log2 len1 + 1 + t len2 = t len = len1 + len2

(1)

(2)

(3)

(4)

A relevant sub-parameter is len since n·len is the size of the secret key, the (uncompressed) public key, and the signature. Let us focus on the WOTS+ scheme using the “simple” instantiations of tweakable hash functions used in SPHINCS+ . In [13] it is mentioned that the “simple” instantiations are faster than the “robust” ones but the cost to pay is a security argument that entirely relies on the random oracle model leading to weaker security assumptions. WOTS-like signatures work by applying a function F iteratively over some blocks of data. Given a hash function H: {0,1}* → {0,1}n , the function F is defined as a tweakable hash function F: {0,1}n × {0,1}256 × {0,1}n → {0,1}n that maps a public seed PKSEED , an address ADRS, and an input M to an n-sized value as follows (where || represents the concatenation operator): F (PKSEED , ADRS, M) = H (PKSEED  ADRS M)

(5)

PKSEED is a fixed value while ADRS is a value that is different each time a tweakable hash function is called in the scheme. A chaining function F of length x applied to an input in is the function F applied x times with the value of in as the first input and the output of one iteration being the input M of the next one, thus making a chain. The OTS secret key SKOTS is composed of len ski portions of size n generated uniformly at random. They are generated from a secret seed SKOTS, SEED of size n uncomp and a Pseudo-Random Function (PRF). The OTS public key PKOTS is obtained by applying a chaining function F of length w − 1 to each portion ski . The resulting (uncompressed) public key is compressed from n·len to n bits using a tweakable hash function Thlen : {0,1}n × {0,1}256 × {0,1}n·len → {0,1}n as:

A Quantum-Resistant and Fast Secure Boot for IoT Devices Using Hash-Based. . .

127

uncomp uncomp = H PKSEED  ADRS PKOTS PKOTS = Thlen PKSEED , ADRS, PKOTS (6) Earlier versions of WOTS+ , such as the one used in RFC8391, use L-Trees to uncomp compress the OTS public key PKOTS . An L-Tree is an unbalanced binary tree very similar to a Merkle tree (see next subsection) whose root node is the compressed OTS public key. The use of the tweakable hash function instead of an L-Tree makes the compression simpler and faster, as mentioned in [6]. To sign a message m, digital signature schemes usually apply a hash function to it, previously to apply the signature algorithm. The hash result is called the message digest md . XMSS and SPHINCS+ schemes compute the digest not only with the message but with additional pseudorandom data (see next subsection). The message digest md is firstly divided in len1 message blocks of length t bits. The value mi of each block ranges from 0 to w − 1. Later, a checksum is computed as: C=

len1−1

(w − 1 − mi )

(7)

i=0

The checksum C is divided in len2 blocks of t bits. Thus, the concatenation of len1 message blocks and len2 checksum blocks form len blocks with value bi . Finally, a chained function F of length bi is applied to the corresponding secret key portion ski . The concatenation of each output forms the OTS signature, SignOTS . Note that the signature is formed by len portions of n bits. The signature verification is straightforward. The bi values are computed from the message m as was explained with the signature. Later, it is applied a chained function F of length w − 1 − bi to each n-bit portion of the received signature. Finally, this value is compressed with the tweakable hash function Thlen shown in (6). The result should be the OTS public key PKOTS if the message has preserved its authenticity and integrity.

3.2 XMSS Scheme A Many-Time Signature (MTS) scheme can be created by concatenating N OTS key pairs and preventing key reuse by managing an index s. The pitfall of this simple solution is that the resulting MTS public key would be too large for its storage and distribution. This can be mitigated by compressing the N OTS public keys into one MTS public key PKMTS using a data structure called Merkle tree. The Merkle tree considered herein is a complete binary tree that uses the compressed OTS public keys as leaves and the parent nodes of each level are computed by applying a function H to the concatenation of its two child nodes Mc1 and Mc2 . In [13] it is used a tweakable hash function H: {0,1}n × {0,1}256 × {0,1}n × {0,1}n → {0,1}n as:

128

R. Román and I. Baturone

Fig. 1 Merkle tree of h = 3. The authentication path for a signature with s = 3 is highlighted

H (PKSEED , ADRS, Mc1 , Mc2 ) = H (PKSEED  ADRS Mc1 Mc2 )

(8)

The root of the tree is the Many-Time Signature (MTS) public key PKMTS . Each WOTS+ secret seed can be generated from a main SKSEED and a PRF along with the index s. The height h of the Merkle tree determines the total number of signatures N that can be generated as N = 2h . A toy example is shown in Fig. 1, where a tree of height 3 is used to compress 8 OTS public keys. The message digest md is computed by hashing the message along with a pseudorandomly generated value R, the public seed PKSEED , and the MTS public key PKMTS . R is computed using a PRF, the message itself, and a secret value SKPRF . The signer must update securely the index s every time an OTS signature SignOTS (s) is generated. The verifier should know the index s that indicates which of the N secret keys was used for the OTS signature. Besides, the verifier needs the array of nodes required to reconstruct the MTS public key PKMTS from the reconstructed OTS public key PKOTS (s). This array of nodes is called the authentication path, Auth(s), and must be generated by the signer every time a message is signed. The authentication path for the previous toy example and an index s = 3 is shown in gray in Fig. 1. Its computation is not a trivial task and a tradeoff between speed and memory consumption needs to be satisfied. The most balanced algorithm found in literature is [17]. Note that while the computation of the authentication path is not an easy task, its use for verifying the signature is very simple. This makes the verification step less complex than the signature generation. The work in [13] proposes a slight modification in F, Thlen , and H when SHA256 is used as the hash function by adding a padding after PKSEED . This allows the precomputation of the state when the first fixed 64-byte block is applied and, hence, fewer calls to the SHA256 compression function are needed in runtime. Another optimization proposed in [13] is to compress the ADRS value from 32 bytes to 22 bytes so that fewer calls to the SHA256 compression function are required. If a

A Quantum-Resistant and Fast Secure Boot for IoT Devices Using Hash-Based. . .

129

parameter n of size smaller than 256 bits is used, the first n bits of the output are taken from F, Thlen , or H and the remaining are discarded.

4 Proposed Solution The proposal presented in this paper is that the device, once registered, is able to boot securely following a first verification based on symmetric cryptography and a second verification based on asymmetric cryptography. It is followed the same two-stage boot used in the ESP32 microcontroller [3]. The code of the first-stage bootloader will be called herein BootROM and resides in a OTP memory laid down during IC fabrication. It is implicitly trusted and acts as the Root of Trust of the device. The code of the second-stage bootloader will be called herein Software Bootloader (SBL) as in [3]. It resides in an external non-volatile memory and is not implicitly trusted. The BootROM carries out initializations and the first verification, which is addressed to check the integrity of the Software Bootloader using a symmetric cryptographic primitive. Implicitly, the authenticity of the device hardware is checked in this stage using PUFs. The Software Bootloader decides which application to run based on the information located on a partition table. If the first verification is successful, the Software Bootloader is loaded and carries out the second verification on the Application Firmware. Unlike the verification of Software Bootloader, the Application Firmware is verified using a digital signature. Asymmetric algorithms in this case are more interesting than symmetric ones since the Application Firmware is intended to be verified by any user and not only by the microcontroller. If the second verification is successful, the Software Bootloader loads the Application Firmware and the IoT device can be operative in its application field. Since the users can check that the devices are trusted, they gain confidence in the IoT application. It is common to update the Application Firmware in some point of time. The i-th version of Application Firmware is denoted herein as APPi . The authenticity and integrity of the Software Bootloader is verified using a keyed-hash message authentication code (HMAC) based on a cryptographic hash function and a secret key K. The data needed to verify the second stage are verified in the first stage. The registration phase allows establishing a trusted communication channel between the device and the manufacturer to agree on the use of the secret key K. It is neither stored anywhere nor known by the manufacturer. It can be reconstructed only by the authentic microcontroller using its PUF response and the public data (Helper Data, HDK , and Identification Mask, IDMASK ) generated during the registration phase and stored in the flash memory. The verification fails if a cloned or counterfeit device uses the public data in the non-volatile memory. The authenticity and integrity of the Application Firmware is verified with hashbased signatures. The manufacturer signs the Application Firmware with XMSS, thus providing the signature index s, the WOTS+ signature SignOTSM (APPi , s), and the authentication path AuthM (s). It is assumed that the manufacturer public

130

R. Román and I. Baturone

key PKMTSM is distributed securely to the users. Hence, any user can verify this signature. The OTS public key PKOTSM (s) needed to verify this signature was verified in the first stage along with the Software Bootloader. In order to reduce the cost from the point of view of the device, it is proposed that the device only verifies the WOTS+ signature SignOTSM (APPi , s) instead of the entire XMSS signature. Thus, the microcontroller uses the One-Time public key PKOTSM (s) which is the s-th leaf of the Merkle tree with the manufacturer public key PKMTSM as the root node. Doing so, it is needed less non-volatile memory to store the signature since the authentication path AuthM (s) does not need to be stored. This can be interesting in a secure boot context since the firmware signature verification has a permanent cost in non-volatile memory storage during all the device lifetime. The manufacturer includes the OTS public key PKOTSM (s) and the index s along with the Software Bootloader in the registration phase, so that the first-stage verification certifies the integrity of the OTS public key. The different phases of the device are summarized in the following.

4.1 Registration Phase Four steps are carried out in this phase. First, the manufacturer signs the Application Firmware with the corresponding OTS secret key (identified by the index s) generated from its secret seed SKSEED, MTSM . In order for the device to verify the signature, the data stored in its external non-volatile memory is the WOTS+ signature SignOTSM (APPi , s) and the compressed WOTS+ public key PKOTSM (s) along with s, PKSEED , and R (generated from SKPRF , see Sect. 3). In order for the user to verify the XMSS signature, the authentication path AuthM (s) is stored outside the device and supplied to the device user. Second, SRAM cell classification is performed to find which cells are stable cells and which ones are random cells. The start-up values of the stable cells do not change every time the internal SRAM of the device is powered down and up. In the other side, the best random cells provide flipping start-up values, “0” and “1” half of the time. To carry out this classification, the methodology found in [8] is used on a selected region of the SRAM memory. It is based on counting the number of 1’s provided by each cell start-up value after powering down and up the memory M times. The SRAM cells whose counter value is equal to zero or M are identified with a value of “1” in a binary mask called Stable Mask STBMASK . In order to avoid some sort of bias in the PUF response (i.e., unequal number of 0’s and 1’s), a de-biasing algorithm is applied. The resulting cells are named identification cells or ID cells and are identified with a value of “1” in a binary mask called IDMASK . This mask is stored in the external non-volatile memory of the device. The start-up values of SRAM cells whose counter value is equal to M/2 are considered random

A Quantum-Resistant and Fast Secure Boot for IoT Devices Using Hash-Based. . .

131

Fig. 2 Memory layout of external non-volatile memory of the device after registration phase

cells and are used as a seed to generate the secret key K. Generating the secret key internally avoids any leakage of sensitive information. Using the secret key K, the third step is to compute the HMAC of the Software Bootloader concatenated with the data needed by the device to verify the OTS signature of the second-stage verification. The resulting digest is denoted as D = HMACK (SBL s PKSEED R PKOTSM (s)) and is stored in the external nonvolatile memory. In the fourth step, the secret key K is encoded with a repetition error correcting code, and later is XOR-ed with the start-up values of a set of ID cells (the SRAM PUF response) as explained in [8]. The resulting non-sensitive data HDK , named Helper Data, are stored in the external non-volatile memory. Figure 2 shows the device non-volatile memory after the registration phase.

4.2 Secure Boot Operation At boot stage, the microcontroller first executes the instructions contained in the BootROM. The secret key K is reconstructed using the public data IDMASK and

132

R. Román and I. Baturone

HDK stored in the external non-volatile memory. The start-up values of the SRAM cells identified by the IDMASK are XOR-ed with the Helper Data HDK and the result is decoded by a repetition error correcting code. Later, the digest D is recomputed as was explained in the Sect. 4.1 and is compared with the one stored during the registration phase. If they match, the Software Bootloader is loaded. The second-stage boot performed by the Software Bootloader verifies the WOTS+ signature SignOTSM (APPi , s) of the stored Application Firmware using the previously verified WOTS+ public key PKOTSM (s). If it is verified, the Application Firmware starts its execution. The Application Firmware contains the verified public key PKMTSM that allows future secure firmware updates.

4.3 Secure Firmware Update Once the IoT device is operative in its application field and is executing its Application Firmware, it can receive a request of Over-The-Air (OTA) firmware update signed with XMSS. This request contains the WOTS+ signature SignOTSM APPj , t of the new Application Firmware (j-th version), the signature index t, the new OTS public key PKOTSM (t), and the authentication path AuthM (t). The device verifies if this request comes from an authorized updater using the verified public key PKMTSM . If it is verified, the device updates its Application Firmware. Then, the new digest D is recomputed with the new signature index, pseudorandom value R, and OTS public key PKOTSM (t), and the new information is stored in place of the older one so that a subsequent booting could verify it as explained in the Sect. 4.2.

5 Implementation Results The Espressif’s ESP32 microcontroller was selected for benchmarking the proposed solution. Concretely, a Pycom’s WiPy 3.0 board was used [18] and the default CPU frequency of 160 MHz was selected. It uses a flash memory of 8 MB as non-volatile memory. Since ESP32 microcontroller supports WiFi and Bluetooth, it is widely used in IoT devices. Besides, it has a SHA2 accelerator for SHA224, SHA256, SHA384, and SHA512 hash functions, which was used in the HMAC and “simple” WOTS+ implementations. The state precomputation mentioned at the end of Sect. 3 needs access to the internal state of the SHA2, but the SHA2 accelerator of the ESP32 does not allow it. In any case, it was checked that using the SHA2 accelerator without the precomputation optimization was faster than the SHA2 implementation using the mbedtls software library with the precomputation optimization. Although the size of the Software Bootloader in the ESP32 microcontroller can vary depending on the selected features, it was found that it occupied approximately 14–17 kB in simple cases and has a default limit of approximately 28 kB [3]. The

A Quantum-Resistant and Fast Secure Boot for IoT Devices Using Hash-Based. . .

133

Table 1 Execution times for signature verification with different schemes Implementation This work

XMSS reference, h = 20 ECDSA secp256r1

n 128 128 192 192 256 256 256 128

w 4 16 4 16 4 16 16 –

Exec. time (cycles) 819,746 1,889,200 1,270,169 3,056,158 1,877,672 4,132,626 15,365,568 32,597,900

Exec. time (ms) 5.12 11.81 7.94 19.10 11.73 25.83 96.03 203.74

time that took to compute an HMAC-SHA512 of a Software Bootloader of around 15 kB was 2.84 ms. The key size for an HMAC-SHA512 is recommended to be not smaller than 64 bytes [19], so it was the chosen size. The reconstruction of a key of this size using a repetition error correcting code of length 8 and a bitwise XOR function as Helper Data algorithm on an ESP32 was found in literature to be 3.20 ms [10]. Therefore, the first-stage verification takes approximately 6.04 ms. Table 1 shows execution times for signature verification of different schemes, excluding the message hashing in the comparison. This work refers to the “simple” WOTS+ verification with the public key compression using a tweakable hash instead of using an L-Tree. The results are compared with the XMSS reference implementation found in the RFC8391 specification, which is used in other solutions of secure boot [14]. The proposal is also compared with the ECDSA signature scheme using the curve secp256r1, which is widely used in IoT devices [20, 21]. In fact, it was chosen for the first secure boot version of the ESP32 microcontroller [3]. It was implemented using the micro-ecc library since it is efficient in embedded devices [22]. Results are shown in Table 1 with three different security levels and two different Winternitz parameter values. A security level of 128 bits is secure and widely used at the time of writing this work while security levels of 192 and 256 bits are for long-term and very long-term securities. The two last levels are the ones recommended in [16] and the specification XMSS RFC8391 only contemplates the security level of 256 bits. It is shown that for n = 256 the proposal of this work has a speedup of 3.72 in cycles in comparison with the RFC8391 reference implementation. The use of the tweakable hash function in (5) has the major impact on the speedup. For a security of 128 bits, the proposal of this work outperforms clearly the ECDSA scheme being 39.79 times faster in cycles for a Winternitz parameter of 4 and 17.25 times for a Winternitz parameter of 16. Note that the hashing of the Application Firmware is not taken into account since it can vary greatly depending on the application unlike the Software Bootloader, which tends to have a constant size. Table 2 shows the different sizes occupied in flash memory by the data needed for signature verification, depending on the security level and the Winternitz parameter values. Two cases are differentiated, on one hand a typical XMSS signature and on the other hand the proposal of using only the WOTS+ part. It is seen that the saving

134

R. Román and I. Baturone

Table 2 Comparison of sizes of data in flash memory needed for signature verification n 128 128 192 192 256 256

w 4 16 4 16 4 16

PK + Signature XMSS with h = 20 (bytes) 1475 931 2979 1779 4995 2883

PK + Signature WOTS+ (bytes) 1171 627 2523 1323 4387 2275

Saving (%) 20.61 32.65 15.31 25.63 12.17 21.09

Table 3 Comparison of total flash occupation between ECDSA and WOTS+ with n = 128

PK + Signature Flash code Total

Flash occupation ECDSA (bytes) 97 4197 4294

Flash occupation WOTS+ , w = 4/16 (bytes) 1171/627 694 1865/1302

achieved in stored data can vary from 12.17% (with n = 256 and w = 4) up to 32.65% (with n = 128 and w = 16). The Helper Data and the IDMASK occupy 512 and 606 bytes, respectively, according to [10]. As shown in Table 3, it was found that the proposed solution occupies even less flash memory than the ECDSA scheme mentioned earlier. While it is true that ECDSA signature occupies only 64 bytes and the public key can be compressed from 64 to 33 bytes, the code size of the signature verification of the WOTS+ scheme is much smaller. For the comparison, it is remarked that it was not taken into account the code size involving the call to SHA256 function because both schemes need it, even ECDSA. The code size for calling the SHA256 accelerator is of 915 bytes.

6 Conclusions A two-stage secure boot based on key obfuscation with on-chip SRAM PUFs and hash-based signatures has been proposed. The second stage, called Software Bootloader in this work, is verified with an HMAC by the first stage, the BootROM, which is stored in an internal OTP memory implicitly trusted. Costly secure nonvolatile memories are not required because the secret key is reconstructed by the PUF. The Application Firmware is signed by the manufacturer using the XMSS scheme, but the device verifies only the WOTS+ part of the signature. In addition, a secure Over-The-Air firmware update is included for completeness. The cryptographic operations related to the proposal were implemented in an ESP32 microcontroller making use of its embedded SHA accelerator. The firststage verification takes only 6.04 ms to be completed, taking into account a realistic

A Quantum-Resistant and Fast Secure Boot for IoT Devices Using Hash-Based. . .

135

size for a Software Bootloader on the ESP32 microcontroller. The digital signature verification in the second stage is very fast (from 5.12 to 25.83 ms, depending on the signature parameters) and reduces flash memory occupation in comparison with common solutions found in IoT devices such as ECDSA. The WOTS+ signature employed considers security levels from nowadays scenarios of 128 bits to future ones with 192 and 256 bits. We realize that the proposal is interesting in critical applications requiring short boot times and post-quantum resistant long-term security. Acknowledgments This work was supported in part by FEDER/Ministerio de Ciencia e Innovación—Agencia Estatal de Investigación/_TEC2017-83557-R and _RTC-2017-6595-7, and in part by Consejería de Transformación Económica, Industria, Conocimiento y Universidades de la Junta de Andalucía under Projects AT17_5926_USE and US-1265146.

References 1. Sethi, P., Sarangi, S.R.: Internet of Things: architectures, protocols, and applications. J. Electr. Comput. Eng. 2017, 9324035:1–9324035:25 (2017) 2. Maple, C.: Security and privacy in the Internet of Things. J. Cyber Policy. 2(2), 155–184 (2017) 3. ESP32—Secure Boot V1. https://docs.espressif.com/projects/esp-idf/en/latest/esp32/security/ secure-boot-v1.html. Accessed 8 Apr 2021 4. Buchmann, J., Dahmen, E., Hülsing, A.: XMSS—a practical forward secure signature scheme based on minimal security assumptions. In: Yang, B.Y. (ed.) PQCrypto 2011 LNCS, vol. 7071, pp. 117–129. Springer, Heidelberg (2011) 5. Hülsing, A.: W-OTS+ —shorter signatures for hash-based signature schemes. In: Youssef, A., Nitaj, A., Hassanien, A.E. (eds.) AFRICACRYPT 2013 LNCS, vol. 7918, pp. 173–188. Springer, Heidelberg (2013) 6. Campos, F., Kohlstadt, T., Reith, S., Stöttinger, M.: LMS vs XMSS: comparison of stateful hash-based signature schemes on ARM Cortex-M4. In: Nitaj, A., Youssef, A. (eds.) AFRICACRYPT 2020 LNCS, vol. 12174, pp. 258–277. Springer, Cham (2020) 7. Baturone, I., Prada-Delgado, M.A., Eiroa, S.: Improved generation of identifiers, secret keys, and random numbers from SRAMs. IEEE Trans. Inf. Forensics Secur. 10(12), 2653–2668 (2015) 8. Arjona, R., Prada-Delgado, M.A., Arcenegui, J., Baturone, I.: Trusted cameras on mobile devices based on SRAM physically unclonable functions. Sensors. 18(10), 3352:1–3352:21 (2018) 9. Zhao, S., Zhang, Q., Hu, G., Qin, Y., Feng, D.: Providing root of trust for ARM TrustZone using on-chip SRAM. In: Proceedings of the 4th International Workshop on Trustworthy Embedded Devices, pp. 25–36. ACM, New York (2014) 10. Román, R., Arjona, R., Arcenegui, J., Baturone, I.: Hardware security for eXtended Merkle Signature Scheme using SRAM-based PUFs and TRNGs. In: Proceedings of 32nd International Conference on Microelectronics (ICM), pp. 1–4. IEEE, New York (2020) 11. Hülsing, A., Butin, D., Gazdag, S., Rijneveld, J., Mohaisen, A.: XMSS: extended Merkle signature scheme. RFC. 8391, 1–74 (2018) 12. Wang, W., Bernhard, J., Wälde, J., Deng, S., Gupta, N., Szefer, J., Niederhagen, R.: XMSS and embedded systems. In: Paterson, K., Stebila, D. (eds.) Selected Areas in Cryptography—SAC 2019 LNCS, vol. 11959. Springer, Cham (2020) 13. Bernstein, D.J., Hülsing, A., Kölbl, S., Niederhagen, R., Rijneveld, J., Schwabe, P.: The SPHINCS+ signature framework. In: Conference on Computer and Communications Security,

136

R. Román and I. Baturone

pp. 2129–2146. ACM, New York (2019) 14. Kumar, V.B.Y., Gupta, N., Chattopadhyay, A., Kaspert, M., Krauß, C., Niederhagen, R.: Post-quantum secure boot. In: 2020 Design, Automation and Test in Europe Conference & Exhibition (DATE), pp. 1582–1585. IEEE, Grenoble (2020) 15. Lamport, L.: Constructing digital signatures from a one-way function. Technical Report, CSL98, SRI International Palo Alto (1979) 16. Cooper, D., Apon, D., Dang, Q., Davidson, M., Dworkin, M., Miller, C.: Recommendation for stateful hash-based signature schemes. Technical report, National Institute of Standards and Technology (2019) 17. Buchmann, J., Dahmen, E., Schneider, M.: Merkle tree traversal revisited. In: Buchmann, J., Ding, J. (eds.) PQCrypto 2008 LNCS, vol. 5299, pp. 63–78. Springer, Heidelberg (2008) 18. ESP32 development-boards. https://www.espressif.com/en/products/hardware/developmentboards. Accessed 8 Apr 2021 19. Krawczyk, H., Bellare, M., Canetti, R.: HMAC: keyed-hashing for message authentication. RFC. 2104, 1–11 (1997) 20. Mössinger, M., Petschkuhn, B., Bauer, J., Staudemeyer, R.C., Wojcik, M., Pöhls, H.C.: Towards quantifying the cost of a secure IoT: overhead and energy consumption of ECC signatures on an ARM-based device. In: IoTSoS 2016, pp. 1–6 (2016) 21. Bauer, J., Staudemeyer, R.C., Pöhls, H.C., Fragkiadakis, A.: ECDSA on things: IoT integrity protection in practise. In: Lam, K.Y., Chi, C.H., Qing, S. (eds.) ICICS 2016 LNCS, vol. 9977. Springer, Cham (2016) 22. Silde, T.: Comparative study of ECC libraries for embedded devices. Technical report, Norwegian University of Science and Technology (2019)

On the Analysis of MUD-Files’ Interactions, Conflicts, and Configuration Requirements Before Deployment Vafa Andalibi

, Eliot Lear

, DongInn Kim

, and L. Jean Camp

1 Introduction The Internet of Things (IoT) has diffused across the globe, and the estimates of IoT devices in the home range from billions to tens of billions. Yet, security has lagged [1]. The security of IoT devices is such that they are used to participate in DDoS attacks [18], are vulnerable to ransomware [46], and enable information exfiltration from within homes [7]. Media reports of abusive strangers engaging with families through IoT devices are not uncommon, e.g., [2]. Given the complexity of IoT devices, the lack of technical support, the level of technical expertise in the home, and the complexity of access control, how can these devices be managed? Manufacturer Usage Description (MUD) is an Internet Engineering Task Force (IETF) standard created in response to the requirements for access control and device isolation for IoT devices [22]. It addresses multiple challenges regarding IoT security by relying on manufacturers providing an Access Control List (ACL) that identifies services and addresses for those services. The goal is to isolate devices, particularly those that cannot be relied upon to provide their own protection. Unlike more traditional verification approaches, MUD can work with devices that have highly limited processing power. In addition, rather than a single entity creating policy, each manufacturer creates the access control that defines the situation for their own devices. The second goal of the MUD standard is to provide an identifier so that updates to devices can be implemented only from authenticated

V. Andalibi () · D. Kim · L. J. Camp Indiana University, Bloomington, IN, USA e-mail: [email protected]; [email protected]; [email protected] E. Lear Cisco Systems, Zurich, Switzerland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_9

137

138

V. Andalibi et al.

and authorized sources. With such functionality, errors in device configurations can be mitigated. We have chosen to focus on MUD because, in addition to being an IETF standard, MUD is also a core component of the National Institute of Standards and Technology (NIST) security for IoT Initiatives, particularly the thrust focused on stopping DDoS [6]. MUD can defend IoT devices in a home from other compromised ones in the household and on the network, with a specific goal of blocking the access of compromised devices to command and control channels. One of the core components of the MUD is the MUD-File which is essentially an access control statement. The MUD-File enumerates the allowed (or specifically disallowed) services and sources for these services. In the MUD standard, it is defined as “a file containing YANG-based JSON that describes a Thing and associated suggested specific network behavior” [22]. MUD-Files could possibly be long and complex, making their reading, reviewing, and modification a laborious task if performed manually. In this chapter, we present MUD-Visualizer that addresses this issue. MUDVisualizer provides (1) protocol checking to avoid formatting errors in the MUDFile, (2) optimization of the MUD-File which identifies internal inconsistencies and inefficiencies, and (3) visualization of the commands in MUD-Files. The first of these prevents coding errors. The second prevents logic errors. The third enables manufacturers and sysadmins to review, validate, and modify the MUD-Files prior deployment. The focus on coding, logic, and contextual errors aligns with the sources of most vulnerabilities [20].

2 Manufacturer Usage Description (MUD) Understanding the importance of the MUD-Files requires some understanding of the MUD standard. For the readers not familiar with the workflow of MUD, a brief summary of MUD workflow and its abstractions is presented in this section. Those familiar with MUD may want to continue to Sect. 3.

2.1 Components and Workflow An implementation of MUD has six main components as presented in Fig. 1: 1. MUD-File: This is a YANG-based JSON file (RFC 7951) created, signed with a public key signature, and distributed by the manufacturer that describes the expected network behavior of the device. 2. MUD-file server: On which MUD-File is hosted and the location of the file is embedded as a uniform resource locator called MUD-URI.

Analyzing MUD-Files Before Deployment

139

MUD File Server 6) MUD-File 5) Request MUD-File

Internet

7) MUD-File (Internet Connection)

4) Request MUD-File

MUD-Manager NAD

8) Configure 3) MUD-URL

2) MUD-URL

1) MUD-URL AAA Server

IoT Devices

Fig. 1 MUD workflow in a LAN: the blue dotted line indicates the boundary of the LAN

140

V. Andalibi et al.

3. MUD-URI: This is used to locate and download the MUD-File to the local network. 4. AAA Server: The Authentication, Authorization, and Accounting (AAA) server enforces the traffic rules on the devices in the network. This server can be either an independent server or a built-in component in the Network Access Device (NAD). 5. Network Access Device (NAD): It acts as a router in the network and is usually equipped with an internal Firewall component which is used by the MUDManager via AAA server to control the traffic and enforce rules. 6. MUD-Manager: It is the core of MUD architecture and is responsible for receiving the MUD-URI from the devices, retrieving the MUD-File from the MUD-file server, and communicating the MUD-File rules to the AAA server. The MUD workflow illustrated in Fig. 1 begins with the IoT device authenticating with X.509 certificate (although DHCP and LLDP are also available as means of authentication) transmitting the MUD-URI embedded in the device to the NAD. MUD-Manager will then receive the MUD-URI, validate the signature, retrieve the MUD-File from the manufacture’s MUD-file server, and then enforce the access control rules of the MUD-file to the network via AAA server.

2.2 MUD-File ACL Abstractions The ACLs in the MUD-Files are interpreted and enforced to the network by MUDManager. The ability to implement such rules enables a range of policies. Based on the MUD specifications, the contents of the MUD-File may include a range of default policies and defined types. There are seven approaches to define the behavior and constraints of a device in a MUD instance. These include constraints or identification of domain names, manufacturers, device class, device models, and the local context of the device. These extensions to the IETF-ACL were all addressed in our implementation. Each extension has the potential to simplify the use of MUD for the manufacturer and adopter. Yet the existence of all these options also drives the need for MUD-Visualizer. A useful abstraction for cloud access is the domain-name. Of course, domain names are a global Internet namespace, which is not always well-suited for a specific, geographically located IoT device. A second abstraction that is defined in the MUD specification is that of the local-networks. With local-networks abstraction, a node will be matched against the nodes in the same network. This is particularly useful for designs where there is an IoT hub in the house that is compliant with the local devices. Another constraining abstraction is that of the manufacturer. In this case, the Hostname of the target node would match against the authority component, e.g., domain name, or MUD-URL of another node. This constrains the reach of a device to other devices from particular manufacturers.

Analyzing MUD-Files Before Deployment

141

Beyond the basic domain name for every IoT device, MUD provides the indicator of same-manufacturer. In this abstraction, multiple devices, e.g., a house where its lightbulbs or outlets are from the same creator, can communicate only with each other. In operational terms, when devices use this extension, the authoritative component will be checked against that of another node. More than one extension can be used in the same file. Thus, there is also an option that identifies a device requiring contextual configuration. The controller extension identifies the care where the network administrator needs to assign the target devices to a particular class. This may be particularly useful when a single device embeds multiple services, and access control depends on these services. For example, a doorbell that offers wireless installation may also offer remote control for the owner, access to a home security firm, face recognition for local control, real-time video activity monitoring, social interactions (e.g., sharing moments), caregiving, and other services which impinge delegating access, network connections, data types, and port numbers. With my-controller abstraction, the device will leverage its MUD-URL and will signal the MUD-Manager to use whatever mapping it has for this MUD-URL to a particular group of hosts that would be used to manage or control this class of device. This mode requires a local decision-maker in the loop, whether that is a human or a process. With this option, the node initiates communication with the MUD-Manager with a request that the MUD-Manager assigns this node to a class. Classes of devices and devices from the same manufacturer may be inadequately constraining. The seventh requirement is that model, manufacturer, and class must all match, defined as model. The potential for conflicts with this set of seven defined options argues for the importance of the visualizing tool.

3 Motivation As described above, MUD’s role in enforcing the Principle of Least Privilege (PoLP) is to limit the devices’ reachability to a bare minimum by leveraging the manufacturers’ knowledge about their devices. For a given configuration, the ACLs should be validated (if not defined) manually to ensure PoLP, which makes this process prone to human error [27, 42]. Errors in the ACLs defined by the manufacturer will result in unwanted access control privilege escalation which poses a security risk to the network. For instance, consider a MUD-compliant smart slow cooker device that is only allowed to communicate with the manufacturer server and the manufacturer‘s site over the Internet. In addition, the slow cooker comes with a mobile application that allows controlling its functionality both online (via the Internet) and offline (locally). When used offline, the slow cooker will only use port 1300 for communication. Alice is a sysadmin in an enterprise network with more than 100 types of MUDcompliant devices. Bob decides to add the smart slow cooker to the kitchen of the office as a collegial act. A new MUD-File appears in Alice’s MUD-Manager.

142

V. Andalibi et al.

To understand all the possible connections, Alice would be required to have an encyclopedic awareness of each MUD-File in the network and be able to read these files while understanding the interactions. Given the reality of the enterprise, it is almost impossible to prevent employees from bringing in personal fitness trackers, slow cookers, rice cookers, microwaves, or space heaters all of which can be Internet-connected devices. Because of the number of devices and corresponding ACLs in the network, it is arguably beyond human cognition for Alice to identify the possible unnecessary communication between the slow cooker and the smart bulb which puts the network at risk with only text files [42]. Although there exist other tools that have the potential to identify the connection between the light bulbs and slow cookers or between the slow cooker and the attacker, MUD-Visualizer is one step ahead and tries to prevent these before they occur. Imagine instead Alice has the MUD-Visualizer. When each device is added, MUD-Visualizer can help her to implement informed threat modeling and flag unwanted communications. Using MUD-Visualizer, Alice could easily identify the connection and isolate the slow cooker from the internal system by adding an additional rule. Beyond the aforementioned example, there are several points made in the previous research studies indicating the importance of different aspects of a tool like MUD-Visualizer. We mention the most important of such works and corresponding points associated with MUD-Visualizer in the next section.

4 Related Work In this section, we present related work that fits in one of the following categories: studies that focus on MUD research and tools, as well as the researches that focus on the importance of Human–Computer Interaction (HCI) in mitigating human errors in regard to access control. Usable access control is a significant challenge, where even the relative responsibilities of the platform, the final user, and the developer are contested, e.g., [36, 40]. The tool described here is developed as a complement to the MUD [22]. At the time of writing this chapter, there are four main projects that implement the core of a MUD instantiation: the Cisco MUD-Manager [44], the Open-Source MUDManager [47], the MUD implementation of NIST [32], and CableLabs Micronets [31]. NIST has a special publication that thoroughly describes four different builds of MUD based on the abovementioned implementations for mitigating networkbased attacks [6]. Besides the MUD-Manager, Cisco also offers the mudmaker,1 which can be used for creating MUD-Files. We used mudmaker to generate a comprehensive

1 https://www.mudmaker.org.

Analyzing MUD-Files Before Deployment

143

MUD-File for the tests reported in Sect. 7. MUD Pretty Printer [21] is another tool developed to summarize ACL information on the MUD-files. However, it does not provide any user interface (UI) or visualization. With regard to MUDFile modification, Cisco has a recent patent that discusses techniques for providing secure modification of MUD-Files based on the device applications [23]. Regarding MUD deployment, there are some studies that focus on the effectiveness of MUD against DoS and DDoS attacks [4, 14, 29, 38]. Afek et al. [3] proposed an ISP-level system architecture that enforces the ACL upstream at the provider network to protect the IoT at scale. Additionally, they extended their MUDinteroperable architecture to support peer-to-peer protocols. With regard to combining Software-Defined Networking (SDN) and MUD, the authors in [33] present a scalable implementation of the MUD standard on OpenFlow-enabled SDN switches. Hamza et al. in [15] attempted to create flow rules based on MUD policies so that they can be enforced using SDN. Matheu et al. [25] employed SDN technique to make MUD model more flexible to support additional aspects such as data privacy, channel protection, and resource authorization. In another work, an SDN-based architecture was proposed to make the process of obtaining and enforcement of MUD policies secure [12]. In a proposed expansion to MUD, Matthíasso presented generating contracts and their local evaluation, [26]; this is complimentary to MUD-Visualizer as it assumes the existence of non-conflicting MUD-files. Some researchers focused on helping the manufacturers in the process of creating MUD-Files. Hamza et al. [17] described MUDgee, which uses the traffic of an IoT device to generate a MUD profile for that IoT device. Beyond the MUD-Files generated from mudmaker, we also performed some dry-run tests with the profiles provided by MUDgee project.2 NIST also has an ongoing open-source project in their GitHub organization entitled MUD-PD [43], which is similarly targeted at profiling IoT devices in order to leverage the MUD architecture. Feraudo et al. in their systematization of knowledge about MUD identified two challenges with MUD-Files that can be mitigated with the use of the MUDVisualizer [9]. The first of these is inconsistent implementations; MUD-Visualizer can be used on the underlying files to easily determine developer intent, thus simplifying differences in results. The other is of course consistent generation of MUD-Files. In another survey paper, Mazhar et al. [28] detailed the role of the MUD in the IoT ecosystem, including the implementation, its role in IoT security, and its integration in different security frameworks. They also review the benefits of MUD to the industrial IoT, telecommunication networks, smart home, Fog and Edge computing, and mobile application. MUD-Visualizer can facilitate the deployment of MUD in all of these applications. In a similar area of research related to configuration verification, Prabhu et al. [30] presented Plankton, which is proposed for network configuration verification. In another study, Fogel et al. [10] proposed a high-fidelity declarative model of low-

2 https://iotanalytics.unsw.edu.au/mudprofiles.

144

V. Andalibi et al.

level network configurations and implemented it as a tool called Batfish. Fayaz et al. [8] implemented ERA which can be used for bug detection in reachability policies. With regard to routing, ARC [13] finds the possible impact of routing protocols on the network’s data plan by abstracting their mechanics. Beckett et al. in [5] presented Minesweeper which can be used to ensure a wide range of intended properties, e.g., isolation among nodes, in the network. Unlike MUD-Visualizer, none of these offer visualizations. The earliest work in optimizing ACL interaction using HCI principles was by Maxion and Reeder [27]. They examine the Windows XP file permissions and found that the visualizer tripled the rate of assigned task completion and reduced errors in those completed tasks by up to 94%. This illustrates the importance of visualization and interaction design in access control. Similar to Vaniea et al. our work is grounded in the recognition of the difficulty of translating policy rules into access control rules [41]. We integrated their recommendations into our design; in particular, we integrate visual feedback while the developer is drafting the MUD-File. SPARKLE [41] followed the common visualization process of focusing on a presentation of data in a table, which is the visualization approach most commonly used. For example, Reeder et al. [34] developed an interactive matrix visualization, the Expandable Grid, to enable improved file permissions in Windows XP. This ACL visualization informed the design of MUD-Visualizer particularly in terms of understanding challenges to ease of cognition. In comparison, our approach provides a visualization based on flows more similar to the graph visualization approach by Kolomeets et al. [19], rather than asking developers to read rows or columns. Salim et al. took a different view examining access control as a case of decisionmaking under uncertainty [37]. They provided a formal method to quantify how much uncertainty is inherent in the Role-Based Access Control (RBAC), one that illustrates the level of complexity required to provide reliably correct access in an organization. Xu and colleagues investigated the role of uncertainty in access control decisions [45]. They implemented qualitative investigations into how system administrators resolve access control conflicts, as these human errors are a known source of security vulnerabilities. Their fundamental finding was that a lack of feedback forced administrations into a trail-and-error mode. MUD-Visualizer provides realtime visual feedback about changes in access control. This verifies a need for a high-level view that provides information about access requirements and settings in a network. The complexity they documented may be far greater with the expansion of IoT. Smetters et al. [39] in their study found that limitations in the UI would lead to the reluctance to change the access control settings. This finding applies to MUD deployment as well; it would be simply difficult and time-consuming for system administrators to manually evaluate the interaction between tens of types of MUD-File associated with their IoT device. We believe MUD-Visualizer’s UI is significantly easier to use compared to manual analysis, and we are going to thoroughly evaluate and show this in our future work.

Analyzing MUD-Files Before Deployment

145

Besides the lack of UI in the manual analysis of MUD-File, the other issue with manual analysis is processing errors. Liginlal et al. [24] focused on the importance of the analysis errors. They found that mistakes in the information processing stage constitute the most cases of human error-related privacy breach incidents, confirming the importance of MUD-Visualizer in MUD-File interaction analysis. Another source of user errors is called goal errors, i.e., the failures of users to understand what to do. The main source of goal errors is found to be poor information representation in the interface. A study of highly skilled programmers found that even these participants struggled with access control [35]. This indicates the importance of information representation of MUD-Visualizer compared to other text-based tools like MUD Pretty Printer [21]. From another perspective, the issues with conflicts in MUD-Files are comparable to the challenges in the SDN flow information base (FLIB). None of the aforementioned studies address the verification of MUD-Files. However, the previous work on SDN verification and human subjects research on access control informed the design of the MUD-Visualizer. The only work that attempts to help the IoT manufacturers and adopters of these devices in process of preparing or deploying MUD profiles is [16]. Their work focuses on different aspects compared to this study in two ways. First, similar to [17, 43], they focus on automatic generation of MUD profile based on network traffic. Second, their tool validates the consistency and compatibility of the generated profiles with organizational policies. Because those project generate MUD-Files, code-checking is less of a challenge. Conversely, since these products create MUD-Files automatically, logic errors can still be embedded. Their work does not have a visualization or usability component. MUD-Visualizer is a complement to the projects that automatically generate MUD-Files. To our knowledge, this is the only product targeted at the developers or sysadmins seeking to define or validate a MUD-File for a product to be deployed. MUDVisualizer is also unique in that it validates interactions and identifies possible conflicts prior to deployment (for the manufacturer) or at the time of deployment (for the user).

5 Methods Recall from Sect. 2 that the MUD-File of each IoT device typically contains access controls in the form of a whitelist. Each list entry contains information about one or more protocols and often a corresponding identifier, e.g., a domain name accessible only via SSH. The whitelist provides the identifier appropriate for the network layers of the protocol. Entries in this MUD-File list are called the Access Control Entries (ACEs). The first task of MUD-Visualizer is to determine how the ACE information of different devices interacts. We call this process ACE Merging. When we merge a set of ACEs of two devices, it is often possible that duplicates appear in the final list of

146

V. Andalibi et al.

Algorithm 1 Merging two protocol stacks 1: initialize empty protocols stack P Sout 2: procedure MERGEPROTOCOLSTACKS(P Ssrc , P Sdst ) 3: for each layer l in protocol stack do 4: for each protocol Pl in layer l do 5: if Plsrc ⊆ Pldst then 6: P Sout ← P Sout + Plsrc ∩ Pldst 7: end if 8: end for 9: end for 10: if isFullStack(P Sout ) then 11: return P Sout 12: else 13: return 14: end if 15: end procedure

merged protocol information. We address this issue by pruning the protocol stacks that are a subset of more generic protocol stacks. We call this process ACE Pruning. Both of these procedures are described in the following subsections.

5.1 ACE Merging When the abstractions of two devices in the network allow them to communicate, e.g., two devices supporting local network connections, their ACEs should be inspected and merged accordingly. Of course, it is possible that even with matching specifications, two devices should only communicate if there is a common factor between their ACEs. Hence, one of the important tasks of the MUD-Visualizer (specifically MUD-Network module described in Sect. 6) is to merge and validate the protocols of ACEs. This task is implemented by moving up the protocol stack and check whether the source protocol (sender) is a subset of the destination protocol (receiver) in that layer. If so, the intersection of the protocols is added to the resulted protocol stack. This procedure is implemented in Algorithm 1. We illustrate an example of this process in Table 1, where simple ACLs for two devices are presented (two ACEs per device). In this table, an intersection between the ACEs of these two devices exists. All possible pairs of ACEs from source and destination devices are checked against one another, and the common factors are saved. The protocol stack in this example contains the transports layer and network layer. The first ACE of the first device is [IPv4, UDP, any, any], representing the network, transport, source port, and destination port, respectively. The first ACE of the second device is [any, any, 5000, 400]. By merging these two ACEs, we find out that these two devices can only communicate if the network layer protocol is IPv4, the transport layer protocol is UDP, and source and destination ports are 5000 and 400, respectively. After a comprehensive matching of

Analyzing MUD-Files Before Deployment Table 1 Example of protocol merging between two devices

147

Dev1 Dev2 Merged

Network IPv4 Any Any IPv6 IPv4 IPv6 Any

Transport UDP TCP Any Any UDP TCP TCP

Src port Any 5000 5000 Any 5000 5000 5000

Dst port Any Any 400 8080 400 8080 400

all possible combinations of the ACEs from both devices, the result is three merged protocols. This result is presented in the third row of Table 1.

5.2 ACE Tree When merging the protocols of the two MUD-Files that contain many ACEs, redundant protocols are a likely result. For example, suppose two ACEs are merged to [IPv4, UDP, 400, 600] and another two ACEs merged to [IPv4, UDP, any, any]. The result of the second merge in this example is a superset of the first protocol, and therefore the first one should be pruned to prevent redundancy and further confusion. To implement this efficiently for each IoT device, a tree structure was created and associated with each communication destination of that device. Each level of this tree contains information about a layer of the ACE protocol stack. Note that for each layer in TCP/IP model, one could add more than one level in the tree as we will present in our next example. Moreover, at each level, we have added a wild card node in case the communication is allowed through multiple protocols in that particular layer. The recursive implementation of this procedure is presented in Algorithm 2. As an example for Algorithm 2, we present an ACE Tree built from the set of ACEs presented in the Original row in Table 2. In this example, the protocol stack has simply two layers: network layer and transport layer. However, as you can see, we have more than one level in the tree associated with the transport layer, i.e. transmission protocols (TCP/UDP) and ports. In simple words, for each ACE, the algorithm does as follows: it starts from the lowest layer (in this case transport layer) and gets the protocols associated with that layer, and if they are more than one, it adds a wild card to that level of the tree. The ACE Tree containing the information of ACEs presented in the Original row of Table 2 is presented in Fig. 2.

148

V. Andalibi et al.

Algorithm 2 Building the ACE tree 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29:

initialize node as a tree node for each ACE in list of ACEs do initialize node to root Updateacetree(ACE, node, n) end for procedure UPDATEACETREE(ACE, node, n) if node is null then return end if protocols ← GetLayerProtocols(ACE, n) if Count(protocols) > 1 then add a wild-card child Wn to node node ← Wn else add a child Cn to node node ← Cn end if ACE ← ACE[1:] Updateacetree(ACE, node, n-1) end procedure procedure GETLAYERPROTOCOLS(ACE, n) initialize protocols as an array for each protocol P in ACE do if P in layer then protocols ← protocols + P end if end for return protocols end procedure

Table 2 Result of the root to leaf tree traversal from the trees shown in Fig. 2 (Original) and Fig. 3 (Pruned)

Original

Pruned

Network IPv4 IPv4 IPv4 IPv4 IPv6 Any Any Any IPv4 IPv4 Any Any

Transport TCP TCP UDP Any UDP TCP UDP Any TCP Any UDP Any

Src port 80 Any 800 800 90 400 90 400 Any 800 90 400

Dst port 43 Any 520 520 120 480 120 480 Any 520 120 480

Analyzing MUD-Files Before Deployment

149

Algorithm 3 ACE pruning 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36:

initialize L, S, C, AL , AC to null procedure PRUNEACETREE(P T ) for each leaf L in the Protocol Tree P T do for each L’s sibling S in the Protocol Tree P T do if L ⊆ C then Prune(L) continue to the next leaf L end if end for for each L’s cousin C in the Protocol Tree P T do if L ⊆ C then n←1 AL ← nthAncestor(L, n) AC ← nthAncestor(C, n) while AL is not null and AL AC do if AL ⊂ AC then Prune(L) continue to the next leaf L end if n←n+1 AL ← nthAncestor(L, n) AC ← nthAncestor(C, n) end while end if end for end for end procedure procedure NTHANCESTOR(N ode,n) initialize AN ode to null c←0 while c < n do AN ode ← P arent (N ode) c ←c+1 end while return AN ode end procedure

5.2.1

Pruning ACE Tree

In this section, we describe how the ACE Tree is pruned. An example is provided for each of the scenarios where all the examples are based on the ACLs provided in Original row in Table 2, which are depicted as an ACE Tree in Fig. 2. Consider the following notations: – – – – –

L: denoting one of the leaves of the ACE Tree Si : denoting ith sibling of the leaf L Ci : denoting ith cousin of the leaf L An (L): denoting nth ancestor of the leaf L An (Ci ): denoting nth ancestor of cousin node Ci

150

V. Andalibi et al.

Destination

IPv4

TCP

[80, 43]

[any, any]

UDP

[800, 520]

IPv6

any

[800, 520]

UDP

[90, 120]

any

TCP

[400, 480]

UDP

[90, 120]

any

[400, 480]

Fig. 2 The tree structure containing the protocol information for a particular destination. The first children are the network protocols, the second children are the transport protocols, and the leaves are pairs of source and destination ports. The nodes colored as dark gray could be removed as a super-set of them already exists in the tree

Each leaf node L in the ACE Tree can be pruned if it satisfies one of the following conditions: – Consider L has a sibling Si and L ⊂ Si . In this case, L can be pruned with no further conditions. Example: Consider the leaf [80,43] and its sibling [any,any]. As can be seen, [80,43]⊂ [any,any], and hence [80,43] can be pruned. Note that the upward tree traversal stops without any outcome either when we reach to the root or when An (L) = An (Ci ). – Consider L has a cousin Ci where L ⊆ Ci , and we traverse the tree starting from both L and Ci simultaneously up toward the root of the tree. L then could be pruned if at any point during traversal An (L) becomes a sibling of An (Ci ) and An (L) ⊂ An (Ci ). Example: Consider the fifth leaf of the tree (counting from left to right) [90, 120] and its sixth cousin C6 = [90, 120] which is the seventh leaf of the tree. As can be seen, [90, 120]⊆[90, 120] indicates that the first condition holds, i.e., L ⊆ Ci . As we traverse the tree toward the root by visiting the ancestors of each of the two target nodes, we see that A2 (L), i.e., IPv6 node, is a sibling of A2 (Ci ), i.e., any node, and IPv6⊂ any indicating that A2 (L) ⊂ A2 (Ci ). Hence, the fifth leaf [90, 120] could be pruned. Another example in this case would be the third leaf [800, 520] and its third cousin C3 , i.e., the fourth leaf, [800, 520]. Since [800, 520]⊆[800, 520] and their first ancestors, i.e., parents, are siblings and A1 (L) ⊂ A1 (C3 ), therefore, the third leaf [800, 520] can be pruned. We illustrate the pruned version of ACE Tree of Fig. 2 in Fig. 3.

Analyzing MUD-Files Before Deployment

151

Fig. 3 Result of the protocol pruning

Destination IPv4

any

TCP

any

UDP

any

[any, any]

[800, 520]

[90, 120]

[400, 480]

6 Implementation We implemented MUD-Visualizer in JavaScript for two main reasons: the prevalent visualization libraries and enormous visualization capabilities in JavaScript and the possibility of creating both a stand-alone application and a web application with minimum changes to the codebase. The UML diagram of the MUD-Visualizer is presented in Fig. 4. As shown, the D3 library3 is a vital component to visualizing the MUD-Files in MUD-Visualizer. The main function of MUD-Visualizer is to provide the appropriate data to the D3 library. There are four main internal components of the MUD-Visualizer: MUD-File processor, visualization data generator, rendering engine, and stand-alone extension. The MUD-File Processor initially parses the MUD-Files and extracts the data needed for the identification of possible flows. This information includes the MUD-URL, manufacturer, incoming and outgoing ACE, existence of my-controller or controller nodes, and associated data defined for the seven optional fields in Sect. 2.2. Using this data, the connection between the instances of the MUD-Files is analyzed. This includes whether or not two nodes should connect, and if so, what are the assumptions about the protocols allowed between them. Note that the rules for each extension in Sect. 2.2 might be different and are kept separately for further analysis in the MUD-File Processor component. In the case of a my-controller node, the required promises [11] are also saved so that they can be fulfilled by the user later on. Finally, the protocols are stored and merged in a way to minimize the redundant information, as was previously described in the subsection entitled ACE Merging. Moreover, further information is requested from the user if needed, e.g., the configuration of my-controller nodes. This step of MUD-Visualizer requires that the developers explicitly recognize their assumptions about the user or contextual knowledge required for configuration. The visualization data generator converts the data extracted from MUD-Files into structures for the following visualization. These structures include nodes, links, and direction of the links as well as the corresponding protocols and rules that are assigned to the MUD-Files. For instance, a simple MUD-File describing an IoT device with a domain-name abstraction that is only allowed to communicate

3 https://d3js.org/.

152

V. Andalibi et al.

MUD-Visualizer Visualization Data Generator Drawable-Nodes

Drawable-Links

Node

Link

MUD-file Processor MUD-Network

MUD-file Parser

Rendering Engine

Standalone Extension Abstractions

ACE-Promise

Protocol-Set

Protocol

Electron App UI

D3

Web App UI

Fig. 4 UML Diagram of the MUD-Visualizer. The dotted line used for the Electron UI is merely for the two UIs, i.e., Web and Electron, to be distinguishable

with a remote server would generate several nodes and links including the IoT device and the remote server. In contrast, a device that should only communicate internally might seek to connect to a device that is constrained to only connect to a manufacturer. The concepts previously described in Methods section including building the ACE Tree in Sect. 5.2 and ACE Tree pruning in Sect. 5.2.1 are both implemented in the Abstractions module of this component. The rendering engine is the component that combines all data generated by the other components and creates the final visualization. This component has several responsibilities, including interfacing with MUD-File Processor, the visualization libraries (i.e., D3), and also the web app UI. If the application is running as a standalone app, it also communicates with the following component. The extension, called stand-alone extension, is not used for the MUDVisualizer web app. It enables a stand-alone version of the MUD-Visualizer. It consists of the components above, calls for those components, and the main script for the Electron UI application.

7 Results The UI of the MUD-Visualizer is shown in Fig. 5. This screenshot is consistent for both the stand-alone version and web app version of the tool. Given that the Electron framework also supports the DevTools, we used the Chrome Performance Analysis tool available as part of the DevTools for benchmarking the web application version of the MUD-Visualizer. We considered the

Analyzing MUD-Files Before Deployment

153

100

4

10

3

1

2

3.5

0.1

1

0.01

0

0.001

Number of Imported MUD-Files Painting

Rendering

Scripting

3162.2

3

1000

2.5

316.2

2

100

1.5

31.6

1

10

0.5

3.1

Memory [MB]

1000

5

Time [s]

6

Log10 (Memory) [MB]

Log10 (time) [ms]

Fig. 5 Screenshot of the UI of the MUD-visualizer

0

0

Number of Imported MUD-Files

Total

Memory

Fig. 6 Performance evaluation of MUD-Visualizer for first-time loading as well as importing 1 to 512 MUD-Files. Left: individual and total runtime for Painting, Rendering, and Scripting. Right: peak JavaScript heap memory usage. Both charts are presented in logarithmic scale with the primary axis indicating the logarithmic values and secondary axis indicating actual values in seconds and MegaBytes, respectively. The maximum runtime is for loading 512 MUD-File which is equal to 526 seconds and the corresponding memory usage is 1754 MB

time spent for Scripting, Rendering, and Painting as well as peak JavaScript heap memory usage. Our experiments were run on a MacBook Pro late 2013 computer with 2.6 GHz Quad-Core Intel Core i7, 16 GB 1600 MHz DDR3 RAM, and 2880x1800 pixels of screen space. The benchmark results are presented in Fig. 6. We evaluated the performance of MUD-Visualizer when loaded for the first time, i.e., indicated as Loading in the figure, as well as when we import MUD-Files. The number of MUD-Files that were used in the benchmark ranged from 1 to 512. To evaluate the scalability of MUDVisualizer, we used copies of a relatively heavy MUD-File created by mudmaker, which includes five out of seven implemented abstractions, i.e., all abstractions except my-controller and model. Please recall that my-controller requires end-user to manually enter data about their selected point of control. Had we included mycontroller the results would have been dominated by user response time. Examining user interaction is a component of our future work. Also, the model abstraction would result in all copies of the sample MUD-File to communicate with each other in a local network. Therefore, we decided not to include that to make the benchmarking more rigorous by letting the MUD-Visualizer process all other abstractions.

154

V. Andalibi et al.

Note that in a real-world setting, although the enterprises might have thousands of IoT devices in their networks, the type of devices in their network is significantly lower than that. For instance, a hospital that has 2k MUD-compliant smart bulbs in its network will have bulbs of a few brands and types. In this case, if a hospital system administrator tries to use MUD-Visualizer, they will not need to load 2k MUD-files but rather a handful of them. In other words, our benchmark shows the scalability of MUD-Visualizer with regard to the type of MUD-Files not the number of MUD-compliant devices in the network. The threat model and risk posture of enterprise will determine if new devices should be manually added, if MUD-Visualizer should interact with automated detection and identification, or if there should be some combination of these strategies. The runtime benchmark data in Fig. 6 indicates that the total time gets very close to scripting time as the number of MUD-Files increases. This is because for each MUD-File that is newly imported, its relation and interaction with other MUD-Files with respect to all MUD abstractions should be analyzed and processed. The maximum loading time is for 512 MUD-Files, which is slightly longer than 10 minutes, and the memory that the application needs exceeds slightly higher than 1754 MB. Be advised that this benchmark is performed rigorously and even enterprise networks barely have 512 types of MUD-Files in their network each being as heavy as the one we used in this benchmark. Moreover, verification of MUDFile using the MUD-Visualizer is not performed on short intervals and is done only when, for instance, a new device is introduced.

8 Conclusions and Future Work In this work, we described a tool for visualizing the interaction of MUD-Files which also explicitly identifies any information required by the use of the controller options. The challenge in visualizing the MUD-files is the way they interact with each other and how the ACL of the devices affects the communication between them. We presented methods for addressing this challenge and implemented MUDVisualizer and made it open-source and publicly available on GitHub. The main purpose of MUD-Visualizer is to facilitate the review and validation phase of MUD deployment for developers, engineers, and system administrators. We also performed a benchmark for runtime and memory consumption of the tool and showed that it can be used on a personal computer to load hundreds of MUD-Files for evaluation. Our future work includes several phases. The first phase is a user study in which we examine the practicality of the MUD-Visualizer and the extend to which it can help the target audience. Second, there are a few points in the tool that we are particularly interested to improve, including visualizing the controllers’ MUDFile, support for including network configuration in the visualization, e.g., the IP address of IoT devices and the corresponding controllers, DHCP configuration, etc. Third, we want to test the functionality that facilitates the local modification of the

Analyzing MUD-Files Before Deployment

155

MUD-Files without creating the opportunity for an attacker to move around these. Essentially, our goal is to allow decreased but not increased connectivity. Finally, we want to implement the abstraction analysis of MUD-Visualizer in parallel to improve the performance and scalability even more.

Availability MUD-Visualizer is made publicly available in GitHub at https://github.com/iotonboarding/mud-visualizer and can be used both as a stand-alone tool and as a tool integrated into web apps. Acknowledgments This research was supported in part by the National Science Foundation awards CNS 1565375 and CNS 1814518, as well as the grant #H8230-19-1-0310, Cisco Research Support, Google Research, and the Comcast Innovation Fund. Any opinions, findings, and conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, Cisco, Comcast, Google, nor Indiana University.

References 1. State of the IoT 2018: Number of IoT devices now at 7B – Market accelerating. [Online]. Available on: https://iot-analytics.com/state-of-the-iot-update-q1-q2-2018-number-of-iot-devicesnow-7b (2018) 2. Ring security camera hacks see homeowners subjected to racial abuse, ransom demands. [Online]. Available on: https://abcnews.go.com/US/ring-security-camera-hacks-homeownerssubjected-racial-abuse/story?id=67679790 (2019) 3. Afek, Y., Bremler-Barr, A., Hay, D., Goldschmidt, R., Shafir, L., Abraham, G., Shalev, A.: NFV-based IoT Security for Home Networks using MUD (2019). Preprint arXiv:1911.00253 4. Andalibi, V., Kim, D., Camp, L.J.: Throwing MUD into the FOG: Defending IoT and Fog by expanding MUD to Fog network. In: 2nd {USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 19) (2019) 5. Beckett, R., Gupta, A., Mahajan, R., Walker, D.: A general approach to network configuration verification. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, pp. 155–168 (2017) 6. Dodson, D., Polk, W., Souppaya, M., Barker, W., Lear, E., Weis, B., Fashina, Y., Grayeli, P., Klosterman, J., Mulugeta, B., et al.: Securing Small Business and Home Internet of Things (IoT) Devices: Mitigating Network-Based Attacks Using Manufacturer Usage Description (MUD). Technical Report, National Institute of Standards and Technology (2019) 7. D’Orazio, C.J., Choo, K.K.R., Yang, L.T.: Data exfiltration from Internet of Things devices: iOS devices as case studies. IEEE Int. Things J. 4(2), 524–535 (2016) 8. Fayaz, S.K., Sharma, T., Fogel, A., Mahajan, R., Millstein, T., Sekar, V., Varghese, G.: Efficient network reachability analysis using a succinct control plane representation. In: 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), pp. 217–232 (2016) 9. Feraudo, A., Yadav, P., Mortier, R., Bellavista, P., Crowcroft, J.: SoK: Beyond IoT MUD deployments–challenges and future directions (2020). Preprint arXiv:2004.08003

156

V. Andalibi et al.

10. Fogel, A., Fung, S., Pedrosa, L., Walraed-Sullivan, M., Govindan, R., Mahajan, R., Millstein, T.: A general approach to network configuration analysis. In: 12th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 15), pp. 469–483 (2015) 11. Friedman, D.P., Wise, D.S.: The impact of applicative programming on multiprocessing. Indiana University, Computer Science Department (1976) 12. García, S.N.M., Molina Zarca, A., Hernández-Ramos, J.L., Bernabé, J.B., Gómez, A.S.: Enforcing Behavioral Profiles through Software-Defined Networks in the Industrial Internet of Things. Appl. Sci. 9(21), 4576 (2019) 13. Gember-Jacobson, A., Viswanathan, R., Akella, A., Mahajan, R.: Fast control plane analysis using an abstract representation. In: Proceedings of the 2016 ACM SIGCOMM Conference, pp. 300–313 (2016) 14. Hamza, A., Gharakheili, H.H., Benson, T.A., Sivaraman, V.: Detecting volumetric attacks on IoT devices via SDN-based monitoring of MUD activity. In: Proceedings of the 2019 ACM Symposium on SDN Research, pp. 36–48 (2019) 15. Hamza, A., Gharakheili, H.H., Sivaraman, V.: Combining MUD policies with SDN for IoT intrusion detection. In: Proceedings of the 2018 Workshop on IoT Security and Privacy, pp. 1– 7 (2018) 16. Hamza, A., Ranathunga, D., Gharakheili, H.H., Benson, T.A., Roughan, M., Sivaraman, V.: Verifying and monitoring IoTs network behavior using MUD profiles (2019). Preprint arXiv:1902.02484 17. Hamza, A., Ranathunga, D., Gharakheili, H.H., Roughan, M., Sivaraman, V.: Clear as MUD: generating, validating and applying IoT behavioral profiles. In: Proceedings of the 2018 Workshop on IoT Security and Privacy, pp. 8–14. ACM, New York (2018) 18. Kolias, C., Kambourakis, G., Stavrou, A., Voas, J.: DDoS in the IoT: Mirai and other botnets. Computer 50(7), 80–84 (2017) 19. Kolomeets, M., Chechulin, A., Kotenko, I., Saenko, I.: Access control visualization using triangular matrices. In: 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 348–355 (2019). https://doi.org/10.1109/EMPDP. 2019.8671578 20. Landwehr, C.E., Bull, A.R., McDermott, J.P., Choi, W.S.: A taxonomy of computer program security flaws. ACM Comput. Surv. 26(3), 211–254 (1994) 21. Lear, E.: MUD Pretty Printer. [Online]. Available on: https://github.com/iot-onboarding/ mudpp (2020) 22. Lear, E., Droms, R., Romascanu, D.: Manufacturer Usage Description Specification. RFC 8520 (Mar 2019). https://doi.org/10.17487/RFC8520, https://rfc-editor.org/rfc/rfc8520.txt 23. Lear, E., Steck, C.S., Weis, B.: Secure modification of manufacturer usage description files based on device applications (Oct 17 2019), US Patent App. 15/954,875 24. Liginlal, D., Sim, I., Khansa, L.: How significant is human error as a cause of privacy breaches? an empirical study and a framework for error management. Comp. Secur. 28(3–4), 215–228 (2009) 25. Matheu, S.N., Robles Enciso, A., Molina Zarca, A., Garcia-Carrillo, D., Hernández-Ramos, J.L., Bernal Bernabe, J., Skarmeta, A.F.: Security Architecture for Defining and Enforcing Security Profiles in DLT/SDN-Based IoT Systems. Sensors 20(7), 1882 (2020) 26. Matthíasson, G., Giaretta, A., Dragoni, N.: IoT device profiling: From mud files to s× c contracts. Open Identity Summit 2020 (2020) 27. Maxion, R.A., Reeder, R.W.: Improving user-interface dependability through mitigation of human error. Int. J. Human-Comput. Stud. 63(1–2), 25–50 (2005) 28. Mazhar, N., Salleh, R., Zeeshan, M., Hameed, M.M.: Role of device identification and manufacturer usage description in IoT security: a survey. IEEE Access 9, 41757–41786 (2021) 29. Polk, W., Souppaya, M., Haag, W., Barker, W.: [Project Description] Mitigating IoT-based Distributed Denial of Service (DDOS). Technical Report, National Institute of Standards and Technology (2017)

Analyzing MUD-Files Before Deployment

157

30. Prabhu, S., Chou, K.Y., Kheradmand, A., Godfrey, B., Caesar, M.: Plankton: Scalable network configuration verification through model checking. In: 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pp. 953–967 (2020) 31. Pratt, C.: micronets Manufacturer Usage Description (MUD) tools. [Online]. Available on: https://github.com/cablelabs/micronets-mud-tools (2019) 32. Ranganathan, M.: Openflow SDN Manufacturer Usage Description (MUD) Server implementation on OpenDaylight Nitrogen Release. [Online]. Available on: https://github.com/ usnistgov/nist-mud (2018) 33. Ranganathan, M., Montgomery, D., El Mimouni, O.: Soft MUD: Implementing Manufacturer Usage Descriptions on OpenFlow SDN Switches. In: ICN 2019, The Eighteenth International Conference on Networks. ThinkMind, Valencia (2019) https://tsapps.nist.gov/publication/get_ pdf.cfm?pub_id=927289 34. Reeder, R.W., Bauer, L., Cranor, L.F., Reiter, M.K., Bacon, K., How, K., Strong, H.: Expandable grids for visualizing and authoring computer security policies. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1473–1482 (2008) 35. Reeder, R.W., Maxion, R.A.: User interface dependability through goal-error prevention. In: 2005 International Conference on Dependable Systems and Networks (DSN’05), pp. 60–69. IEEE, Piscataway (2005) 36. Roesner, F., Kohno, T., Moshchuk, A., Parno, B., Wang, H.J., Cowan, C.: User-driven access control: Rethinking permission granting in modern operating systems. In: 2012 IEEE Symposium on Security and Privacy, pp. 224–238. IEEE, Piscataway (2012) 37. Salim, F., Reid, J., Dawson, E., Dulleck, U.: An approach to access control under uncertainty. In: 2011 Sixth International Conference on Availability, Reliability and Security, pp. 1–8. IEEE, Piscataway (2011) 38. Schutijser, C.: Towards automated DDoS abuse protection using MUD device profiles. Master’s Thesis, University of Twente (2018) 39. Smetters, D.K., Good, N.: How users use access control. In: Proceedings of the 5th Symposium on Usable Privacy and Security, pp. 1–12 (2009) 40. Tahaei, M., Vaniea, K.: “developers are responsible”: What ad networks tell developers about privacy. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI’21 Extended Abstracts), pp. 1–12 (2021) 41. Vaniea, K., Karat, C.M., Gross, J.B., Karat, J., Brodie, C.: Evaluating assistance of natural language policy authoring. In: Proceedings of the 4th Symposium on Usable Privacy and Security, pp. 65–73 (2008) 42. Wang, M.: Accessible Access Control: a Visualization System for Access Control Policy Management. Michigan Technological University (2019) 43. Watrobski, P.: A tool for characterizing the network behavior of IoT devices. [Online]. Available on: https://github.com/usnistgov/MUD-PD (2019) 44. Weis, B.: MUD-Manager Version 3.0. [Online]. Available on: https://github.com/CiscoDevNet/ MUD-Manager (2018) 45. Xu, T., Naing, H.M., Lu, L., Zhou, Y.: How do system administrators resolve access-denied issues in the real world? In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 348–361 (2017) 46. Yaqoob, I., Ahmed, E., ur Rehman, M.H., Ahmed, A.I.A., Al-garadi, M.A., Imran, M., Guizani, M.: The rise of ransomware and emerging security challenges in the Internet of Things. Comput. Netw. 129, 444–458 (2017) 47. Yeich, K.: osMUD—Open Source MUD Manager. [Online]. Available on: https://github.com/ osmud/osmud (2019)

Natural Scenes’ Text Detection and Recognition Using CNN and Pytesseract Ashee Mahajan, Anand Nayyar, Rachna Jain, and Preeti Nagrath

1 Introduction Over the past few decades, computer vision gained tremendous popularity. A huge number of development processes are going on to make it as perfect as human vision [1]. With increasing interests, the research in this field is constantly increasing. One of the most researched topics in the computer vision field is image-based sequence recognition having the challenging task of natural scene text recognition. Reading and analyzing textual information in the natural scene images requires the text to be detected first which is of not much difficulty for typed documents. But when we talk about text detection in natural scenes, the task becomes quite difficult, the reasons of which are explained below: • Deformities: In the real word the texts are printed on various objects which may or may not be planar. For example, text printed on some curved surface (say a juice bottle) can be easily read and interpreted by humans but for a machine it is complex to read this sort of text. A Character-Aware Neural Network (Char-Net) for recognizing distorted scene text is presented in [2] by Wei Liu.

A. Mahajan () · P. Nagrath Bharati Vidyapeeth College of Engineering, New Delhi, India e-mail: [email protected] A. Nayyar School of Computer Science, Duy Tan University, Da Nang, Vietnam e-mail: [email protected] R. Jain Bhagwan Parshuram Institute of Technology, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_10

159

160

A. Mahajan et al.

• Blur Images: The pictures taken from smartphones lacking stabilization result in blur images which make the task challenging. • Resolution: The image resolution also varies widely from one camera to another. • Image noise: Traditional scanners have very less noise as compared to the digital cameras. Also, in case of cheap cameras, they have raw sensors which tend to interpolate pixels in order to obtain real colors in an image. • Orientation of the text: The text in the captured images has arbitrary orientation because of pictures taken from different angles. • Unknown layout: Layout of text in each text cannot be defined in advance. Text may be present in any position in natural scene images [3]. • Lighting conditions and reflective surfaces: The environmental conditions and the lighting in natural scene images are uncontrollable. It may be partially dark or saturated due to sunlight and the non-paper surfaces containing text may be reflective. Now, the most crucial step for text detection is designing of features which will help the model to differentiate between text and its background. Conventionally, these features were designed manually [4–6]. However, in deep learning [7, 8], the selection of these features is decided by the neural network itself based on its learning over the training data. Text recognition using recurrent neural networks (RNN) needs preprocessing of data which will convert image data to sequence of image features [9]. These proposed works based on neural networks can perform well but contain several complex stages including various preprocessing and post-processing steps. The objective of this methodology is to perform text recognition process for natural scene images efficiently with reduced preprocessing and post-processing stages. • The proposed method makes use of a fast detector and recognizes text accurately with simple architecture. • The various intermediate steps like candidate aggregation and word segmentation are eliminated. • Image data is fed to a fully convolutional network, output of which is given to the Non-Maximum Suppression (NMS) unit for accurate detection. • The region of interest (ROI) of the image from the detector is passed to the Pytesseract for text recognition. The various related works carried out in this field are discussed in Sect. 2. Followed by which, Sect. 3, describes the methodology used in the detection stage, EAST (Efficient and Accurate Scene Text) detection model [10] and recognition stage, tesseract. The remaining sections discuss the implementation details along with the results and evaluations. At last, there is a discussion about the further research work that can be carried out followed by conclusion.

Natural Scenes’ Text Detection and Recognition Using CNN and Pytesseract

161

2 Related Work Various innovative ideas have been proposed from time to time for scene text detection and recognition. There are various conventional approaches for text recognition which manually design the features. B. Epshtein’s work [4] makes use of Stroke Width transform (SWT). Other methods like Maximally Stable Extremal Region (MSER) [6, 11] have also been used. These methods work on the character candidate detection using edge detection. FAST key point detector for extraction [11] is a fast text detection system. Zhang et al. [12] used text’s local symmetry property to design various features which helped in detection of text region. But all these methods proved to be less efficient in front of deep neural networks where the network itself learns over the time to design features by its own. A huge number of work ideas using deep neural networks have also been proposed in recent years. The deep neural network-based algorithms [8, 13–16] for text recognition came into picture. Huang et al. [14] determined character candidates using MSER and after that applied deep neural networks for classification and pruning of false positives. Jaderberg et al. [8] produced dense heatmap at each scale of image obtained by window slide fashioned scanning using convolutional networks. Tian et al. [17] detected horizontal text lines using CNN-RNN joint model. Thomas Deselaers et al. [18] proposed an Optical Character Recognition (OCR) system which detects handwritten characters and converts it into digital text. Apurva A. Desai et al. [19] used an artificial neural network (ANN) to recognize handwritten Gujarati digits. M. Liao et al. [20] proposed a method to directly predict the text in a two-dimensional space. A multi-object rectification network and an attention-based sequence recognition network for general scene text recognition is proposed [21] by C. Luo. Utilization of Fully Convolutional Network (FCN) for obtaining heatmap [22] and estimation of orientation with the help of component projection was proposed by Zhang et al. [16]. All these methods performed well but are having multiple preprocessing and post-processing stages, e.g., candidate aggregation, word partitioning, and post filtering for eliminating false positives. Due to all these stages time taken for processing of complete pipeline is more, also complexity of architecture increases and tuning needs to be done carefully which may take a lot of time. A simple and fast methodology with good efficiency is proposed in this paper which uses a two-stage pipeline for text detection and feeds ROIs to tesseract for recognition.

3 Methodology The working of recognition process starts with the very first stage of feeding input to the pipeline that will detect the ROI. This pipeline has a deep convolutional network which selects the image feature and gives dense per pixel prediction for a given

162

A. Mahajan et al.

Fig. 1 Block diagram of different stages involved in scene text recognition

image. Then non-maximum suppression (NMS) is applied as a post-processing step. The output of the EAST detector gives a ROI which is fed to the recognition unit. A basic block diagram of various stages involved is shown in the Fig. 1.

3.1 The Detection Stage The EAST detection model is used to detect the text present in image. It is a simple and fast text detector which detects text accurately. A comparative analysis of the three different text detection algorithms, EAST, SWT, and Tesseract, is done in [23] by Olsson. Pipeline and network design The general view of the pipeline is shown in Fig. 2. The model uses FCN (Fully Convolutional Network) and multiple channels are predicted which contains score maps with pixel values in range [0,1]. The score essentially reveals to us the certainty of the geometry shape predicted at a specific area. The use of FCN eliminates various intermediate stages, also post-processing only includes thresholding and NMS on the predicted shapes. While detecting text using neural networks, a few factors must be taken into account. In sequence-based recognition length of the text is not fixed; therefore for text having long length the prediction requires features from late stages and for small words prediction of geometrical shapes enclosing them needs feature from the early stage. For fulfilling these requirements, the network must be able to select features from different stages according to the length of the word. This task can be accomplished with the help of HyperNet [24]. This has a limitation, merging large numbers of channels on a feature map gives rise to computation overhead and to avoid this concept of U-shape [25] is used in which gradually merge the feature maps.

Natural Scenes’ Text Detection and Recognition Using CNN and Pytesseract

163

Fig. 2 Structure of text detection FCN [10]

There are four levels of feature maps (fi) which are extracted from the stem. Now these are passed to merging branch as shown in Fig. 2. In the feature merging branch these are gradually merged as per the equations: 

unpool (hi ) , i ≤ 3 conv3×3 , i > 4

(1)

fi , i = 1 conv3×3 (conv1×1 ([mi−1 ; fi ])) , otherwise

(2)

mi =  hi =

Here mi is merge base and hi is merged feature map. The output layer contains one channel of score map Fs and multi-channel geometry map. The geometry output can be either RBOX or QUAD.

164

A. Mahajan et al.

Fig. 3 The left image depicts all the bounding boxes that were detected by the model and the image on the right shows the result after non-maximum suppression was done. (Image from ICDAR-2015 dataset)

NMS is the post-processing step in which after applying a threshold to the geometries from output of the neural network, all the geometries which are enclosing for the same ROI are merged, i.e., it ensures that a particular text is detected only once. Figure 3 shows what happens when an image undergoes non-max suppression. The algorithm for non-max suppression. Bnms ← θ for bi B do discard ← False for bi B do if same (bi , bj ) > λnms then if score (c, bi ) > score (c, bi ) then discard ← True if not discard then Bnms ← Bnms ∪ bi return Bnms

Explanation of Algorithm 1. Bnms is the list of proposals after filtering. It is initially empty. 2. B is the list of output proposals from detector having confidence score as S. Set overlap threshold N. 3. From B the proposal with highest S value (bi ) is popped and stored in Bnms . 4. Now the Intersection Over Union (IOU) of the added element in Bnms and each element of B is calculated and is compared with N value.

Natural Scenes’ Text Detection and Recognition Using CNN and Pytesseract

165

5. If IOU value is less than N value, then bi is kept in Bnms , else it is discarded. 6. Steps 3, 4, and 5 are repeated until B is empty. 7. Obtained Bnms is the final output of NMS stage.

3.2 Recognition Stage For recognition of the text extracted from natural scene images, tesseract which is an open-source optical character engine is used. The ROI generated from the detector stage is fed to the tesseract for text detection, although tesseract also has a built-in text detector but is having less efficiency; therefore a separate detection using EAST is done. The recognition of text is done in a two-pass process. The first pass tries to recognize each word from the text. Each satisfactory word is fed to an adaptive classifier which acts as the training data for it. Now, as the classifier goes on learning by recognizing the text, it is quite possible that the recognitions done at the initial stages of training are not accurate. Therefore, it is fed to a second pass in order to accurately recognize those words. The architecture and description of the recognition stages of tesseract are explained in [26].

4 Experiments 4.1 Datasets We are using the pretrained EAST Text Detection model to detect the ROIs in the image. It has been trained and tested on four benchmark datasets—COCO-Text dataset, MSRA-TD500, Street View Testing (SVT) dataset, and ICDAR-2015. The model has been able to do better than the state-of-the-art results. The model can perform at 13.2 fps at resolution of 720p. COCO-Text dataset was used in the training of the EAST model [27]. This is the largest dataset that was used. There are about 63,000 images. It is based on the MSCOCO dataset. The dataset consists of 44,000 images for training purposes and the rest 20,000 images for testing. MSRA-TD500 dataset was also for the training purpose of the EAST model [28]. This dataset fulfills the need to train the model on various languages, not limiting it to just English. The dataset introduces Chinese language to the model, hence broadening the scope. The dataset comprehends a total of 500 images only. ICDAR 2015 [29] dataset is taken from the fourth challenge of the widely known robust reading competition. A total of 1500 images are provided out of which 1000 images are for training and the rest are for the purpose of testing. Images in the dataset suffer from various problems including motion blur and less

166

A. Mahajan et al.

resolution. We are also provided with ground truth tables in format of .txt files for each image which includes the coordinates of the bounding box and the word contained in it. Street View Testing (SVT) Dataset [30] has high variability in scenes and contains low resolution making it a perfect fit for testing the model. This dataset has wordlevel annotations. These annotations are provided in XML in a format which is like the one provided in ICDAR-2003 competition dataset.

4.2 Implementation Details We are using the Before testing the datasets, preprocessing is done. All the excess data in ground truth tables provided was cleaned wherein information of bounding boxes having null values were removed. The output from our model was saved in the same format of ground truth tables, followed by which the F1 was determined using confusion matrix. The model performs in the following fashion. It first takes in the input of the image to be tested. Various parameters—minimum confidence, padding, height, and width—can be provided in addition to the image details to enhance the output results. The pretrained EAST text detection model is implemented and all the calculations for resizing the image are done. Next the geometry of all the possible ROIs are passed through non-max suppression. This step eliminates all the unnecessary ROIs and then passes them to the tesseract library function where the text is recognized. The three important flags supplied are -l for language, –oem OCR Engine Mode and –psm which controls Page Segmentation Mode. Here English is taken as standard language, the deep learning Long Short-Term Memory (LSTM) engine only for oem and the two psm’s that worked as the best fit were when a single uniform block of text was assumed and when the image was treated as a single text line. Next, the ROI in the form of red rectangles and output text is printed on the image and the result is shown.

4.3 Results and Evaluations Figure 4 delineates some images wherein the model outputted correctly. Followed by which Fig. 5 depicts images wherein the output was partially or wholly wrong. The images are taken from both STV dataset and ICDAR-2015 dataset. The images from the two datasets are used to form a dataframe containing the information about bounding boxes and the word enclosed in it that is detected in a particular image. In a loop all the images processed while storing their results in the dataframe formed. After undertaking the processing of all images in both the datasets, they are correlated to the already provided information about these images.

Natural Scenes’ Text Detection and Recognition Using CNN and Pytesseract

167

Fig. 4 Correctly identified text from images of ICDAR-2015 dataset

Fig. 5 Images with wrong output from SVT dataset

Then, using these values, precision, recall, and f score are calculated. Precision (p) and recall (r) are dependent on values of true positives (TP), false positives (FP), and false negatives (FN). The F1 score is calculated as the harmonic mean of p and

168

A. Mahajan et al.

Table 1 Performances of other similar works done on scene text recognition

Approach Jaderberg et al. 2015 [31] Jaderberg et al. 2016 [15] Shi et al. 2016 [32] Shi et al. 2016 [33] Cheng et al. 2018 [34] Liu et al. 2016 [35] Liu et al. 2018 [2] Cheng et al. 2017 [36] Shi et al. 2018 [37] Zhan et al. 2019 [3] Baek et al. 2019 [38] OURS

SVT 71.7 80.7 80.8 81.9 82.8 83.6 84.4 85.9 89.5 90.2 – 90.5

ICDAR-2015 – – – – 68.2 – 60 70.6 76.1 76.9 71.8 70.17

r. Table 1 represents the performance of our model in comparison with other similar models which have worked on SVT dataset and ICDAR-2015 dataset.

p=

TP TP + FP

(3)

r=

TP TP + FN

(4)

2×p×r p+r

(5)

F1 =

5 Conclusion and Future Work A simple, fast, and accurate methodology is proposed for text recognition which uses an EAST model based on FCN followed by NMS. The ROI is then fed to an OCR engine, tesseract which recognizes the text extracted from natural scene images. The model is tested over SVT dataset with an F-score of 90.5 and ICDAR2015 with an F-score of 70.17. This architecture for text detection and recognition can detect and recognize text in natural scene images accurately which is a quite challenging task. Although the proposed work is only limited to the still images and is not compatible with the video text recognition for real-time applications. The future scope of this work includes:

Natural Scenes’ Text Detection and Recognition Using CNN and Pytesseract

169

• To create a language detector which can be further extended to translation of recognized words of the desired language. It will be helpful for a person to read various sign boards written in some foreign language. • This work can further be extended to object detection. • To build smart glasses for visually impaired users making them more selfdependent. • To integrate the model with the smart voice assistant which may read and translate the text from one language to another.

References 1. AlSaid, H., AlKhatib, L., AlOraidh, A., AlHaidar, S., Bashar, A.: Deep learning assisted smart glasses as educational aid for visually challenged students. In: 2019 2nd International Conference on New Trends in Computing Sciences (ICTCS), pp. 1–6. IEEE, New York (2019) 2. Liu, W., Chen, C., Wong, K.Y.K.: Char-Net: a character-aware neural network for distorted scene text recognition. AAAI. 1(2), 4 (2018) 3. Zhan, F., Lu, S.: Esir: end-to-end scene text recognition via iterative image rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2059– 2068 (2019) 4. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE, New York (2010) 5. Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1241–1248 (2013) 6. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision, pp. 770–783. Springer, Berlin (2010) 7. Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., et al.: Text detection and character recognition in scene images with unsupervised feature learning. In: 2011 International Conference on Document Analysis and Recognition, pp. 440–445. IEEE, New York (2011) 8. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: European Conference on Computer Vision, pp. 512–528. Springer, Cham (2014) 9. Su, B., Lu, S.: Accurate scene text recognition based on recurrent neural network. In: Asian Conference on Computer Vision, pp. 35–48. Springer, Cham (2014) 10. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017) 11. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3538–3545. IEEE, New York (2012) 12. Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015) 13. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016) 14. Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced mser trees. In: European Conference on Computer Vision, pp. 497–511. Springer, Cham (2014)

170

A. Mahajan et al.

15. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016) 16. Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4159–4167 (2016) 17. Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: European Conference on Computer Vision, pp. 56–72. Springer, Cham (2016) 18. Deselaers, T., Gass, T., Heigold, G., Ney, H.: Latent log-linear models for handwritten digit classification. IEEE Trans. Pattern Anal. Mach. Intell. 34(6), 1105–1117 (2011) 19. Desai, A.A.: Gujarati handwritten numeral optical character reorganization through neural network. Pattern Recogn. 43(7), 2582–2589 (2010) 20. Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., et al.: Scene text recognition from twodimensional perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8714–8721 (2019) 21. Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019) 22. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 23. Olsson, O., Eriksson, M.: Automated system tests with image recognition: focused on text detection and recognition (2019) 24. Kong, T., Yao, A., Chen, Y., Sun, F.: Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 845–853 (2016) 25. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 234–241. Springer, Cham (2015) 26. Bisiach, J., Zabkar, M.: Evaluating methods for optical character recognition on a mobile platform: comparing standard computer vision techniques with deep learning in the context of scanning prescription medicine labels, Thesis (2020) 27. Veit, A., Matera, T., Neumann, L., Matas, J., Belongie, S.: Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140. (2016) 28. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083– 1090. IEEE, New York (2012) 29. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE, New York (2015) 30. Wang, K., Belongie, S.: Word spotting in the wild. In: European Conference on Computer Vision, pp. 591–604. Springer, Berlin (2010) 31. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903. (2014) 32. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016) 33. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016) 34. Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: Aon: towards arbitrarily oriented text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5571–5579 (2018) 35. Liu, W., Chen, C., Wong, K.Y.K., Su, Z., Han, J.: STAR-Net: a SpaTial attention residue network for scene text recognition. BMVC. 2, 7 (2016)

Natural Scenes’ Text Detection and Recognition Using CNN and Pytesseract

171

36. Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084 (2017) 37. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035– 2048 (2018) 38. Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., et al.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4715–4723 (2019)

Assessing the Resistance of Internet of Things Applications Against Memory Corruption Attacks: A Case Study for Contiki and Tizen Mohammad Basiri and Maryam Mouzarani

1 Introduction One of the important IT technologies, which have captured rapt attention, is the Internet of Things or IoT. In this technology, everything is connected to the Internet to exchange information and operate more optimally. Some of the applications of IoT in everyday life include smart homes, smart cities, and smart factories. One of the substantially important concerns in IoT is ensuring its security, which is a major challenge in many ways. For example, the security mechanisms typically used in ordinary digital devices cannot be applied to IoT devices due to the limitations of hardware resources. Also, IoT devices usually have considerable time constraints and have to operate in real time. The implementation of common security mechanisms can decelerate systems and increase their response time. Most studies on the security of IoT are focused on the security of its networks, such as [1–3]. Some studies revolve around the security of operating systems and detection of vulnerabilities in the source code of IoT operating systems, such as [4, 5], while the security of applications in these operating systems is considered less. Given the embedded structure of IoT operating systems and the low-level access requirements in IoT applications, these applications are usually developed with C and C++. According to the characteristics of these languages, memory corruption vulnerabilities may arise if the programmer pays little attention to security. In this paper, we study how memory corruption vulnerabilities appear in applications of two popular IoT operating systems, namely Contiki and Tizen, and if it is possible to exploit them in these systems.

M. Basiri · M. Mouzarani () Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_11

173

174

M. Basiri and M. Mouzarani

Contiki is a lightweight open-source operating system, written in C programming language, for the sensor nodes of wireless sensor networks. Numerous sensor systems such as Tmote Sky, TelosB, MCU AVR, and MCU MSP430 support Contiki. Contiki has a virtual machine called Instant Contiki, which provides an Ubuntu-based environment for Contiki programmers. This operating system includes useful tools, such as the Cooja simulator that helps to simulate the nodes of wireless sensor networks. In this study, we have used Contiki version 2.7. Tizen, powered by Samsung and Intel [6], is an operating system that supports various platforms such as mobile phones, smartwatches, and smart TVs, and is potentially capable of supporting other platforms [7]. In this operating system, developers create applications in one of the three possible categories: Native (with C and C ++), Web (with HTML, JavaScript, and CSS), and Hybrid (a combination of Native and Web). An integrated development environment is also provided, called Tizen Studio, that presents numerous facilities for developing Tizen applications, such as static analysers and a run-time debugger. In this study, we have used Tizen studio and have developed native vulnerable applications for Tizen version 5.5. We inject five classes of memory corruption vulnerabilities in sample programs and execute them in Contiki and Tizen operating systems. These vulnerability classes include stack-based buffer overflow, heap-based buffer overflow, buffer overread, format string, and use after free. Then, we attempt to exploit the vulnerabilities and analyse the strength of the compiler or operating system in defending against these attacks. We show that it is possible to exploit all these vulnerability classes in both operating systems. We will show that these operating systems are not immune to such vulnerabilities and the developers should be aware of the secure coding concept while developing applications and firmwares for these operating systems. This paper consists of the following sections: in Sect. 2, we review the related works. The vulnerable applications and their exploitation in Contiki and Tizen are presented in Sects. 3 and 4, respectively. Section 5 concludes the paper and Sect. 6 suggests some future works.

2 Related Works A limited number of studies have been conducted on the security of applications in the Internet of Things. In [8], the authors analyse some reported control-flow hijacking vulnerabilities in different IoT firmware, according to various metrics such as OS architecture, CVE number, vulnerability risk, and impact. By this analysis, they notify the research community of the importance of developing robust defences for protecting IoT software. They argue that due to IoT developers’ lack of software security knowledge, the diverse hardware architecture of devices, and the limitation of resources, many vulnerabilities occur in IoT software. Moreover, the protection mechanisms that exist in IoT operating systems are usually weak to resist related

Assessing the Resistance of Internet of Things Applications Against Memory. . .

175

attacks. However, they don’t analyse the strength of security mechanisms in a particular operating system in depth. The authors of [9] examine the resistance of Free RTOS operating system against buffer overflow attacks. They demonstrate that memory protection mechanisms in this system are not activated by default. Developers of Free RTOS suggest enabling these mechanisms only during the development and test phases of applications. Even if the memory protection mechanisms of this operating system are enabled, they have some weaknesses that make them unable to protect the system properly. The authors managed to successfully bypass the buffer overflow prevention mechanisms in Free RTOS and exploit a specific stack-based buffer overflow vulnerability. In this article, we perform a similar job for analysing the memory protection mechanisms in Tizen and Contiki operating systems. In [4], the source code of three popular IoT operating systems, namely openWSN, Contiki, and TinyOS, are analysed statically. The researchers have identified over 2800 insecure function calls in the C source code of these operating systems. The Contiki operating system ranked one with over 1800 insecure function calls, which showed that this system could be prone to several attacks. The security mechanisms of these operating systems for preventing memory corruption attacks are not studied in [4]. In [6] the structure of Tizen is described and two main security issues in version 2.2 of this operating system are mentioned. These issues arise in the OS memory protection mechanism and stock Tizen browser. The authors demonstrate that the DEP [10] mechanism does not exist in Tizen and also the ASLR [10] mechanism is not enabled by default. We show that these mechanisms are implemented and enabled by default in Tizen 5.5, but it is possible to bypass them. In this study, we develop intentionally vulnerable applications for Tizen and Contiki operating systems to analyse how the vulnerabilities arise in such applications and if the operating systems prevent exploiting the vulnerabilities. We focus on five classes of memory corruption vulnerabilities and managed to exploit all of them in both operating systems.

3 Security Assessment of Contiki OS In order to execute vulnerable programs in Contiki, we have simulated Instant Contiki in Virtual Box. We injected vulnerabilities in the Hello World example code at ~/contiki-2.7/examples/hello-world. The following sections present the injected codes for each of the vulnerability classes. Because of the space limitation, the entire code is not presented and only vulnerable parts are mentioned. We compiled each vulnerable code for two platforms: Native and Tmote Sky. Native platform translates the codes into x86 assembly and is used to test the program in a terminal environment. Tmote Sky is one of the commonly used platforms for wireless sensor networks. The codes compiled for this platform are

176

M. Basiri and M. Mouzarani

simulated using the Cooja simulator. We used the “make” command with its default configuration to compile all the codes and did not disable any security mechanism by ourselves.

3.1 Stack-Based BOF In Code 1, a function called overflow() is defined that copies the input argument into a 4-byte buffer using strcpy() function. This function does not consider the destination buffer limits and carries out the copy operations byte after byte. Therefore, if overflow() is called with an input longer than 4 bytes, the buffer overflow occurs. Code 1 Stack-based buffer overflow vulnerability in a Contiki application void overflow(char *inbuf) { char buf[4]; strcpy(buf, inbuf); } PROCESS_THREAD(hello_world_process, ev, data) { overflow(“hello”); }

First, we compiled this code for the Native platform. We analysed the security mechanisms of the generated binary file using GDB debugger and Peda open-source tools with the checksec command and found out that DEP, Canary [10], and Fortify [11] security mechanisms were activated. After executing the code, it ended with a “buffer overflow detected” error. By analysing the assembly code of the program, we realized that the strcpy() function was replaced with __strcpy_chk() function as shown in Fig. 1. This replacement is performed by the Fortify mechanism to check the buffer limits and detect overflows. There was also a __stack_chk_fail() function at the end of each function in the assembly code as shown in Fig. 1. This function detects modification of the stack canary. We replaced the strcpy() function in the vulnerable program with an equivalent loop. This time the program execution ended with a “stack smashing detected” error, due to modification of the canary. After examining the addresses of the libraries, we also found that ASLR was activated in this platform as shown in Fig. 2. Next, we compiled the program for the Tmote Sky platform and examined the security status of the generated code once again using the checksec command. This time, it showed that only the DEP mechanism was activated. Afterward, we executed the program using the Cooja simulator. The execution stopped because of an invalid instruction error, without detecting the overflow. Therefore, we attempted to perform a buffer overflow attack. To demonstrate the attack, we changed the input string of the overflow() function into “AAAABBBB”. We debugged the program in the simulator using the Msp CLI and Msp Code Watcher tools. By putting a breakpoint at the beginning of the overflow() function and using stack and mem commands, the stack frame was

Assessing the Resistance of Internet of Things Applications Against Memory. . .

177

Fig. 1 The assembly source of the compiled program in the Contiki Native platform

Fig. 2 Analysis of ASLR in Native platform, parts (a) and (b) show the addresses of libraries are different in two executions

displayed. The return address was in the first two bytes of the stack. By taking steps into the end of the function, we re-examined the stack. As Fig. 3 shows, the next instruction address to be executed was changed to 0x4242 (the equivalent of BB). In other words, buffer buf was filled with “AAAA” and the return address was overwritten with “BB”. In the next step, we found the address of puts() function in the Libc library using GDB and changed the input string as follows: four “A”s followed by the address of puts() found in reverse (because the Sky architecture is Little Endian). After

178

M. Basiri and M. Mouzarani

Fig. 3 Performing stack-based BOF attack on the Tmote Sky platform

Fig. 4 Result of Return-to-Libc attack in the Tmote Sky. The right part of figure shows that the control flow is hijacked into the start of puts function

running the program with the new input, we successfully hijacked the control flow and executed puts() function as shown in Fig. 4. Therefore, it is concluded that there is not any security mechanism in Sky for buffer overflow detection and prevention. Even if we overwrite the return address with an address within the stack, it tries to execute the address content as shown in Fig. 5. Therefore, the DEP mechanism is also not activated and this platform does not contribute to the prevention of buffer overflow.

Assessing the Resistance of Internet of Things Applications Against Memory. . .

179

Fig. 5 Bypassing DEP mechanism in Tmote Sky platform. In the left part of figure, current stack pointer value is retrieved by the “stack” command. Then the program execution is hijacked to the same address in the stack (38c2)

3.2 Heap-Based BOF In Code 2, a linked list is created with two nodes: head and tail. The data structure of the nodes consists of a 10-byte data buffer and a 4-byte pointer to the next buffer. The data in node tail is set to “BBB” and the data buffer of node head is overflowed by assigning 12 “A”s (“AAAAAAAAAAAA”) to it. In the last line of this code, the buffer head is used to access the buffer tail and print its data. It is expected that overflowing the data buffer of node head results in overwriting its next pointer with some “A”s and thus the next node in this linked list would be inaccessible. Code 2 Heap-based buffer overflow vulnerability in a Contiki application PROCESS_THREAD(hello_world_process, ev, data) { struct node *head = (struct node *)malloc(sizeof (struct node *)), *tail = (struct node *)malloc(sizeof(struct node *)); head->next = tail; tail->next = NULL; strcpy(head->data, “AAAAAAAAAAAA”); strcpy(tail->data, “BBB”); printf(“%s\n”, head->next->data); }

We executed this code on the Native platform and received an error indicating buffer overflow detection. However, when we executed the code in the Tmote Sky platform, the data in address 0x4141 (equivalent to AA) was printed. Therefore, exploiting this vulnerability is possible in this platform.

180

M. Basiri and M. Mouzarani

3.3 Buffer Overread In Code 3, first local buffers str1 and str2 with different data values are defined. Then, the first buffer is read in a loop with a wrong iteration number. Therefore, the number of characters read from str1 is higher than the buffer limit. Code 3 Buffer overread vulnerability in a Contiki application PROCESS_THREAD(hello_world_process, ev, data) { char str1[3] = “hi”, str2[3] = “AA”; int i; for (i = 0; i < 10; i++) putchar(str1[i]); putchar(’\n’); }

By executing the code in the Native platform, the buffer was overread and a string that started with “hiAA” was printed. This means that it is possible to read other local buffers in this vulnerability. When executing the program in the Tmote Sky platform, the output string only contained “hi” not “AA”. We changed the order of buffer definitions and defined str2 before str1 in the code. When we executed the revised code, the output string was started with “hiAA”. This is because the stack grows to the lower addresses in the Tmote Sky platform, while it grows in the reverse order in the Native platform. In this example, the second buffer contained an arbitrary string, while in real-world programs the content of other local buffers may include confidential information such as a password or an encryption key. If the attacker has control over the number of read bytes in the buffer, this information would be disclosed by exploiting this vulnerability.

3.4 Format String In Code 4, the format string “%x” is stored in buffer str1 and is passed as the only argument to printf() function. This string can be received from user input. Code 4 Format string vulnerability in a Contiki application PROCESS_THREAD(hello_world_process, ev, data) { char str1[20] = “%x\n”; char str2[25] = “AAA”; printf(str1); }

After executing the code in the Native platform, the stack content was displayed as a hexadecimal number. We changed the content of str1 into seven “%x”s and extracted the value of local buffer str2 as 414141 in hexadecimal. In Tmote Sky

Assessing the Resistance of Internet of Things Applications Against Memory. . .

181

platform, the output of this code was 4141 as its architecture is 16-bit. Thus, it is possible to exploit format string vulnerability in both platforms.

3.5 Use After Free In Code 5, use after free occurs by allocating and then freeing str1 buffer. After this, str2 is allocated and the string “hello” is copied into it. Then, the content of str1 is displayed in printf() function to simulate the use of a freed pointer mistakenly. Code 5 Use after free vulnerability in a Contiki application PROCESS_THREAD(hello_world_process, ev, data) { char *str1 = (char *)malloc(100); free(str1); char *str2 = (char *)malloc(100); strcpy(str2, “hello”); printf(“%s\n”, str1); }

After executing the code on both platforms, the string “hello” was displayed in the output. This is because after freeing str1, the same address was allocated to str2. Therefore, by printing the buffer with the address of str1, we printed the content of the str2 buffer. In real-world programs, if the attacker has access to the content of the second allocated buffer, he may be able to perform malicious activities such as hijacking the control flow. This example shows that this vulnerability is exploitable on both platforms.

4 Security Assessment of Tizen OS To study the five memory corruption vulnerability classes in the Tizen platform, we created five C++ programs and compiled them for Samsung smartwatches. These programs consist of a layout containing a label, a text field, and a button. We created the vulnerability classes in an event listener function of the button, called OnButtonPressed(). The following sections present the OnButtonPressed() function of created vulnerable codes. Due to the lack of space, the entire program is not presented. We tested these vulnerable programs by the use of a Tizen emulator that was provided by Tizen Studio. The compilation of these programs are performed by pressing the “Build” button in Tizen Studio and without disabling the default security mechanisms.

182

M. Basiri and M. Mouzarani

4.1 Stack-Based BOF In Code 6, there is a function called overflow() with a 4-byte local buffer to which an input string is copied and is displayed in the text label. Code 6 Stack-based buffer overflow vulnerability in a Tizen application void overflow(const char *inbuf) { char buf[4]; strcpy(buf, inbuf); textLabel.SetProperty(TextLabel ::Property:: TEXT, buf); } bool OnButtonPressed(Button button) { std::string fieldTextString = textField.GetProperty (TextField::Property::TEXT).Get(); overflow(fieldTextString.c_str()); return true; }

We built the program with Tizen studio but first checked the generated binary file by GDB debugger and Peda tool in a Linux machine. It is worth stating that due to the difference between the platforms, it was not possible to execute this file using GDB and we could only review its assembly code and security status. Using the checksec command in Peda, we found out that the DEP mechanism was enabled and the stack canary and Fortify mechanisms were disabled for the program. We also reviewed the assembly code of the program and made sure that there were no __strcpy_chk() or __stack_chk_fail() functions for preventing buffer overflow exploitation. Then, we executed the program in the Tizen emulator and entered 24 “A”s in the input text field and it terminated by a Segmentation Fault error. By analysing the log file that was provided by Tizen Studio, we realized that the return address value (eip register) was changed into 0x41414141 (i.e. AAAA) as shown in Fig. 6. Besides, the memory map of the binary program, that was presented in the log file, demonstrated that there was no execution permission for the stack frame as shown in Fig. 7. Therefore, it assured us that the DEP mechanism is activated for this application. We also re-executed the program and checked the addresses of libraries. As shown in Fig. 7, the addresses were changed in each execution. Therefore, the Fig. 6 Overflowing the stack buffer and overwriting the return address in a Tizen application

Assessing the Resistance of Internet of Things Applications Against Memory. . .

183

Fig. 7 Proof of ASLR and DEP in Tizen OS. Parts (a) and (b) show the addresses and permissions of memory sections in two different executions

ASLR mechanism is enabled by default in Tizen version 5.5. However, by executing the program 100 times, we found out that only the second and third bytes of the four-byte addresses were changed in all execution. Even if we do not consider this pattern, as the addresses are 32-bit long, it is possible to bypass the ASLR mechanism by the brute force attack. We attempted to perform Return-to-Libc with brute force attacks to bypass DEP and ASLR mechanism and exploit the buffer overflow vulnerability. We located the position of the return address in the stack frame using the GDB debugger console in Tizen Studio. Moreover, by the use of the print command in this debugger, we obtained the address of the puts() function in the libraries of the binary code. For simplicity, we hardcoded the input attack string in the program. We wrote a bash script to execute this program in Tizen emulator Shell in an infinite loop. After several rounds of execution, we could successfully hijack the control flow and print the content of the stack by calling the puts function. Therefore, it is possible to exploit this vulnerability in Tizen 5.5, despite the presence of the aforementioned mechanisms.

184

M. Basiri and M. Mouzarani

4.2 Heap-Based BOF In Code 7, heap-based buffer overflow happens in a similar scenario as Code 2 in Sect. 3.2. The only difference here is that the data of the node head is received from the input text field. Code 7 Heap-based buffer overflow vulnerability in a Tizen application bool OnButtonPressed(Button button) { node *head = (node*)malloc(sizeof( node)); node *tail = (node*)malloc(sizeof( node)); head->next = tail; tail->next = NULL; strcpy(tail->data, “Last”); std::string fieldTextString = textField.GetProperty (TextField::Property::TEXT).Get(); strcpy(head->data, fieldTextString .c_str()); textLabel.SetProperty(TextLabel:: Property::TEXT, head->next->data); free(head); free(tail); return true; }

We executed this program with the input of 10 “A”s to fill the data buffer of the node head and 6 “B”s to overflow the data buffer and overwrite the next pointer value in this node. The execution ended with a Segmentation Fault error and by the use of Tizen debugger we found that the value of the next pointer in the head node was changed into 0x42424242 (the equivalent to “BBBB”). Thus, this vulnerability can be exploited in this operating system to modify data or hijack the control flow.

4.3 Buffer Overread In Code 8, two string buffers, named str1 and secretBuffer, are defined for storing a user input and a secret password, respectively. In the sscanf() function call, two input arguments from the user are received: one is an integer for cnt variable and another is a string for str1 buffer. The value of cnt determines the number of characters that are copied from str1 to the buffer destBuffer. The copy operation is performed in a for-loop structure with cnt as the iteration number limit. Since there is no bound-checking in the loop structure, if the value of cnt exceeds the size of str1, a buffer overread will happen. As a result, it will copy the content of buffer str1 into destBuffer and will continue to also copy the content of seceretBuffer into destBuffer. By printing the data in destBuffer in a text label, the secret information will be revealed.

Assessing the Resistance of Internet of Things Applications Against Memory. . .

185

Code 8 Buffer overread vulnerability in a Tizen application bool OnButtonPressed(Button button) { char str1[5], secretBuffer[10]; int cnt; strcpy(secretBuffer, “Password”); std::string destBuffer; std::string fieldTextString = textField.GetProperty (TextField::Property::TEXT).Get(); sscanf(fieldTextString.c_str(), “%d %s”, &cnt, str1); for (int i = 0; i < cnt; i++) if (str1[i]) destBuffer.push_back(str1[i]); textLabel.SetProperty(TextLabel:: Property::TEXT, destBuffer); return true; }

We executed the program with input data consisting of an arbitrary string and an integer number that was larger than the size of buffer str1. Therefore, we overread the buffer and received the “Password” string in the output.

4.4 Format String Although the printf() function is not applied in the GUI of Tizen applications, it is used for debugging purposes in the console output. Also, programmers can use sprintf() and fprintf() functions with a format string argument to copy strings into a string or file. Therefore, there is still a risk of format string vulnerability in Tizen applications. In Code 9, this vulnerability occurs by copying the input of a text field into a string buffer, called str, using sprintf(). The buffer string is printed in the next line to demonstrate the result of the attack. There is also a local integer variable, named x, with the value of 123456. The code is vulnerable because there is no format string argument in calling sprintf() function. Code 9 Format string vulnerability in a Tizen application bool OnButtonPressed(Button button) { std::string fieldTextString = textField.GetProperty (TextField::Property::TEXT).Get(); char str[100]; int x = 123456; sprintf(str, fieldTextString .c_str()); textLabel.SetProperty(TextLabel:: Property::TEXT, str); return true; }

We executed the program with the input “%d” and it printed four bytes from the top of stack in the format of an integer number. We used GDB and checked the stack frame to find the location of variable x. According to the distance of the location of x

186

M. Basiri and M. Mouzarani

to the top of the stack, we executed the program with 11 “%d”s input value and got the number 123456 in the output. By the use of appropriate format string arguments, it is also possible to change the value of x in the stack. Therefore, this vulnerability may occur in Tizen applications and can be exploitable.

4.5 Use After Free In Code 10, two string pointers, named str1 and str2, are defined. First, a buffer is dynamically allocated to str1 and freed immediately. Then, another buffer is allocated to str2 and the user input is stored in it. There is a use after free vulnerability in this code, as the content of str1 is printed into a text label, although the str1 buffer is already deallocated. When we executed this program, we could view the content of the second buffer in the output. Code 10 Use after free vulnerability in a Tizen application bool OnButtonPressed(Button button) { char *str1 = (char *)malloc(20); free(str); char *str2 = (char *)malloc(20); std::string fieldTextString = textField.GetProperty (TextField::Property::TEXT).Get(); strcpy(str2, fieldTextString. c_str()); textLabel.SetProperty(TextLabel:: Property::TEXT, str1); return true; }

5 Discussion In the previous sections, we exploited a number of vulnerabilities in two well-known operating systems of IoT. Table 1 presents a summary of the assessment carried out in this study. In this table, the first row demonstrates the default security mechanisms provided in each platform. Also, the green pluses in the table cells mean that we couldn’t exploit the vulnerability in the specified platform and the red minuses mean that the operating system failed to prevent exploiting that vulnerability. As Table 1 shows, we couldn’t exploit buffer overflow vulnerabilities in applications of the Contiki Native platform. In this platform, the Fortify mechanism replaces insecure function calls with secure ones and also adds canary checks at compile time. Even in programs where the buffer overflow occurs in other operations, such as copy operation in a loop, we couldn’t bypass ASLR, DEP, and Stack Canary mechanisms and exploit these vulnerabilities. In contrast to Contiki Native, there are no security mechanisms provided in the Contiki Tmote-Sky platform by default. This is because of hardware resources

Assessing the Resistance of Internet of Things Applications Against Memory. . .

187

Table 1 A summary and conclusion of the vulnerability assessments in the two operating systems. The red minus sign means that we managed to exploit the vulnerability while the green plus means that the vulnerability was not exploitable Vulnerability Security mechanisms Stack-based BOF

Contiki Native ASLR, DEP, Canary, Fortify +

Contiki Tmote-Sky Nothing -

Tizen ASLR, DEP -

Format string

-

-

-

Use after free

-

-

-

Buffer overread

-

-

-

Heap-based BOF

+

-

-

limitation in this platform and the high overhead of memory protection mechanisms. Therefore, we could exploit the buffer overflow vulnerability and performed a Return-to-Libc attack to exploit stack-based buffer overflow and hijack the control flow of a vulnerable program to an arbitrary function. For other memory corruption vulnerabilities, in both platforms, we showed that it is feasible to exploit them by overwriting or reading data in memory. However, it is possible to perform more enhanced exploits and hijack the control flow of the program. For Tizen operating system, as it runs in more powerful devices in comparison with Contiki Tmote-Sky, such as cellphones, smart watches, and smart TVs, and they are provided with stronger hardware, many of memory protection mechanisms are implemented in this platform. Although, the implemented protection methods were not strong enough and were bypassed by some attacks. As an example, we performed Return-to-Libc attack to bypass the DEP mechanism and combined it with a brute force attack to also bypass ASLR and successfully hijacked the control flow of the program to an arbitrary function. It is worth mentioning that for illustration purposes, we hardcoded the exploit string in some vulnerable programs. As the applications in these operating systems are written in C programming language, the real-world examples might receive the exploit string via various input methods, such as the main arguments, input file, sockets, environment variables, and GUI. As an example, we revised the stack-based BOF vulnerable code so that it received the input string from the main arguments and exploited the vulnerability by executing the below shell command: Code 11 Exploiting stack-based buffer overflow in a Tizen application by a shell command i=0; while [ $i -le 100 ]; do echo $i; ./basicuiwithdali $(echo -e ’ATTACKED_WITH_BOF!!!!!!!\x50\xac\x58\xb7’); ((i=i+1));done

As a complete attack scenario, the above command might be executed by a malicious application that is installed in a Tizen smartwatch to exploit the vulnerability of another application.

188

M. Basiri and M. Mouzarani

6 Conclusion The resistance of the applications in IoT operating systems against memory corruption attacks should be studied case by case for each platform as it depends heavily on the strength of hardware in that platform. Our analyses showed that it is possible to exploit a number of memory corruption vulnerabilities in Contiki and Tizen operating systems. To sum up, there are some security mechanisms for preventing buffer overflow in both operating systems. However, it is possible to bypass these mechanisms using common attack patterns. Also, there is no security mechanism for preventing the exploitation of other studied vulnerability classes. Therefore, the programmer is directly held responsible for securing programs and should be familiar with the secure coding principles to consider them in the course of coding. Besides, researchers studying the Internet of Things security should particularly value software security in addition to network security.

7 Future Works Since software security has received less attention in IoT, extensive work can be conducted on it in the future. Examples are as follows. • In addition to Native applications, Tizen also supports web applications. Performing the same study for the vulnerabilities in web application of Tizen would be worthful. • In this study, the tests were performed on the simulators. Conducting these tests on actual hardware provides real results that are significantly important. • We have tested Contiki only for the Tmote Sky platform and Tizen on a smartwatch. These tests can be extended for other platforms in another study.

References 1. Chaabouni, N., Mosbah, M., Zemmari, A., Sauvignac, C., Faruki, P.: Network intrusion detection for IoT security based on learning techniques. IEEE Commun. Surv. Tutorials. 21(3), 2671–2701 (2019) 2. Rathore, S., Kwon, B.W., Park, J.H.: BlockSecIoTNet: blockchain-based decentralized security architecture for IoT network. J. Netw. Comput. Appl., 167–177 (2019) 3. Rizvi, S., Orr, R., Cox, A., Ashokkumar, P., Rizvi, M.R.: Identifying the attack surface for IoT network. Internet Things. 9 (2020) 4. Alnaeli, S.M., Sarnowski, M., Aman, M.S., Abdelgawad, A., Yelamarthi, K.: Vulnerable C/C++ code usage in IoT software systems. In: IEEE 3rd World Forum on Internet of Things (WF-IoT), Reston, VA, USA (2016) 5. McBride, J., Arief, B., Hernandez-Castro, J.: Security analysis of Contiki IoT operating system. In: EWSN(2018): International Conference on Embedded Wireless Systems and Networks (2018)

Assessing the Resistance of Internet of Things Applications Against Memory. . .

189

6. Abraham, A.: Hacking Tizen: the OS of everything. In: nullcon, Goa, India (2015) 7. Tizen: About Tizen [Online]. https://www.tizen.org/about 8. Mohanty, A., Obaidat, I., Yilmaz, F., Sridhar, M.: Control-hijacking vulnerabilities in IoT firmware: a brief survey. In: 1st International Workshop on Security and Privacy for the Internet-of-Things (IoTSec), Orlando, Florida (2018) 9. Mullen, G., Meany, L.: Assessment of buffer overflow based attacks on an IoT operating system. In: Global IoT Summit (GIoTS), Arhus, Denmark (2019) 10. Piromsopa, K., Enbody, R.J.: Survey of protections from buffer-overflow attacks. Eng. J. 2011, 31–52 (2011) 11. Sharma, S.: Enhance application security with FORTIFY_SOURCE (2014). https:// access.redhat.com/blogs/766093/posts/1976213

IoT Geography Chain: Blockchain-Based Solution for Logistics Ecosystem Malik Junaid Jami Gul

and Anand Paul

1 Introduction We can observer swarm of IoT devices in shape of gadgets, smart sensor, smart cars, and even in shape of smart homes. IoT market is about to hit 267 billion in 2020 and shows upward trend. Upward trend in IoT market opens safety and security issues. Tracking the logistic equipment is vital to ensure safety and transparency in logistics 4.0. IIoT provides a solid base to address implication faced by logistics 4.0. Logistics 4.0 still requires human intervention to some extent in the supply chain process. Issues like tracking, transparency, and end-to-end trust building heavily depend on logistics. Integrity is also one the challenges that logistics should focus to improve quality of service. Logistics requires automation of certain task while ensuring transparency and security, and blockchain-based frameworks can provide solutions. Frameworks for logistics should be applicable at a large scale as even we consider logistics 4.0 as a part of industry 4.0, still it can be a separate and full-fledged business, for example, DHL. DHL is famous worldwide for its courier delivery system, but underneath the overall workflow, DHL requires secure and efficient logistics. Frameworks and application based on 4.0 can help companies like DHL to improve their logistics services. Logistics is required for industry to procure raw material from different places and to make product ready to sell in the shops. Logistics can impact the overall economy of the country, and its involvement in the economy can be inferred from the scale of the industry as it varies from small company where people send and receive goods from nearby location to logistics between the countries. Countries are dependent on logistics to sell their goods to other country. Mostly, country uses

M. J. J. Gul · A. Paul () Kyungpook National University, Daegu, South Korea e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4_12

191

192

M. J. J. Gul and A. Paul

logistics services from ships and aeroplanes for long-distance trade. In the list of most critical industries for economic growth, logistics holds third position. Logistics industries depend on transportation, which is the focused area for logistics industries to translate business into money. Logistics 4.0 along with blockchain is the future for such companies to operate their business with security and smoothly. Blockchain being one of the top trending technologies can provide security solution to logistics, but still there are fewer frameworks that can address the logistics domain. The first section of this chapter emphasizes the importance of logistics industry in the growth of the economy and provides introduction about IoT and blockchain. Section 2 provides information about related work. Section 3 contains the proposed framework with GPS technology and provides baseline to extend the work with graph analysis. Finally, the authors concluded the work.

2 Literature Review IoT devices can be found everywhere nowadays, and smart cities are taking advantage of this upcoming technology. Various environmental factors can be monitored with IoT devices and study conducted by Ullo and Sinha[7] where the authors monitored the data about pollution and temperature to make a decision for smart cities. A review [1] presents various features of IoT devices along with blockchain and artificial intelligence. The authors discussed about layered approach to connect three different domains to work in collaboration to enhance the security. Layers were divided into application, IoT blockchain, network, and perception layers. Such architecture can be effective, but the layers can be merged for better efficiency. Implications in smart logistics are important to understand [2] as there are challenges to share data across the IoT devices. Solution to secure communication with visualization [5] of data extracted from devices can be done between communicated layers. Logistics and supply chain relation is vital as logistics is moving from traditional system to logistics 4.0 [6] due to emergence of heterogeneous IoT devices. Blockchain is already providing solution for the supply chain resilience [3] to decrease the risk and uncertainty while building the trust among participants. Impact of the industry 4.0 is effecting logistic 4.0 [4], thus logistic 4.0 requires more efficient frameworks, and such study provides baseline for our proposed work.

3 Proposed Framework Figure 1 provides the overview of the proposed framework. Top layer of the framework consists of satellites and generates GPS data for the logistics devices according to the road links. GPS data consist of x and y coordinates, and links can be created according to the location marked on the road. Our study also considers

IoT Geography Chain: Blockchain-Based Solution for Logistics Ecosystem

193

Fig. 1 Proposed framework

Algorithm 1 Logistics GPS graph analysis Input: Nodes and Edges Output: Shortest Path 1: for traverse nodes and edges in graph do 2: Discovery shortest path in graph from point A to B =SP 3: end for 4: Save and share SP with logistics IoT = LST 5: Smart contract communication SC with LST 6: end()

the coordinated for the location where logistics transportation are bound to visit. GPS data for location for logistics create a premises within which smart contract can trigger for blockchain process. Logistic IoT layer consists of base station where initial path can be identified. Industries collect GPS data from above layer and perform the shortest path analysis for efficient transportation. Once the path is identified, data is sent to blockchain and communicated with IoT devices on the transportation vehicles. Vehicle follows the path determined by the shortest path algorithm. Premises are also identified by the industries, and coordinates are shared with the blockchain and saved in smart contracts. Premise can be defined as area where logistics transport will deliver the products. Blockchain layer receives the data from industries which contain the information of location represented in graph. Smart contract automatic triggering mechanism resides in this layer. To automatically trigger the smart contract, IoT devices require GPS coordinates to authenticate to triggering process. Table 1 contains the description of the abbreviations used in Algorithm 1. In our algorithm, we use Dijkstra’s algorithm to find the shortest path. The algorithm determined all the shortest paths SPs available in the graph.

194 Table 1 Abbreviation table

M. J. J. Gul and A. Paul Serial 1 3 3

Abbreviation SP LST SC

Description Shortest path from point A to point B IoT devices involved in logistics Smart contract

4 Conclusion GPS-based smart contract triggering system provides automation for the blockchain process which enhances the security for logistics. Fixed coordinates for premises ensure security as it is difficult for the attackers to identify and launch attacks from the premises coordinates. Shortest paths make logistics transportation more efficient as efficiency in transport is required for logistics to generate revenue. Path generated by the algorithm is referenced as geo-chain which effectively improved the location tracking of the logistic transport. Geo-chain also used as authentication parameters as they follow the trend which is identifiable if logistics vehicle IoT data is mismatched from the blockchain geo-chain data. The proposed system is effective to automate the blockchain process for the logistic 4.0. Our study ensures security with efficiency with the shortest path. Acknowledgments This work is supported by the National Research Foundation of Korea (NRF) grants funded by the Korean government, grant number: 2020R1A2C1012196.

References 1. Atlam, H.F., Azad, M.A., Alzahrani, A.G., Wills, G.: A review of blockchain in internet of things and Ai. Big Data Cogn. Comput. 4(4), 1–27 (2020). https://doi.org/10.3390/bdcc4040028 2. Humayun, M., Jhanjhi, N., Hamid, B., Ahmed, G.: Emerging Smart Logistics and Transportation Using IoT and Blockchain. IEEE Int. Things Mag. 3(2), 58–62 (2020). https://doi.org/10.1109/ iotm.0001.1900097 3. Min, H.: Blockchain technology for enhancing supply chain resilience. Business Horizons 62(1), 35–45 (2019). https://doi.org/10.1016/j.bushor.2018.08.012 4. Müller, J.M., Voigt, K.I.: The Impact of Industry 4.0 on Supply Chains in Engineer-to-Order Industries - An Exploratory Case Study. IFAC-PapersOnLine 51(11), 122–127 (2018). https:// doi.org/10.1016/j.ifacol.2018.08.245 5. Rubí, J.N.S., de Lira Gondim, P.R.: IoT-based platform for environment data sharing in smart cities. Int. J. Commun. Syst. 34(2) (2021). https://doi.org/10.1002/dac.4515 6. Sreedharan V, R., Unnikrishnan, A.: Moving towards industry 4.0: a systematic review. Int. J. Pure Appl. Math. 117(20), 929–936 (2017). https://www.mendeley.com/catalogue/731f08a66182-3885-8fdf-bcfeb97ba2b2/ 7. Ullo, S.L., Sinha, G.R.: Advances in smart environment monitoring systems using IoT and sensors. Sensors 20(11) (2020). https://doi.org/10.3390/s20113113

Author Index

A Afek, Y., 143 Almutawa, M., 101–118 Andalibi, V., 137–155

B Bada, M., 35–43 Baek, J., 168 Basiri, M., 173–188 Baturone, I., 121–135 Beckett, R., 144 Boddupalli, H., 101–118

C Camp, L.J., 137–155 Castro, M., 68 Chandra, T.D., 68 Cheng, Z., 168 Cheung, W., 1–15, 19–32

F Fayaz, S.K., 144 Feraudo, A., 143 Ferguson, I., 65–78 Fischer, M.J., 67 Fogel, A., 143

G Gul, M.J.J., 191–194

H Hamza, A., 143 Huang, W., 161 Hueihan, J., 50

J Jaderberg, M., 161, 168 Jain, R., 159–169 Jianan, L., 51

D Desai, A.A., 161 Deselaers, T., 161 Dibbo, S.V., 1–15, 19–32

K Kim, D., 137–155 Kolomeets, M., 144

E Epshtein, B., 161

L Lamport, L., 67, 125 Lear, E., 137–155

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4

195

196 Liao, M., 161 Liginlal, D., 145 Liu, W., 168 Luo, C., 161 Luu, L., 76

M MacKenzie, B.A., 65–78 Mahajan, A., 159–169 Matheu, S.N., 143 Maxion, R.A., 144 Mazhar, N., 143 Mishra, S., 101–118 Mohammad, Z., 81–97 Mouzarani, M., 173–188 Muratyan, A., 1–15

N Nagrath, P., 159–169 Nakamoto, S., 68 Narang, V., 47–63 Nasim, K., 50 Nayyar, A., 159–169 Nicolas, B., 50 Nyangaresi, V.O., 81–97

P Paul, A., 191–194 Prabhu, S., 143

R Razaq, A., 65–78 Reeder, R.W., 144 Román, R., 121–135

Author Index S Salim, F., 144 Shi, B., 168 Sinha, G.R., 192 Smetters, D.K., 144 Solanki, A., 47–63

T Tasweer, A., 50 Tian, Z., 161

U Ullo, S.L., 192

V Vaniea, K., 144 Vhaduri, S., 1–15, 19–32 von Solms, B., 35–43

W Wang, T., 50

X Xin, C., 50 Xu, T., 144

Y Yan Hu, 49

Z Zhan, F., 168 Zhang, Z., 161

Subject Index

A AAA server, see Authentication, Authorization, and Accounting (AAA) server ACC, see Accuracy (ACC) Accelerometers, 15 Access Control Entries (ACEs), 145, 146 Access Control List (ACL), 137, 141, 142, 144, 146 Accuracy (ACC), 8, 27 ACE Merging, 145–147 ACEs, see Access Control Entries (ACEs) ACE Tree algorithm, 147–149 MUD-Files, 147 protocol, 147 pruning, 149–151 result, 148 ACL, see Access Control List (ACL) Alice, 141, 142 ANN, see Artificial neural network (ANN) Application firmware authenticity and integrity, 129 manufacturer signs, 130 signature, 130 stage, 130 Area under the curve—receiver operating characteristic (AUC-ROC), 8, 10, 12 Arlo Pro app audio and motion scenario category, 113 baseline scenario category, 112 field of view scenario category, 115 IR sensor, 112 modes, 111 notifications scenarios, 116

Artificial neural network (ANN), 30, 161 ASLR mechanism, 183 AUC-ROC, see Area under the curve–receiver operating characteristic (AUC-ROC) Audio data augmentation, 25–26 Augmentations, 21 Authentication, Authorization, and Accounting (AAA) server, 140 Awareness cybersecurity, 38–39, 42 for users, 43

B Balance authentication mechanism (BAM) environment, 69 security, 68 timing synchronisation, 68 Bark-frequency cepstral coefficients (BFCC), 26 Base station (BS), 81 Behavioral biometrics, 21 BFCC, see Bark-frequency cepstral coefficients (BFCC) Binary classifiers, 6, 7, 11, 12, 15, 57 Binary RF, 9–10, 12, 13 Biometrics, 20 behavioral, 2, 3, 21 passive behavioral, 21 physical, 21 physiological, 2 types, 3 user authentication techniques, 1 user identification models, 3

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. Nayyar et al. (eds.), The Fifth International Conference on Safety and Security with IoT, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-030-94285-4

197

198 Blockchain, 192 Blockchain-based solution, logistics ecosystem abbreviation table, 193, 194 countries, 191 economy, 191 equipment, 191 frameworks, 191, 193 geo-chain, 194 GPS-based smart contract triggering system, 194 GPS data, 192, 193 GPS graph analysis, 193 implications, 192 industry, 191 industry 4.0 impact, 192 integrity, 191 IoT devices, 192 logistic IoT layer, 193 premise, 193 shortest paths, 194 smart contract automatic triggering mechanism, 193 supply chain relation, 192 Blockchain consensus algorithm IoT ecosystem, 67 protocol, 67 transaction, 67 Boolean isomorphic algorithms, 72 BootROM, 123, 129 Breathing patterns, 1, 2 Buffer overread vulnerability Contiki application, 180 Tizen application, 184–185 Byzantine general problem, 67

C CDF, see Cumulative distribution function (CDF) Character-Aware Neural Network (Char-Net), 159 Char-Net, see Character-Aware Neural Network (Char-Net) Chroma short-time Fourier transform (STFT), 26 CNN, see Convolutional neural network (CNN) COCO-Text dataset, 165 Computer vision techniques, 47 Consensus process BAM’s distributed network, 69 lead cell, 69 Consensus protocol, 68

Subject Index Constant-Q chromagram (chroma), 26 Consumer-friendly privacy practices, 41 Contiki, 174, 175, 188 Contiki OS security assessment, memory corruption vulnerabilities buffer overread, 180 format string, 180–181 heap-based BOF, 179 Native platform, 175 stack-based BOF, 176–178 Tmote Sky, 175 use after free, 181 Convolutional neural network (CNN), 48–52, 61 augmentations, 21 behavioral biometrics, 21 candidate features, 26 data pre-processing audio data augmentation, 25–26 data segmentation and cleaning, 24–25 datasets, 24 feature sets/classes, 30, 31 FWPT, 22 GMM, 22 leave-one-event-out validation approach, 21 LSTM model, 21–22 MFCC, 21 natural scenes text detection and recognition (see Natural scenes text detection and recognition) non-speech human sounds, 21 passive behavioral biometrics, 21 physical biometrics, 21 TFL Auth app, 20, 22–24 training-testing accuracy, 29, 30 user authentication, 26–30 user’s breathing patterns, 20 vs. RNNs, 22 Cooja simulator, 176 Cumulative distribution function (CDF), 24–25 Cybersecurity awareness, 42 awareness of users, 38–39 fitness devices (see Fitness devices) GDPR, 42 guidelines consumers, 39 for manufacturers, 41–42 policy makers, 39 for users, 40–41 risks, 36

Subject Index D D2D communication, 81 D2D performance gains, 81–83 Deep learning, 47, 160 Deep neural networks, 161 Disease monitoring, 20

E EAST, see Efficient and Accurate Scene Text (EAST) detection model ECDSA signature, 134 ECG, see Electrocardiogram (ECG) Efficient and Accurate Scene Text (EAST) detection model, 162, 165, 166, 168 Electrocardiogram (ECG), 3 Electroencephalography, 3 Electromyography, 3 Electrooculography, 3 Environmental Sound Classification (ESC-50), 24 Error detection, 75 Espressif’s ESP32 microcontroller, 132 Evistr digital voice recorder, 24

F Facial images, 1 False acceptance rate (FAR), 8 False rejection rate (FRR), 8 FAR, see False acceptance rate (FAR) Fast R-CNN, 48, 49, 62 analysis, 61–62 architecture, 51, 52 flowchart dividing the images for training and testing, 54 extracting images from videos, 53 labelling the image, 54 result, 54 testing the model, 54 training the model for normal and abnormal actions, 54 with SSD-Mobilenet architecture, 49 working and operation, 51–52 FCN, see Fully Convolutional Network (FCN) Fingerprints, 1 Fitbit, 3, 15, 22, 23, 32, 35–38, 41 Fitbit Ionic, 32 Fitbit watch and phone, 23 Fitness bands, 35 Fitness devices data security, 42 personal data collection, 36–37

199 risks, 36–38 and smart home devices, 36 Flash memory, 134 Format string vulnerability Contiki application, 180–181 Tizen applications, 185–186 Free RTOS operating system, 175 FRR, see False rejection rate (FRR) Fully Convolutional Network (FCN), 161, 162, 168 Fuzzy wavelet packet transform (FWPT), 22 FWPT, see Fuzzy wavelet packet transform (FWPT)

G Gait, 1, 2 Gait-based authentication system, 20 Gammatone-frequency cepstral coefficients (GFCC) feature extraction scheme, 22 GAR, see Genuine acceptance rate (GAR) Gateway node (GWN), 84 Gaussian mixtures model (GMM), 22 General Data Protection Regulation (GDPR), 42 Genuine acceptance rate (GAR), 8, 28, 30 Genuine rejection rate (GRR), 8, 28, 30 Geo-chain, 194 GMM, see Gaussian mixtures model (GMM) Goal errors, 145 Google home mini baseline scenario category, 105 experiments, 105 noise/conversation, 107 non-standard operating scenarios category, 109–110 observations, 111 soft music playing, 106 standard operating scenario category, 107–109 Google TensorFlow Lite framework, 23 GPS-based smart contract triggering system, 194 GPS data, 192 GRR, see Genuine rejection rate (GRR)

H Hash-based signatures ADRS, 126 digital signature, 125 digital signature schemes, 127 RSA and ECDSA signature, 125

200 Hash-based signatures (cont.) sub-parameters, 126 WOTS+ scheme, 125 XMSS scheme, 125 HCI, see Human-Computer Interaction (HCI) Health monitoring, 20 Health monitors, 35 Heap-based BOF vulnerability Contiki application, 179 Tizen application, 184 Heart rate (HR) authentication approaches, 7 biometric, 15 data pre-processing, 4–5 features, 6 and HRSpO2 model, 11, 12 and oxygen data, 7 oxygen saturation with, 2 PPG sensors, 3 Wellue dataset, 4 zones, 4–6 Heart rate and oxygen saturation data-driven model (HRSpO2 model), 7 with average and standard deviation of performance measures, 11 binary RF classifier, 12, 13 relative performance-loss, 14 SVM RBF classifier, 15 unary SVM RBF classifier, 12, 13 Heart rate data-driven model (HR model), 7 with average and standard deviation of performance measures, 10 relative performance-loss, 13–14 Helper Data algorithm, 133 Hidden Markov model, 48 HMDB dataset, 50 abnormal classes, 59–60, 62–63 classified, 54 graphical representation binary classifier, 57 classification loss, 57 localization loss, 57 total loss, 57, 58 illustration, 55 normal class, 58–59 training loss at different epochs, 56 training process, 56 Home automation, 101 HR, see Heart rate (HR) Human abnormal behaviour detection CNN, 48, 50 computer vision techniques, 47 deep learning, 47

Subject Index drawbacks, 48 dynamic image, 50 environment, 48 Hidden Markov model, 48 HMDB dataset, 50, 54–62 J-HMDB-21, 50 KTH, 48, 50, 51 least-squares SVM classifier model, 49 machine learning, 47 MTCNN, 50 object detection (see Object detection) physical feature, 49 pose estimation (see Pose estimation) security systems, 47 SSD MOBILE NET, 48 SVM classifier, 50 3D CNN, 50 three-stream CNN, 50 2D and 3D invariants, 48 UCF101-24, 50 UCF-Sports, 48, 50, 51 in video surveillance, 50 WSVM, 50 Human–Computer Interaction (HCI), 142 Hybrid D2D message authentication technique, 83 Hyper-parameter optimization, 9

I ICDAR-2015 dataset, 165, 167, 168 IETF, see Internet Engineering Task Force (IETF) Implicit authentication, 1–3, 15 Industry 4.0, 192 Initialization phase, 88 Instant Contiki, 174 Integrity, 191 Internet Engineering Task Force (IETF), 137, 138 Internet of Things (IoTs), 35 applications, 121 authenticating, 71 authentication, 3 blockchain’s security characteristics, 66 cryptographic primitives, 122 cryptographic services, 66 device, 74 double-spend/fraudulent misappropriation, 71 insecurities, 66 interconnected environments, 15 lack of technical support, 137 manufacturers, 101

Subject Index and mobile networks, 1 network devices and protocols, 65 nodes, 71 operating systems, 173 post-quantum resistant, 122 requests, 70 security, 42, 66, 137, 173, 174 services/sectors, 2 SRAM memories, 123 technology, 66 IoT connectivity, 20 IoT geography chain, see Blockchain-based solution, logistics ecosystem IoTs, see Internet of Things (IoTs) IoT Security, 36, 73, 143

J J-HMDB-21, 50 Joints for HMDB dataset (J-HMDB), 50

K Key-phrase-based voice authentication, 20 Keystroke dynamics, 1 Knowledge-based authentication approaches, 19 Knowledge-based authentications, 1 KTH, 48, 50, 51

L Least-squares SVM classifier model, 49 Leave-one-event-out fashion, 27 Leave-one-event-out testing approach, 27 Leave-one-event-out validation approach, 21 Linear predictive coefficients (LPC), 26 Logistics 4.0, 192, 194 Long short-term memory (LSTM) model, 21–22, 166 LPC, see Linear predictive coefficients (LPC) LSTM, see Long short-term memory (LSTM)

M Machine learning, 47 Manufacturer Usage Description (MUD) access control decisions, 144 ACLs, 141, 142, 144 Alice, 141, 142 automatic generation, 145 challenges in MUD-Files, 143, 154 components, 138–140

201 definition, 138 deployment, 143, 144 electron framework, 152 futurework, 154–155 goal, 137 HCI, 142 IETF, 137, 138 implementation, 151–152 IoT devices, 138, 154 IoT manufacturers, 145 IoT security, 137 lack of UI, 145 methods ACE Merging, 145–147 ACE Tree, 147–150 MUD-Visualizer, 145 MUD-File, 138, 143–145, 151, 153–155 MUD-File ACL abstractions controller extension, 141 default policies, 140 defined types, 140 domain-name, 140 local-networks, 140 manufacturer, 140 model, 141, 153 with my-controller abstraction, 141, 153 same-manufacturer, 141 MUDgee, 143 MUD-Visualizer, 142–144, 152–155 PoLP, 141 projects, 142 SDN, 143 SDN-based architecture, 143 smart slow cooker device, 141, 142 source of user errors, 145 UI of the MUD-visualizer, 153 workflow, 138–140 Many-time signature (MTS) scheme, 127 Maximally Stable Extremal Region (MSER), 161 McAfee survey, 19 Mel-frequency cepstral coefficients (MFCC), 21, 26, 29, 30, 32 Mel-Spectrogram (Mel-Spect.), 26 Memory corruption vulnerabilities, IoT operating systems Contiki, 174, 175 Contiki OS security assessment ASLR analysis, Native platform, 177 assembly source, compiled program, 177 buffer overread, 180 conclusion, 187

202 Memory corruption vulnerabilities, IoT operating systems (cont.) format string, 180–181 heap-based BOF, 179 Native platform, 175 securing programs, 188 stack-based BOF, 176–178 summary, 187 Tmote Sky, 175 Free RTOS operating system against buffer overflow attacks, 175 futureworks, 188 Tizen, 174, 175 Tizen OS security assessment ASLR and DEP, 183, 187 buffer overread, 184–185 conclusion, 187 format string, 185–186 heap-based BOF, 184 OnButtonPressed() function, 181 securing programs, 188 stack-based BOF, 182–183, 187 summary, 187 Tizen Studio, 181 use after free, 186 Merkle tree, 75–76 MFCC, see Mel-frequency cepstral coefficients (MFCC) MSER, see Maximally Stable Extremal Region (MSER) MSRA-TD500 dataset, 165 MTCNN, see Multi-task CNN model (MTCNN) MUD, see Manufacturer Usage Description (MUD) MUD-File ACL abstractions controller extension, 141 default policies, 140 defined types, 140 domain-name, 140 local-networks, 140 manufacturer, 140 model, 141, 153 with my-controller abstraction, 141, 153 same-manufacturer, 141 MUD-File Processor, 151, 152 MUD-Files, 138, 143–145, 151, 153, 154 MUD-file server, 138 MUDgee, 143 MUD-Manager, 140, 142 MUD-URI, 140 MUD-Visualizer, 142–146, 151–155 Multi-biometric models, 3 Multi-task CNN model (MTCNN), 50

Subject Index N NAD, see Network Access Device (NAD) Natural scenes text detection and recognition blur images, 159 CNN-RNN joint model, 161 datasets COCO-Text, 165 ICDAR 2015, 165–166 MSRA-TD500, 165 SVT, 166 deep learning, 160 deep neural networks, 161 deformities, 159 detection stage pipeline and network design, 162–163 text detection FCN structure, 163 evaluations, 166–168 FCN, 161 futurework, 168–169 ICDAR-2015 dataset, 167 image noise, 160 image resolution, 159 implementation, 166 lighting conditions and reflective surfaces, 160 MSER, 161 multi-object rectification network, 161 NMS, 162 performance, 168 preprocessing and post-processing stages, 160 recognition stage, 165 results, 166–168 RNN, 160 stages, 162 SVT dataset, 167 text orientation, 160 Network Access Device (NAD), 140 NGCC, see Normalized gammachirp cepstral coefficients (NGCC) NMS, see Non-maximum suppression (NMS) Noise superposition, 26 Non-maximum suppression (NMS), 162, 164–165, 168 Non-speech human sounds, 21 Non-volatile memory, 131 Normalized gammachirp cepstral coefficients (NGCC), 26 Notations, 87 O Object detection, 51, 53, 62 Fast R-CNN model, 48 tensorflow, 49

Subject Index OnButtonPressed() function, 181 One-time programmable (OTP), 121 overflow() function, 176 Oxygen saturation (SpO2 ) authentication approaches, 7 data, 5–6 data pre-processing, 4–5 features, 6 heart rate data, 2 multi-factor fingerprint authentication system, 3 sensors, 2 Two-Sample T-tests, 2, 5 wearable user authentication, 2 Wellue dataset, 4 Oxygen saturation data-driven model (SpO2 model), 7 with average and standard deviation of performance measures, 11 relative performance-loss, 14 P Parameter initialization phase, 86 Passive behavioral biometrics, 21 PCA, see Principal Component Analysis (PCA) Performance evaluation communication costs comparisons, 95 communication overheads, 94 computation overheads comparisons, 96 D2D authentication, 94 message signing and verification latencies, 93 mutual authentication message exchanges, 95 signing and verification, 94 verification cost, 94 Performance measures CNN ACC, 27 F1 Score, 28 GAR, 28 GRR, 28 RMSE, 28 HR and SpO2 ACC, 8 area, 8–9 AUC-ROC, 8 binary model measures, 12, 13 F1 score, 8 GAR, 8 GRR, 8 RMSE, 8 unary model measures, 12, 13

203 Personal data collection fitness devices, 36–37 Photoplethysmogram (PPG) sensors, 3, 15 Physical biometrics, 21 Physically unclonable function (PUF), 86 PINs, 1 Pitch shift, 25–26 Place discovery, 20 Policy makers, 39 PoLP, see Principle of Least Privilege (PoLP) Pose estimation, 50, 62 learning semantics, 49 Principal Component Analysis (PCA), 6–7 Principle of Least Privilege (PoLP), 141 Privacy, 21, 36, 38, 39, 41–43 Privacy leakage, 116–117 Probability density function (PDF), 24–25 Proposed session key agreement protocol, 89 Pruning ACE Tree, 149–151 puts() function, 177, 178, 183, 185 Pytesseract natural scenes text detection and recognition (see Natural scenes text detection and recognition)

R Random forest (RF) classification model, 15 Rasta perceptual linear prediction coefficients (RPLP), 26 RBAC, see Role-Based Access Control (RBAC) Recurrent neural networks (RNN), 22, 160 Regions of Interest (RoI), 51 Rendering engine, 152 Resilience cloning attacks, 91 identity theft, 92 masquerade attack, 92 message replays, 92 simulation parameters, 91 source authentication, 92 Risk education on, 41 fitness devices, 36–38 Rivest–Shamir–Adleman (RSA), 85 RMSE, see Root mean square error (RMSE) RNN, see Recurrent neural networks (RNN) Role-Based Access Control (RBAC), 144 Root-mean-square (RMS), 26 Root mean square error (RMSE), 8, 28 RPLP, see Rasta perceptual linear prediction coefficients (RPLP)

204 S Scalability, 75–76 SDN, see Software-Defined Networking (SDN) Secure firmware update, 132 Security provisions authentication process, 74–75 consensus, 75 device, 73 IoT request, 74 TCP/IP protocol, 73 validation node, 75 Select the K-Best (SelectKBest), 7 Signature verification, 133 Sleep quality improvement, 20 Smart home devices evaluation methodology, 104 indoor and outdoor cameras, 102 internet-connected, 103 setup, 105 technique, 102 traffic analysis, 103 transport layer encryption, 104 Wi-Fi hotspot, 104 Smartphones, 20 Smartwatches, 35 Software Bootloader, 132, 133 Software-Defined Networking (SDN), 143 Spectrogram (Spect.), 26 SpO2 , see Oxygen saturation (SpO2 ) sprintf() function, 185 SRAM cell classification, 130 SRAM chip, 123 SSD-MOBILE NET, 48 architecture, 52 flowchart dividing the images for training and testing, 54 extracting images from videos, 53 labelling the image, 54 result, 54 testing the model, 54 training the model for normal and abnormal actions, 54 working and operation, 51–52 Stack-based BOF vulnerability Contiki application, 176–178 Tizen application, 182–183 Stand-alone extension, 152 Strcpy() function, 176 Street View Testing (SVT) dataset, 166–168 Stress monitoring, 20 Stroke Width transform (SWT), 161 SVM classifier, 50

Subject Index SVT dataset, see Street View Testing (SVT) dataset SWT, see Stroke Width transform (SWT) Symmetric cryptography, 84 System model D2D key agreement and authentication, 85 digital signature protocol, 85 PKI based approaches, 85

T Tensorflow objection detection, 49 TensorFlow Lite (TFL) Auth app, 20, 22–24, 27, 30, 32 Text recognition, 160 3D CNN, 50 Three-stream CNN, 50 Time complexity, 77 Tizen, 174, 175, 188 Tizen OS security assessment, memory corruption vulnerabilities buffer overread, 184–185 conclusion, 187 format string, 185–186 heap-based BOF, 184 OnButtonPressed() function, 181 stack-based BOF, 182–183 summary, 187 Tizen Studio, 181 use after free, 186 Tizen Studio, 174, 181, 183 Tmote Sky, 175, 176, 188 bypassing DEP mechanism, 179 return-to-Libc attack, 178 stack-based BOF attack, 178 Tonnetz, 26 Training-testing set, 9 Transactional data, 74 Transport for London (TfL) transit system, 36 2D and 3D invariants, 48

U UCF101-24, 50 UCF-Sports, 48, 50, 51 Ultra-reliable low latency communication (URLLC), 82 Unary classifiers, 7 Unary SVM RBF, 9–10, 12 Uncorrelated features, 6 Urban Sound 8K (US-8K), 24 Use after free vulnerability Contiki application, 181 Tizen application, 186

Subject Index User authentication CNN authentication model evaluation, 29–30 Google cloud platform, 26 Google TensorFlow Lite framework, 27 hyper-parameter optimization, 28–29 MFCC, 27 performance comparison measures, 27–28 TFL Auth app, 27 training-testing set, 27 hyper-parameter optimization, 9 model evaluation, 10–15 optimal count, 9–10, 12 performance measures, 8–9 training-testing set, 9 V Video surveillance, 50 Visualization data generator, 151 Voice, 1

205 W Wearable authentication, 3, 21 Wearable devices fitness bands, 35 health monitors, 35 physical and digital world, 35 security/privacy, 42 smartwatches, 35 warnings and instructions, 40 Wearable security gait and breathing patterns, 2 user authentication, 2, 3 Weighted Support Vector Machine (WSVM), 50 Well-being tracking, 20 Wellue dataset, 4 Wellue SleepU wrist-worn oxygen monitor, 4 Wireless connection, 36 WOTS-chain computation, 124 WSND-CH authentication, 90 WSVM, see Weighted Support Vector Machine (WSVM)