Data Protection in a Post-Pandemic Society: Laws, Regulations, Best Practices and Recent Solutions 3031340051, 9783031340055

This book offers the latest research results and predictions in data protection with a special focus on post-pandemic so

111 31 8MB

English Pages 253 [246] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Data Protection in a Post-Pandemic Society: Laws, Regulations, Best Practices and Recent Solutions
 3031340051, 9783031340055

Table of contents :
Preface
Contents
About the Editors
Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future Challenges
1 Introduction
2 Previous Work
3 Research Challenges
Cybersecurity
Avatar Integrity
Device Security
Data Privacy
Data Collection
User Consent
Direct Marketing
Data Intermediaries
Health
Content Moderation
Children
User Education
Economy
Ownership
Advertising
Portability and Interoperability
Transparency and Accountability: AI
Laws and Regulation Landscape
EU/UK General Data Protection Regulations
Confidentiality
Responsibility and Liability
EU Digital Services Act
Consistency
Systematisation
Consumers Vs Traders
Transparency
UK Online Safety Bill
Big Tech Companies
UK Two-Tier System: Liability
Publishers
Online Intermediaries
Legal Clarity
Discussion and Summary
4 Future Research Directions and Potential Solutions for Cybersecurity
Avatar Integrity
Security Protocols
Cyber-Resilience
Data Privacy
Data Collection
Metaverse: Open and Decentralised
Data Protection Framework
Health
Content Moderation
Children
User Education
Economy
Ownership
Advertising
Data Portability and Interoperability
Transparency and Accountability: Blockchain
Laws and Regulation Landscape
EU/UK General Data Protection Regulations
Confidentiality
Responsibility and Liability
EU Digital Services Act
Consistency
Systematisation
Consumer Vs Trader
Transparency
UK Online Safety Bill
Big Tech Companies
UK Two-Tier System: Liability
Publishers
Online Intermediaries
Legal Clarity
5 Discussion
6 Conclusion
Recommendations
Policy and Notification
VDaaS: Notification, Consent, Policies, and Records System Update
Data Provenance and Integrity
Data Veracity and User Safety
Cybersecurity
AI Training Data
Data Privacy
Due Diligence and Best Practices
Avatars
User Consent and Age Verification
Tokenization-Validation
Limitations
Conflict of Interest
References
Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine Learning
1 Introduction
2 The Great Privacy Awakening
3 Attacks on ML Systems
Data Access Attack
Membership Inference Attack
Input Inference Attack
Parameter Inference Attack
Property Inference Attack
4 Quantifying Privacy Risks in ML Systems
Membership Inference
Input Inference
Parameter Inference
Property Inference
5 Privacy-Preserving Machine Learning
6 Privacy-Preserving Techniques
Differential Privacy
Federated Learning
Synthetic Data
Data Condensation
Auxiliary Techniques
7 Privacy and Responsible ML
8 Conclusion
References
Security Analysis of Android Hot Cryptocurrency Wallet Applications
1 Introduction
Background/Context
Research Focus and Purpose
2 Literature Review
Current Developments and Related Work
Security Attacks
3 Research Methods
Research Methodology
Technical Background Overview
Cryptography, Encryption, and Principles
Blockchain, Types of Blockchain, and Transaction Process
Cryptocurrency
Crypto Wallet, Crypto Wallet Types, and Architecture
Types of Crypto Wallets
Android Crypto Wallet Vulnerabilities
Vulnerabilities of Crypto Wallets
Vulnerabilities of Android OS
Android Crypto Hot Wallets and Their Security Mechanisms
4 Investigation Results/Findings
Using Accessibility Services to Steal Information
Potential Defence Mechanisms
Using USB Debugging to Take Information from Backup File
Potential Defence Mechanisms
Taking Information from a Rooted Device
Potential Defence Mechanisms
5 Discussions
6 Conclusion
References
Exploring the Opportunities of Applying Digital Twins for Intrusion Detection in Industrial Control Systems of Production and Manufacturing – A Systematic Review
1 Introduction
Research Question
Objectives
Significance of the Study
Limitations of the Study
Chapter Outline
2 Literature Review
3 Research Methodology
Research Questions
Research Questions Formulated Through PICO Framework
Protocol and Eligibility Criteria
Information Sources
Search Strategy
Study Selection
4 Quality Assessment
5 Data Extraction Process
6 Data Presentation and Analysis
Familiarisation with Data
Coding
Searching for Themes
Reviewing Themes
Defining and Writing Up
7 Data Presentation
8 Conclusion and Recommendation
Discussion
Conclusion
References
Securing Privacy During a World Health Emergency: Exploring How to Create a Balance Between the Need to Save the World and People's Right to Privacy
1 Introduction
2 Pre-COVID-19 Data Protection Overview
3 Changes to Data Brought About by COVID-19
Rapid Growth in the Volume of Data
Expansion in the Scope of Data
Changes for Data Controllers and Processors
Weakened Control of Data by Data Subjects
Increased Impact of Data on Every Day People's Lives
New Data Technologies and Their Role
4 Challenges of Data Protection in Post-COVID Society
Handling of COVID-Related Data
Technical Challenges to Protect Privacy
Ethical Challenges of a Rapidly Evolving Digital Society
Legal Challenges of Data Protection
5 Solutions
Establish Mechanisms to Respond to Similar Crises
Promote Technological Innovation
Strengthen Industry Regulation
Improve Laws and Regulations
6 Conclusion
References
Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning
1 Introduction
2 Understanding Federated Learning
Topology
3 Data Privacy and Cyber Security in Federated Learning
Privacy Challenges and Federated Learning Threat Models
Federated Learning for Cyber Security
4 Case Study: Federated Learning for Intrusion Detection Systems
The CICIDS2017 dataset
CICIDS 2017: Training a federated model
Results
Discussion
5 Open Issues and Future Trends
6 Conclusion
References
Emerging Computer Security Laws and Regulations Across the Globe: A Comparison Between Sri Lankan and Contemporary International Computer Acts
1 Sri Lankan Context
Background of Sri Lanka and Technology
What Is a Crime? What Is a Computer Crime?
Overview of Computer Crimes Act 2007
Comparison on the Reports of 2011 and 2020
The Loopholes in the Computer Act 2007
2 International Context
Introduction to International Contemporary Laws
Estonia
Singapore
Review of the CSA
Updating the Cybersecurity Code of Practice (CCoP)
South Africa
United States of America
3 Comparison: Sri Lankan Computer Crime Act with Foreign Computer Crime Acts
4 Suggestions and Conclusion
References
Legal Considerations and Ethical Challenges of Artificial Intelligence on Internet of Things and Smart Cities
1 Introduction
2 IoTs in Cyberspace
3 Internet of Things (IoTs)
4 IoT and AI/ML
5 IoT Security and the Human Factor
6 IoT and the Cookie Monster
7 IoT and Covid-19 Impact
8 IoT-Perceptions of Security and Privacy
9 IoT and Smart Cities
10 Conclusion
References

Citation preview

Chaminda Hewage Yogachandran Rahulamathavan Deepthi Ratnayake   Editors

Data Protection in a Post-Pandemic Society Laws, Regulations, Best Practices and Recent Solutions

Data Protection in a Post-Pandemic Society

Chaminda Hewage • Yogachandran Rahulamathavan • Deepthi Ratnayake Editors

Data Protection in a Post-Pandemic Society Laws, Regulations, Best Practices and Recent Solutions

Editors Chaminda Hewage Cybersecurity and Information Networks Centre Cardiff Metropolitan University Cardiff, UK

Yogachandran Rahulamathavan Loughborough University London, UK

Deepthi Ratnayake University of Hertfordshire Hatfield, UK

ISBN 978-3-031-34005-5 ISBN 978-3-031-34006-2 https://doi.org/10.1007/978-3-031-34006-2

(eBook)

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Data protection is a key aspect of information security where personal and business data need to be protected from unauthorized access and modification. The number of emerging digital technologies that collect, transmit, and process privacy-sensitive data is on the rise. These technologies range from the blockchain applications that perform private business data transaction on public ledger to Internet of Things that collect personal healthcare data, finance data, traffic data, and send them to cloud computers to perform machine learning inferences. The protection of personal data receives significant attention due to the increased number of violations of subjects’ rights. The stolen personal information has been used for many purposes such as ransom, bullying, and identity theft. Due to the wider usage of the Internet and social media applications, people make themselves vulnerable by sharing personal data. This book discusses key issues and challenges associated with personal data protection prior, during, and post the COVID-19 pandemic. Some of these challenges are caused by technological advancements (e.g. Artificial Intelligence (AI)/Machine Learning (ML), ChatGPT, and Metaverse technologies). And some issues are due to the surveillance measures and significant online presence during the COVID-19 pandemic. The enactment of the EU General Data Protection Regulation (GDPR) provided a unique framework for data processors to process personal data fairly, legally, and transparently. The GDPR remains one of the prominent and revolutionary data protection regulations, which led other countries around the world to draft data protection laws with similar principles. However, the choice of principles related to data localization and subject rights (e.g. right to be forgotten) was distinguished. In addition, there were several issues and challenges for these regulations due to the new norm created by the COVID-19 pandemic. This was more evident when sharing Coronavirus data for research and development, etc. The chapters of this book address these issues where the data protection laws are pushed to their limits and present the proposed amendments to the exiting laws and regulations. With the increased awareness of data protection and existing laws, researchers and developers were eager to address data privacy in different applications. In order to preserve the privacy of the data involved, there are novel techniques such as zero v

vi

Preface

knowledge proof, fully homomorphic encryption, and multi-party computations that are being deployed. The tension between data privacy and data utility drive innovation in this area where numerous start-ups around the world have started receiving funding from government agencies and venture capitalists. This fuels the adoption of privacy-preserving data computation techniques in real application, and the field is rapidly evolving. This book will capture the state-of-the-art data privacy techniques used in important areas. Cardiff, UK Loughborough, UK London, UK February 2023

Chaminda Hewage Yogachandran Rahulamathavan Deepthi Ratnayake

About the Book

The purpose of the Data Protection in a Post-Pandemic Society – Laws, Regulations, Best Practices and Recent Solutions edited book is to establish the state of the art and set the course for future research in data protection. The scope of this book includes not only all aspects of data protection and privacy but related areas, such as cybersecurity in the Metaverse and machine learning. The book serves as a central source of reference for data protection research and developments. The book aims to publish thorough and cohesive overviews on specific topics in data protection in a post-pandemic society, as well as works that are larger in scope than survey articles and that will contain more detailed background information. The book also provides a single point of coverage of advanced and timely topics and a forum for topics that may not have reached a level of maturity to warrant a comprehensive textbook.

vii

Contents

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vinden Wylde, Edmond Prakash, Chaminda Hewage, and Jon Platts 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Previous Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Future Research Directions and Potential Solutions for Cybersecurity . . . 5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jigyasa Grover and Rishabh Misra 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Great Privacy Awakening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Attacks on ML Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Quantifying Privacy Risks in ML Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Privacy-Preserving Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Privacy-Preserving Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Privacy and Responsible ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 5 7 20 28 32 39 49 49 50 51 56 61 61 71 74 75

Security Analysis of Android Hot Cryptocurrency Wallet Applications. . . 79 Danyal Mirza and Yogachandran Rahulamathavan 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3 Research Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4 Investigation Results/Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

ix

x

Contents

6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Exploring the Opportunities of Applying Digital Twins for Intrusion Detection in Industrial Control Systems of Production and Manufacturing – A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nipuna Sankalpa Thalpage and Thebona Arachchige Dushyanthi Nisansala 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Data Extraction Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Data Presentation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Data Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusion and Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Securing Privacy During a World Health Emergency: Exploring How to Create a Balance Between the Need to Save the World and People’s Right to Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shasha Yu and Fiona Carroll 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Pre-COVID-19 Data Protection Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Changes to Data Brought About by COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Challenges of Data Protection in Post-COVID Society . . . . . . . . . . . . . . . . . . . . 5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jonathan White and Phil Legg 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Understanding Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Data Privacy and Cyber Security in Federated Learning. . . . . . . . . . . . . . . . . . . 4 Case Study: Federated Learning for Intrusion Detection Systems . . . . . . . . 5 Open Issues and Future Trends. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

113

113 117 123 129 130 132 136 139 141

145 145 145 147 150 157 160 161 169 169 170 175 178 187 188 189

Emerging Computer Security Laws and Regulations Across the Globe: A Comparison Between Sri Lankan and Contemporary International Computer Acts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 S. Y. Rajapaksha, L. G. P. K. Guruge, and S. L. P. Yasakethu 1 Sri Lankan Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 2 International Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Contents

xi

3

Comparison: Sri Lankan Computer Crime Act with Foreign Computer Crime Acts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 4 Suggestions and Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Legal Considerations and Ethical Challenges of Artificial Intelligence on Internet of Things and Smart Cities . . . . . . . . . . . . . . . . . . . . . . . . . Nisha Rawindaran 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 IoTs in Cyberspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Internet of Things (IoTs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 IoT and AI/ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 IoT Security and the Human Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 IoT and the Cookie Monster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 IoT and Covid-19 Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 IoT-Perceptions of Security and Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 IoT and Smart Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

217 217 218 219 221 222 224 227 230 233 236 237

About the Editors

Chaminda Hewage is a Reader (Associate Professor) of Data Security in the Cardiff School of Technologies at Cardiff Metropolitan University, UK, where he is also the Associate Dean Research for the school. He is the founder and director of Cybersecurity and Information Networks Centre (CINC). He received a B.Sc. Engineering (First Class Honours Degree) in Electrical and Information Engineering from the Faculty of Engineering, University of Ruhuna (Sri Lanka) in 2004 and a Ph.D. in Multimedia Communications from the University of Surrey (UK) in 2009. He was awarded the Gold Medal for best performance in Engineering (2014) by the University of Ruhuna for his achievements in undergraduate studies at the General Convocation held in 2004. After graduation, he joined Sri Lanka Telecom PLC (Sri Lanka) as a Telecommunication Engineer (2004). In September 2005, he was awarded the Overseas Research Scholarships (ORS) scholarship by the Higher Education Funding Council of England (Universities UK) to pursue his Ph.D. at the University of Surrey, UK. Upon completion of his PhD, he worked as a researcher at the University of Surrey, UK, and Kingston University, London, UK. In 2015, he joined Cardiff Metropolitan University, UK, as a Senior Lecturer. In 2014, he received a Post-Graduate Certification in Higher Education (HE) Teaching and Learning from Kingston University – London. He is a Fellow of the Higher Education Academy (HEA), UK. He is a senior member of the IEEE and Member of the Chartered Institute of Information Security (CIISec). Yogachandran Rahulamathavan is a Senior Lecturer and the Programme Director for MSc Cyber Security and Data Analytics at Loughborough University’s London Campus in the UK. Yoga obtained his PhD degree from Loughborough University in mathematical optimisation techniques for information processing in 2012. His research interest is in developing novel security protocols to advance machine learning techniques to solve complex privacy issues. His current focus is on post-quantum encryption techniques to develop privacy-preserving machine learning algorithms. Currently, Dr Rahul has been coordinating a UK-India project between Loughborough University London, IIT Kharagpur, India, and City, Univer-

xiii

xiv

About the Editors

sity of London. He is a Senior Member of IEEE and an Associate Editor for the IEEE Access journal. Deepthi Ratnayake is a university academic, a researcher, and an author with nearly 30 years of experience in industry, defence, and academia in the areas of cybersecurity, networking, and information systems management. She currently serves as a Senior Lecturer in Computer Science (Cyber Security and Networks) and is a member of the Cyber Security Research Group within the School of Physics, Engineering and Computer Science (PECS) at the University of Hertfordshire, UK. Deepthi is also an executive officer of the Information Security Specialist Group of British Computer Society (BCS-ISSG), an associate editor of Information Security Journal: A Global Perspective, and a Cyber Security Columnist at BCS ITNow quarterly magazine. Deepthi obtained her PhD in Probe Request Attack Detection in Wireless LANs using Intelligent Techniques. Her research interests are intruder detection and prevention using intelligent techniques, ISM automation, and security in humanoids. She has delivered several academic and professional talks, has published in refereed national/international conferences, and has authored journal articles of high impact. She also regularly contributes to policy advisory and reaches wider audiences through her articles on current hot topics. She is also keen on bridging the gap between business needs and cybersecurity education and cyber-crime prevention. She has organised many collaborative events that included participation of university academics and students, professional bodies, industry, and government agencies.

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future Challenges Vinden Wylde, Edmond Prakash, Chaminda Hewage, and Jon Platts

1 Introduction This chapter (see Fig. 1) brings attention to the SARS-CoV-2 (Covid-19) pandemic and the lessons learned from the urgency in deploying technological solutions with tracking and tracing apps via the cloud, 5G networks, Big Data (BD), Internet of Things (IoT), Smart phones, Sensors, and other devices. Here the authors emphasise the analysis of partnerships undertaken between technology and policy to enable consistency, proportionality, and transparency for relevant service delivery. Current models of Data Provenance are stated to not be in full General Data Protection Regulations 2016/679 (GDPR) compliance, hence setting out changes in models to include high-level granularity oversight (i.e., meta data) in undertaking intercontroller audits and the highlighting the lack of appropriate safeguarding to the Data Controller in legally transferring data. Moreover, during the pandemic NHSX paid Serco £400 million and recruited additional 21 private subcontractors of which one in particular, Intellin, required all employees to use geo-tracking (mitigate working abroad/GDPR violations). Of the BD four V’s, Veracity is a dimension that concerns data quality, accuracy, and plays a core role in attaining user trust. In providing high quality data, means that better decisions can be made with resource and deployment strategies to a given problem [1, 2]. Blockchain (BC) and Smart Contracts (SC) are a proposed method of executing and demonstrating GDPR transparency across IoT and 5G networks, smart devices, and the cloud [3, 4], however, at a higher level, the authors research V. Wylde () · C. Hewage · J. Platts Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, UK e-mail: [email protected]; [email protected]; [email protected] E. Prakash Research Centre for Creative Technologies (R&I), University for the Creative Arts, Farnham, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Hewage et al. (eds.), Data Protection in a Post-Pandemic Society, https://doi.org/10.1007/978-3-031-34006-2_1

1

2

V. Wylde et al.

Fig. 1 Research and development

GDPR, Digital Services Act (DSA), and Online Safety Bill (OBS) regulatory frameworks to propose a User Interface (UI) to aid in the auditing of organisations, institutions, and companies. Of the lessons learned from Covid-19, the unprecedented global public health emergency coupled with mandatory governmental requirement for the public to stay at home, put significant users’ information bias and data privacy at a heightened risk of violation as a consequence. In [5] for example, the authors consider integrity violations of social media posts, individuals, groups, and advertisers of social networks, that potentially were used as a vehicle in violating policies such as present and future exploitation of children (i.e., grooming). The UK’s draft OSB (set to proceed in spring 2023) was drafted in response to these factors on the back of the European Union (EU), European Commission (EC), and European Parliament (EP) DSA (comes into force from January 1st 2024). The DSA was delivered on the 16th of December 2020, in line with the European Digital Strategy of “Shaping Europe’s Digital Future” [6], and was published as two main legislative proposals in the form of the DSA and Digital Markets Act (DMA). As messaging services head toward end-to-end encryption, the responsibilities on Big Tech (BT) companies, the balance between privacy and encryption of private posts, are briefly discussed. In addition, [7] states how everyone is talking about the Metaverse, and asks the question “.. . .what about the monitorless 3D internet in practice?”. According to an online study of 151 German managers conducted in spring 2022 from marketing and marketing-related occupations, less than 9% of the respondents were familiar with the Metaverse term, yet 78% of respondents had heard of the term in passing. However, from the paradigm shift created in the wake of the Covid-19 pandemic

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

3

and how we consequently conduct social interactions, the Metaverse is becoming a necessity for the not too distant future, and a major next step in the development of the internet [8]. Metaverse activities and interactions include professional, financial, commercial, leisure, gaming, and health interventions such as surgery [9]. For doctor consultations for example, interactions can happen via the Metaverse from the other side of the world via Telemedicine, which is valuable in areas of medical professional shortages [10]. The Metaverse can help train health professionals, and assist disease sufferers (i.e., agoraphobia) overcoming dramatic experiences [9]. Health professionals such as psychiatrists and psychologists already utilise VR to treat aversion therapy, which helps a patients interactions in situations of anxiety [10]. An article by [11] pertains to the DSA for BT platforms (i.e., Instagram and Facebook [Meta], and Youtube [Google]) in the assessment and management of systemic risk regarding their services. This includes risks such as the spread of misinformation and advocacy of hatred, and means that in a “watershed moment” for internet regulation, that BT companies will have to present annual independently verified audits [12], give platform access to civil society, regulator and third-party researchers [13], and present insights into algorithmic “black-box” accountability, thus enabling greater oversight and scrutiny. Chapter Structure • Objective of this paper: This chapter builds upon the contributions of [1–4, 14– 19], focusing specifically on the granular layered developments of, and analysis relating to, the conceptual framework “EU/EC GDPR Audit Mechanism: VDaaS” [17]. This includes developments on data privacy and cybersecurity components to operate securely and safely in the Metaverse. Here, additional architectural mechanisms are developed to augment effective and proportionate decision making for both Cloud Service Providers (CSP) and consumers from within emerging digital ecosystems (i.e., Internet, Metaverse, and Digital Economy). This is undertaken by the analysis of partnerships and governance’s between technology and policy, with insights from the interactions of principles, rights, and freedoms within society. • Problem Description: As identified from within the literature, this chapter aims at addressing and highlighting subject areas such as cybersecurity, data protection, and Metaverse challenges whilst analysing lessons from the unprecedented global Covid-19 pandemic and the swift deployment of emerging technologies. Currently, there is a lack of societal legal awareness, adequate user controls, ethical safeguarding, and robust trustworthy notification validation mechanisms, to help empower both the Data Controller and Data Subject in the safe and secure use of Information Communication Technologies (ICT). • Chapter Contributions: This chapter aims at contributions to include: current Public Health awareness and applications, social media platform precautionary control measures and obligations, cybersecurity and data privacy implications, interoperability of digital assets, and the complexities of current and future reg-

4

V. Wylde et al.

ulatory framework governance’s. We systematically select and survey secondary resources of literature and related works from IEEE Explore, ACM Digital Libraries, MDPI, BMJ, SAGE, SSRN, arXiv, Springer, and grey literature to include commercial, institutional, and governmental guidance documents and regulatory frameworks. Section 1 Introduces an overall data collection and sharing architecture that is deployed between partnerships in-line with present legal instruments and technical requirements. This brings into focus the culture of data sharing practices between institutions, sub-contractors, BT companies, social media platforms, groups, and individuals, concerning data quality in effective decision making for the attainment of user trust in service delivery. As Covid-19 restrictions meant escalating interactions with the Metaverse, a heightened risk of information bias, and data privacy violation, future instruments such as the EU’s new DSA and UK’s OSB, will apply pressure to BT companies in presenting annually verified audits regarding their data protection practices. Blockchain and Smart Contracts are proposed as a mechanism to validate compliance for present and future legal instruments with requirements to bring about aspects such as algorithmic accountability and harmful online behavioural justice. Section 2 Sets out an overview of the evolving nature, conception, development, popularity, and utility of online platforms in the Metaverse through immersive Virtual Reality (VR) experiences. A brief snapshot is presented to highlight various instances of the types of data interactions and transactions that take place between the virtual and physical world. These instances present significant challenges for present and future technologies, legal frameworks, and overall societal cohesion concerning product consumption. Section 3 Presents a series of societal Metaverse-Information Communication Technology (ICT) challenges that include the interoperability and ownership of digital assets, with accountability thus promoting transparency. Cybersecurity and Data Privacy instances that pertain to the legal standing of Metaverse avatars, and the ramifications of intermediaries and publishers in the proliferation of hatespeech, misinformation, and disinformation. Next, the authors highlight the health ramifications of internet users and what constitutes committing breaches civil or criminal law in terms of abusive behaviours and harms towards women and children in the Metaverse. The authors bring attention to current and future iterations of legal frameworks that will contribute to augmenting principles such as consistency, legal clarity, and responsibilities as a consequence of the Covid-19 pandemic and in the not too distant future. Section 4 Here, the authors emphasise decentralised protections such as attaining user age verification, location of a user or avatar, and the provision of appropriate notification services when transferring data internationally. This includes highlighting that development is needed for NFTs in the transference of digital assets in attaining relationship/user trust. The authors promote and analyse regulation to suggest product safety via granular options on products from inception, then

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

5

highlight Decentralised Autonomous Organisation (DAO) controls that could be built upon for legal certainty. Should an avatar have a legal standing in society? This means that policymakers should take the initiative in ensuring online intermediaries consider cybersecurity platform communication impacts of VR. Moreover, devices are encouraged to move towards GDPR, DSA, and OSB compliance’s whilst striking a balance between Metaverse user experience and mitigating data misuse. Section 5 The authors look at mitigation of data leakage and identity theft, and in maintaining effective security controls with VR and AR devices at product inception. This includes the means of data flow controls and interoperability for legal certainty as GDPR may be inadequate for challenges such as with AI and unconscious behaviours. The authors analyse the DSA and shed light upon the future legal standing of VR avatars, crypto-digital assets, to include cyber-enforcement, health outcomes, cyber-educational attainment, and pre-existing directives that may be too ambiguous for rising to meet modern-day challenges. Finally, the authors survey GDPR, DSA, and OSB to draw inferences from the literature with regard to establishing common rules, complimentary legislatures, verification, and establish the lack of duty of care for BT platform thus far. Imposing a duty of disclosure for example, could be promoted and implemented in a homogeneous method across all networks in a local, regional, and international way of encouraging data sovereignty, trust, and corporate, social, and business responsibility. Section 6 Demonstrates the need, development, and operation of VDaaS (Vinden Data as a Service) architecture (see Fig. 3), a cloud-based UI/software tool to manage and analyse data from the evidence presented. Risk factors are identified including emphasis on the storage of data standpoint, AI governance that give bias outcomes, and the distinguishing between harmful and unlawful content. Metrics are proposed to update the notification, recording, and consent attributes of the VDaaS UI. If made more effective and robust, these systems are promoted to augment relationships such as between NHSX and Intellin with consensus through a decentralised BC network. However, the open standards nature of the Metaverse require more investigation for cross-chain and NFT interoperability challenges. A current lack of skill and personnel at this time are compounding the implementation of these solutions and advances.

2 Previous Work This year, Tech giants are investing heavily in the Metaverse, Meta (Messenger, Instagram, Facebook Group) for example, has committed to the investment of $10 billion on the technology at its Facebook Reality Labs, a division responsible for the software, hardware, and content aspects (i.e., Oculus Quest headsets, and virtual/mixed reality experiences). To understand the opportunities created by these new technologies, it is important to identify key aspects, and to remember that although the idea is over 30 years old, the Metaverse is in its early stages of

6

V. Wylde et al.

development, although entertainment companies, real estate, and retail are already capitalising in the Metaverse domain [20, 21]. As a successor to the mobile internet, the Metaverse concept (originally from the 1992 science fiction novel Crash by Neal Stephenson [22]) typically known as an embodied internet version (see Fig. 2), has recently gained immense popularity [23]. As we navigate the internet traditionally with a mouse cursor and keyboard, the Metaverse includes utilising novel technologies and solutions facilitated by Augmented Reality (AR), Virtual Reality (VR), and Extended Reality (XR) that are envisioned and powered by Artificial Intelligence (AI), BC Technologies, and 5G peer-to-peer interactions. Online platforms to include Sandbox [23] and Decentraland [24] show the potential for decentralised tools first deployed in the Metaverse. With these new service provisions that are supported via decentralised ecosystems, they create indistinctions between where physical and virtual worlds begin and end. In realising the Metaverse, investment from tech giants such as Facebook (re-branded as “Meta” on the 28th of October 2021), Niantic [25], and Microsoft (with Mesh [26]) demonstrate commitments and signal developments towards the Metaverse vision [27]. For example, in the evolution of the online virtual games world, AR, VR, and haptic technologies (i.e., XR haptic glove [sensory: touch]) give virtual and physical experiences in a “lite-type” Metaverse from Massive Multiplayer Online (MMO) platforms such as Second Life [28], Minecraft [29], of which Fortnight [30] and Roblox [31] had recently facilitated online concerts that generated millions of views. As AR, VR, XR, and haptic technologies enable users to experience the Metaverse, this opens up new opportunities in physical services such as remote

Fig. 2 Blockchain council 2022 [35]

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

7

surgery, facilitated by VR Modelling Language (VRML), which control the physics, animation, properties, and rendering of virtual assets/objects for users to experience smooth Metaverse transitions [32]. Moreover, AI is leveraged to support the efficient rendering of assets/objects, User Generated Content (UGC), and chatbots. An example is presented by EpicGames with the MetaHuman Project that quickly generates digital life-like characters that are deployed by Virtual Service Providers (VSPs), and utilised as conversational assistants in the Metaverse [33]. At the haptic/sensory level, XR devices provide a more immersive, realistic, and therefore better experience of the Metaverse, as these devices record large amounts of biometric and spatial data such as bystander’ physical space and surroundings [25, 34].

3 Research Challenges Cybersecurity Due to the circulation and immense amounts of data in the Metaverse, this constitutes ever-increasing risks for users in how this data can be utilised. Concerns regarding the connections the Metaverse and dark web are also prominent, thereby facilitating the need to limit and prevent online illegal activities via calls for an online criminal justice system [36]. Additionally, researchers are also questioning how offline criminality is treated and compared to virtual considerations [37]. However, current cyber-challenges to include hacking, malware attacks, and phishing, mean that these challenges will increasingly continue and extend to Metaverse devices and avatars [38]. Therefore, the protection and integrity of avatars are of main interest regarding any new forms of criminality to include malicious Smart Contracts (SCs) [39], illicit use of crypto-assets and currencies [40], and the creation and selling of fake Non-Fungible Tokens (NFTs) [41]. Moreover, the multi-layered structures of the Metaverse will present challenges concerning virtual crimes (i.e., sex offenders, terrorist groups, organised criminals, and hackers) meaning that these layers will augment the ability for criminals to hide via untraceable NFTs and encryption, therefore proving extremely difficult to identify and pursue legally [36]. Consequently, security considerations and solutions need to be included in the ongoing development and operation of the Metaverse [42].

Avatar Integrity An interoperability issue for an avatar means that there are risks of misuse, duplication, and identity theft. Although BC can help (i.e., decentralised) [42], this cannot bridge the gap between social engineering and the targeting of online

8

V. Wylde et al.

human behaviours [43]. Moreover, in implementing a decentralised identification network for example, this may create excessive data that will increase the risk further of making avatar accounts more vulnerable to cyber-attacks [44]. In addition, can punishment and misbehaviour regulations apply in the Metaverse? Horizons by Meta encounter some of these issues in the misconduct of virtual avatars that utilise channels to sexually harass other avatars [45].

Device Security Research has shown that the characteristics of Metaverse enabled devices present serious sensitive data breaches, to include facial movement and voice control, all of which can be easily reproduced [46]. Via the utility of VR for example, consciousness and emotions [47] give hackers access to manipulate a potential victims body and psyche [48], also hackers can control what the victim is hearing and seeing, thereby creating serious security breaches (i.e., see inside their office or bedroom).

Data Privacy As an inevitable impact on our lives and society, the Metaverse is changing the way we trade in online assets via online games, as users create avatars an trade virtual assets and accessories. However, the technologies utilised in creating and maintaining the Metaverse present privacy, and governance concerns, with sensing devices for example, in providing a more immersive and realistic quality of experience (QoE), can be a threat to security, privacy, and user safety [8, 24, 25, 34]. Privacy regulations and practices should be transparent to all users of the Metaverse, however as more users venture into the Metaverse, the attack surface presented and potential data leakage present additional challenges. For example, biometric information to include gait, gaze, and heart rate, present psychological aspects of a given user [28], and as the Metaverse reflects our society [23], governance of user behaviour [29] highlight regulatory compatibility challenges. As with the internet/Metaverse QoE, the data ascertained from user data can be utilised to better enhance user QoE, the introduction of stringent privacy law to include the GDPR, mean that the Metaverse will need to evolve alongside preserving user privacy. For example, there are challenges for XR platform (i.e., systems frameworks and device manufacturers) for effective privacy protections that can conform to the Metaverse consistently across all platforms and entities.

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

9

Data Collection For the collection of user data, it is in the best interest of Big Tech (BT) to promote and encourage that users spend more online time. However, sensitive data access to include emotional reactions, could mean that ways of profiling will cause harm (especially for vulnerable groups) [49] such as voter manipulation [50] and a user possibly losing control over their life decisions [51]. Similarly, increasing state surveillance practices also gives governments access to this shared Metaverse data [49]. Moreover, in the Metaverse workplace, employees participate in meta-enabled VR environments and simulations, thus generating employee physiological data. This could lead to challenges such as intrusive employer surveillance, perceptual experiences replace reflexive decision-making, therefore leading to biased automated outcomes (i.e., training, performance evaluation [52], and inequities in hiring [51]) [53].

User Consent Whilst operating in the Metaverse, the issue of ascertaining user consent and the provision for displaying privacy notices becomes challenging. For example, does this apply to each user/entity separately, to a particular Metaverse, or the Metaverse in it’s entirety? [9]. GDPR mandates that for each purpose, user consent must be given [54], and unambiguous (GDPR Articles 4(11) and 6(1)) [55], which may not apply so much when attending and online auction or concert, in addition, a users’ data will be ascertained in a much wider fashion in the Metaverse experience [56]. Unfortunately, there are claims that collection of consent will be almost impossible, continuous, and involuntary [49].

Direct Marketing In utilising geolocalisation and emotional responses via the Metaverse, direct marketing issues arise meaning that users will be offered products and services based on user reactions and behaviours. In addition, the question of the selling (sharing) of data via third parties require active and freely given consent under GDPR [54], and maintaining these requirements to include with the increasing use of Metaverse subliminal advertisements [57]. Researchers have also found that at a granular level that companies can attain eye-tracking data for targeted advertising, meaning that special attention is required in the protection of vulnerable groups (i.e., children/effective age verification) in mitigating the risk of providing personal data unknowingly [9].

10

V. Wylde et al.

Data Intermediaries In the collection of user data, data intermediaries serve as the links that bridge the connection between entities and people [58]. However, for making decisions of users’ data permissions, AI-enabled data agents can also be utilised and therefore need special attention [59]. Moreover, the more human-centric approach has been opted for by the EU in the draft AI Act to help mitigate such unwelcome developments [60]. Additionally, in the draft Data Governance Act, the EU has established a consent control and data sharing management framework for people utilising data intermediary services to include personal data spaces/data wallets [61].

Health If utilised excessively, the Metaverse can contribute to mental health disorders in the form of loneliness and a reduction in physical activity that may ultimately compound other physical health conditions and challenges such as obesity, and as a consequence, users may feel the desire to escape further from the real world. As a form of escapism, addictions to online gaming and social media are already established, however the Metaverse reinforces these types of addictions and may also cause eye, head, and neck fatigue, dizziness, nausea, and motion sickness [62]. Moreover, the distractions caused whilst using the Multiverse can instigate harmful accidents to the user or things (i.e., furniture) in their local vicinity [63].

Content Moderation In terms of content moderation challenges, the use of VR and AR in a virtual space encompass issues such as hate speech, verbal assault, defamatory content, misinformation, and pornographic content (i.e., modelled on avatars). When users operate in the Metaverse via their avatars, situations can arise that could potentially constitute breaking civil and criminal law. For example, on the social media platform Meta (i.e., Facebook), cases of sexual harassment against women have already been documented [64], even though the incidents take place in VR, feelings of a violating and real nature are often felt by the person. Research has also pointed out that Metaverse platforms are a breeding ground for disinformation [65] and serve as a conduit for the promotion and expansion of extremist ideologies [66]. In relation to current EU content moderation amendments, there are questions surrounding if its impact is applicable in addressing harmful and illegal Metaverse content challenges. For example, there are arguments [67] stating that the DSA will “likely” apply to businesses and developers that operate [68], and that the draft AI act [69] “may” regulate exploitative, manipulative, and subliminal techniques, including the use of biometrics [70]. As the topic of VR is not explicitly motioned in the draft AI Act, DSA Act, or in the EU’s emerging technologies framework

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

11

(forthcoming framework), further amendments to EU law in keeping online users safe are still needed [71]. For companies such as Facebook [72], developing a self-regulation approach to moderating Metaverse content has been viewed with skepticism [73].

Children The Center for Countering Digital Hate reports that children are exposed through some social apps to pornographic content, racism, bulling, harassment, and abuse. This is highly problematic as a child’s sense of responsibility and reality are less developed [74], as found in a global survey that highlighted 34% of the respondents had been asked to undertake something online that was sexually explicit and uncomfortable as a child. Unfortunately, the same survey indicates that the age of which these types of occurrences happen is getting lower [75]. To utilise the Metaverse alongside the real world, solutions need to be found in preserving traditional relationships such as social, child-to-child, caregiver-to-child, and teacher-to-child [76]. Therefore, it is especially important to more effectively identify user ages, and for parents in the responsible monitoring of their child’s Metaverse activities, indeed not an easy task in VR [77]. The DSA proposal also protects minors and public health in the virtual world [78].

User Education As a learning motivator, the Metaverse can give children immersive experiences in understanding the real world around them and how it operates, which includes hands-on experiences and exploring history and/or places they may not have been able to before [79]. However, as these interactions facilitated via a user Metaverse avatar, although pseudorealistic, should not replace actual human interaction [76], and if the Metaverse remains unregulated, this could potentially contribute in significant childhood harms [80].

Economy In the Metaverse (not to be confused with Web3) virtual environment, there are wide-ranging ways to undertake harmful and illegal practices and behaviours. The nature of the Metaverse present many new challenges in relation to protecting Intellectual Property (IP) rights, misleading advertising practices, combating illegal and harmful practices, and addressing liabilities. In the Metaverse space, the boundaries are still relatively unknown, therefore a main concern of which is to address, prevent, and to control this phenomena [67].

12

V. Wylde et al.

Ownership For the tracking, sale, validation, and proof of ownership with digital goods in the Metaverse, Crytocurrencies (i.e., Etherium and Bitcoin) and NFTs are expected in commercial transactions [77]. An example is demonstrated with NFTs that provide uniqueness and authenticity for virtual goods ownership, which also facilitates a peer-to-peer decentralised trading environment. Proof of ownership is reflected via NFTs (BC-enabled cryptographic assets) [81] as a digital item is purchased in the Metaverse (i.e., virtual decoration, avatar clothing, or just an avatar); the purchase is recorded on a BC (i.e., an immutable, decentralised digital platform for the secure storage and recording of information and transactions) where the transaction/information cannot be altered or deleted. When a purchase is made in the real-world (i.e., physical item), the NFT is linked with the customer and item. Major global brands have started to create new business models [82] for customers to buy physical items via this method. In the Metaverse, there are ongoing legal and regulatory developments and disputes regarding the extent of NFT and ownership rights. Currently, ownership of Metaverse assets are governed by contractual law as opposed to property law [83], therefore many legal issues are presented [84] in terms of verification of ownership, which could give private Metaverse platforms contractual control advantages for key aspects of digital assets. Due to no clear regulation frameworks for NFT ownership, the creating of NFTs to sell without an owners’ knowledge or permission, means that there are various ways to misuse NFTs to include unlawful access to digital wallets (storing NFTs/Crypto-assets), hacking, malware, and fraudulent scams [85]. In addition, other legal issues concern money laundering and gambling when digital currency is virtually exchanged between avatars [86].

Advertising In an immersive technological context, advertising strategies [87] are everincreasing, however [88] highlights the risks associated with the psychological affects of consumer manipulation, and that the impact on the consumer from advertising practices from the Metaverse are not clear. In the Metaverse environment, popular brands are facing challenges of unauthorised registered trademarks utility [89], also experts state that enforcement of IP present significant challenges [90].

Portability and Interoperability Currently, there are no over-arching mechanisms to facilitate Metaverse environment interoperability or portability, therefore for proprietary digital assets, each environment/platform has to link its own NFTs accordingly [83]. This means

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

13

that there are challenges for a customers’ ability in terms of transferring digital assets (i.e., avatars and property) from one virtual world to another [83]. As Metaverse content is replicated and distributed across BC-based platforms and decentralised networks, there are issues regarding jurisdiction and law, thereby identifying criminals and their activities also present major challenges.

Transparency and Accountability: AI At the Metaverse core there are Deep-Learning (DL) and machine-learning (ML) AI algorithms and architectures [91], which in turn could be utilised increasingly in the monitoring and tracking of users/customers. These technologies further augment a companies’ active role and abilities (in real-time) [92] in the Metaverse that can further reinforce these negative impacts. Due to a new phase of the “digital revolution”, there may be inherent biases (i.e., AI and ML outcomes) that lead to automated decision-making unfairness, for example, in [93] algorithm ubiquity is increasingly causing concern due to the possible reputational and financial damage, out of which a new industry in the form of auditing and assurance of algorithms (to validate associated AI and ML algorithms) is growing.

Laws and Regulation Landscape EU/UK General Data Protection Regulations Although many people will partake and participate in using avatars inside the Metaverse via equipment such as VR headsets etc., for an immersive experience, large collections of sensitive personal user data (i.e., biometric, emotional and physiological data) is gathered, thus requiring the GDPR [54] to ensure explicit user consent and special attention to determine which purpose and data are appropriate [9].

Confidentiality With the immersive experience of the Metaverse, the integration of access points for content of services means that a user’s capacity to withhold personal data collection becomes significantly diminished. This brings into focus the question of confidentiality or personal correspondence in the Metaverse, user protection from state and commercial interests, and redefinition of a private virtual space [53].

14

V. Wylde et al.

Responsibility and Liability Due to the vast number of entities operating in the Metaverse, the web of relationships it creates will make it extremely difficult to apportion liabilities and responsibilities [94]. For example, as a consequence of the highly intermingled Metaverse web, distinguishing between data controller and data processor could be a significant challenge (i.e., who does what on behalf of whom?) [56].

EU Digital Services Act In regard to platform governance with a focus on digital economic harms mitigation, the EC has recently enacted many important regulatory proposals for the protection of markets and democracies alike [95]. The EC’s goals for cross-sectoral regulation involve the creation of many procedural and enforcement institutions [96]. For example, the DSA [97], political advertising and disinformation related package [98], and the AI Act [99] to name but a few.

Consistency For consistency with other EU policies and frameworks (as explained in the DSA Explanatory Memorandum [100]), with the revision of Directive 2000/31/EC (eCommerce), the DSA proposals update pre-existing rules in the provision of digital services for platform responsibilities. This means that with the previous EC’s Inception Impact Assessment, that there is alignment with the “Better Regulation” focus back in 2015 [101], which concentrates more on consultation and transparency with law-making whilst utilising increasing evidence-based approaches to understand its legislative impact [102].

Systematisation With regard to the systematisation concept of EU law, issues concerning coherence and European harmonisation policies and rules are not new [103]. For example as a result of subsidiarity, conferral, and proportionality principles of which the EU’s competencies are based (Article 5 TEC [104]) [105], EU action has been mostly sectoral, thus primarily focusing on partial legislative harmonisation (i.e., directives and regulations) measures, and reactive to nuanced market developments (i.e., technology).

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

15

Consumers Vs Traders In the DSA Explanatory memorandum, the justification for cross-sectoral scope shows that there are limitations, for example procedurally insufficient incorporated procedures for managing illegal content; and substantively: too focused on narrow issues (i.e., illegal hate speech, copyright, and child abuse material) with specific platforms (i.e., sharing platforms [audiovisual]) [100]). Due to the lack of coherence in EU law, the practical implications of DSA rules can be shown in Article 22 regarding the traceability of traders. Section 3 of DSA does not apply to small or micro enterprises, however Article 22 concerns the imposing of obligations concerning intermediating platforms between consumers and traders in obtaining identifying trader information [106].

Transparency Although pursuing transparency measures of service providers is not a new concept, Articles 5 and 6 of the e-Commerce Directive had already established service provider obligations in the disclosure of information regarding themselves to public authorities and consumers.

UK Online Safety Bill As the internet becomes increasingly integrated into communal and individual lives: more than 90% of UK citizens are online [107], with ongoing social challenges such as the harassment of minority groups and women, terrorist propaganda, and child abuse, have transformed the internet and deemed it as an unsafe place [108]. For example, in the UK, this includes 153,383 cases of Child Sexual Abuse Material (CSAM) in 2020 [109], 21% of UK women experienced online harassment (i.e., misogynistic abuse) [110], and around two in every three UK citizens are concerned of fake news proliferation [111].

Big Tech Companies At the time of writing, the UK’s OSB is at the reporting stage in parliament, and according to [112], OSB could have significant outcomes globally in relation to internet regulation. A joint committee spent six months analysing and scrutinising the bill that protects online users from potentially harmful content or risk large fines from Ofcom (The Office of Communications: government-approved regulatory and competition authority for the broadcasting, telecommunications and postal industries of the United Kingdom). The OSB will impose a “duty of care” to online platforms for example BT companies such as Twitter and Facebook, which could

16

V. Wylde et al.

incur fines up to around £18m or 10% of annual turnover, including having their sites blocked.

UK Two-Tier System: Liability In the UK at present, the regulation of online speech and its liabilities are fragmented and in a complex two-tier system [113]. Tier one is made up of mainstream media, anyone with a website (i.e., individual social media users and bloggers), and publishers (i.e., any organisation or individual that publishes online). Tier two concerns the sharing and dissemination of online content applicable to intermediaries such as Twitter, Instagram, Facebook, and many other social media platforms and search engines (i.e., Bing and Google) [114], further augmented by varieties of self-regulatory schemes and initiatives that are voluntary.

Publishers In tier one, liability applies at the point of publication (i.e., illegal content: defamatory, data protection breaches, copyright infringement, and criminal [including hate speech]), however utilising current criminal regimen to deal with online speech in this two-tier system causes additional fragmentation and complexity, resulting in the Crown Prosecution Service (CPS) [115] and Sentencing Council [116] to give extensive additional guidance. For example, take social media that is not used to commit a substantive offence, there is consideration to utilise communication offences contrary to section 127(1) of the Communications Act 2003 [117], and section 1(1) of the Malicious Communications Act 1988 [118]. Whereas specific to hate crimes, section 29–32 of the Crime and Disorder Act 1988 include harassment [119], public order offences [120], criminal damage [121], and religiously or racially aggravated forms of assault [122] which can be utilised.

Online Intermediaries For tier two, online intermediaries liability to include social media platforms is limited due to “safe harbour” protections offered by Articles 12–15 of Directive 2000/31/EC (e-Commerce) to protect user privacy and free speech. This means that intermediaries are not subject to a duty at inception in ensuring lawful content is indexed or hosted, meaning that content liability (i.e., criminal, defamatory, or data protection breaches) can only apply if the intermediary has been notified of facilitating illegal content (including hate speech: no pre-emptive platform obligation to block unlawful content), or fail to expeditiously remove the unlawful content [123].

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

17

Legal Clarity With the increasing amount of individuals publishing online content, also with inadequate efforts in addressing the online environment regarding the clarity of online offences, this exponentially contributes more challenges for current laws that could under-criminalise. For example, with behaviours concerning abusive stalking and bullying, many harmful and damaging online communications evade appropriate sanctioning [124]. However, by universally applying standards to include “grossly offensive” and “indecent” illegal content, does not regard harm with a particular context [125], thus potentially over-criminalising [126]. In response to this, recommendations to repeal and replace section 127(1) of the Communications Act 2003 and section 1(1) of the Malicious Communications Act 1988, have been made by the UK Law Commission to be replaced with a consolidated harm-based model [124].

Discussion and Summary Cybersecurity With the utility of Metaverse devices, avatar security is especially at an increased risk. This is due to the sheer amount and circulation of data-types (including malicious code transference’s), thus data leakage and the identification (decentralised) of users, with the production and processing of excessive data, puts avatar accounts at higher risk. This potentially includes the increased likelihood of identity theft, development of malicious SCs, cyber-attacks, the compromising of VR device security, and ultimately a hacker gaining control in the Metaverse environment (i.e., configurable personal space), thus affecting the integrity and security of avatars. However, can punishment and misbehaviour regarding the above offences be applied to the Metaverse? There are additional concerns over the dark web, hence emphasis is on the restriction of content in the meantime, with the potential commissioning of online criminal justice systems. In addition, how can offline or online activities be utilised in bringing about legal certainty whilst being considered separately? Data Privacy Technologies that create and maintain the Metaverse present regulatory, privacy and governance challenges. Government and workplace surveillance practices with bias in recruitment (inequities), and company QoE for example, may cause additional user privacy, security, ethical, and safety challenges. Introducing the GDPR means that the Metaverse will need to evolve user privacy consistence across all platforms in ascertaining user consent and displaying privacy notices challenges (user, entity, Metaverse, Metaverses), as content moderation, VR and AR virtual space, hate speech verbal assault, and avatar use, can potential be breaching criminal and civil law.

18

V. Wylde et al.

DSA proposal protects minors and public health in the virtual world, however EU content moderation amendments (DSA), mean that questions around its impact in addressing illegal and harmful Metaverse content (DSA “Likely” and “may”) could embolden businesses and developers in potential exploitative, manipulative, and subliminal advertising techniques. Transparent privacy regulations for the Metaverse is a must (governance of misbehaviour/behaviour online). Special attention is needed to protect vulnerable groups in providing personal data involuntarily (age verification) as data intermediaries and data collection regarding EU and AI act in preventing AI agent methods for a more human-centric approach. However, challenges in the spread of user information across Metaverse platforms (i.e., online concerts and auctions) may cause traceability issues causing continual and involuntary data collection practices making it difficult to attain user consent. The draft data governance act (consent control, Data Sharing Management Framework) for data intermediaries services in data wallets and personal data space could be an attractive alternative, however VR is not explicitly mentioned in the AI, DSA, or the emerging technologies framework. Further amendments are still needed to keep online users safe, as online intermediaries’ self-regulation has been met with scepticism. Thus, making sure that GDPR mandates and third parties, and GDPR mandates for each data propose, extremely challenging to fulfill. Health To augment a child’s learning (or any user), the immersive Metaverse experience gives expansive access to world history, it can take also them to places they’ve never been to, and ultimately give hands-on experience pertaining to the environment of which they will contribute to in the future. Even though these experiences are undertaken using an avatar, the pseudo-realism should not be utilised in replacing traditional relationships (i.e., teacher-to-child and child-tochild) and general human contact. However, if used excessively, the potential for significant harm from the immersive experience exposes a vulnerable user to many facets of a currently, underregulated Metaverse. This includes mental health disorders, and distractions that can both harm the user physically and compound positive human interactions that would otherwise enhance real-world participation and development. Moreover, the apps available to children, involve exposures to online abuses which can further confuse and disorient the development and sense of morality and responsibility. These exposures are well documented and increasing, meaning that controls such as robust age verification systems, and additional parental due diligence gets even more challenging consequently. Economy Currently in the Metaverse, there are many legal and regulatory disputes surrounding the extent of NFT ownership rights. This is due to a lack of clear mechanisms and legal frameworks, thus leaving them open to misuse. This includes unauthorised use of trademarks and IP challenges alongside digital assets that are mainly governed by contractual law rather as opposed to property law. As a user experiences and interacts with the Metaverse, content is replicated across many different decentralised platforms, this presents additional legal jurisdiction issues

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

19

and susceptibilities in user interoperability and portability between Metaverse environments and platforms, thereby compounding proprietary digital assets (avatars and property) value and transference challenges. Moreover, the use of this data in the decision-making practices of AI may lead to biased outcomes such as unfairness, financial and reputational damage, whilst simultaneously being utilised as a vehicle in the monitoring and tracking of customers to augment a company in its ability to advertise. Who is responsible? The Avatar, it’s User, or a Server? The effects on the consumer are unknown and demonstrate practices which highlight the psychological risks with potential consumer manipulation. However, this also gives private Metaverse businesses a strategic advantage in the accumulation of digital assets during the legal and regulatory uncertainty. It could be said as a consequence of this, major global brands are creating business models around NFTs and BC. EU/UK General Data Protection Regulations Large collections of sensitive personal data are gathered from VR/XR equipment in accessing and utilising the Metaverse, meaning that attaining user consent is of main priority with regard to data capture. This remains challenging due to the amount of potential access points and commercial interests from BT and the consumer, thus protecting confidentiality presents additional issues with the user switching between platforms and entities. As the number of Metaverse entities increase, then the intertwined web-relationships create further complexities in terms of liability and responsibility, thus identifying who is the data controller or the data processor. EU Digital Services Act The creation of new procedural and enforcement institutions (i.e., DSA and DMA), means that the EC and EU are shaping the European digital future with strategies that update pre-existing rules. However, further challenges are presented with regard to the speed at which new Metaverse challenges arise, thus retrospectively applying new law could put the DSA at a disadvantage in keeping up with emerging trends and technological innovation. In addition, as coherence of EU law is harmonised, additional challenges come from a partial legislative attempt that is too focused upon a small part of service and market developments. This cross-sectoral scope therefore potentially leaves out other considerations that are of relevance to the managing of illegal content, traceability, and obligations regarding intermediaries on behalf of consumers and traders, and when it is appropriate to report information to the local authorities. UK Online Safety Bill With communal and individual internet audiences, social challenges continue to present negative outcomes and thereby paint the Metaverse as a potentially hostile environment. However, the OSB aims to protect vulnerable groups and the general public by imposing a duty of care upon BT companies with additional financial and operational sanctions from Ofcom. However, the UK twotier system demonstrates the challenges of applying liability for publication and/or data sharing proportionately. On one hand, publishers who submit illegal content can utilise a number of defences from established regimen, on the other, online

20

V. Wylde et al.

intermediaries can still utilise “safe harbour” protections (e-commerce). This causes additional duty of care and content liability challenges.

4 Future Research Directions and Potential Solutions for Cybersecurity Avatar Integrity In terms of mitigating the risks of identity theft, BC could play a critical part in identity authentication as it is more resilient than a centralised system (i.e., BC is decentralised) in resisting cyber-attack [42]. Across platforms, the use of a decentralised identification network based upon international standards to enhance account verification, and user confidence may be a way to mitigate this problem [44]. Moreover, for avoiding harassment in the Metaverse, users could choose to utilise secondary avatars (i.e., clone) in obfuscating data, thus hiding online behaviour and action to mitigate data leakage of information such as economic background, demographics, and culture [127]. This method prevents other avatars from identifying the real owner therefore cannot infer additional behaviour information. Also, users may have configurable personal space options as shown by Meta that implemented similar options in the Horizons online social platform [8].

Security Protocols To help mitigate the risk of harmful code transference in the movement of users between virtual spaces and platforms, a main technical challenge will be in building protocols [128]. Due diligence in the supply chain will be of the utmost importance in preserving platform security. Meaning that user will need to not only consider their own security measures, but the security measures in place of other Metaverse entities, thus bringing further responsibilities in allocating interoperability [9].

Cyber-Resilience The proposed Network and Information Security (NIS2) Directive could help to further increase the EU’s national security and cyber-resilience capabilities [129]. However, for addressing consumer requirements for products, the NIS2 Directive falls short. Moreover, the AR and VR devices that enable Metaverse experiences are all covered in the proposal for a regulation on general product safety, which

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

21

for product protection, requires appropriate security features at product inception [130]. Additionally, in the forthcoming Cyber-Resilience Act (CRA), consumer protections for digital products and ancillary services will be introduced under common cybersecurity rules [131]. This means the augmented provision for a broader scope of NIS2 (whilst repealing NIS [132]) which already applies to critical sectors (i.e., medium and large sized public and private enterprises), to include medical devices, social networking platforms and providers, public electronic communications and services networks, postal and courier services [133]. As with the growth of the Metaverse, investment bank Citi estimates that the Metaverse could account for 1% of the global economy to reach an estimated $8–$13 trillion by 2030 [134], therefore as new companies and businesses become active in the Metaverse, the CRA and NIS2 gives added protections for national infrastructure as cybersecurity challenges evolve.

Data Privacy Data Collection The Future of Privacy Forum states that processing private data (i.e., XR hardware) with such devices should allow granular options to include controls on data flow from sensors with visual cues indicating personal data transmission or collection [127]. However, regardless of the inclusion of virtual privacy protection tools (i.e., Horizon Worlds: privacy bubble), users may not be fully aware or are unable to utilise these options.

Metaverse: Open and Decentralised For interoperability and the universal operation of the Metaverse model, open standards [135] based on BC [136] are starting to take root [137], which are user controlled via a Decentralised Autonomous Organisation (DAO) [138]. This Metaverse model could be investigated further [139] in addressing data protection challenges that are cumbersome to rectify in a centralised business model. However, there are still data protection issues between BC and regulation, moreover to augment legal certainty, researchers recommend regulatory guidance adoption with codes of conduct and certification [140].

Data Protection Framework As the privacy and data protection frameworks apply in the Metaverse, the EP has also called on the EC to ensure compliance of the current privacy and data protection frameworks regarding entities and companies operating inside the Metaverse [141].

22

V. Wylde et al.

As a consequence, there are calls to update and revise the GDPR [142] as currently due to the Metaverse, it is not sufficient to address some complexities [143] and challenges [144] that are presented, to include interaction with AI and the regulation of unconscious behaviour [70].

Health To help mitigate adverse Metaverse health risks, the DSA also covers public health with various proposals and solutions ranging from obligating companies to warn users of possible harm, establishing help centres, and to include distress buttons [77].

Content Moderation In the immersive environment, to ensure that law enforcement authorities are better equipped in identifying and responding to illicit or dangerous content (i.e., defamatory content and non-consensual pornography), policymakers should take the initiative to ensure that online intermediaries consider their user platform communication impacts of using VR/AR and applicable liability laws [145]. Moreover, if it is necessary in granting avatars a Metaverse legal standing (i.e., make avatars responsible for their online actions), or provide identification to differentiate between an avatar, the person who operates the avatar, and if the person is a “true” legal person [143]. Furthermore, the EU recently proposed legislative acts to meet age-inappropriate and illegal online content challenges. The proposal of May 2022 (OSB), contains regulation to combat and prevent online child sexual abuse, obligates providers in the detection, reporting, blocking, and removal of material regarding child sexual abuse from their services. In addition, that providers must introduce age verification mechanisms [146]. Similarly, the EC proposed the European Digital Identity Wallet to help in the identification of age [147]. Also in May 2022, the EC adopted the Better Internet for Kids (BIK) strategy, which builds upon the previous BIK strategy from 2012 [148] in ensuring the protection of children from illegal content (i.e., age-inappropriate), and for children to be empowered and respected online in parallel with European Digital Rights and Principles (EDRP) [149, 150]. The EDRP strategy additionally proposes numerous actions for the EU, members states, and industry to include an EU code on ageappropriate design whilst taking into account EC Parliament’s Children’s Rights Resolution 2021 [9, 151].

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

23

Children The provisional political agreement states (DSA) that in the protection exposing minors to illegal online content, providers of digital services must clearly write their terms and conditions of service so that children can understand [152]. Moreover, to ensure the security and safety of children that user their services, all online platforms are to be prohibited in the presentation of targeted advertising that derive a minors personal data [152].

User Education In the fight to protect data, cyber-criminals tend to be innovative and therefore steps ahead of companies and regulators, thus education regarding preventative ways in which to protect assets and identities in the Metaverse will be increasingly important. Moreover, as mentioned in the updated digital education plan, it states that by 2025 to expect up to 70% of 16–74 years old to have ascertained basic digital skills [153].

Economy Ownership Currently, NFTs are not specifically subject to an EU regulation, however some compliance’s are undertaken via existing legislation and requirements such as the Anti Money-Laundering Directive [154] on virtual currency exchanges [155]. Although, it is the view of the ECs that unregulated crypto-assets expose investors and consumers to unnecessary risk. The EU Directive on Digital Content for example, is believed to apply to Metaverse purchased digital assets, however to assess whether consumer protection frameworks are to be revised for the new virtual environment should be of great interest to policymakers [156]. Additionally, the proposal for regulation on markets in crypto-assets could be an influential example on the economic governance of the Metaverse [157].

Advertising In addressing Metaverse implications, there is debate regarding the revising of advertising legislation [9]. For example, regarding the advertising regulatory framework in the Metaverse, experts think that it will be informed by the current rules applicable to video games [44]. In addition, in clarifying the rules from within

24

V. Wylde et al.

virtual universes, the French advertising authority recently updated their guidelines [158]. Moreover, for the Metaverse in general, current trademark law applies, as experts state [89] that in law, it would be more beneficial to utilise specific references to the Metaverse. However, in terms of regulation and transparency, there are other experts that believe that it should be designed to narrow and limit the scope of emotional-responsive advertising to include restrictions on Metaverse virtual product placement [92].

Data Portability and Interoperability With the movement of users, assets, and data, the operation of which between Metaverses brings attention to data portability and interoperability. Companies will need to initiate data sharing agreements (i.e., companies preferentially use proprietary rights before user data) in line with data protection requirements to include attaining user consent and undertaking privacy notification obligations. In a decentralised Metaverse model, this may prove challenging [56]. In addition, transferring data internationally will need clarity alongside addressing jurisdictional challenges and implications, such as the location of a user, their avatar, and/or server location [143]. Currently, it is challenging to generate virtual goods that maintain value (including NFT enabled) outside their inherent platforms to facilitate trading, whereas BC can provide a reduction in the reliance of platform centralisation, also in terms of preserving the universality and value of said virtual goods, BC can provide the key to establishing economic systems within the Metaverse. However, the conception of the Metaverse comes with escalating data privacy concerns amongst users [27]. To facilitate seamless Metaverse operation, multiple and different parties and their data will need to develop cross-chain technology to ensure secure data interoperability [27].

Transparency and Accountability: Blockchain To help prevent fraud and users’ identities in the Metaverse, decentralised technologies such as BC are suggested. One advantage of BC is that there is no interference from centralised institutions in the digital space, also due to its heterogeneous architecture, it enables interoperability of security to a reasonably high level, thus enhancing user trust in providing transparency, traceability, and security [159]. Additionally, SCs that control any prescribed transactions online, and is embedded into the BC code, can execute an agreed contract once criteria are met [160]. However, due to a lack of certainty regarding SCs [161], in 2020 the EP called upon the EC to reiterate its 2018 resolution and to propose a legal framework to address these issues [162, 163].

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

25

In response to the proliferation of national approaches and increasing threats, the EC has proposed the Digital Operation Resilience Act (DORA) and the draft regulation Markets in Crypto-Assets (MiCA) [164, 165]. However, in building EU cyber-resilience, it will be important to address the shortage of professionals in cybersecurity and by utilising the European Cyber-Security Framework (ECSF) to close the skills gap [166, 167]. As cyber-criminals are particularly attracted to the financial sector, to increase its cyber-resilience, the EP has also called upon the EC to propose legislative changes regarding cybersecurity and ICT requirements [168].

Laws and Regulation Landscape EU/UK General Data Protection Regulations Meanwhile, without too much compromising of the Metaverse experience, researchers are creating solutions that help to ensure that the privacy of devices are GDPR compliant and safe [169]. For example, [22] discusses the future of IoT, 5G cellular networks in their transition from 5-6G. More specifically, that decentralised AI can provide better models and solutions for the service of 5/6G (i.e., IoT communications), without compromising privacy through an untrustworthy network, by keeping the data within the IoT device. In addition, ML can provide solutions towards the sharing of private information with third parties if more accurate and up-to-date models are created with larger data quantities.

Confidentiality In the future, as the Metaverse presents more access points to content, providing an opt-out would not be practical, hence finding regulatory solutions for both industry and government are needed [92].

Responsibility and Liability In addition, on-going Metaverse challenges regarding the storage, handling, and safeguarding of data and responsibility for data misuse and theft also need to be addressed [142].

EU Digital Services Act Both the DSA and DMA are to promote fundamental rights in digital services; and (2) to promote technological innovation in the European Single Market and beyond [100], via common rules establishment for digital service providers.

26

V. Wylde et al.

Consistency The horizontal approach of the DSA is to compliment and therefore leave unaffected, a series of existing legislative instruments [100] to include Directive (EU) 2019/2161 (Omnibus Directive) [170] and Directive (EU) 2018/1808 [171]. However, with the DSA many aspects remain unclear as to how to demonstrate digital enforcement in practice [95].

Systematisation Although, as the DSA Explanatory Memorandum states and acknowledges that in its past efforts from a lack of systematisation, current endeavours do not fundamentally improve the situation. For example, instruments include Regulation (EU) No 524/2013 (Online Dispute Resolution (ODR)) [172], Directive (EU) 2005/29/EC (Unfair Commercial Practices Directive (UCPD) [173], and Directive 93/13/EEC (Unfair Contracts Terms Directive (UCTD) [174], of which all remain applicable to unlawful content, and are equal cross-sectoral instruments in the context of consumer protection [95].

Consumer Vs Trader Additionally, under the procedural frameworks, platforms are to make “reasonable efforts” in verifying the information reliability provided. Generally, requirements on platforms mean that they do not have to disclose this information to their users (i.e., consumers and traders), however in Articles 22(5) and 9 stipulate that the obtained information must be made available to national authorities [96].

Transparency The Consumer Rights Directive (CRD) also requires that online traders disclose information prior to any contract conclusion. In addition, in the CRD Article 6a was introduced by the Omnibus Directive especially to impose duties of platforms with regard to consumers that initiate/engage with third-parties whilst utilising platform architecture [96].

UK Online Safety Bill Already seen as a right step in the direction of protecting vulnerable groups, the UK OSB builds upon the DSA whilst customising for better intermediary service and enforcement delivery. However, there are issues pertaining to the UK’s inherent legal system and compatibility.

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

27

Big Tech Companies An article by Coe [175] states that in the development of an online harms regimen in the UK and elsewhere, that in thinking of only placing responsibility on platforms to mitigate hate speech which may benefit societies, however in regard to free speech, these laws could have insidious implications.

UK Two-Tier System: Liability Whilst this system appears fragmented and complex, these two tiers could potentially be harmonised for the effective delivery of certainty and to set a robust and decisive UK framework in the future.

Publishers Under tier one, if a social media communication action is deemed to be a substantive offence/an offence committed under section 127 of the Communications Act 2003, or an offence committed under section 1 of the Malicious Communications Act 1988, then the hostility towards individuals and groups (i.e., disability, transgender identities, sexual orientation, religion, or race) under section 66(2) of the Sentencing Code [176], require that in determining a sentence, judges and magistrates must view the hostility of hate crimes characteristics as an aggravating factor [177].

Online Intermediaries Moreover, to be able to detect unlawful content beforehand, which could be useful, Article 15 forbids courts from making intermediaries undertake general monitoring of potentially unlawful content [178].

Legal Clarity Moreover, depending upon the outcome of the Law Commissions recommendations, at this time very little has changed with regard to the plethora of criminal offences and the communication revolution. This is perhaps in part to do with section 127(1) of the Communications Act 2003 being based upon section 10(2)a of the Post Office (Amendment) Act 1935, therefore criminal actions and laws predating the mobile telephone and internet are inconsistent in their modern-day application [124].

28

V. Wylde et al.

5 Discussion Cybersecurity In mitigating the likelihood of identity theft, BC is a more resilient technology that can play a role in identification and authentication, as opposed to the centralised approach, and is less susceptible to cyber-attack. For cross-platform utility, and to enhance account verification and user confidence, a decentralised Identification network based upon international standards may be a good mitigation strategy. Unlike a centralised system with digital identity stored in a database (i.e., users have no controls), Self Sovereign Identity (SSI) concepts are emerging as increasingly popular as a reliable and secure identity solution, meaning that SSI users can take control of identity details (digital wallet) and consent in a decentralised way (with additional SSI principles) [179, 180]. For example, in land registry systems, challenges include the inadequate coordinated exchange of information across government departments, thus enabling unscrupulous officials in the potential modification of records [179]. Here, a BCbased SSI can be utilised to mitigate fraud by verifying all participants in a transaction [181]. In addition, utilising SSI for identity verification with event ticketing systems is providing an effective ticket-to-visitor bind, thus enabling secondary market control [182]. Alongside due diligence in the information supply chain, a user should consider their own and other Metaverse operators’ security measures, thus highlighting the interoperability challenge in allocating responsibilities. The proposed NIS2 directive could help increase EU national security ad cyber-resilience agendas, however for addressing consumers, the directive remains insufficient. In additional AR and VR devices that facilitate Metaverse user experience are all covered by a proposal for regulation on general product safety; for product protection, it requires additional security measures at product inception. However, in the forthcoming CRA, consumer protections for digital products and ancillary devices will be addressed. Avoiding data leakage (i.e., culture, economic background, and demographics) and harassment for example, secondary avatars can be deployed (clone) to obfuscate data and to mask online behaviours and actions. This prevents other avatars from ascertaining the real owner and cannot make behavioural inferences. As shown in Horizons by Meta, configurable personal space options may be useful for mitigating data leakage. Data Privacy The Future of Privacy Forum state that processing private data (i.e., XR hardware) with devices such as XR should allow for granular options that controls data flow from sensors to visual notification of data transmission or collection. However, inclusion of virtual privacy protection tools (i.e., Horizon Worlds: Privacy Bubble) could mean that users may not be aware or able to utilise these options. Fortunately, interoperability and universal operation of current Metaverse model open standards are taking route with BC, controlled by the DAO. This model could be investigated further in addressing data protection challenges that are

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

29

cumbersome to rectify in a centralised business model. Additionally, there are still data protection issues between BC and regulation; to augment legal certainty, researchers recommend regulatory guidance adoption with codes of conduct and certification. As data protection and privacy protections frameworks apply in the Metaverse, the EP have called upon the EC to ensure compliance of current privacy and protection frameworks regarding entities and companies operating in the Metaverse. However, calls to revise GDPR and its operation in the Metaverse, currently are not sufficient in dealing with the evolving complexities and challenges to include AI (i.e., ChatGPT, ChatSonic, Jasper AI, and Bard AI) for example in the regulation of unconscious behaviour. Health For mitigating Metaverse health risks, the DSA encompasses public health measures that obligate companies to provide health centres, distress, and health warning mechanisms. In addition, it further empowers authorities and law enforcement in more effective identification and response regarding online platforms, which further encourages policymakers to ensure that intermediaries are considerate towards their actions (i.e., communications), product impact, and liabilities (i.e., AR/VR). Moreover, the question of granting an avatar legal standing for differentiating between a legal person, AI, and cyber-criminals need to be addressed in protecting health, identities, digital assets, and data protection. Preventative educational methods are available; however, this will have to significantly increase (i.e., with the Digital Education Plan) for the necessary and forthcoming legislation to be fully utilised. For example, the EC recently proposed legislative acts for numerous actions for EU member states and industry to meet age-inappropriate and illegal content challenges. This includes age identification provisions (i.e., the European Digital Identity Wallet), age-appropriate product design (i.e., Children’s Rights Resolution 2021), and granting additional online child protections (i.e., BIK). Further measures are also proposed to obligate providers in the detection, reporting, blocking, and removal of harmful material, making sure that providers must introduce ageverification mechanisms in line with EDRP. Economy Regulation of the Metaverse is a slow, uncertain, ambiguous, and an ongoing process. For example, the EU directive on Digital Content is “believed” by experts to be applicable to the Metaverse (purchased digital assets). However, this gives emphasis for policymakers to determine if the current consumer protection frameworks need to be revised. This could be useful where current trademark law applies; it would be more beneficial to reference the term “Metaverse” when creating new legislation. Moreover, the number of proposals and legislative changes being put forward show that in terms of proprietary rights and contractual law regarding digital assets, that this is an ongoing challenge. For example, the ECs view is that unregulated crypto-assets expose investors and consumers to unnecessary risk, and that NFTs are not subject to specific EU regulation. However, directives do apply such as Anti

30

V. Wylde et al.

Money-laundering Directive (virtual currencies), and there is an additional draft Regulation Markets in Crypto-Assets (MiCA) which may be influential for the time being. Additionally, at this time, in the transference of digital assets, these concerns creating more emphasis on companies to undertake data-sharing agreements (companies prefer using proprietary rights before user rights) in-line with data protection requirements (attaining user consent, undertaking privacy notification obligations) as in a decentralised model, are still very challenging. However, SC governance is being considered between the EP and EC in proposing a legal framework that should influence current and future data sharing agreements. To help mitigate fraud (i.e., user identities), decentralised technologies are recommended. An advantage is that there is no interference with centralised institutions in the digital space (heterogeneous architecture) and enables interoperability security, thus enhances user trust in providing transparency and traceability. There is debate around revising advertising legislation. Advertising regulatory framework in the Metaverse are said to be regulated like that of video games, however the French Advertising Authority (FAA) have recently updated guidelines in recognition to the ongoing challenges presented. Whilst commending the FAA in their guidance update, in terms of regulation and transparency, this should be used to also narrow focus upon emotional-responsive advertising and to restrict Metaverse virtual product placement. Transferring data internationally needs clarity in addressing jurisdictional challenges and implications (location of user, avatar, and or server). This causes additional challenges for virtual goods to maintain value (i.e., NFT enabled) outside their inherent platforms to effectively undertake trading practices in the future. In the Metaverse, BC is key in establishing robust economic systems, and can provide agility to rely less upon centralisation in universally maintaining virtual digital asset value across platforms. However, multiple parties will need to develop cross-chain technologies to ensure a seamless Metaverse operation (i.e., interoperability). For example, online transaction legality and certainty, SCs can execute terms and agreements once a criterion is met. However, due to SC legal clarity the EP have asked the EC to propose a legal framework for smart contract governance. The EP has also asked the EC to propose legislative changes for cybersecurity requirements regarding ICT. In response to national threat proliferation approaches, the EC have proposed Digital Resilience Act (DORA). However, to implement these immense challenges, the EU will need to address skills shortage in Cybersecurity professionals and utilise the European Cybersecurity Framework (ECSF) to close the skills gap. EU/UK General Data Protection Regulations Researchers are continually looking for ways to encompass GDPR and the private operation of Metaverse devices. Regulatory solutions for industry and government are perceived as a remedy to protect confidentiality, encourage innovation, and corporate/social responsibility.

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

31

However, the storage, handling, and safeguarding of data, also need much work in augmenting and realising these ongoing efforts. EU Digital Services Act Establishing common rules for digital service providers (for DSA and DMA), is a promising idea however, complimenting an already existing list of legislation present enforcement challenges in practice. Additional guidance’s therefore must be presented and harmonised to provide consumers and traders a comprehensive view in preparation to their Metaverse interactions. Moreover, in providing verification with business transactional information, it would be more helpful to provide clients with an option for full information disclosure. This would encourage better transparency practices between consumers and traders, and help to build trust between the local authorities, traders, and third parties. UK Online Safety Bill A main consideration of the implementation of the OSB is upon societal free speech. Due to the confusion of liability between publisher and intermediaries, placing responsibility on BT platforms to self-regulate presents the need for an external audit mechanism to differentiate and verify between monitoring potential unlawful content, and the publication of potentially offensive social media communications. Unfortunately, little has changed in this bill to facilitate legal clarity, accountability, and to deliver justice alongside the evolving nature of potential Metaverse criminality. Taking all the above-mentioned into account, investigative duties regarding the DSA and transparency are highlighted in the platforms obligation in the retrieval of trader information. The e-Commerce Directive and CRD, obligates traders to disclose this information to consumers. However, the e-Commerce Directive and CRD also has no verification requirement, of which the DSA specifies that platforms are obligated to take reasonable verification steps. In other words, after a platform gathers its information, the DSA does not impose a duty of disclosure, unless requested to do so by public authorities (with appropriate procedural processes) [96]. Even though the DSA approach is different from the transparency rules of previous legislation mentioned above, there are remaining inconsistencies regarding consumer instruments (i.e., UCPD and CRD). Moreover, it could be argued that existing liability systems taint the rationale and purpose of Directive 2000/31/EC (e-Commerce). This could be because as shown in the UK tier one regime, that the system is unable to effectively deal with targeting online publishers for harmful content due to online environment practicalities [183]. This is due to social media and the nature of the internet, the frequency and method of communication, and the number and types of content/publishers that share and disseminate possible harmful content has changed at a fundamental level [124]. Potentially, these examples warn of the over-criminalisation of speech and also prevents the assessment of speech being appropriately contextualised in meeting any objective criminal legal standard, which could overwhelm the criminal justice system [124]. Taking into account the varying types and quantities of publishers online that operate across numerous jurisdictions, pseudonymously, anonymously,

32

V. Wylde et al.

and at alarming publication frequency, means that locating and identifying said publishers would be challenging [184, 185]. In addition, the Directive 2000/31/EC (e-Commerce) restricts liability for online publishers, of which social media platforms in particular have frequently demonstrated a lack of regard to their self-regulating commitments. As the British Parliament considers the OSB, the purpose is to create “a new regulation regime to address illegal and harmful content online” is ongoing [186].

6 Conclusion From the evolving legal BT responsibilities, obligations, and unprecedented volume of internet traffic that significantly increased as a result of the Covid-19 pandemic, these risk-factors mean that user privacy and information bias (i.e., widespread misinformation, disinformation, and advocacy of hatred) is under further threat. However, these same circumstances can provide opportunities for the development and implementation of a BC external audit mechanism to record and verify BD veracity via a 5-6G network in further augmenting the distinguishing of unlawful, offensive, and harmful content via AI. However with regard to GDPR; the storage, handling, and safeguarding of data needs more work (i.e., storage specification), which includes current inadequacies for AI governance pertaining to algorithmic bias and unconscious behaviour. Additionally, the same technology could be utilised successfully between contractors and subcontractors in maintaining data quality, accuracy, and accountability (i.e., geo-tracking by NHSX and Intellin) prior to any deployment of contractual obligations facilitated by SC and IoT architectures (i.e., Track and Trace App). However, in the development of the Metaverse environment, there is uncertainty regarding the empowerment of a digital Avatar, its legal standing, and the differentiation between a true-living person, AI-bot, or cyber-criminals, in protecting user identities, assets, and data. This creates ongoing challenges regarding accountability and transparency. Indeed, for mitigating the likelihood of identity theft and fraud, BC is a more resilient disruptive technology that can play a vital role in effective identification and authentication (centralised vs decentralised) purposes, and provide a robust economic system to provide agility across platforms with less susceptibility to cyber-attack. Consequently, with no interference from centralised institutions, BC enables interoperability, SCs consensus could enhance user trust for transparency, and provide traceability (i.e., agreement, location, server, avatar) in the balance between various aspects of user privacy and encryption. However, for the universal operation of the Metaverse, open standards with BC should be investigated further with a decentralised identification network based upon international standards. Moreover, this will require more development on cross-chain technologies to ensure seamless operations, such as transferring data internationally warrants further clarity in addressing jurisdictional challenges. This

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

33

ultimately can also help NFT virtual goods in the supply chain, and to maintain value in future interoperability between platforms. For this to be realised, additional harmonised guidance’s are also needed for consumers and traders in preparation of Metaverse interactions. For example, the EU directive on digital content needs more certainty on its applicability to the Metaverse (purchased digital assets) and to utilise the term “Metaverse” in the creation of future legislation, to include better definitions of what constitutes the Metaverse. Nonetheless, crypto assets still need further clarity and protections to include a shift in the priorities of proprietary rights vs user rights in line with data protection (i.e., attaining user consent, undertaking privacy notification obligations). Finally, the EP have already instigated conversations with the EC to propose a legal framework for SC governance, and to propose additional legislative action for cybersecurity measures regarding all ICTs. However, the current lack of skills and personnel creates implementation with current national threat proliferation approaches very challenging.

Recommendations Policy and Notification Meanwhile, researchers recommend regulatory guidance adoption with codes of conduct and certification as an answer. This could mean the provision of verification mechanisms in business transactions and the ability to opt for full disclosure to encourage transparency and the building of trust between local authorities, traders and third parties. Likewise, due to on-going evidence of neglect on the part of social media platform self-regulation, Local Authorities (LA) could allow for the voluntary provision of imposing disclosure permanently within a LA area whilst providing specialist training centres, and utilising prescribed and accredited verification tools. Consequently, this could be implemented similar to the DSA investigative duties that are to be placed upon traders, consumers, intermediaries, and publishers concerning robust verification mechanisms prior to network admittance. This scheme could go further and include the promotion of data sovereignty and by attracting schools and universities, that could produce wide local, regional, and global support whilst generating inter-competitiveness for the safest and most trustworthy operation and delivery of the internet/Metaverse infrastructure. This would ultimately help in the targeting/verification of publisher compliance for example, with warning notices of jurisdiction, potential harmful content, and a list of accredited publishers with their locations could be utilised as future metrics in BC verification architectures. However, for societal cohesiveness, this may also require the revising of advertising legislation, with a narrow focus upon emotional responsive advertising to restrict Metaverse product placement practices, to include age verification mechanisms (OSB) whilst granting additional child protections. This could be offset by allowing

34

V. Wylde et al.

for granular options (i.e., GDPR privacy by design principles) to control data flow sensors, hardware XR with additional security measures implemented at product inception. For example, in maintaining due diligence practices during the transference of code between platforms, a notice of user interaction security protection level is advised, with a cross-platform configurable space for mitigating digital data leakage. Incidentally, the enforcement of product impact liabilities needs additional codes of practices for authorities to encourage intermediaries’ proactive duty of care contributions, such as SC controls to bridge the LA, EP, and EC in legal framework execution in upholding data-sharing agreements, and online transaction legality in executing terms and agreements once a set of criteria is met. Finally, to help with transparency and accountability principles (cybersecurity and data protection), simplification of the terms and conditions of platform access could be initiated, with separate sections for adults and children. This also means that the confusion of the British two-tier system could benefit from clearly defined liability mechanisms between intermediaries and publishers, alongside the reconsideration of assessment methodologies concerning the contextualising of speech in meeting an agreed minimal legal standard. This will provide more overall internet/Metaverse clarity locally, regionally, and internationally, in upholding the obligations of CSPs for the legal detection, reporting, blocking, and removal of harmful material.

VDaaS: Notification, Consent, Policies, and Records System Update As shown in [4] and [19], VDaaS is a cloud-based set of tools for verifying overall data pipeline traffic, and to augment effective and proportionate decision making for both CSPs and consumers. The development of which was conducted by an analysis of SMEs, partnerships and governance’s between technology and policy, with insights from the interactions of principles, rights, and freedoms within society. Here, the authors identify experiments to be undertaken in the detection, capturing, processing, and storage of data. This is achieved through the design of a BC framework architecture for the isolation of packet data to cybersecurity, data privacy, and transparency related objectives. Later, the authors in [17] discuss attaining consent (GDPR), ensuring clarity with cookie banners (DSA), and propose an integrity violation detection audit framework for social media posts, advertisers, individuals, and groups, in an effort to balance the privacy of posts and encryption. This chapter iteration of VDaaS focuses on the granular layered developments of intermediary security and privacy governance, provenance, audit compliance (notification, records, and policies), and analysis relating to emerging digital ecosystems (i.e., Metaverse), as shown in Fig. 3.

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

35

Fig. 3 Update from previous [17] iteration

Data Provenance and Integrity It has long been recognised that ICTs have significant impacts upon the social and economic aspects of societies there can be direct and indirect harms to users and societies, therefore it is necessary to ensure system safety, reliability, and trust [15]. Here we present an updated conceptual architectural framework, to bolster user rights and freedoms in respect to demonstrating and verifying overall data pipeline traffic (i.e., incl Metaverse), and intermediary security and privacy governance, provenance, and ultimately audit compliance. We apply controls for assessing technique efficacy in the identification of data/content violations and to later include further methods of integrity enforcement (i.e., Potential acquisition of large collections of sensitive personal user data [i.e., biometric, emotional and physiological data] for Undertaking inter-controller auditing)

Data Veracity and User Safety Data veracity concerns the reliability and quality of data such as where it has been collected, how it was collected, and how it will be utilised in analysis [2]. Metadata Veracity (uncertainty/quality) auditing Institutions, organisations, and companies. The OSB will impose a “duty of care” to online platforms and Articles 5 and 6 of the e-Commerce Directive had already established service provider obligations in the disclosure of information regarding themselves to public authorities and consumers undertaking privacy notification obligations. This could be transposed from legal frameworks via SC OPCODES. Surveys indicate that consumers and the public are

36

V. Wylde et al.

unaware of their data output value, however BC provides a method in ensuring that availability of data will always be verifiable and trustworthy [16].

Cybersecurity As shown in GDPR and SC policy with cryptography as a cybersecurity control method, gives transparency, protected agency and responsibility to the public, financial markets, business professionals and legal representatives, in conducting valid and transparent actions or investigations on behalf of the directorate or client [4, 19]. Transferring data internationally will need clarity alongside addressing jurisdictional challenges and implications, such as the location of a user, their avatar, and/or server location: GDPR 6. Territorial Scope (Article 3): Traffic management system to investigate origin (i.e., port 80: http) [1, 4]. Outlier detection could be utilised for detecting fraudulent transactional data and network intrusion [1]. Network Location and Consent Manager (integration of access points for content of services means that a user has capacity to withhold personal data collection): • • • •

Awareness: BT Advice and Guidance to Secure Computer Account Awareness: Changing Network/Access Points Awareness: Changing location Warning: Inconsistent Jurisdictional Standard

AI Training Data With real-case scenarios and applications of BD methodologies, a main issue is the misspelling of training samples which can significantly affect statistical classification accuracy [1]. For example, this includes relabelling the same cookie many times for different and/or contradictory purposes, with additional undeclared, sometimes unclassified cookies: • • • •

Awareness: Potential detection of AI Bot Awareness: Potentially misleading cookie expiration times Awareness: Potential incorrect cookie category assignments Awareness: Deletion of Emails and Contacts (deleting and updating data)

Data Privacy Integrity violations of social media and Personally Identifiable Information (PII).

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

37

• Data Modification and Deletion (Articles 16–18): Provision of Off-chain mechanism that stores PII and non-PII • Default Protection By Design (Article 25) Privacy protections XR (systems frameworks and device manufacturers) • Controllers/Processors Responsibilities (Articles 24, 26, and 28) • Lawfulness and Principles (Articles 5, 6, and 12)

Due Diligence and Best Practices Logging and Reporting (Deployment of Sound and/or visual cues): Maintaining the information supply chain and limit the scope of emotional-responsive advertising to include health warnings, restrictions on Metaverse virtual product placement children that user their services, all online platforms are to be prohibited in the presentation of targeted advertising that derive a minors personal data. • Warning: Eye, head, and neck fatigue, dizziness, nausea, and motion sickness • Warning: Viewing of Profile (log: Tracking Metric/Produce Records [transparency]) • Warning: Violating Behaviours (Passive, Invasive, and Duplicitous) • Warning: Unauthorised Account Access by a Partner/User • Warning: Distress ’buttons’

Avatars User Interface Avatars/user/entity? secondary avatars (i.e., clone) in obfuscating data, thus hiding online behaviour. In mitigating the likelihood of identity theft, Avoiding data leakage (i.e., culture, economic background, and demographics) and harassment for example, secondary avatars can be deployed (clone) to obfuscate data and to mask online behaviours and actions. To mitigate risks of harmful code transference in the movement of users between virtual spaces and platforms, creating and maintaining security protocols present additional technical challenges.

User Consent and Age Verification Consent Management (Article 7) GDPR to ensure explicit user consent and special attention to determine which purpose and data are appropriate include attaining user consent. Similarly, the EC proposed the European Digital Identity Wallet to help in the identification of age. • Warning: Adult Operational Policy/Child Operational Policy (Separate adult and child terms and conditions: clear definitions of Data Collection and storage methods) • Warning: Age and Consent (Verification)

38

V. Wylde et al.

• Awareness: User Account: Child/Adult • Warning: Full Disclosure Mode • Awareness: Granular options to include controls on data flow from sensors (Create awareness of user High levels of granularity) • Awareness: Personal data transmission or collection VR • Awareness: Potential emotional recording of data VR • Awareness: Potential profiling • Awareness: Data Sharing (pictures: children or vulnerable people consent) • Awareness: QoE opt-out end-to-end encryption advertising • Warning: Potential sharing of data with third parties

Tokenization-Validation As with Track and Trace, it is not just a matter of deploying technology in the mitigation of risk to enable consistency, transparency, and proportionality for service delivery, it is in the trust and scalability of partnerships and technology [3]. Due to no clear regulation frameworks for NFT ownership, the creating of NFTs to sell without an owners’ knowledge or permission. However, with the verification of VDaaS traffic and data cleaning, issued user Unique ID could be utilised similar to Tokenization and vaccines [3], NFTs (BC-enabled cryptographic assets) could be utilised to validate within a strict rotating time-period for example.

Limitations Additional requirements may be necessary for the regulation of consistency, quality, and security of information that is disseminated at a global scale. This will typically depend upon national and international agreements regarding directives and legal frameworks in carrying out interventions and management on behalf of the public. Additional risks continue post-pandemic in regard to the cloud, BD, and AI in the evolution, effectiveness, and delivery of fundamental and strategic planning, accurate real-time information [14]. However, VR is not explicitly motioned in the draft AI Act, DSA, or in the EU’s emerging technologies framework. Additionally, the Center for Countering Digital Hate look for better ways to effectively identify user ages, and for parents in the responsible monitoring of their child’s Metaverse activities.

Conflict of Interest Authors declare that they have no conflicts of interest.

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

39

References 1. V. Wylde, E. Prakash, C. Hewage, J. Platts, Data cleaning: challenges and novel solutions. In AMI - The 4th Advances in Management and Innovation Conference 2020 (Cardiff Metropolitan University, 2020) 2. V. Wylde, E. Prakash, C. Hewage, J. Platts, Data cleaning: challenges and novel solutions for big data analytics and visualisation. In 3MT RITA - The 8th International Conference on Robot Intelligence Technology and Applications 2020 (Cardiff Metropolitan University, 2020) 3. V. Wylde, E. Prakash, C. Hewage, J. Platts, Covid-19 crisis: is our personal data likely to be breached? In AMI 2021 - The 5th Advances in Management and Innovation Conference 2021 (Cardiff Metropolitan University, 2021) 4. V. Wylde, N. Rawindaran, J. Lawrence, R. Balasubramanian, E. Prakash, A. Jayal, I. Khan, C. Hewage, J. Platts, Cybersecurity, data privacy and blockchain: a review. SN Comput. Sci. 3(2), 1–12 (2022) 5. A. Halevy, C. Canton-Ferrer, H. Ma, U. Ozertem, P. Pantel, M. Saeidi, F. Silvestri, V. Stoyanov, Preserving integrity in online social networks. Commun. ACM 65(2), 92–98 (2022) 6. Shaping Europe’s Digital Future. https://ec.europa.eu/info/strategy/priorities-2019-2024/ europe-fit-digital-age/shaping-europe-digital-future_en. Accessed 27 Jul 2022 7. Research Report: Managers’ view on the Metaverse. https://www.philipprauschnabel.com/ en/research/research-report-managers-view-on-the-metaverse/. Accessed 21 Jul 2022 8. C.B. Fernandez, P. Hui, Life, the metaverse and everything: an overview of privacy, ethics, and governance in metaverse. Preprint. arXiv:2204.01480 (2022) 9. Metaverse: opportunities, risks and policy implications. https://www.europarl.europa.eu/ RegData/etudes/BRIE/2022/733557/EPRS_BRI(2022)733557_EN.pdf. Accessed 27 Jul 2022 10. The amazing possibilities of healthcare in the metaverse. https://www.forbes.com/sites/ bernardmarr/2022/02/23/the-amazing-possibilities-of-healthcare-in-the-metaverse/?sh= 53a7151c9e5c. Accessed 30 Jul 2022 11. European Union: Digital services act agreement a ‘watershed moment’ for Internet regulation. https://www.amnesty.org/en/latest/news/2022/04/european-union-digital-services-actagreement-a-watershed-moment-for-internet-regulation/. Accessed 01 May 2022 12. A. Peukert, M. Husovec, M. Kretschmer, P. Mezei, J. Quintais, European Copyright Societycomment on copyright and the digital services act proposal. Available at SSRN 4016208 (2022) 13. A. Turillazzi, F. Casolari, M. Taddeo, L. Floridi, The digital services act: an analysis of its ethical, legal, and social implications. Legal, and Social Implications (January 12, 2022) (2022) 14. V. Wylde, E. Prakash, C. Hewage, J. Platts, Covid-19 era: trust, privacy and security. In Privacy, Security and Forensics in the Internet of Things (IoT) (Springer, 2022), pp. 31–49 15. V. Wylde, E. Prakash, C. Hewage, J. Platts, Ethical challenges in the use of digital technologies: AI and big data. In Digital Transformation in Policing: The Promise, Perils and Solutions (Springer, 2022) 16. V. Wylde, E. Prakash, C. Hewage, J. Platts, The use of AI in managing big data analysis demands: status and future directions. In Artificial Intelligence and National Security (2022), pp. 47–67 17. V. Wylde, E. Prakash, C. Hewage, J. Platts, EU/EC GDPR audit mechanism: VDaaS. In AMI - The 6th Advances in Management and Innovation Conference (Cardiff Metropolitan University, 2022) 18. V. Wylde, E. Prakash, C. Hewage, J. Platts, Sensor data analytics: security and privacy management in IoMT sensor network. In Indian Institute of Information Technology, Kottayam: Presentation (Cardiff Metropolitan University, 2021)

40

V. Wylde et al.

19. V. Wylde, N. Rawindaran, J. Lawrence, R. Balasubramanian, E. Prakash, A. Jayal, I. Khan, C. Hewage, J. Platts, Cybersecurity, cloud security, data privacy and blockchain: open challenges. In 7th International Conference on Cyber Security and Privacy in Communication Networks (ICCS) (Cardiff Metropolitan University, 2021) 20. Metaverse, Future of the Internet or simply a buzz? https://www.headmind.com/en/ metaverse-future-of-the-internet-or-simply-a-buzz/. Accessed 05 Dec 2022 21. Facebook is spending at least $10 billion this year on its metaverse division. https:// www.headmind.com/en/metaverse-future-of-the-internet-or-simply-a-buzz/. Accessed 05 Dec 2022 22. C. Sandeepa, B. Siniarski, N. Kourtellis, S. Wang, M. Liyanage, A survey on privacy for B5G/6G: new privacy challenges, and research directions. J. Ind. Inf. Integr., 30, 100405 (2022) 23. L.H. Lee, T. Braud, P. Zhou, L. Wang, D. Xu, Z. Lin, A. Kumar, C. Bermejo, P. Hui, All one needs to know about metaverse: a complete survey on technological singularity, virtual ecosystem, and research agenda. Preprint. arXiv:2110.05352 (2021) 24. F. Roesner, T. Kohno, D. Molnar, Security and privacy for augmented reality systems. Commun. ACM 57(4), 88–96 (2014) 25. J.A. De Guzman, A. Seneviratne, K. Thilakarathna, Unravelling spatial privacy risks of mobile mixed reality data. Proc. ACM Interactive Mob. Wearable Ubiquitous Technol. 5(1), 1–26 (2021) 26. K. Lebeck, K. Ruth, T. Kohno, F. Roesner, Securing augmented reality output. In 2017 IEEE Symposium on Security and Privacy (SP) (IEEE, 2017), pp. 320–337 27. W.Y.B. Lim, Z. Xiong, D. Niyato, X. Cao, C. Miao, S. Sun, Q. Yang, Realizing the metaverse with edge intelligence: a match made in heaven. Preprint. arXiv:2201.01634 (2022) 28. P. Renaud, J.L. Rouleau, L. Granger, I. Barsetti, S. Bouchard, Measuring sexual preferences in virtual reality: a pilot study. CyberPsychol. Behav. 5(1), 1–9 (2002) 29. S. Ølnes, J. Ubacht, M. Janssen, Blockchain in government: benefits and implications of distributed ledger technology for information sharing, 34(3), 355–364 (2017) 30. M. Xu, W.C. Ng, W.Y.B. Lim, J. Kang, Z. Xiong, D. Niyato, Q. Yang, X.S. Shen, C. Miao, A full dive into realizing the edge-enabled metaverse: visions, enabling technologies, and challenges. Preprint. arXiv:2203.05471 (2022) 31. H. Alves, G.D. Jo, J. Shin, C. Yeh, N.H. Mahmood, C. Lima, C. Yoon, N. Rahatheva, O. Park, S. Kim, et al. Beyond 5G URLLC evolution: new service modes and practical considerations. Preprint. arXiv:2106.11825 (2021) 32. J. Park, M. Bennis, URLLC-eMBB slicing to support VR multimodal perceptions over wireless cellular systems. In 2018 IEEE Global Communications Conference (GLOBECOM) (IEEE, 2018), pp. 1–7 33. F. Guo, F.R. Yu, H. Zhang, H. Ji, V.C.M. Leung, X. Li, An adaptive wireless virtual reality framework in future wireless networks: a distributed learning approach. IEEE Trans. Veh. Technol. 69(8), 8514–8528 (2020) 34. J.A. De Guzman, K. Thilakarathna, A. Seneviratne, Security and privacy approaches in mixed reality: a literature survey. ACM Comput. Surv. (CSUR) 52(6), 1–37 (2019) 35. Web 3.0 vs. metaverse: a detailed comparison. https://www.blockchain-council.org/ metaverse/web-3-0-vs-metaverse/. Accessed 01 Nov 2022 36. When the dark web meets the metaverse. https://futuristspeaker.com/business-trends/whenthe-dark-web-meets-the-metaverse/. Accessed 03 Aug 2022 37. Insight: regulating the metaverse. https://www.global-counsel.com/insights/blog/regulatingmetaverse. Accessed 03 Aug 2022 38. Exploring the metaverse and the digital future. https://www.gsma.com/asia-pacific/wpcontent/uploads/2022/02/270222-Exploring-the-metaverse-and-the-digital-future.pdf. Accessed 03 Aug 2022 39. Malicious smart contracts: how they’re built, and how they steal your money. https:// medium.com/harpie-io/malicious-smart-contracts-how-theyre-built-and-how-they-stealyour-money-c69ec6ffc773. Accessed 03 Aug 2022

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

41

40. Cryptocurrency crime: preventing the misuse of virtual assets by organized crime for money laundering. https://www.interpol.int/en/News-and-Events/News/2021/Cryptocurrencycrime-preventing-the-misuse-of-virtual-assets-by-organized-crime-for-money-laundering. Accessed 03 Aug 2022 41. Tech: The counterfeit NFT problem is only getting worse. https://www.theverge.com/ 22905295/counterfeit-nft-artist-ripoffs-opensea-deviantart. Accessed 03 Aug 2022 42. Why the fate of the metaverse could hang on its security. https://venturebeat.com/2022/01/ 26/why-the-fate-of-the-metaverse-could-hang-on-its-security/. Accessed 03 Aug 2022 43. European Union Agency for Cybersecurity. What is “social engineering”? https://www. enisa.europa.eu/topics/csirts-in-europe/glossary/what-is-social-engineering. Accessed 03 Aug 2022 44. Metaverse make-up: are advertising rules the same in the metaverse? https://togetherwith. osborneclarke.com/metaverse-report/metaverse-make-up/. Accessed 02 Aug 2022 45. J. Hu, A. Iosifescu, R. LiKamWa. LensCap: split-process framework for fine-grained visual privacy control for augmented reality apps. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services, pp. 14–27 (2021) 46. Rutgers researchers discover security vulnerabilities in virtual reality headsets. https:// www.rutgers.edu/news/rutgers-researchers-discover-security-vulnerabilities-virtual-realityheadsets. Accessed 03 Aug 2022 47. R. Dremliuga, A. Iakovenko, N. Prisekina, Crime in virtual reality: discussion. In 2019 International Conference on Cybersecurity (ICoCSec) (IEEE, 2019), pp. 81–85 48. F. Roesner, T. Kohno, Security and privacy for augmented reality: our 10-year retrospective. In VR4Sec: 1st International Workshop on Security for XR and XR for Security (2021) 49. Opinion: the challenges of protecting data and rights in the metaverse. https://www. devex.com/news/sponsored/opinion-the-challenges-of-protecting-data-and-rights-in-themetaverse-103026. Accessed 30 Jul 2022 50. How Cambridge Analytica sparked the great privacy awakening. https://www.wired.com/ story/cambridge-analytica-facebook-privacy-awakening/. Accessed 01 Aug 2022 51. 1. Concerns about human agency, evolution and survival. https://www.pewresearch.org/ internet/2018/12/10/concerns-about-human-agency-evolution-and-survival/. Accessed 01 Aug 2022 52. B. Egliston, M. Carter, Critical questions for Facebook’s virtual reality: data, power and the metaverse. Internet Pol. Rev. 10(4), 1 (2021) 53. Metaverse: virtual realities or augmented collections? https://linc.cnil.fr/fr/metavers-realitesvirtuelles-ou-collectes-augmentees. Accessed 30 Jul 2022 54. European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance). https://eur-lex.europa.eu/eli/reg/2016/679/ oj. Accessed 30 Jul 2022 55. EU: Privacy and security concerns in the metaverse. https://www.dataguidance.com/opinion/ eu-privacy-and-security-concerns-metaverse. Accessed 05 Dec 2022 56. The metaverse: the evolution of a universal digital platform. https://www.nortonrosefulbright. com/de-de/wissen/publications/5cd471a1/the-metaverse-the-evolution-of-a-universaldigital-platform. Accessed 30 Jul 2022 57. Meta’s metaverse with likely be filled with marketing and manipulation. https://dataethics. eu/metas-metaverse-with-likely-be-filled-with-marketing-and-manipulation/. Accessed 01 Aug 2022 58. Advancing digital agency: the power of data intermediaries. Insight report February 2022. https://www3.weforum.org/docs/WEF_Advancing_towards_Digital_Agency_2022. pdf. Accessed 01 Aug 2022 59. Data privacy: what data privacy could look like in the metaverse. https://www.protocol.com/ enterprise/data-privacy-intermediaries-metaverse-web3. Accessed 01 Aug 2022

42

V. Wylde et al.

60. European Parliament. Artificial Intelligence Act. https://www.europarl.europa.eu/RegData/ etudes/BRIE/2021/698792/EPRS_BRI(2021)698792_EN.pdf. Accessed 01 Aug 2022 61. European Parliament. Data Governance Act. https://www.europarl.europa.eu/RegData/ etudes/BRIE/2021/690674/EPRS_BRI(2021)690674_EN.pdf. Accessed 01 Aug 2022 62. D.I.D. Han, Y. Bergs, N. Moorhouse, Virtual reality consumer experience escapes: preparing for the metaverse. Virtual Reality, 26, 1–16 (2022) 63. S. Mystakidis, Metaverse. Encyclopedia 2(1), 486–497 (2022) 64. The metaverse has a groping problem already. https://www.technologyreview.com/2021/12/ 16/1042516/the-metaverse-has-a-groping-problem/. Accessed 02 Aug 2022 65. Cybersecurity: the metaverse will be the next breeding ground for bad info. https:// www.govtech.com/security/the-metaverse-will-be-the-next-breeding-ground-for-bad-info. Accessed 02 Aug 2022 66. Global Terrorism Index 2022: measuring the impact of terrorism. https://www. visionofhumanity.org/wp-content/uploads/2022/03/GTI-2022-web.pdf#page=75. Accessed 02 Aug 2022 67. Regulation: Online safety in the metaverse - what will platforms need to think about? https:// togetherwith.osborneclarke.com/metaverse-report/regulation-part-3/. Accessed 02 Aug 2022 68. European Parliament. Digital Services Act. https://www.europarl.europa.eu/thinktank/en/ document/EPRS_BRI(2021)689357. Accessed 02 Aug 2022 69. European Parliament. Artificial Intelligence Act. https://www.europarl.europa.eu/thinktank/ en/document/EPRS_BRI(2021)698792. Accessed 02 Aug 2022 70. Users: Data privacy, handling data and artificial intelligence. https://togetherwith. osborneclarke.com/metaverse-report/users-part-3/. Accessed 01 Aug 2022 71. European Commission. Commission collects views on making liability rules fit for the digital age, artificial intelligence and circular economy. https://digital-strategy.ec.europa. eu/en/news/commission-collects-views-making-liability-rules-fit-digital-age-artificialintelligence-and. Accessed 02 Aug 2022 72. Building the metaverse responsibly. https://about.fb.com/news/2021/09/building-themetaverse-responsibly/. Accessed 02 Aug 2022 73. Mark Zuckerberg’s metaverse unlocks a new world of content moderation chaos. https://www. lawfareblog.com/mark-zuckerbergs-metaverse-unlocks-new-world-content-moderationchaos. Accessed 02 Aug 2022 74. Facebook’s metaverse: one incident of abuse and harassment every 7 minutes. https:// counterhate.com/research/facebooks-metaverse/. Accessed 30 Jul 2022 75. Estimates of childhood exposure to online sexual harms and their risk factors. https://www. weprotect.org/economist-impact-global-survey/#report. Accessed 30 Jul 2022 76. K. Hirsh-Pasek, J. Zosh, H.S. Hadani, R.M. Golinkoff, K. Clark, C. Donohue, E. Wartella, A whole new world: education meets the metaverse. Policy, 1–13 (2022) 77. Safeguarding the metaverse. https://www.theiet.org/media/9836/safeguarding-the-metaverse. pdf. Accessed 30 Jul 2022 78. European Union. Proposal for a regulation of the european Parliament and of the council on a single market for digital services (Digital Services Act) and Amending Directive 2000/31/EC. https://eur-lex.europa.eu/legal-content/en/TXT/?uri=COM%3A2020%3A825 %3AFIN. Accessed 30 Jul 2022 79. H. Tobar-Muñoz, S. Baldiris, R. Fabregat, Augmented reality game-based learning: enriching students’ experience during reading comprehension activities. J. Educ. Comput. Res. 55(7), 901–936 (2017) 80. N. Goltz, ESRB warning: use of virtual worlds by children may result in addiction and blurring of borders – the advisable regulations in light of foreseeable damages. Advis. Regul. Light Damages Univ. Pittsburgh J. Technol. Law Pol. 11(2), 1–63 (2010) 81. What is an NFT? Non-fungible tokens explained. https://edition.cnn.com/2021/03/17/ business/what-is-nft-meaning-fe-series/index.html. Accessed 02 Aug 2022 82. Why you can’t have the metaverse without a blockchain. https://www.weforum.org/agenda/ 2022/01/metaverse-crypto-blockchain-virtual-world. Accessed 02 Aug 2022

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

43

83. Can you truly own anything in the metaverse? A law professor explains how blockchains and NFTs don’t protect virtual property. https://theconversation.com/can-you-truly-ownanything-in-the-metaverse-a-law-professor-explains-how-blockchains-and-nfts-dontprotect-virtual-property-179067. Accessed 02 Aug 2022 84. The metaverse: what are the legal implications. https://www.cliffordchance.com/content/ dam/cliffordchance/briefings/2022/02/the-metaverse-what-are-the-legal-implications.pdf. Accessed 02 Aug 2022 85. N. Kshetri, Scams, frauds, and crimes in the nonfungible token market. Computer 55(4), 60–64 (2022) 86. Y.K. Dwivedi, L. Hughes, A.M. Baabdullah, S. Ribeiro-Navarrete, M. Giannakis, M.M. AlDebei, D. Dennehy, B. Metri, D. Buhalis, C.M.K. Cheung, et al. Metaverse beyond the hype: multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. Int. J. Inf. Manag. 66, 102542 (2022) 87. Leadership: a giant leap for advertising kind: how the metaverse could lead to a next-gen brand experience. https://www.forbes.com/sites/forbescommunicationscouncil/2022/05/ 16/a-giant-leap-for-advertising-kind-how-the-metaverse-could-lead-to-a-next-gen-brandexperience/?sh=464d19302f6b. Accessed 01 Aug 2022 88. B. Heller, A. Bar-Zeev, The problems with immersive advertising: in AR/VR, nobody knows you are an ad. J. Online Trust Saf. 1(1), 1–14 (2021) 89. European Commission. Intellectual property in the metaverse. Episode II: trade marks. https://intellectual-property-helpdesk.ec.europa.eu/news-events/news/intellectualproperty-metaverse-episode-ii-trade-marks-2022-03-29_en. Accessed 02 Aug 2022 90. Protecting and enforcing IP rights in the metaverse. https://www.gov.uk/government/news/ online-safety-law-to-be-strengthened-to-stamp-out-illegal-contenthttps://www.afslaw.com/ perspectives/alerts/protecting-and-enforcing-ip-rights-the-metaverse. Accessed 02 Aug 2022 91. T. Huynh-The, Q.V. Pham, X.Q. Pham, T.T. Nguyen, Z. Han, D.S. Kim, Artificial intelligence for the metaverse: a survey. Preprint. arXiv:2202.10336 (2022) 92. L. Rosenberg, Regulation of the metaverse: a roadmap. In Proceedings of the 6th International Conference on Virtual and Augmented Reality Simulations (ICVARS 2022), Brisbane, Australia, vol. 1 (2022) 93. A. Koshiyama, E. Kazim, P. Treleaven, Algorithm auditing: managing the legal, ethical, and technological risks of artificial intelligence, machine learning, and associated algorithms. Computer 55(4), 40–50 (2022) 94. Reed Smith guide to the metaverse: data protection and privacy. https://www.reedsmith.com/ en/perspectives/metaverse/2021/05/data-protection-and-privacy. Accessed 30 Jul 2022 95. C. Goanta, T. Bertaglia, A. Iamnitchi, The case for a legal compliance API for the enforcement of the EU’s digital services act on social media platforms. Preprint. arXiv:2205.06666 (2022) 96. C. Cauffman, C. Goanta, A new order: the digital services act and consumer protection. Eur. J. Risk Regul. 12(4), 758–774 (2021) 97. The Digital Services Act package. https://digital-strategy.ec.europa.eu/en/policies/digitalservices-act-package. Accessed 27 Jul 2022 98. Press Corner. https://ec.europa.eu/commission/presscorner/home/en. Accessed 27 Jul 2022 99. Regulatory framework proposal on artificial intelligence. https://digital-strategy.ec.europa. eu/en/policies/regulatory-framework-ai. Accessed 27 Jul 2022 100. Regulation of the European Parliament and of the council. https://eur-lex.europa.eu/legalcontent/EN/TXT/PDF/?uri=CELEX:52020PC0825&from=en. Accessed 27 Jul 2022 101. A. Alemanno, How much better is better regulation?: Assessing the impact of the better regulation package on the European Union–a research agenda. Eur. J. Risk Regul. 6(3), 344– 356 (2015) 102. Better regulation agenda: enhancing transparency and scrutiny for better EU law-making. https://ec.europa.eu/commission/presscorner/detail/en/IP_15_4988. Accessed 27 Jul 2022 103. J. Smits, Full harmonization of consumer law? A critique of the draft directive on consumer rights. Eur. Rev. Private Law 18(1), 5–14 (2010)

44

V. Wylde et al.

104. European Union. 12008M005: consolidated version of the treaty on European Union TITLE I: COMMON PROVISIONS - Article 5 (ex Article 5 TEC). https://eur-lex.europa.eu/ LexUriServ/LexUriServ.do?uri=CELEX:12008M005:EN:HTML. Accessed 27 Jul 2022 105. S. Garben, I. Govaere, The Division of Competences Between the EU and the Member States: Reflections on the Past, the Present and the Future (Bloomsbury Publishing, 2017) 106. The digital services act: risk-based regulation of online platforms. https://policyreview.info/ articles/news/digital-services-act-risk-based-regulation-online-platforms/1606. Accessed 27 Jul 2022 107. M. Trengove, E. Kazim, D. Almeida, A. Hilliard, S. Zannone, E. Lomas, A critical review of the online safety bill. Patterns, 3, 100544 (2022) 108. Consultation outcome: online harms white paper. https://www.gov.uk/government/ consultations/online-harms-white-paper/online-harms-white-paper. Accessed 26 Jul 2022 109. IWF annual report 2020 - face the facts. https://www.iwf.org.uk/about-us/who-we-are/ annual-report-2020/. Accessed 26 Jul 2022 110. Amnesty reveals alarming impact of online abuse against women. https://www.amnesty.org/ en/latest/press-release/2017/11/amnesty-reveals-alarming-impact-of-online-abuse-againstwomen/. Accessed 26 Jul 2022 111. N. Newman, R. Fletcher, A. Schulz, S. Andi, C.T. Robertson, R.K. Nielsen, Reuters Institute digital news report 2021. Reuters Institute for the study of Journalism (2021) 112. BCS Policy and PR. The online safety bill and the tech agenda for 2022. ITNOW 64(1), 28–29 (2022) 113. A. Scott, An unwholesome layer cake: intermediary liability in English defamation and data protection law. In The Legal Challenges of Social Media (Edward Elgar Publishing, 2017), pp. 222–246 114. Video sharing platforms that have notified. https://www.ofcom.org.uk/cymru/online-safety/ information-for-industry/vsp-regulation/notified-video-sharing-platforms. Accessed 28 Jul 2022 115. Hate crime. https://www.cps.gov.uk/crime-info/hate-crime. Accessed 28 Jul 2022 116. Sentencing and the council. https://www.sentencingcouncil.org.uk/sentencing-and-thecouncil/. Accessed 28 Jul 2022 117. Guidelines on prosecuting cases involving communications sent via social media. https://www.cps.gov.uk/sites/default/files/documents/legal_guidance/Guidelines%20on %20Prosecuting%20Cases%20Involving%20Communications%20Sent%20via%20Social %20Media._0.pdf. Accessed 28 Jul 2022 118. J. Herring, Criminal Law Statutes 2012-2013 (Routledge, 2013) 119. Crime and Disorder Act 1998 c.37 Section 32(1). https://www.legislation.gov.uk/ukpga/1998/ 37/section/32. Accessed 28 Jul 2022 120. Crime and Disorder Act 1998 c.37 Section 31(1). https://www.legislation.gov.uk/ukpga/1998/ 37/section/31. Accessed 28 Jul 2022 121. Crime and Disorder Act 1998 c.37 Section 30(1). https://www.legislation.gov.uk/ukpga/1998/ 37/section/30. Accessed 28 Jul 2022 122. Crime and Disorder Act 1998 c.37 Section 29(1). https://www.legislation.gov.uk/ukpga/1998/ 37/section/29. Accessed 28 Jul 2022 123. Stepping up the EU’s efforts to tackle illegal content online. https://ec.europa.eu/commission/ presscorner/detail/en/MEMO_17_3522. Accessed 28 Jul 2022 124. Modernising communications offences: a final report. https://s3-eu-west-2.amazonaws.com/ lawcom-prod-storage-11jsxou24uy7q/uploads/2021/07/Modernising-CommunicationsOffences-2021-Law-Com-No-399.pdf. Accessed 28 Jul 2022 125. Understanding and reporting online harms on your online platform. https://www.gov.uk/ guidance/understanding-and-reporting-online-harms-on-your-online-platform. Accessed 28 Jul 2022 126. Online safety law to be strengthened to stamp out illegal content. https://www.gov. uk/government/news/online-safety-law-to-be-strengthened-to-stamp-out-illegal-content. Accessed 28 Jul 2022

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

45

127. B. Falchuk, S. Loeb, R. Neff, The social metaverse: battle for privacy. IEEE Technol. Soc. Mag. 37(2), 52–61 (2018) 128. Cybersecurity: Nascent metaverse raises complex cybersecurity questions. https:// togetherwith.osborneclarke.com/metaverse-report/regulation-part-4/. Accessed 03 Aug 2022 129. The NIS2 directive: a high common level of cybersecurity in the EU. https://www.europarl. europa.eu/RegData/etudes/BRIE/2021/689333/EPRS_BRI(2021)689333_EN.pdf. Accessed 03 Aug 2022 130. General Product Safety Regulation. https://www.europarl.europa.eu/RegData/etudes/BRIE/ 2021/698028/EPRS_BRI(2021)698028_EN.pdf. Accessed 03 Aug 2022 131. The New European Cyber Resilience Act. https://www.europarl.europa.eu/legislative-train/ theme-a-europe-fit-for-the-digital-age/file-european-cyber-resilience-act. Accessed 03 Aug 2022 132. Data Protection Report. https://www.dataprotectionreport.com/2022/10/the-proposed-eucyber-resilience-act-what-it-is-and-how-it-may-impact-the-supply-chain/. Accessed 05 Dec 2022 133. Cybersecurity: The EU parliament approved the new NIS2 directive. https://portolano.it/en/ newsletter/portolano-cavallo-inform-digital-ip/cybersecurity-eu-parliament-approved-newnis2-directive. Accessed 05 Dec 2022 134. Metaverse and cybersecurity, what challenges for the future? https://www.gamingtechlaw. com/2022/06/metaverse-cybersecurity-challenges-future.html. Accessed 05 Dec 2022 135. Metaverse, standards and interoperability. https://www.metapunk.co.uk/metablog/metaversestandards-interoperability. Accessed 02 Aug 2022 136. The metaverse is money and crypto is king – why you’ll be on a blockchain when you’re virtual-world hopping. https://theconversation.com/the-metaverse-is-money-and-crypto-isking-why-youll-be-on-a-blockchain-when-youre-virtual-world-hopping-171659. Accessed 02 Aug 2022 137. IP Helpdesk: intellectual property in the metaverse. Episode 1. https://intellectual-propertyhelpdesk.ec.europa.eu/news-events/news/intellectual-property-metaverse-episode-1-202202-25_en. Accessed 02 Aug 2022 138. What are DAOs, or Decentralised Autonomous Organisations? https:// www.economist.com/the-economist-explains/2022/01/26/what-are-daos-ordecentralised-autonomous-organisations?gclid=EAIaIQobChMI4Y736_yMAIVyPZRCh2Q2wd8EAAYASAAEgKyqfD_BwE&gclsrc=aw.ds. Accessed 02 Aug 2022 139. The foundation of the metaverse: centralization versus decentralization. https://www.td.org/ atd-blog/the-foundation-of-the-metaverse-centralization-versus-decentralization. Accessed 02 Aug 2022 140. European Parliament. Blockchain and the general data protection regulation. https://www. europarl.europa.eu/RegData/etudes/STUD/2019/634445/EPRS_STU(2019)634445_EN.pdf. Accessed 02 Aug 2022 141. European Parliament. European Parliament resolution of 5 May 2022 on competition policy – annual report 2021 (2021/2185(INI)). https://www.europarl.europa.eu/doceo/document/TA9-2022-0202_EN.html. Accessed 01 Aug 2022 142. The metaverse – what does it mean for data privacy and information security? https://www. jdsupra.com/legalnews/the-metaverse-what-does-it-mean-for-2751284/. Accessed 01 Aug 2022 143. The metaverse: three legal issues we need to address. https://theconversation.com/themetaverse-three-legal-issues-we-need-to-address-175891. Accessed 01 Aug 2022 144. The importance of creating an interoperable metaverse. https://www.accesspartnership.com/ how-can-we-protect-those-participating-in-an-interoperable-metaverse/. Accessed 01 Aug 2022 145. Content moderation in multi-user immersive experiences: AR/VR and the future of online speech. https://itif.org/publications/2022/02/28/content-moderation-multi-user-immersiveexperiences-arvr-and-future-online/. Accessed 02 Aug 2022

46

V. Wylde et al.

146. European Union. Proposal for a regulation of the European Parliament and of the council laying down rules to prevent and combat child sexual abuse. https://eur-lex.europa.eu/legalcontent/EN/TXT/?uri=COM%3A2022%3A209%3AFIN&qid=1652451192472. Accessed 30 Jul 2022 147. European Commission. Commission proposes a trusted and secure digital identity for all Europeans. https://ec.europa.eu/commission/presscorner/detail/en/ip_21_2663. Accessed 30 Jul 2022 148. European Union. Communication from the commission to the European Parliament, the council, The European Economic and Social Committee and the Committee of the Regions European strategy for a better internet for children. https://eur-lex.europa.eu/legal-content/ EN/ALL/?uri=CELEX:52012DC0196. Accessed 30 Jul 2022 149. European Commission. European digital rights and principles). https://digital-strategy.ec. europa.eu/en/policies/digital-principles. Accessed 30 Jul 2022 150. European Commission. A European strategy for a better Internet for kids (BIK+). https:// digital-strategy.ec.europa.eu/en/policies/strategy-better-internet-kids. Accessed 30 Jul 2022 151. European Parliament. European Parliament Resolution of 11 March 2021 on children’s rights in view of the EU strategy on the rights of the child. https://www.europarl.europa.eu/doceo/ document/TA-9-2021-0090_EN.html. Accessed 30 Jul 2022 152. European Commission. Digital services act: commission welcomes political agreement on rules ensuring a safe and accountable online environment. https://ec.europa.eu/commission/ presscorner/detail/en/IP_22_2545. Accessed 30 Jul 2022 153. Updating the digital education action plan. https://www.europarl.europa.eu/legislative-train/ theme-a-europe-fit-for-the-digital-age/file-digital-education-action-plan. Accessed 03 Aug 2022 154. Anti-money laundering and countering the financing of terrorism. https://ec.europa.eu/info/ business-economy-euro/banking-and-finance/financial-supervision-and-risk-management/ anti-money-laundering-and-countering-financing-terrorism_en. Accessed 03 Aug 2022 155. Non-fungible tokens: what are the legal risks? https://www.dlapiper.com/en/us/insights/ publications/2021/09/non-fungible-tokens-what-are-the-legal-risks/. Accessed 03 Aug 2022 156. C. Goanta, Selling LAND in Decentraland: the regime of non-fungible tokens on the ethereum blockchain under the digital content directive. In Disruptive Technology, Legal Innovation, and the Future of Real Estate (Springer, 2020), pp. 139–154 157. European Union. Proposal for a regulation of the European Parliament and of the council on markets in crypto-assets, and amending directive (EU) 2019/1937). https://eur-lex.europa.eu/ legal-content/EN/TXT/?uri=CELEX%3A52020PC0593. Accessed 03 Aug 2022 158. ARPP Codes: ARPP recommendations are the rules of ethics applicable to advertising in France. https://www.arpp.org/nous-consulter/regles/codes-in-english/. Accessed 02 Aug 2022 159. T.R. Gadekallu, T. Huynh-The, W. Wang, G. Yenduri, P. Ranaweera, Q.V. Pham, D.B. da Costa, M. Liyanage, Blockchain for the metaverse: a review. Preprint. arXiv:2203.09738 (2022) 160. An introduction to smart contracts and their potential and inherent limitations. https://corpgov. law.harvard.edu/2018/05/26/an-introduction-to-smart-contracts-and-their-potential-andinherent-limitations/. Accessed 03 Aug 2022 161. Will NFTs push regulators to regulate the metaverse? https://barlaw.co.il/blog/regulation/ will-nfts-push-regulators-to-regulate-the-metaverse. Accessed 03 Aug 2022 162. European Parliament resolution of 20 October 2020 with recommendations to the commission on a digital services act: adapting commercial and civil law rules for commercial entities operating online (2020/2019(INL)). https://www.europarl.europa.eu/doceo/document/TA-92020-0273_EN.html. Accessed 27 Jul 2022 163. European Parliament resolution of 3 October 2018 on distributed ledger technologies and blockchains: building trust with disintermediation (2017/2772(RSP)). https://www.europarl. europa.eu/doceo/document/TA-8-2018-0373_EN.html. Accessed 27 JUl 2022

Post-Covid-19 Metaverse Cybersecurity and Data Privacy: Present and Future. . .

47

164. Digital finance: provisional agreement reached on DORA. https://www.consilium.europa. eu/en/press/press-releases/2022/05/11/digital-finance-provisional-agreement-reached-ondora/. Accessed 27 Jul 2022 165. Proposal for a regulation of the European Parliament and of the council on markets in cryptoassets. https://www.europarl.europa.eu/legislative-train/theme-a-europe-fit-for-the-digitalage/file-crypto-assets-1. Accessed 27 Jul 2022 166. European Union Agency for Cybersecurity. Addressing the EU cybersecurity skills shortage and gap through higher education. https://www.enisa.europa.eu/publications/addressingskills-shortage-and-gap-through-higher-education. Accessed 03 Aug 2022 167. European Union Agency for Cybersecurity. European cybersecurity skills framework. https://www.enisa.europa.eu/topics/cybersecurity-education/european-cybersecurityskills-framework. Accessed 03 Aug 2022 168. Digital Finance: Emerging risks in crypto-assets - regulatory and supervisory challenges in the area of financial services, institutions and markets. https://www.europarl.europa.eu/doceo/ document/TA-9-2020-0265_EN.html. Accessed 27 Jul 2022 169. Come the metaverse, can privacy exist? https://www.wsj.com/articles/come-the-metaversecan-privacy-exist-11641292206. Accessed 30 Jul 2022 170. European Union. Directive (EU) 2019/2161 of the European Parliament and of the council of 27 November 2019 amending council directive 93/13/EEC and directives 98/6/EC, 2005/29/EC and 2011/83/EU of the European Parliament and of the council as regards the better enforcement and modernisation of union consumer protection rules (Text with EEA relevance). https://eur-lex.europa.eu/eli/dir/2019/2161/oj. Accessed 27 Jul 2022 171. European Union. Directive (EU) 2018/1808 of the European Parliament and of the council of 14 November 2018 amending directive 2010/13/EU on the coordination of certain provisions laid down by law, regulation or administrative action in member states concerning the provision of audiovisual media services (audiovisual media services directive) in view of changing market realities. https://eur-lex.europa.eu/eli/dir/2018/1808/oj. Accessed 27 Jul 2022 172. European Union. Regulation (EU) No 524/2013 of the European Parliament and of the council of 21 May 2013 on online dispute resolution for consumer disputes and amending regulation (EC) No 2006/2004 and Directive 2009/22/EC (Regulation on consumer ODR). https://eurlex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32013R0524. Accessed 27 Jul 2022 173. European Union. Directive 2005/29/EC of the European Parliament and of the council of 11 May 2005 concerning unfair business-to-consumer commercial practices in the Internal market and amending council directive 84/450/EEC, directives 97/7/EC, 98/27/EC and 2002/65/EC of the European Parliament and of the council and regulation (EC) No 2006/2004 of the European Parliament and of the council (‘Unfair Commercial Practices Directive’) (Text with EEA relevance). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX %3A32005L0029. Accessed 27 Jul 2022 174. European Union. Council directive 93/13/EEC of 5 April 1993 on unfair terms in consumer contracts OJ L 95, 21.4.1993, p. 29–34. https://eur-lex.europa.eu/legal-content/EN/TXT/? uri=celex%3A31993L0013. Accessed 27 Jul 2022 175. P. Coe, The Draft Online Safety Bill and the regulation of hate speech: have we opened Pandora’s box? J. Media Law, 14, 1–26 (2022) 176. Sentencing Act 2020 c.17 Section 66. https://www.legislation.gov.uk/ukpga/2020/17/section/ 66. Accessed 28 Jul 2022 177. Sentencing council: hate crime. https://www.sentencingcouncil.org.uk/explanatory-material/ magistrates-court/item/hate-crime/. Accessed 28 Jul 2022 178. K. Noti, Injunctions and Article 15 (1) of the E-Commerce Directive: the pending Glawischnig-Piesczek v. Facebook Ireland limited preliminary ruling (2018) 179. M. Shuaib, N.H. Hassan, S. Usman, S. Alam, S. Bhatia, P. Agarwal, S.M. Idrees, Land registry framework based on self-sovereign identity (SSI) for environmental sustainability. Sustainability 14(9), 5400 (2022)

48

V. Wylde et al.

180. M. Shuaib, N.H. Hassan, S. Usman, S. Alam, S. Bhatia, A. Mashat, A. Kumar, M. Kumar, Self-sovereign identity solution for blockchain-based land registry system: a comparison. Mob. Inf. Syst. 2022, 1 (2022) 181. T. Antonio, P. Lilyana, Directive (EU) 2018/843 of the European parliament and of the council. Off. J. European Union 648, 32 (2018) 182. R. Soltani, U.T. Nguyen, A. An, A new approach to client onboarding using self-sovereign identity and distributed ledger. In 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) (IEEE, 2018), pp. 1129–1136 183. P. Coe, The social media paradox: an intersection with freedom of expression and the criminal law. Inf. Commun. Technol. Law 24(1), 16–40 (2015) 184. P. Coe, Media Freedom in the Age of Citizen Journalism (Edward Elgar Publishing, 2021) 185. P. Coe, Anonymity and pseudonymity: free speech’s problem children. Media Arts Law Rev. 22(2), 173–200 (2018) 186. K. Barker, G. Noto La Diega, R. Flaherty, A. Diker Vanberg, Draft Online Safety Bill. Written evidence submitted by the British and Irish Law, Education and Technology Association (BILETA)(OSB0073). BILETA (2021)

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine Learning Jigyasa Grover and Rishabh Misra

1 Introduction The past few years have witnessed the perforation of Machine Learning (ML) into diverse aspects of humankind. The explosive growth and adoption of ML with applications like shopping recommendations, playlist personalization, resume shortlisting, custom email responses, financial portfolio management, self-driving cars, automated medical diagnosis, etc. can be owed to various factors like information overload, the need to automate mundane tasks, advancing the current state of technology, and sometimes just curiosity about the extent of possibilities [1]. Holding the power to discern patterns from humongous amounts of data with minimal human intervention, ML is one of the most fascinating emerging technological trends in the present age. That being, the bloom of ML has accelerated tremendously in the postpandemic era. The COVID-19 pandemic altered the standard groove of the majority worldwide, with the conjecture of it being a deadly virus with an obscure cure. The initial guidance of isolation disrupted the lives of many individuals and industries, however, the digitization of many essential services fueled by technologies like Machine Learning in healthcare, finance, education [2], and others kept the wheels turning and avoided the world from coming to a complete standstill. Consequently, the purview of these ML-powered services and products is amplifying in the post-pandemic world. This propulsion is further powered by the volume and variety of data being precipitated at lightning speed these days [3].

Jigyasa Grover and Rishabh Misra contributed equally with all other contributors. J. Grover () · R. Misra Twitter, Inc., San Francisco, CA, USA e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Hewage et al. (eds.), Data Protection in a Post-Pandemic Society, https://doi.org/10.1007/978-3-031-34006-2_2

49

50

J. Grover and R. Misra

Unquestionably, more data improves the analytical capability of these systems and increases accuracy and granularity. The variety enables distinctive and unsought inferences and predictions, more so with private user data [4]. Leveraging social media activities, camera surveillance, digitized healthcare records, financial transactions, identification documents, etc. under the guise of personalized services is debatable. Furthermore, unchecked usage of such kind of data without accounting for it in the algorithm ushers potential bias. With an increased scope of impact of these intelligent systems, this evolution highlights the need for privacy, of both the data and the underlying algorithm. A slight crack in the design of modernday ML systems can lead to disastrous socio-economic outcomes [5] propelled by cyber-attacks, reverse engineering, and leakage of sensitive data like personal conversations, financial transactions, medical history, and so on. Therefore, it is imperative to retain the confidentiality of data, maintain the privacy of proprietary design, and stay compliant with the latest regulations and policies. In this chapter, we will cover varied types of privacy-defying attacks on ML systems with some case studies of privacy breaches that were detrimental to society. We will then examine the interpretation and value of Privacy Preserving Machine Learning in ensuring trust amongst all stakeholders. We will further discuss some tools to quantify and effectively measure the privacy risks in an ML system along with some privacy-enhancing techniques. In the end, we will close out with some recent data-regulation policy developments and pointers for the readers regarding emerging trends in this domain.

2 The Great Privacy Awakening In March of 2018, The New York Times, along with The Observer of London, and The Guardian were able to gain access to a few cached documents of Cambridge Analytica, a British data and consulting firm. With the assistance of their former employee, it was revealed that Cambridge Analytica had access to all the personal data of Facebook users. Using this purchased data of tens of millions of Americans without their knowledge, Cambridge Analytica was suspected to have built voter profiles to influence the 2016 US elections. This type of technique was nothing short of a “psychological warfare tool” [6]. The groundwork for this tool was laid in 2013 when Kosinski et al. [7] published their research on predicting the private traits and attributes of users using digital records of their behavior online, specifically using freely accessible Facebook data. Back then as well, this research work was deemed to be threatening since it invaded privacy and leveraged the use of personal data without consent, but little heed was paid. The data of close to 87 million Facebook users was collected using a cloudbased Facebook quiz app called thisisyourdigitallife curated by a psychology professor at the University of Cambridge. This quiz aimed to extract their personality traits using a given set of questions, not only from the targeted users but also from all the others present in their friend list. The data from the quiz app was further joined with

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

51

information from their social media profiles like gender, age, relationship status, hometown, current location, events attended, follows, likes, etc. Since there was no other privacy-enhancing processing happening at Facebook’s end at that time, other than data anonymization, it made Cambridge Analytica’s job much simpler. Further investigation revealed that Cambridge Analytica had enough data points and attributes to match users to all the other publicly available records and build highly accurate psychographic profiles of approximately 30 million people. Apart from the 2016 US Elections, Cambridge Analytica came under scrutiny for its supposed ties with Russia and Brexit movements owing to being one of the key providers of massive data for political targeting [8]. In 2020, Facebook paid US$550 million to settle a class-action lawsuit over the use of its facial recognition technology in the state of Illinois. Using data of millions of users to train their face-matching algorithm for their smart photo tag suggestions product got the company in trouble. Like this, there have been various cases of privacy breaches where data was leaked either due to hacking, poor cloud security, or scraping, from various companies like LinkedIn, Pixlr, Wattpad, Target, etc. However, the Facebook-Cambridge Analytica fiasco was the most blood-curdling one. This scandal truly put the spotlight on privacy standards and regulations thereby sparking a public awakening moment worldwide.

3 Attacks on ML Systems Owing to the substantial information possessed by ML systems, they are often the prey to varied attacks. Some examples of such assaults include bypassing spam filters, disfiguring features for failing facial recognition systems, bamboozling road signs to mislead autonomous vehicles, forging voice commands for digital assistants, playing with words for befooling sentiment analyzers, and so on. These attacks can be targeted toward the data, model, and infrastructure aspect of an MLpowered framework thereby exploiting their vulnerabilities. Technically, the attacks on ML systems can be categorized based on the intent of the attacker and the stage at which the attack is made [9]. If an attacker obtains information about the ML system and uses it to further plan their attack, it is termed espionage. If the goal is to disrupt the system it is called sabotage whereas the endeavor to game the system for one’s own advantage is called fraud. These attacks can be done either at the training stage or at the inference stage (Fig. 1). Evasion is one of the most common types of attacks. It focuses on fabricating an input to deceive the model during inference. For instance, in an image classification system, an attacker can introduce just a slight amount of noise in the input image such that it is easily recognizable by a human, but not by the ML system [10]. Poisoning attacks are similar but for models which train on streaming data like in online training. Attackers furnish input examples such that it shifts the decision boundaries in their favor, for example bypassing an email spam filter. This can be achieved by label modification, data injection, data modification, or

52 Fig. 1 Categories of attacks on ML systems

J. Grover and R. Misra

Goal Sabotage Fraud Poisoning Trojaning Training Poisoning Poisoning Backdooring Evasion Inference Privacy Attack Evasion Reprogramming Stage

Espionage

logic corruption. Polyakov [11] puts forward that in a poisoning attack, there is no access to the model or the data and the strategy is to simply add more data points or modify the initial dataset. However, in a Trojaning attack, the attacker gains access to the model and its parameters, using which they reverse engineer the training data to retrain the model with the trojan-stamped data and inject malicious behaviors into the model. This has been historically seen in cases of Transfer Learning. Backdooring is a type of adversarial attack where the model is trained intentionally to misclassify a given input with an added trigger (a secret pattern constructed from a set of neighboring pixels, for example, a white square) to a specific target label [12]. Whereas, in Reprogramming, the original model is repurposed to perform a task desired by the attacker, without them needing to compute the specific desired output. For example, weak access controls on a facial recognition endpoint can lead to them being abused as deepfakes generators [13]. The most relevant types of attacks for our synopsis are Privacy Attacks which seek to expose the model or the data. These types of attacks are extremely threatening especially in cases where the underlying training data or the learning logic are highly sensitive. Before diving into the categorization of these privacy attacks, we need to define the meaning of privacy in ML and what it means for an ML model to have a breach in privacy. Inference about members of the population: We can leverage Dalenius’s desideratum to put forward a pragmatic definition of privacy in ML, by stating that a model should not disclose any attributes of an input to which it is applied, other than what would have been known without applying the model [14]. Or in simpler terms, anything that could be extracted about an input should be possible without access to the model itself. Subsequently, a breach occurs if an attacker can reverse engineer the model’s outputs to infer unwarranted sensitive attributes of the input. Specifically, in cases where the model intelligently discerns a correlation between the input’s highly-sensitive attributes and the target variable, any breach in a single input’s attribute can lead to the exposure of all other inputs as well. Since highly generalizable models predict accurately for samples beyond the training set as well, the resultant correlation is applicable to all the members even when not used for training purposes leading to a privacy breach. Types of attacks that come under this umbrella include Input Inference and Parameter Inference. Inference about members of the training dataset: Defining and maintaining privacy for the entire population is cumbersome, hence customarily the focus is to protect the confidentiality of samples in the training set. Since members of the

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

53

entire population are a superset of the members of the training set, it is essential to comprehend what attributes of an individual training data point a model exposes as compared to the remaining members of the population. The idea is to quantify the liability of a data point being in the training set which is referred to as the membership risk. Membership Inference and Property Inference attacks fall into this category. Usually, privacy concerns in ML arise due to using non-processed public datasets, using confidential data without any anonymization or encryption, sharing trained model snapshots in an unprotected way, exposing parameter configuration, revealing more than needed information including the output with the users, and so on [15]. We discuss different types of privacy attacks based on the goal of the attacker, the technique applied, and the ultimate result in the following subsections.

Data Access Attack The data access attack is one of the most elementary kinds of attack where the aim is to gain access to confidential information. This sensitive data may either be used for training purposes or discerning other useful correlations. The attacker can either tap the data pipelines or leverage the continuous model update pipelines to gather additional information about the private inputs.

Membership Inference Attack A Membership Inference Attack enables an adversary to deduce from an ML system’s responses whether a given data point is a member of the training dataset. This is achieved by recognizing the differences in predictions on the inputs that the target model trained on versus the inputs that it did not train on. In other words, the confidence score dispensed by a model is always higher on its training examples as compared to other unseen input examples. Using this characteristic, the attackers are able to reconstruct the training data used by the model without having access to any of its parameters. A standard technique is to fabricate an attack model that utilizes the target model’s predictions to recognize the differences in the target model’s behavior and use them to distinguish members from non-members of the target model’s training dataset (refer to Fig. 2). In cases of a multi-class target model, the attack model can be a collection of models, one for each output class. This methodology improves the accuracy of the membership inference because based on the input’s actual label (or class), the target model produces different distributions over its output classes. This identification of a training data point is a prime example of information leakage through the model. Consequently, it can lead to direct or indirect breaches of privacy. A straightforward example of this could be the disclosure of a person’s

54

J. Grover and R. Misra

Fig. 2 Membership Inference attack against MLaaS in a blackbox setting

medical history based on their clinical records being used for training [16]. Despite the increased adoption of Machine Learning-as-a-Service (MLaaS), which provides a decent abstraction of the model and its underlying logic, Membership Inference attacks are seeing headway. Membership inference attacks are one of the most popular types, have severe privacy ramifications, and oftentimes, these kinds of attacks lay the foundation for other attacks like evasion, profiling, property inference, etc. In their study, Cristofaro et al. [17] bring out an interesting yet positive use case of Membership Inference Attack. They mention how this can be leveraged by regulators to support the suspicion that a model was trained on personal data without an adequate legal basis, or for a purpose not compatible with the data collection. For instance, using a Membership Inference attack, it was found that DeepMind had used personal medical records provided by the UK’s National Health Service for purposes beyond direct patient care [18].

Input Inference Attack Also known as the Model Inversion attack, an Input Inference attack was first conceptualized by Fredrikson et al. [19]. This attack entails supplying a concealed input to a model to infer sensitive attributes from its output. Private features that are used to train a highly sensitive model can be recovered and used to reconstruct the data previously inaccessible to the attacker. This is achieved by finding an input that maximizes the confidence level returned, subject to the classification matching the target [20]. A successful attack is able to generate realistic and diverse samples that accurately describe each of the classes in private datasets. However, in the specific case of facial recognition, since the output class is unique to each person, model inversion attacks are the most successful. The attacker is able to extract images and other confidential information corresponding to the given input using input and the classifier itself (refer to Fig. 3).

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

55

Fig. 3 Left: An image recovered using a Model Inversion attack model. Right: An image from the training data. In this case, the attacker has access to the name of the person and the facial recognition system that returns a class confidence score

Another specific characteristic of this attack is that it can not decipher whether a particular data point is being used for training or not. In most cases, model inversion produces nearly similar results for members and non-members, with just a slight variation that is impossible to spot without any prior information. Shokri et al. [14] summarize the Model Inversion attack as the one which produces the average of attributes, or features that, at best, characterize an entire output class without reconstructing any specific data point of the training set.

Parameter Inference Attack A parameter Inference attack is a kind of exploratory attack where the goal is to extract the underlying model architecture [21] or hyperparameters [22] of a model, and is also known as a Model Extraction or Model Stealing attack. The idea is to assemble an attack model whose predictive performance on validation data is similar to the target model. This is done by using adversarial examples to make prediction queries on the target model and construct the model parameters (refer to Fig. 4). The process is further accelerated in scenarios where there is no limit to the number of prediction queries one can make in a given period of time. Beyond simply replicating the model and stealing the functionality, the results inferred from this attack can further be used for evasion attacks. Consequently, it also helps in making inferences on the training dataset or recovering feature information.

Property Inference Attack Property Inference attack seeks to infer the various properties of an ML system in regards to training data, learning algorithm, or target variable, other than the ones explicitly encoded as features. Such properties are neither captured in any of the attributes nor are they a part of the learning task, yet unintentionally learned by the model. An elementary example of such a property would be the class distribution in

56

J. Grover and R. Misra

Fig. 4 Model f allows external entities to make prediction queries. An adversary utilizes it for their advantage and makes q prediction queries to extract another model f^ where f^ ≈ f [23]

the training data [15] or the optimization algorithm used for model training [24]. This is achieved by training an attack model to infer whether the target model includes the said property or not. Orthogonal to the other types of attacks discussed previously, which focus on individual data points and attributes of the input samples, property inference attack focuses on the inference of sensitive global properties of the training dataset and the training process.

4 Quantifying Privacy Risks in ML Systems Previously, we discussed the various attacks on ML systems, particularly the details of prevalent privacy attacks. In order to wield ML safely, it is vital to quantitatively assess the privacy risks in various parts of the system. This section puts forward different types of tools and techniques we can use to gauge the privacy risks in the ML system.

Membership Inference One of the foundational studies on this topic [14] reveals that the success of a Membership Inference Attack (MIA) is highly associated with the generalization capability of the model along with the diversity of its underlying training data. In scenarios where the model overfits and does not generalize well to inputs beyond its training data, or if the training data is not representative of the overall data model would infer, the model has a tendency to leak information regarding its training inputs. Thus, overfitting is not only harmful to a model’s predictive power but also increases the risk of leakage of sensitive information about the training data. Furthermore, since different types of ML models may memorize varying amounts of information about the training data due to differences in architectures, the success rate of MIAs is also affiliated with the type of model used.

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

57

Fig. 5 Left: Loss values of a model AS on training data S. Right: Loss values of the same model AS on test data S¯. Since the average loss (i.e. area) across both is the same, the model AS does not overfit according to the standard notion of generalization. However, some populations are penalized more in the test data than others. This discrepancy is captured by distributional generalization [25]

Some recent works highlight that while overfitting is an important factor for evaluating MIA, it may not be absolutely necessary for MIA vulnerability [26]. A study by Kulynych et al. [25] shows that the vulnerability to MIAs can be characterized using an extended notion of generalization. It further demonstrates that disparity is bounded by the difference in levels of distributional generalization across population subgroups, which means that to reduce susceptibility to MIAs using a certain feature set, the distribution of the feature set outside of the training data has to be close to that for the training examples (refer to Fig. 5). Thus, a target model has to learn the distribution of that specific feature set to avoid vulnerability, which is a stricter requirement than what is typically necessary for its main task (i.e. performance in terms of accuracy, or average error). Furthermore, in many cases, the attackers may not be restricted to one of the feature sets only. Therefore, the target model necessitates the learning of such distributions for various combinations of adversarial features. Lastly, to prevent disparity in vulnerability, the distribution of feature sets has to be learned across population subgroups. The aforementioned tasks are all challenging, however, they can provide sufficient protection against MIAs. These evaluation criteria can thus be treated as yardsticks for how robust the target model is. ML Privacy Meter, a tool developed by Murakonda et al. [27], analyzes the susceptibility of ML models to MIAs (Fig. 6). The tool operates by generating different types of attacks on a target model and corresponding inference accuracy assuming either Blackbox or Whitebox access to the model. Whitebox attacks can exploit the gradient of the parameters of the target model, intermediate layer outputs, or prediction of the model to infer training set membership of the input, while Blackbox attacks only use the target model predictions to identify membership. The tool provides privacy risk scores which help in identifying the data records that are under high risk of being revealed through the model parameters or predictions.

58

J. Grover and R. Misra

Fig. 6 Overview of the ML Privacy Meter, which is a Python library that enables quantifying the privacy risks of machine learning models to members in the training data [27]

Input Inference As discussed previously, Input Inference or Model Inversion attacks are able to deduce the values of certain sensitive attributes, particularly Personal Identifiable Information (PII) that is part of the training data. It has been demonstrated that several factors can contribute to an increased risk of model inversion, including overfitting and feature influence [28]. The influence here refers to the impact a feature has on the output of the ML system. It should be noted that not all influential features have the same level of privacy or sensitivity. Therefore, sensitive features that are also highly prominent are the most favorable candidates for inversion. Goldsteen et al. [29] put forward the idea of measuring the sensitivity of the training data attributes to quantify the risk of model inversion and thereafter reduce their influence, especially in a tree-based model. Their work also exhibits that in many cases, a model can be trained in multiple ways without impacting its performance but leading to modified rankings of features in terms of influence. Leveraging this, we can quantify the risk of model inversion privacy attacks by capturing the reliance of the model on the most sensitive attributes in the training data.

Parameter Inference As discussed previously, a Parameter Inference attack exploits the target models’ endpoints by injecting synthetic queries to create a surrogate model, using which

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

59

they steal or reconstruct the functionality of the target model in a blackbox setting. It is highly critical to quantify the risks of these attacks, especially in presentday industrial applications which use ML-as-a-service systems to leverage partial feature vectors as inputs and even include confidence values with output predictions. Tramèr et al. [23] suggest monitoring the amount of information being spat out by these prediction APIs and obfuscating or minimizing the minute details while still maintaining the functionality of these black-box endpoints. A simple example could be to reduce the decimal precision of the model’s output prediction or hide the confidence interval. Furthermore, a thorough evaluation needs to be performed on gauging the well-informedness of the prediction API queries in regard to their response level [30]. Enforcing completeness and a certain format reduces the chances of an external query (made by an attacker) giving away the details thereby facilitating model stealing.

Property Inference In a Property Inference attack, the adversary attempts to infer the properties of the ML model, viz. details of the training dataset, learning algorithm, objective function, or learning target, using only the parameters of the trained model as prior knowledge. Since this attack is more model oriented, it is pragmatic to quantify the risk based on the type and structure of the model in question. For example, in Hidden Markov Models (HMMs) and Support Vector Machine (SVM) classifiers with flattened parameter vectors as the feature representation for models, Ateniese et al. [31] exhibit the relative ease to training a meta-classifier to distinguish whether the model has a certain property or not. However, in Deep Neural Networks (DNNs), since two equivalent networks can have very different vector representations (refer to Figs. 7 and 8), it is difficult for the meta-classifier to capture the common patterns among them along with inferring target properties. To further protect such

Fig. 7 Neuron Permutation: Two equivalent networks whose neurons have different orders [24]

60

J. Grover and R. Misra

Fig. 8 Weight Multiplication: Two equivalent neurons obtained by weights multiplication [24]

Fig. 9 Overview of the ML-DOCTOR framework [32]

models, Wang [24] recommends techniques like model compression and fine-tuning. For model compression, some connections between neurons can be removed to compress the target model, which improves the model’s generalizability as well as reduces the chances of a successful attack. For fine-tuning, in order to reduce the tendency of deep neural networks to memorize arbitrary information, the model parameters can be tuned further on a separate dataset. To perform a holistic risk assessment of these inference privacy attacks against ML models, Liu et al. [32] have designed ML-DOCTOR. It is a modular framework geared to evaluate the different types of attacks thereby enabling ML model owners to assess the potential security and privacy risks of deploying their models. The framework has four different components (refer to Fig. 9): Data Processing Module (to process datasets to mount different attacks), Attack Module (to perform the actual inference attacks), Defense Module (to deploy mitigation techniques against attack), and Evaluation Module (to evaluate the performance of attacks and defenses). This modular design allows for easy integration with additional types of attacks and defense mechanisms, as well as plugging in any dataset or model. Additionally, this open-source framework also serves as a benchmark tool for researchers and practitioners.

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

61

5 Privacy-Preserving Machine Learning Machine Learning has the stunning capability to mimic human intelligence and perform complex problems when trained effectively. Past studies have shown that ML models are as good as the data they ingest [1]. Along with quality, it is imperative to have a large quantity of such data to enable ML models to discern compelling and explanatory patterns using it. One of the conventional ways of having an exhaustive dataset for training models for use cases like text completion, object detection, and so on is by pooling data from multiple sources. One such inferior yet powerful example is the previously discussed Facebook Cambridge Analytica data scandal. Case studies and past research is indicative of the fact that ML models trained on such large quantities of sensitive data in some cases implicate privacy in unexpected ways. For instance, pre-trained public language models that are finetuned on confidential data can be misused to recover private information, or how Clarini et al. [33] showcase generative sequence models unintentionally memorizing unique trainingdata sequences which might be capturing Personally Identifiable Information (PII). As we studied above, the ability to infer even a single user’s identity and conclude if they were part of the training dataset or not is a breach of privacy. Unprecedented data leaks and privacy attacks on ML systems, especially in this era of cloud-based services present interesting challenges for organizations to secure the data and preserve its confidentiality. Researchers, developers, investors, policymakers, and all the stakeholders who are involved in the fabrication of AIdriven data-powered systems have started to put privacy at the forefront. The various techniques being applied to ensure trust between the service providers and users, retain the confidentiality of data, maintain the privacy of proprietary design, and stay compliant with the latest regulations and policies, fall under the umbrella of Privacy-Preserving Machine Learning [34].

6 Privacy-Preserving Techniques As discussed so far, it is vital to mitigate privacy risks in ML pipelines and this section presents a few tools and techniques to assist in this task. Though not exhaustive, we will provide an overview of some of the notable techniques that are applied during different stages of a typical ML workflow, be it data processing or model training to enhance the privacy guarantees, while having minimal impact on the model performance.

62

J. Grover and R. Misra

Differential Privacy In applications such as medical diagnosis, we might not want the learning algorithms to memorize sensitive information about the training data, such as specific medical histories of patients. The notion of Differential Privacy (DP) allows quantifying the degree of privacy protection provided by an algorithm on the underlying (sensitive) dataset it operates on [35]. It works on the intuition of introducing randomness into the learning algorithm such that it obscures the contribution of an individual record, but does not cloud critical statistical patterns embedded in the data. The randomness is achieved by adding carefully tuned noise, often characterized by an epsilon, during the computation making it difficult for hackers to identify any user. This addition of noise leads to a decline in the accuracy of the model, hence there is a trade-off between the accuracy and privacy protection offered. The level of privacy is measured by epsilon and is inversely proportional to the extent of privacy protection offered. This implies the higher the epsilon value, the lesser the degree of protection of the data which in turn leads to an increased probability of sensitive information revelation. In simple terms, it can be said that an algorithm is differentially private if by looking at the output, one cannot ascertain whether any individual’s information was included in the training of the model or not. It is worth noting that DP works better on larger-scale data since as the number of users grows, the effect of any single individual on a given aggregate statistic diminishes. A popular approach for DP in Machine Learning is Differentially Private Stochastic Gradient Descent (DP-SGD). DP-SGD proposes to modify the model updates computed by the most common optimizer used in deep learning: Stochastic Gradient Descent (SGD) [36]. At a high level, the two modifications made by DPSGD to obtain differential privacy are the way gradients are computed and the addition of noise. The gradients are computed on a per-example basis rather than averaged over multiple examples and are first clipped to control their sensitivity. Furthermore, spherical Gaussian noise is added to the sum of gradients to obtain the indistinguishability needed for DP. Despite several positives, DP-SGD is not a foolproof way to provide privacy protection and some recent studies focus on tackling its various limitations. Chen et al. [37] illustrate that the clipping operation creates a geometric bias in the optimization trajectory of the loss landscape for DP-SGD and suggest adding Gaussian noise before clipping (referred to as pre-noising) to mitigate the geometric bias of the mini-batch gradients. Knolle et al. [38] recently showed that DP-SGD can yield poorly calibrated and overconfident deep learning models. They highlight and exploit parallels between DP-SGD and Stochastic Gradient Langevin Dynamics (SGLD), a scalable Bayesian inference technique for training deep neural networks, in order to train differentially private Bayesian Neural Networks (BNNs) with minor adjustments to the original algorithm, i.e. DP-SGD. Their approach provides considerably more reliable uncertainty estimates than DPSGD, as demonstrated empirically by a reduction in expected calibration error.

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

63

Fig. 10 PATE framework: An ensemble of teachers is trained on disjoint subsets of the sensitive data and the student model is trained on the public data labeled using the ensemble [39]

Another way to achieve differential privacy is through model-agnostic private learning. Papernot et al. [39] first demonstrated a generally applicable approach to providing strong privacy guarantees for training data which they referred to as Private Aggregation of Teacher Ensembles (PATE). Their approach combines multiple models in a black-box fashion that are trained with disjoint datasets (refer to Fig. 10), for instance, data from different clusters of users. Since these individual models rely directly on sensitive data, they are not published but instead used as “teachers” for the “student” model. Without directly accessing an individual teacher or their underlying data or parameters, the student model learns to predict an output chosen by noisy voting among all of the teachers. The differential privacy properties hold even if an adversary can not only query the student but also inspect its internal workings. However, Liu et al. [40] report that PATE falls short in explaining its success, specifically for generic cases where the error rate of the optimal classifier is bounded away from zero. They fill in the gap by introducing the Tsybakov Noise Condition (TNC) and establishing stronger and more interpretable learning bounds. These bounds provide new insights into when PATE works and improve over existing results even in the narrower realizable setting. Recently, Majmudar et al. [41] proposed a simple, easy-to-interpret, model-agnostic, and computationally lightweight perturbation mechanism for Large Language Models that can be applied to an already trained model at the decoding stage. The perturbation is obtained by linear interpolation between the original distribution (which is considered to have the least privacy) and the uniform distribution (which apparently has the maximum privacy). They provide a theoretical analysis showing that the proposed mechanism is differentially private, and experimental results show a privacy-utility trade-off. The idea of differential privacy was initially developed by cryptographers, and hence draws most of its language semantics from that field. Lying at the intersection of mathematics, cryptography, and ML, this area is seeing consistent advancement. Due to its success in several public purpose considerations like data utility, data accuracy, data privacy, and security, it is encountering a lot of traction from policymakers and policy-focused audiences interested in the social opportunities and risks of the technology.

64

J. Grover and R. Misra

Federated Learning Typically, when training an ML model, the data is collected from various sources, for example, across all users and their multiple devices, and processed in one central location for ease of management, training, and validation process thereby dispensing better performance. However, this approach presents several challenges in preserving the privacy of user data owing to sensitive data transmission and accumulation. Federated Learning takes a decentralized approach to perform model training to protect user information without ever exchanging them under the assumption that the data is heterogeneous and not IID. With Federated Learning, models are trained on third-party data by sending the model artifacts and training instructions to the location where data is located, for example, a mobile device. The original data collected stays in the original location, reducing data privacy concerns and network bandwidth requirements. Using this localized data, the model is trained and after the training process, only the updated model with the latest weights and parameters is returned (refer to Fig. 11). The output values are then averaged with results from similar training runs that were performed on other such locations, thereby diminishing the risk of data traceability. This technique addresses critical issues such as data privacy, data security, data access rights, and access to heterogeneous data [43]. Federated Learning techniques can enhance model training by reaching greater amounts of data in distributed locations, transferring which to a central location may not be possible, like in the case of self-driving vehicles. It is also useful in situations where data privacy is crucial, such as in the healthcare industry. The prime benefit of Federated Learning from the privacy aspect is the ability of data to

Fig. 11 Collaborative ML without Centralized Training Data. (a) The device updates the model locally. (b) Model updates are aggregated. (c) Updates are synced to form a consensus change to the shared model, after which the procedure is repeated [42]

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

65

remain with its owner, i.e. retention of sovereignty, while still enabling the training of learning algorithms on the data. It should be noted that Federated Learning is different from Distributed Learning as the latter aims at parallelizing computing power, and assumes data is IID whereas in Federated Learning the data at each location can be as diverse. Federated Learning is also unreliable as compared to Distributed Learning since the servers in Federated Learning can be batteryoperated edge devices, IoT gadgets, or smartphones, whereas proper data centers with robust power backups are utilized for the latter. Federated Learning, also referred to as Collaborative Learning, can be implemented in a centralized or decentralized way. In a centralized approach, a central server orchestrates the steps and coordinates the process. The server is responsible for model distribution at the beginning of the training process and for aggregation of the returned models. Because all updates are sent to a single entity, the central server often becomes a bottleneck in this approach. In a decentralized approach, the isolated locations or individual devices, often referred to as the nodes, coordinate amongst themselves regarding the model updation process to update and distribution process. Since model updates are exchanged only between the interconnected nodes without requiring orchestration from a central server, this approach prevents a single point of failure. In either of these approaches, continuous online availability is not a prerequisite since training can be performed offline and results can be returned later. Due to the privacy benefits provided by Federated Learning, it is transforming various industries where data confidentiality needs to be maintained. In banking and financial services, Federated Learning techniques are being used to optimize pricing and expense ratios in portfolio management. Federated Learning enables asset managers, financial advisors, and robo-advisors to maintain their client’s confidentiality relating to the components of the portfolio. Federated Learning also allows them to connect with other investment banks that can provide a fair price during buying or selling a client’s portfolio. In healthcare, a new study led by Mass General Brigham and NVIDIA [44], sensitive medical data from 20 institutions from around the world was used to train the electronic medical record (EMR) chest X-ray AI model leveraging the federated Learning technique. By using data from multiple sources globally, yet maintaining data anonymity, the model is successfully able to predict the future oxygen requirements of symptomatic patients with COVID-19, using inputs of vital signs, laboratory data, and chest X-rays. Furthermore, recent studies by Rieke et al. [45] and Kaissis et al. [46] highlight the importance of Federated Learning in Digital Health, and Medical Imaging respectively. While Federated Learning does not need access to data in a central location and resolves issues like data governance and ownership, it is essential to understand that it does not guarantee complete security and privacy unless combined with other methods like Differential Privacy. Many instances illustrate how privacy could be compromised in a Federated Learning environment. For example, a lack of encryption can allow attackers to steal personally identifiable data directly from the source or interfere with the internode communication process [46]. This communication requirement can be burdensome for large-scale ML models or

66

J. Grover and R. Misra

high-volume data sources. In case the local learning algorithms are not encrypted or the updates are not securely aggregated, data can leak or algorithms can be tampered with, reconstructed, or stolen (referred to as Parameter Inference), which is unacceptable from the viewpoint of intellectual property, patent restrictions, or asset protection [47]. Moreover, neural networks represent a form of memory mechanism, with compressed representations of the training data stored within their weights, a form of unintended memorization. It is therefore possible to reconstruct parts of the training data from the algorithm weights themselves on a decentralized node [33]. Fredrikson et al. [19] in their study illustrate how images can be reconstructed with impressive accuracy and details, thereby emphasizing the fact that model inversion or reconstruction attacks can cause catastrophic data leakage. Along with the adoption of relevant tools, the application of Federated Learning requires ML practitioners to consider refined ways of model development, training, and evaluation with no direct access to or labeling of raw data, with communication cost as a limiting factor. Contemporary use cases indicate that the advantages of Federated Learning from user and privacy perspectives are tremendous and make tackling the technical challenges worthwhile.

Synthetic Data The primary reason for the success of ML in fields like computer vision and speech recognition is the availability of high-volume quality data that helps in training complex deep neural networks [1]. However, many of these datasets used for training highly personalized services often include sensitive information that might raise a lot of privacy concerns. Consequently, privacy-preserving techniques and conventional data anonymization methodologies are useful to protect the users and their sensitive data with the condition that they still support the corresponding ML and data analytics workstreams. It is expected that these techniques do not interfere with the data’s utility and also be scalable. Therefore, existing privacypreserving techniques and conventional data anonymization techniques oftentimes grapple to strike a balance between robust privacy protection and fostering agility and innovation [48], and this is where synthetic data uniquely fills the gap. Synthetic data is artificially annotated information generated using learning algorithms or simulations. Contrary to other privacy-preserving techniques, it does not try to modify, obscure or encrypt the underlying data; instead uses ML to generate new data altogether. Since this artificially generated synthetic data tries to retain all the original attributes, characteristics, and correlations of the original data and leads to similar results from the predictive model, it is much safer to make it public without many privacy risks or fear of violating data protection regulations. A common use case of this is the processing of customer data in the post-GDPR era (refer to section ‘Privacy and Responsible ML’ for more on this) which enforces strict compliance and governance rules for companies. In these

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

67

situations, synthetic data is used to dispense more dexterity and autonomy to the companies for leveraging and releasing the data in a compliant way. The concept of synthetic data is one of the very early ones for preserving privacy. It stems from the technique devised by researchers in the 1990s to share the US Decennial Census without disclosing any confidential information. The US Census Bureau1 has since been actively working on generating synthetic data for releasing data to the public. The last decade has seen increased interest from both the public and private sectors, especially in healthcare and financial services. Ultimately, synthetic data is intended to maintain the perfect balance between privacy protection and data utility, which can also be defined as the analytical completeness and validity of the new data. The artificially generated data should be closest to the original data in terms of the attributes and other statistical properties, along with allowing the drawing of similar insights using the same data analytical methodologies as on the original data. Depending on the limit to which the original data is present in the synthetic data, we can categorize them into two types as described following. Partial Synthetic Data contains a selection of the initial data where only a few attributes are imputed that are at a higher risk of disclosure. Generally, this data leads to higher analytical ability owing to the change in a few attributes only. Drechsler et al. [49] further verifies the claims that partially synthetic data provides higher data quality in terms of lower deviation from the true estimates and higher confidence interval overlap between estimates from the original data and the synthetic data. They highlight the fact that this higher data quality comes at the risk of disclosure though. Fully Synthetic Data on the other hand does not contain any of the original data, and by this virtue provides stronger privacy protection. However, the data quality is lower as compared to partially synthetic data. Drechsler et al. [49] provides suggestions on how to identify which data, partially synthetic or fully synthetic, is better based on the use case. If the original data contains only a few attributes and the imputation models are easy to set up, building a fully synthetic dataset might be better since it would provide the highest confidentiality protection. In case the number of attributes is comparatively higher and it also contains a lot of skip patterns, logical constraints, and values unique to certain subsets, it might be a better approach to create a partially synthetic dataset for public release and include a detailed risk disclosure study in the evaluation for data quality. Synthetic Data Generation Production of synthetic data requires a robust learning algorithm that can accurately model the original data. The synthetic data points are generated based on the probability of a certain data point belonging to the original data. Since neural networks are known to be generalizable models by learning the underlying data distribution proficiently, they are an ideal choice for this purpose. Neural networks are able to generate data points similar to, but not identical to, the original data points. The quality of the synthetic data highly depends on the

1 https://www.census.gov/library/working-papers/2018/adrm/SIPP-Synthetic-Beta.html

68

J. Grover and R. Misra

quality of the underlying model used for the imputation and for many attributes, it is not easy to devise good models. A few state-of-the-art neural networks to generate synthetic data include Variational Autoencoders (VAE), Generative Adversarial Networks (GAN), and Neural Radiance Field (NeRF). One of the most popular domains where synthetic data is often leveraged is computer vision. Along with providing faster and cost-effective data creation at scale, it also helps build edge cases that might be very rare occurrences in real life and contribute to dataset imbalance. Deepfakes [13], when used with caution for a positive use case, is the perfect example of synthetically generated images and videos, often of people non-existent on this planet. In one of the recent studies, Soufleri et al. [50] devise a method using Batch Normalization (BN) layers, pre-trained on original data, to generate synthetic data (refer to Fig. 12). For each class in the original dataset, they use a subset of samples (approximately 25% of the total training samples) of the corresponding class to fine-tune the pre-trained network, and record the running mean and running variance from each layer. The synthetic data is initialized as Gaussian Noise and further optimized to match the recorded BN statistics (mean and variance). Using CIFAR-10 and ImageNet, the authors demonstrate that the synthetic data generated using this method maintained the data utility in terms of ML performance when compared to the original data. They further compare the synthetic images with the original images using Mean Square-Error (MSE), Image Quality Metrics (IQMs), Structural Similarity Index Measure (SSIM), and Haar Wavelet-based Perceptual Similarity Index (HaarPSI), and the results showcase that there is a high degree of

Fig. 12 Synthetic Image Generation [50]

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

69

visual dissimilarity between the original images and the synthetic images obtained from their method. It should be noted, however, that visual dissimilarity does not imply the protection of sensitive attributes of the original data. The authors conduct rigorous privacy-leakage attacks like Gradient Matching Attacks, Model Memorization Attacks, and GAN-based Attacks with an aim to reconstruct the original images of the training dataset to complement the image quality analysis. In their experiments, they observe that even when the synthetic dataset and/or a network trained on this dataset is readily accessible, the attacker might not be able to leak visual information about the original data. Based on these results, the authors vouch for the effectiveness of their method in generating secure and synthetic data for privacy-preserving ML applications. Going beyond neural networks, which are often computationally heavy-burden and are difficult to optimize, Kalay [51] puts forward an approach named Local Resampler (LR) utilizing k-Nearest Neighbors (kNN) to create synthetic samples. LR is a sequential algorithm that iterates through the subsamples, defined from the neighbors produced using kNN, and the synthetic values are drawn from locally estimated distributions. It produces efficient results with minimal efforts for hyperparameter tuning, and it can replicate non-linear and non-convex distributions. LR can further be optimized using non-parametric distributions and more efficient algorithms. It has been demonstrated to replicate real samples very efficiently, especially with multivariate normal distribution. Though synthetic data offers highly appealing advantages, it is not straightforward to realize them without complications. A specialized skill set with advanced AI knowledge is required to generate synthetic data as it needs an indepth understanding of how data is composed and how it impacts the system along with expertise in sophisticated tools to generate and analyze this polymerized data. Some other challenges faced during the process of creating and leveraging synthetic data are realism, bias, and privacy.2 It is tricky to stay true to its definition by accurately representing real-world data while protecting privacy. If the synthetic data is not sufficiently precise, any form of ML sophistication would not yield quality insights. By and large, bias is a prevalent issue in ML-powered systems and the underlying data composition is the leading cause of it. By mimicking the biased original data, the synthetic data might be biased too. Along with taking corrective steps to mitigate bias from data, the ML models should also be adjusted accordingly to account for the bias when training on synthetic data. Despite being one of the trending research topics in the privacy-preserving realm, the application of synthetic data is still in the experimental stage. Though substantial research analysis is performed using original data, synthetic data is generally used for exploratory data analysis or for the public release of data, for example, census data, or data used for education purposes. At present, organizations observe extreme vigilance along with performing a thorough study of data utility before making

2 https://datagen.tech/guides/synthetic-data/synthetic-data

70

J. Grover and R. Misra

Fig. 13 DC-synthesized data can be used for privacy-preserving model training and cannot be recovered through MIA and visual comparison analysis [52]

synthetic data a standard for tasks beyond preliminary research which can and should change as we probe more into the viability of its application.

Data Condensation In one-its-kind research by Dong et al. [52] puts forward the idea of utilizing Dataset Condensation (DC), originally meant for improving training efficiency, as means to generate privacy-preserving datasets thereby pulling two weeds by a single yank. This idea is based on the concept of data distillation, which aims to condense a large training set into a small synthetic set to efficiently train deep neural networks with a moderate decrease in accuracy (refer to Fig. 13). It is to be noted that this is contrary to synthetic data generation, whose goal is to fabricate an identical data set, both in terms of statistical features and volume. The data condensation techniques on the other hand generate informative training samples for data-efficient learning and are usually not suitable for drawing comparable insights as original data. The authors leverage prior work on dataset condensation including Differentiable Siamese Augmentation and Distribution Matching by Zhao et al. [53], and Kernel Ridge-Regression by Nguyen et al. [54] to illustrate the results. They further bridge the concepts of database condensation and differential privacy, and theoretically prove the advantages of dataset condensation using both linear and non-linear feature extractors thereby providing a better trade-off between privacy and utility. Particularly, they observe that the accuracy of models trained on data generated by the state-of-the-art DP-generator (DP-Sinkhorn) is still lower than synthesized images generated using distribution matching. Additionally, through empirical evaluations on standard image datasets like CIFAR-10, FashionMNIST, and CelebA, the authors demonstrate that data condensation methods reduce the adversary’s advantage of membership inference attacks to zero. Furthermore, the condensed data points are perceptually different from the original data in terms of similarity metrics, particularly L2 and Learned Perceptual Image Patch Similarity (LPIPS).

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

71

Auxiliary Techniques The above-discussed approaches to preserve the sanctity of data are just a few of the emerging ones in this domain. There are some elementary techniques that can assist in protecting confidentiality in a straightforward manner. For instance, the simplest one is where we remove granular information from the existing data, a technique referred to as generalization. Unquestionably, this does protect privacy but leads to loss of information and does not serve a valuable purpose. On the other hand, techniques like perturbation modify the existing data by simply adding random noise to the attributes supposed to be shielded. Though the original aggregate tally is accessible, it does lead to the creeping in of various biases if unmonitored. Instead of the addition of noise, there are strategies like pseudo-anonymization where PII fields are replaced by one or more artificial identifiers or pseudonyms. It should be noted that this technique does not safeguard us from re-identification and utmost care must be taken while applying this. Encrypting data is another way of concealing sensitive data. However, traditional encryption approaches usually do not enable insightful processing of encrypted data implying the need for decryption before any type of data usage. A specific form of encryption that enables computations on the encrypted data without decrypting it first is referred to as homomorphic encryption. Iterative research reveals Fully Homomorphic Encryption (FHE), where ciphertexts can be processed with any deep boolean or arithmetic circuits without access to the data, is a fitting tool for privacypreserving ML as it ensures strong security in the cryptographic sense along with satisfying the brevity of communication [55]. Apart from these, there are several other techniques and variations of the above being researched upon. One can choose the most suitable strategy to mitigate the risks of privacy infringements based on the domain of the problem, the use case, the level of effort, and the scope of impact.

7 Privacy and Responsible ML Past privacy breaches and data leaks have ignited a sense of concern in individuals all across the world where they are cautious about how and to whom they provide their personal data. Government officials, regulators, organizations, researchers, and practitioners from various disciplines are increasingly getting involved in data and ML governance.3 The effort is particularly focused on preparing a framework for the responsible and ethical use of data in ML systems in order to maintain the trust of users and society.

3 https://iapp.org/news/a/privacy-and-responsible-ai/

72

J. Grover and R. Misra

Privacy is one of the key pillars of building a responsible ML system, particularly when sensitive data like health records, financial transactions, and personal conversations is used. Apart from elementary principles of privacy like purpose cataloging, data collection and limitation, anonymization, quality checks, accountability, and individual participation, it is imperative to stay compliant with the government-enforced privacy regulations, both at a global as well as regional level. Non-compliant systems pose a threat to the individuals who are the end users of these services and are often the ones whose privacy is compromised, as well as to the owning entity, which is subject to hefty penalties and forced deletion of data, models, and algorithms. As of today, 137 out of 194 countries4 have put in place legislation to secure the protection of data and privacy. Amongst all the data security and privacy laws out there, the ones by the European Union (EU) and the United States of America (USA) are considered to be the most comprehensive ones, as they cover almost every aspect of the vulnerability known so far. The EU’s General Data Protection Regulation (GDPR) came into effect in 2018, after the infamous Facebook-Cambridge Analytica data scandal shook the world. GDPR aims to enrich the control and rights of an individual over their confidential data, and goes beyond data privacy by also addressing the transfer of personal data outside the EU and other territories that come within their economic spans. It thereby sets a strict standard on personal data handling, particularly obtaining the concerned individual’s due permissions while ensuring transparency and security during data collection. It also provides individuals the right to learn what personal data is collected about them along with providing them the right to access it and request its permanent deletion. Irresponsible processing of personal data is the main trigger for the data protection laws and not the models since they are typically thought to be primarily governed by varying intellectual property rights such as trade secrets [56]. It is important to note that GDPR is a regulation and not a directive. It is directly binding and applicable, however, provides flexibility for certain aspects of the regulation to be adjusted by individual member states. Strict enforcement of GDPR has proved worthwhile and many service providers have been heavily fined for violating its provisions. A few instances5 include Amazon for US$823.9 Million in 2021 for tracking user data without acquiring appropriate consent from users, WhatsApp for US$247 Million in 2021 for unclear privacy policies and a lack of transparency in how it was using user data, and Google for US$66 Million in 2021 for failing to give users simple ways to refuse cookies on YouTube. GDPR’s success in ensuring privacy has motivated other countries like Turkey, Mauritius, Chile, Japan, Brazil, South Korea, South Africa, Argentina, and Kenya, who have their data protection and privacy laws now modeled after GDPR. The US counterpart to GDPR is the California Consumer Privacy Act (CCPA), though it is applicable only to the residents of the state of California. CCPA took

4 https://unctad.org/page/data-protection-and-privacy-legislation-worldwide 5 https://termly.io/resources/articles/biggest-gdpr-fines/

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

73

effect in 2020 and is a state statute intended for the improvement of privacy rights and consumer protection. CCPA shares many of its standards with the GDPR, including an individual’s knowledge of personal data collection, condemning the sale of any personal data, requesting deletion, and most importantly, the right to not be discriminated against owing to one’s privacy preferences. A notable difference between GDPR and CCPA, other than the geographic limitation, is the way how financial damages are assessed. CCPA violations have an upper limit of a civil penalty of US$7,500 for each intentional violation and US$2,500 for each unintentional violation, which is quite insignificant in comparison to GDPR. Another contrasting aspect of CCPA is how it offers California residents the right to sue businesses for damages if there is a violation of their consumer rights. CCPA is not the first such data protection and privacy act to be passed by the US government. There are approximately 20 industry or sector-specific federal laws and more than 100 privacy laws at the state level, with 25 privacy related laws in California alone.6 The timeline dates back to around 48 years when the federal government passed the US Privacy Act of 1974 to enhance individual privacy protection. This act established rules and regulations for the collection, use, and disclosure of personal information particularly regarding US government agencies. Following up, the Health Insurance Portability and Accountability Act (HIPAA) came into effect in 1996 and is a federal privacy protection law to safeguard one’s medical information. In 1998, the Congress enacted Children’s Online Privacy Protection Act (COPPA) to protect the privacy of minors aged 13 and below. It specifically applies to apps, websites, and other online services that collect, use, or disclose personal information from children. In 1999, the US government signed the Gramm-Leach-Bliley Act (GLBA) to protect consumers from any financial institution that collects, uses, or discloses personal information. Apart from these key federal laws, many states like California, Colorado, Connecticut, Maryland, Massachusetts, New York, and Virginia, have their own regional laws as well. Though all these privacy laws are aimed at protecting an individual’s privacy rights, based on what their target criteria are, privacy laws can be categorized either with a horizontal focus or a vertical focus [57]. Horizontal privacy laws oversee how organizations use information, regardless of their context. They can encompass data like biometric data, retina scans, fingerprints, and other PII such as names and addresses. These laws reduce the need for sector-specific privacy rules and can easily be extended globally thereby facilitating cross-border data flow and local data-driven economic activity. Vertical privacy laws on the other hand safeguard medical records or financial data, including details such as an individual’s health and financial status. Across the globe as well, many countries have their own regulations to ensure the trust amongst corporations, agencies, and end users is maintained. Some of the examples include include Australia’s Privacy Act, Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA), China’s Personal Information

6 https://www.i-sight.com/resources/a-practical-guide-to-data-privacy-laws-by-country

74

J. Grover and R. Misra

Protection Law (PIPL), Germany’s Bundesdatenschutzgesetz, India’s Information Technology Act (ITA), Indonesia’s Electronic Information and Transactions (EIT), Israel’s Basic Law: Human Dignity and Liberty, Italy’s Garante, Japan’s Act on Protection of Personal Information (APPI), South Africa’s Protection of Personal Information (PoPI) Act, Switzerland’s Federal Act on Data Protection (FADP) and UK’s implementation of GDPR. With so many privacy protection laws regulating present-day ML-powered systems, it is a challenging task to balance them with the ceaseless developments in technology. Additionally, as more hands are involved in the handling of private data digitally, it is vital to understand the interplay between privacy regulations and technological advancement for a future where trust and harmony are maintained.

8 Conclusion As researchers, developers, and policymakers continue to apply and advance privacy-preserving ML processes to enhance privacy guarantees, the scope of impact and growth seems to expand. In this section, we can discuss a few emerging trends to fuel further research and development in this domain. A crucial direction is to adapt the existing Machine Learning approaches to evolving regulations around privacy and confidentiality. On a global level, the Organization for Economic Co-operation and Development (OECD) has counted about 700 AI policy initiatives across 60+ countries.7 For instance, the EU has already implemented the Artificial Intelligence Act under which high-risk AI systems are explicitly being regulated. In the United States., President Biden’s administration has recently released the blueprint of the ‘AI Bill of Rights’ along with other related key agency actions to advance tech accountability and protect the rights of the American public, with one of the five key principles being data privacy. Additionally, the new California Privacy Protection Agency will likely be charged with issuing regulations governing AI by 2023, which can be expected to have a far-reaching impact. Apart from development guided by these principled regulations, on the technical side, there is a need for supplementary research into techniques to enhance privacy at each step of the ML pipeline including data curation, data governance, data architecture, and algorithmic implementation. Post formulation, such techniques should be made as accessible as possible for easy integration in corresponding product-driven implementation in order to drive greater adoption. The adoption of decentralized learning is another focus area in the field to avoid sourcing data in one place due to intrinsic privacy and confidentiality benefits. Apart from apparent benefits for efficiency improvement like faster training with minimum computation and network resource allocation, it also assists in maintaining privacy tremendously. This purview encompasses notions of training

7 https://iapp.org/news/a/privacy-and-responsible-ai/

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

75

algorithms for private federated learning, combining causal and federated learning, using federated reinforcement learning principles, federated optimization, and so on. In addition to the aforementioned directions in the Privacy-Preserving Machine Learning space, there continue to be open questions that are worthwhile to stimulate technical advancement [34]. A few examples include “Is it possible to have tighter theoretical bounds for Differential Privacy training and enable im-proved privacyutility trade-offs?”, “Is training ML models using only synthetic data viable?” or “Is it possible to integrate privacy and confidentiality guarantees in the Deep Neural Network design itself?” Undoubtedly, advances in technology present tremendous opportunities for improvement along with potentially equally significant risks. As technologists, it is recommended to devise leading-edge tools for realizing technology ethics from principles to practice, engage at the intersection of policy and technology, and ensure the continued advancement of technology is trustworthy, privacy protective, and beneficial to our society.

References 1. R. Misra, J. Grover, Sculpting Data for ML: The First Act of Machine Learning. ISBN 9798585463570 (2021) 2. R. Misra, News category dataset. arXiv preprint arXiv:2209.11429 (2022) 3. C.F. Kerry, Protecting privacy in an AI-driven world. www.brookings.edu/research/protectingprivacy-in-an-ai-driven-world (2020) 4. R. Misra, M. Wan, J. McAuley, Decomposing fit semantics for product size recommendation in metric spaces, in Proceedings of the 12th ACM Conference on Recommender Systems, pp. 422–426 (2018) 5. R. Misra, P. Arora, Sarcasm detection using hybrid neural network. arXiv preprint arXiv:1908.07414 (2019) 6. I. Lapowsky, How Cambridge Analytica Sparked the Great Privacy Awakening. www.wired.com/story/cambridge-analytica-facebook-privacy-awakening (2019) 7. M. Kosinski, D. Stillwell, T. Graepel, Private traits and attributes are predictable from digital records of human behavior. Proc. Natl. Acad. Sci. U. S. A. 110(15), 5802–5805 (2013). https://doi.org/10.1073/pnas.1218772110. Epub 2013 Mar 11. PMID: 23479631; PMCID: PMC3625324 (2013) 8. BBC News, Cambridge Analytica ‘not involved’ in Brexit referendum, says watchdog. www.bbc.com/news/uk-politics-54457407 (2020) 9. O. Onyango, Artificial Intelligence and its Application to Information Security Management. https://doi.org/10.13140/RG.2.2.12066.09921 (2021) 10. F.L. de Mello, A survey on machine learning adversarial attacks. J. Inf. Secur. Cryptogr. (Enigma) 7(1), 1–7 (2020) 11. A. Polyakov, How to Attack Machine Learning (Evasion, Poisoning, Inference, Trojans, towardsdatascience.com/how-to-attack-machine-learning-evasion-poisoningBackdoors). inference-trojans-backdoors-a7cb5832595c (2019) 12. A. Salem, R. Wen, M. Backes, S. Ma, Y. Zhang, Dynamic backdoor attacks against machine learning models, in 2022 IEEE seventh European Symposium on Security and Privacy (EuroS&P), (IEEE, 2022), pp. 703–718

76

J. Grover and R. Misra

13. R. Misra, J. Grover, Do not ‘Fake It Till You Make It’! Synopsis of trending fake news detection methodologies using deep learning, in Deep Learning for Social Media Data Analytics, (Springer, Cham, 2022), pp. 213–235 14. R. Shokri, M. Stronati, C. Song, V. Shmatikov, Membership inference attacks against machine learning models, in 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18 (2017) 15. E.U. Soykan, L. Karacay, F. Karakoc, E. Tomur, A survey and guideline on privacy enhancing technologies for collaborative machine learning. IEEE Access 10, 97495–97519 (2022) 16. J. Fowler, 2.5 Million Medical Records Leaked By AI Company. securethoughts.com/medicaldata-of-auto-accident-victims-exposed-online (2021) 17. E. De Cristofaro, An overview of privacy in machine learning. arXiv preprint arXiv:2005.08679 (2020) 18. J. Hayes, L. Melis, G. Danezis, E. De Cristofaro, LOGAN: Membership inference attacks against generative models. Proc. Priv. Enhanc. Technol. (PoPETs) 2019(1) (2019) 19. M. Fredrikson, S. Jha, T. Ristenpart, Model inversion attacks that exploit confidence information and basic countermeasures, in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS ’15), (Association for Computing Machinery, New York, NY, USA, 2015), pp. 1322–1333 20. A. Salem, Y. Zhang, M. Humbert, P. Berrang, M. Fritz, M. Backes, Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246 (2018) 21. S.J. Oh, B. Schiele, M. Fritz, Towards reverse-engineering black-box neural networks, in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, (Springer, Cham, 2019), pp. 121–144 22. B. Wang, N.Z. Gong, Stealing hyperparameters in machine learning, in 2018 IEEE symposium on security and privacy (SP), (2018) 23. F. Tramèr, F.F. Zhang, A. Juels, M.K. Reiter, T. Ristenpart, Stealing machine learning models via prediction APIs, in 25th USENIX security symposium (USENIX Security 16), pp. 601–618 (2016) 24. T. Wang, Property Inference Attacks on Neural Networks using Dimension Reduction Representations (2019) 25. B. Kulynych, M. Yaghini, G. Cherubin, M. Veale, C. Troncoso, Disparate Vulnerability to Membership Inference Attacks. Proceedings on Privacy Enhancing Technologies (2022) 26. Y. Long, L. Wang, D. Bu, V. Bindschaedler, X. Wang, H. Tang, C.A. Gunter, K. Chen, A pragmatic approach to membership inferences on machine learning models, in 2020 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 521–534 (2020) 27. S.K. Murakonda, R. Shokri, ML privacy meter: Aiding regulatory compliance by quantifying the privacy risks of machine learning. arXiv preprint arXiv:2007.09339 (2020) 28. S. Yeom, I. Giacomelli, M. Fredrikson, S. Jha, Privacy risk in machine learning: Analyzing the connection to overfitting, in 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pp. 268–282 (2018) 29. A. Goldsteen, G. Ezov, A. Farkash, Reducing risk of model inversion using privacy-guided training. arXiv preprint arXiv:2006.15877 (2020) 30. A. Marshall, J. Parikh, E. Kiciman, R.S.S. Kumar, Threat Modeling AI/ML Systems and Dependencies. learn.microsoft.com/enus/security/engineering/threat-modeling-aiml (2022) 31. G. Ateniese, G. Felici, L.V. Mancini, A. Spognardi, A. Villani, D. Vitali, Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers. arXiv preprint arXiv:1306.4447 (2013) 32. Y. Liu, R. Wen, X. He, A. Salem, Z. Zhang, M. Backes, E. De Cristofaro, M. Fritz, Y. Zhang, ML-Doctor: Holistic risk assessment of inference attacks against machine learning models, in 31st USENIX Security Symposium (USENIX Security 22), pp. 4525–4542 (2022) 33. N. Carlini, C. Liu, U. Erlingsson, J. Kos, D. Song, The secret sharer: Evaluating and testing unintended memorization in neural networks, in 28th USENIX Security Symposium (USENIX Security 19), pp. 267–284 (2019)

Keeping it Low-Key: Modern-Day Approaches to Privacy-Preserving Machine. . .

77

34. V. Ruehle, R. Sim, S. Yekhanin, N. Chandran, M. Chase, D. Jones, K. Laine, B. Kopf, J. Teevan, J. Kleewein, S. Rajmohan, Privacy preserving machine learning: Maintaining confidentiality and preserving trust. www.microsoft.com/en-us/research/blog/privacy-preservingmachine-learning-maintaining-confidentiality-and-preserving-trust (2021) 35. C. Dwork, F. McSherry, K. Nissim, A. Smith, Calibrating noise to sensitivity in private data analysis, in Theory of Cryptography Conference, (Springer, Berlin, Heidelberg, 2006), pp. 265– 284 36. S. Song, K. Chaudhuri, A.D. Sarwate, Stochastic gradient descent with differentially private updates, in Proceedings of the 2013 IEEE Global Conference on Signal and Information Processing, GlobalSIP’13, pp. 245–248, Washington, DC, USA, 2013. (IEEE Computer Society, 2013) 37. X. Chen, S.Z. Wu, M. Hong, Understanding gradient clipping in private SGD: A geometric perspective. Adv. Neural Inf. Process. Syst. 33, 13773–13782 (2020) 38. M. Knolle, A. Ziller, D. Usynin, R. Braren, M. R. Makowski, D. Rueckert, G. Kaissis, Differentially private training of neural networks with Langevin dynamics for calibrated predictive uncertainty. arXiv preprint arXiv:2107.04296 (2021) 39. N. Papernot, M. Abadi, U. Erlingsson, I. Goodfellow, K. Talwar, Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755 (2016) 40. C. Liu, Y. Zhu, K. Chaudhuri, Y-X. Wang, Revisiting model-agnostic private learning: Faster rates and active learning, in International Conference on Artificial Intelligence and Statistics, pp. 838–846 (2021) 41. J. Majmudar, C. Dupuy, C. Peris, S. Smaili, R. Gupta, R. Zemel, Differentially private decoding in large language models. arXiv preprint arXiv:2205.13621 (2022) 42. B. McMahan, D. Ramage, Federated learning: Collaborative machine learning without centralized training data. ai.googleblog.com/2017/04/federated-learning-collaborative.html (2017) 43. M. McNamara, What is federated learning in AI? www.netapp.com/blog/federated-learning (2022) 44. I. Dayan, H.R. Roth, A. Zhong, et al., Federated learning for predicting clinical outcomes in patients with COVID-19. Nat. Med. 27, 1735–1743 (2021) 45. N. Rieke, J. Hancox, W. Li, F. Milletari, H.R. Roth, S. Albarqouni, S. Bakas, et al., The future of digital health with federated learning. NPJ Digit. Med. 3(1), 1–7 (2020) 46. A.G. Kaissis, M.R. Makowski, D. Rückert, R.F. Braren, Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2, 305–311 (2020) 47. R. Tomsett, K. Chan, S. Chakraborty, Model poisoning attacks against distributed machine learning systems, in Proceedings of Artificial Intelligence and Machine Learning for MultiDomain Operations Applications, vol. 11006, (SPIE, 2019) 48. S. Nappo, Synthetic data vs other privacy preserving technologies. www.datomize.com/ resources/synthetic-data-vs-other-privacy-preserving-technologies 49. J. Drechsler, S. Bender, S. Rassler, Comparing Fully and Partially Synthetic Data Sets for Statistical Disclosure Control in The German IAB Establishment Panel: supporting paper für die Work Session on Data Confdentiality 2007 in Manchester (EUNECE/Programmes, 2007) 50. E. Soufleri, G. Saha, K. Roy, Synthetic dataset generation for privacy-preserving machine learning. arXiv preprint arXiv:2210.03205 (2022) 51. A.F. Kalay, Generating synthetic data with the nearest neighbors algorithm. arXiv preprint arXiv:2210.00884 (2022) 52. T. Dong, B. Zhao, L. Lyu, Privacy for free: How does dataset condensation help privacy? arXiv preprint arXiv:2206.00240 (2022) 53. B. Zhao, H. Bilen, Dataset condensation with differentiable siamese augmentation, in International Conference on Machine Learning, (PMLR, 2021), pp. 12674–12685 54. T. Nguyen, Z. Chen, J. Lee, Dataset meta-learning from kernel ridge-regression. arXiv preprint arXiv:2011.00050 (2020) 55. J.-W. Lee, H.C. Kang, Y. Lee, W. Choi, J. Eom, M. Deryabin, E. Lee, et al., Privacy-preserving machine learning with fully homomorphic encryption for deep neural network. IEEE Access 10, 30039–30054 (2022)

78

J. Grover and R. Misra

56. M. Veale, R. Binns, L. Edwards, Algorithms that remember: model inversion attacks and data protection law. Philos. Trans. A Math. Phys. Eng. Sci. 376(2133), 20180083 (2018) 57. D. Harrington, U.S. Privacy Laws: The Complete Guide. www.varonis.com/blog/us-privacylaws (2022)

Security Analysis of Android Hot Cryptocurrency Wallet Applications Danyal Mirza and Yogachandran Rahulamathavan

1 Introduction Background/Context As the years go by, technology is evolving and the Cryptocurrency market is one sector that has risen in popularity during and after the COVID-19 pandemic, a time where everything in the world was affected, a number of businesses being forced to close due to lack of consumers and sales, and traditional physical systems being replaced by online digital systems. The COVID-19 pandemic resulted in businesses such as grocery shops and restaurants removing physical cash payments and only allowing card payments, to stop the spread of the virus at any means possible. Additionally, the public had to use online website systems to order shopping to their house, use video conferencing applications such a Zoom for school lessons and meetings, and online banking applications to make money transactions. When this digitalisation transition took place, all types of people including the elderly learnt how easy technology is to use and became exposed to secure ledger systems such as the Blockchain, and Cryptocurrency. Then individuals became fascinated with Cryptocurrency, as it runs on a secure distributed ledger system known as Blockchain, allowing them to digitally send and receive virtual cash securely back and forth immediately. Many people were attracted to the usage of Cryptocurrency during the pandemic due to being able to securely send digital funds to others without having to trust or rely on a third party central bank that can experience financial problems and close down due to the pandemic.

D. Mirza · Y. Rahulamathavan () Institute for Digital Technologies, Loughborough University London, London, UK e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Hewage et al. (eds.), Data Protection in a Post-Pandemic Society, https://doi.org/10.1007/978-3-031-34006-2_3

79

80

D. Mirza and Y. Rahulamathavan

The use of technology has benefited people massively over the years, however there were many that did not fully trust or know how to use technical devices or systems such as the elderly. The COVID-19 pandemic was one of the biggest factors in the rise of Cryptocurrency usage due to it forcing everyone into using digital platforms. Below is background information in regards to Blockchain, Cryptography, Cryptocurrency, and Cryptocurrency wallets. Blockchain is a digitally distributed and decentralized system that makes use of cryptographic mechanisms such as encryption and digital signatures to deliver high level security to individuals that make use of the network. It has a huge amount of accessible storage and allows for transactions to be processed very quickly, without a need for an external third party authority to verify the transactions like in traditional transactions. The security mechanisms developed and adopted within the Blockchain network help to prevent potential attackers from gaining unauthorised access into the network and/or modifying data. Cryptography aids Cryptocurrency through many techniques such as encryption and multi factor authentication to ensure that data is protected and individuals are kept safe from external threats. Cryptocurrency is stored on the Blockchain, and is decentralised, meaning it is not run on a central server or run by authority. Examples include Bitcoin and Ethereum. Bitcoin was created by Satoshi Nakamoto in 2009 after the great recession in 2008, where individuals grew distrust towards banking companies [22]. Today, Bitcoin is one of the most popular and is highly valued, with it being valued at $1 in 2011, moving to $60,000 a decade later in 2021 [20]. Additionally, from 2012 to 2020, Bitcoin managed to gain a total of 193,639.36%, just showing how popular Cryptocurrency has gotten over the last decade [73]. When Cryptocurrency is bought, the individual is given two keys, a public key, that is shareable with others to send/receive crypto cash, and a private key, which is a unique code, that must be kept secret and only known by the individual. Cryptocurrency has been used in many countries such as India, Pakistan, Ukraine, Kenya and Nigeria, on a global scale. It has been said by Statista that Vietnam is the top ranked developing country to have the most Cryptocurrency value and transactions, due to being put to good use as an investment tool [13]. Below is a table, made by Chainalysis in 2021, ranking 20 developing countries in terms of their Cryptocurrency adoption rates, with Vietnam being the highest, followed by India and Pakistan [33]. The following table (Table 1) shows how popular the use of Cryptocurrency has risen over the years, from also being adopted by developing countries. Cryptocurrency wallet applications have been developed in the last decade to enable users to easily gather all of their digital currency assets in one place, and help to protect a user’s private key from attackers. There are a wide range of different types of crypto wallets which will be described however the main focus of this article is on non-custodial hot Cryptocurrency wallets, due to this type of wallet being the most used and prone to hackers, as it requires Internet connections to run. Cryptocurrency wallets also enable users to view their overall balances, their list of past transactions, and send and receive crypto currencies. These Cryptocurrency

Security Analysis of Android Hot Cryptocurrency Wallet Applications

81

Table 1 Cryptocurrency country adoption rates [33]

Country Vietnam India Pakistan Ukraine Kenya Nigeria Venezuela United states Togo Argentina Columbia Thailand China Brazil Philippines South Africa Ghana Russian Federation Tanzania Afghanistan

Index score 1.00 0.37 0.36 0.29 0.28 0.26 0.25 0.22 0.19 0.19 0.19 0.17 0.16 0.16 0.16 0.14 0.14 0.14

Overall index ranking 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Ranking for individual weighted metrics feeding into Global Crypto Adoption Index On-chain On-chain P2P value retail value exchange received received trade volume 4 2 3 2 3 72 11 12 8 6 5 40 41 28 1 15 10 18 29 22 6 3 4 109 47 42 2 14 17 33 27 23 12 7 11 76 1 1 155 5 7 113 10 9 80 18 16 62 32 37 10 8 6 122

0.13 0.13

19 20

60 53

45 38

4 7

wallets bring many benefits in terms of security, as it protects user data and ensures only authorised users can access the wallet application [62]. However there is no such thing as a perfectly secure information system; there are always potential threats and vulnerabilities that can negatively affect the system that can open doors to hackers who intend to cause malicious activity and harm, showing the importance of high level security. Hence why this article will consist of identifying security mechanisms within current and trending hot wallets, what vulnerabilities there are for these types of apps, how hackers may manage to carry out these attacks, and how users can stay safe from these attacks.

Research Focus and Purpose Security is a major factor that always needs to be monitored, checked, and identified due to it always posing potential threats over periods of time, as well as the fact that information systems store tonnes of data that is sensitive to businesses and

82

D. Mirza and Y. Rahulamathavan

customers. So if security was not thought about to a high level for a Cryptocurrency wallet app, then the internal sensitive data could be exposed and hacked into, not only causing a big impact upon the company’s reputation, but will also have the users of the wallet lose control of their wallet and their digital assets. A Cryptocurrency holder has a set of keys, with one being the public key that is used for sending and receiving cryptocurrencies from party to party. The other key is called the private key, for the purpose of unlocking incoming transactions and proving ownership of the individual’s digital assets. The main two security vulnerabilities in crypto wallets is losing the private key, and external attackers hacking into the crypto wallet platform application, and stealing the private key and other sensitive information. If Cryptocurrency owners lose their private key, this would result in the individual in possibly never accessing and retrieving their digital assets again, hence why keeping the private key safe is massively important. If the crypto wallet platform were to get hacked into, this would cause the individual to lose their digital assets, and also cause the crypto wallet company reputation to go down, which would make them lose money and customers in the long term. Year by year, more and more Cryptocurrency security breaches and attacks occur, which lead to individual’s sensitive information being hacked, stolen and misused. An article from BBC states that in 2021, hackers in North Korea stole £219 million of Cryptocurrency from seven different attacks on Crypto systems [8]. The attacks consisted of phishing, code exploits and malware to move the sensitive data from the hacked hot wallets, into their own safe addresses. Hot wallets are connected to the internet which is its main vulnerability as these type of wallets are less secure than cold wallets, where cold wallets are disconnected from the internet [18]. Storing digital assets in a cold wallet is deemed more secure, however, not all individuals take this safe option and potentially end up getting hacked and having their data stolen when only solely using a hot wallet [11]. Hot crypto wallets are somewhat secure in general, however as seen from attacks like this, no hot wallet is 100% safe and secure, requiring a need to analyse and investigate exactly how secure hot crypto wallet apps in today’s world really are [58]. New Cryptocurrency wallets are being developed year by year, improving its security levels to protect user’s data against external attackers and to make users feel as though they can trust them. However the most popular crypto wallet apps still get targeted even if they are deemed to be the most popular and most secure, whether it be through brute force attacks, or malware phishing attacks for example. The focus and purpose of this article is to make use of research and investigations to analyse the security measures that current Android crypto hot wallets put in place to protect sensitive data from external attackers such as hackers and unauthorised users. This review will also explore how secure applications are, potentially gathering suggestions of ways users can overcome and protect their wallet from external threats who aim to steal their sensitive information and digital assets. The aim of this chapter is to find, evaluate and analyse the security of hot crypto wallet Android applications. Also to investigate any potential vulnerabilities that may exist in these apps, and demonstrate how attackers can gain malicious

Security Analysis of Android Hot Cryptocurrency Wallet Applications

83

advantage of these vulnerabilities to access and steal data. The article also suggests any improvements that can be made to minimise the risk of these potential threats and vulnerabilities in these apps rising.

2 Literature Review Current Developments and Related Work Uddin et al. [70] carried out a security analysis of Android crypto wallet apps using semi-automated assessment tools on over 300 apps, and manually investigated 18 trending apps from Google’s Play Store. He found many vulnerabilities in the apps; one being the plaintext key being exposed in over 100 crypto wallets, which could result in ownership of the wallet to be lost by the user [70]. He et al. [37] carried out a security analysis of Cryptocurrency wallet applications on the Android OS, leading attack tests to identify security weaknesses in these apps, to show the need of high level security. The two applications used for this analysis were “Huobi Wallet” and “imToken”, and the two attack experiments were testing whether sensitive data was able to be captured from the screen display of the device. He et al. used accessibility service which tracked actions made by the device, allowing user to be stolen. The second experiment tested whether user input data could be captured through the use of USB debugging. He et al. managed to in fact read user input data through USB debugging [37]. Li et al. [46] proposed a security analysis over Cryptocurrency applications in Android’s Google Play Store, putting in place attack measures to test the level of security of the apps, and identifying its risks and presented potential prevention methods to solve these risks. They used a Huawei Honor 7× phone device and made three main tests using eight wallet applications, intending to gain access to confidential information from a backup file through the use of USB debugging, finding confidential information in screenshots through local servers, finding and changing confidential information, and stealing confidential information through accessibility service. The crypto wallet applications included ‘Blockchain Wallet’, ‘Bitcoin.com Wallet’, ‘Bitcoin Wallet’, ‘BitPay’, ‘Mycelium Bitcoin Wallet’, ‘MEWconnect’, ‘Coinbase Wallet’, and ‘Trust Wallet’. Cong Li (et al) then introduced potential defence mechanisms to prevent possible security issues, e.g. using a turning off and on button for incoming hazard features for users, enabling them with options whether to turn off specific features and/or keep others on, depending on the security issues that it notifies them with [46]. Rajendra Sai et al. [59] gathered an investigation of Cryptocurrency wallet apps in terms of its privacy and security levels with comparisons to level of security in financial and banking applications. 48 different Cryptocurrency wallets and 10 banking/trading applications were used for the experiment tests. They used static code analysis to look into the source code of the application, and network data

84

D. Mirza and Y. Rahulamathavan

analysis to track network traffic in order to find any in-app interaction activity. They found that Crypto wallets have more security vulnerabilities compared to finance and banking apps, with two being insufficient cryptography and insecure storage of data. They found that the static code analysis method had many limitations and was not deemed to be a valuable tool in evaluating levels of security [59]. Ghaffar Khan et al. [30] gathered a research paper analysing the security level of hardware wallets including the use of QR codes. Hardware wallets are more secure than hot software based wallets however they found a number of threats where hackers could potentiality steal user’s private keys and digital assets, including through brute force attacks and double spending attacks for example. They later proposed a solution to these threats, explaining how a cold wallet can benefit from being used for private key storage for cross check and verification purposes, and the hot wallet can benefit from being used for sending and receiving transactions through the mobile app, copying addresses, and viewing transaction history. The method enables both private keys to be stored very securely, and also allows users to easily make transactions and view transactions wherever they are on the go [30]. Mukhopadhyay et al. [53] set out a survey of Cryptocurrency systems, consisting of Cryptocurrency, Blockchain, Cryptocurrency mining methods, different types of cryptographic algorithms, as well as the problems that have occurred. They mentioned security breaches that have occurred, showing the lack of security in some Cryptocurrency exchanges, and the need for higher level of security measures to be developed. They touch upon Cryptocurrency problems and vulnerabilities however mainly focus on the efficiency of mining systems and how effective they are in their responsibilities [53]. Tomas Kovalcik [42] carried out a research thesis based on digital forensics and Cryptocurrency wallets. He aimed to explore the security of cryptocurrencies and create a tool to investigate whether any sensitive artefacts of data is accessible and able to be stolen from the two crypto wallet applications, Exodus and Electrum. Different types of crypto wallets, algorithms, and methodologies were explored and used to carry out the forensics experiments. He found that both Cryptocurrency wallets were good at securing sensitive information, however found that Exodus was better as data was extracted but in a way that could not be understandable by a human [42].

Security Attacks Andrew Norry from Blockonomi, in 2020, gathered information about the history events of the Mt. Gox Hack, where the largest bitcoin exchange on a global scale in 2014 ended up being bankrupt that same year, resulting in Mt. Gox users having their funds lost and some user’s funds untouchable for a number of years. A total of approximately 740,000 BTC was taken from customers from this hack, and 100,000

Security Analysis of Android Hot Cryptocurrency Wallet Applications

85

BTC was taken from the company itself. It was said that the BTC funds were stolen from Mt. Gox’s hot wallets (online wallets) [56]. According to the Bitcoinist website, published in 2015, BTER, a Cryptocurrency exchange located in China, had 7170 bitcoins stolen from their cold wallet. Due to cold wallets not having any connection to the internet, it was unknown how hackers were able to hack into and steal Cryptocurrency from the offline wallet [49]. According to NewsBTC, published in 2015, CBE Kipcoin, a bitcoin wallet company, was subject to a hack that lost them around 3000 bitcoins. This attack led by hackers caused CBE Kipcoin to shut down their company [54]. Hackread headlined an article stating how the well-known European Bitcoin exchange company named BitStamp had 19,000 BTC stolen from them through hacker’s phishing attack methods. Hackers used Skype to send phishing messages to a number of BitStamp’s employees that looked very legitimate and were tailored to each employee. Prior to this last phishing attack hack, these hackers were able to hack into BitStamp’s hot wallet to therefore steal data such as user’s ID’s, leading them to then steal the vast amount of Bitcoin from the company [39]. CoinDesk reported in 2016 that the Hong Kong Cryptocurrency exchange, Gatecoin, lost 185,000 Ethers and 250 Bitcoins, equalling to around $two million. Gatecoin mentioned that the hackers attacked and gained access to Gatecoin’s user’s hot wallets, and attempted to hack into their cold wallets but failed due to the cold wallets being harder to gain access into [38]. TechCrunch reported in 2018 that Coinrail, a Korean Cryptocurrency exchange was subject to a hack that lost them over $40 M in altcoins. The hacker’s wallet address was found which helped Coinrail to identify what was exactly taken from them, which were $19.5 M worth of NPXS tokens, $13.8 M from Aston X, $5.8 M from Dent tokens, and $1.1 M from a project named Tron. After the security breach of the attack on Coinrail’s hot wallets, the company went offline to store all of their assets to cold wallets to further manage and inspect the breach [64]. Bloomberg UK reported in 2018 that Zaif, a Japanese Cryptocurrency exchange was hacked and lost $60 M in the process. On September 14th, the hackers gained unauthorised access into Zaif for more than 2 hours and in this time had their hot wallets hacked into, which is where Zaif stored all of their digital funds. This led to Zaif taking their exchange company offline after finding out about the security breach [61]. The Verge reported in 2022 that Trezor hardware wallet users received phishing emails from hackers who gained access into an internal tool within the emailing platform called Mailchimp. Mailchimp reported that more than 100 of its user’s data has been has been stolen from the hackers through their phishing emails. The phishing emails consisted of emails that would notify the users that they their current version of Trezor is dated, and then directing them to a newer version of the application, which was actually a fake version of the app that hackers created in order to steal the user’s personal information to access their hardware wallets. This attack led Trezor to deactivate the accounts that were hacked into, however this was not totally effective due to the hackers still having information about 300 user accounts in total [26]. According to CNBC, published in August 2021, Poly

86

D. Mirza and Y. Rahulamathavan

Network, a company known for storing, controlling and connecting Blockchains, was hacked and had $600 M worth of Cryptocurrency stolen from them. It was said that the hackers identified and took advantage of a vulnerability within the network where they found a piece of Poly Network’s code that enabled them to send Poly Network’s assets to themselves. After announcing the attack, Poly Network officially set out a statement that insisted the hackers to send the hacked cryptocurrencies back to them, where unexpectedly the hackers re-turned almost half of the total amount [40]. According to the BBC, Coincheck were subject to a one of the largest digital currency hacks, where they lost $534 M worth of assets that were stored in their hot wallets. The attack occurred in the night, where Coincheck only discovered this around 8 hours later. Coincheck investigated the hack, identifying how many accounts have been compromised and reported that they were aware of where the digital assets were sent to but was not able to carry out the trace and recovery of these assets. This hack resulted in Bitcoin and other cryptocurrencies such a Ripple’s value to decrease [7]. According to CNBC, published in May 2019, Binance, one of the globally largest Cryptocurrency exchanges and crypto wallet provider, was hacked and lost over 7000 BTC which equalled to around $40 M, and the hackers also managed to access user’s sensitive data through inspecting multi factor authentication code files. Binance announced that the attack was from their hot wallet, where the hackers used multiple attack techniques to steal this large sum of funds [41].

3 Research Methods Research Methodology This research is conducted by collecting data through the use of secondary research, such as reviewing similar past research articles and academic resources, as well as the use of news articles to identify any recent related attacks, and books to help understand how systems may work or what vulnerabilities there are in crypto wallets overall for example. The research will be a mixture of qualitative and quantitative as although the majority of the work will make use of text-based research, to evaluate and back up any possible points with the use of statistical figures, along with evidence in real world scenarios. The work will not only identify the risks and vulnerabilities that crypto wallets have but will also investigate the process of how hackers may attack and gain access into user’s hot crypto wallets and steal their sensitive information. This research makes use of an available Sony Xperia XZ1 phone, running Android 9, to demonstrate how hackers manage to attack into wallets, and later on include methods for users to keep in mind to prevent attackers from hacking their wallet.

Security Analysis of Android Hot Cryptocurrency Wallet Applications

87

Some investigations will consist using accessibility services to further steal sensitive information, taking advantage of USB debugging to steal data from backup files, as well as how rooted phones can be hacked and have information stolen from it. Other researchers have used similar techniques for security analysis over crypto wallets, however this chapter will be going more in depth and demonstrating clearly how hackers may attack hot crypto wallets.

Technical Background Overview Cryptography, Encryption, and Principles Cryptography refers to the security techniques put in place to ensure data is protected and non-understandable by unauthorised individuals through the use of encryption, to turn plain messages into jumbled up letters. Symmetric and asymmetric cryptography are the two different types of cryptography however the cryptography type that Cryptocurrency exchanges and Cryptocurrency wallets use is asymmetric. Symmetric cryptography is an encryption type that can encrypt and decrypt data in a fast amount of time, no matter how large the data is. It makes use of one single key that is shared to receivers, and encrypts or decrypts data using a stream cipher or block cipher. The stream cipher is used to turn plaintext messages into single bits, and then transforming this into ciphertext (encrypted text). Whereas the block cipher, on the other hand, turns plaintext messages into blocks of data, and then transforms this into ciphertext with the use of a key. Stream ciphers generally are faster than block ciphers as they encrypt bits or bytes of data every time, but block ciphers encrypt big blocks of data [44]. The main drawback with symmetric encryption is its single key use, as if this were to be missing or in a vulnerable location, then hackers can easily retrieve the key and decrypt all encrypted messages, leaving the user or company, for example, in a very bad state. However asymmetric, also known as public key cryptography, is where two keys, a public key and private key, is used for encryption and decryption purposes when sending and receiving messages. If an individual called Andy wanted to send a message using asymmetric encryption to another individual called Bella, they would send their public keys to each other, and then Andy would use Bella’s public key to encrypt the message. Then Andy will send the encrypted message over to Bella, and she can decrypt and access this message using her private key due to this key being kept safe and unknown to external parties. Some examples of asymmetric encryption include Rivest Shamir Adleman (RSA), the Diffie-Hellman exchange method, and Elliptical Curve Cryptography (ECC).

88

D. Mirza and Y. Rahulamathavan

RSA RSA gathers two random prime numbers together to create the public key, and then the same two numbers form the private key later on. The key lengths used in RSA are very large; from 1024 up to 15,360 [67]. Diffie-Hellman exchange method: DH is a protocol, developed in the 1970’s, that is used to allow two individuals to securely exchange information to each other even if the sender has not shared the key with the intended recipient [45]. This protocol actually uses symmetric encryption for the encrypted protection of large amounts of data. ECC ECC is an alternative to RSA however makes use of mathematical elliptic curves for public key encryption of data. It uses a small key size, meaning is faster at generating keys and signatures, and has a high level of security due to it being able to create keys that are almost impossible to crack. The Cryptocurrency market makes use of ECC, as the transactions are digitally signed using ECC, to verify the user is the rightful individual to hold, send, or receive the digital cash [4]. The main difference between the two encryptions is that symmetric is a faster process at encrypting and decrypting and asymmetric is slower due to being much more complex. However, asymmetric is used more in today’s world due to it being considered far more secure over symmetric, as the public key is used to encrypt data and is accessible by others but the private key is kept hidden away and only the rightful recipient can decrypt and read the message [21].

Cryptography Security Principles There are four main security properties that cryptography helps to ensure, in a system, and these are confidentiality, integrity, authentication, and non-repudiation [15]. Confidentiality Protecting information from unauthorised users. Dangers of not keeping in mind of data confidentiality can result in users getting their information stolen and used for fraudulent activity [27]. Encryption algorithms such as AES and RSA are used for data confidentiality, helping to turn data into cipher text (encrypted data), meaning no one can view or understand the data except the authorised users that have access to the key to open and read it. Integrity Ensuring that data within a system is accurate and legitimate, and has not been tampered with or modified by attackers [72]. Hashing is used to ensure data integrity, which digests the data and turns the data into a long text string, using an algorithm. If the data is altered, then the hash message also changes, not knowing what the new hash message will be. Authentication The process of verifying someone’s actions in a system, confirming that the user is actually who they say they are, and they are not attackers. Authentication methods include multi-factor authentication protocols to access a

Security Analysis of Android Hot Cryptocurrency Wallet Applications

89

system, such as an additional pattern passcode, or biometrics, such as finger print scanners [23]. Non-repudiation The process of preventing individuals from denying something within a system, whether it be denying they have sent a message, or signed it, when they actually did. Digital signatures help to ensure non repudiation, as it is an electronic signature that is created with the system’s private key and verified with the public key, ensuring that the individual cannot deny this later on, as their actions have been tracked [5].

Blockchain, Types of Blockchain, and Transaction Process The Blockchain is an extremely secure distributed ledger system, responsible for storing, recording, sending and receiving blocks of data, along a decentralized network, meaning no central server is present. This makes the Blockchain secure, as the parties involved in sending and receiving data, communicate directly with each other, and no unauthorised user can try to hack into a central server system, to steal or misuse data. All records of messages/data that is sent on the Blockchain is recorded and encrypted with asymmetric encryption, enabling only authorised parties to receive and access information through their pair of public and private key [60] (Fig. 1). A transaction on the Blockchain is as follows. An individual first requests a transaction, which is then identified as a block of data that is connected to the network to every existing node. Then the transaction is validated and verified using a cryptographic mechanism known as hashing, to ensure data authenticity and data integrity. The block of data is then added to the Blockchain after the node miners

Fig. 1 Blockchain transaction process [17]

90

D. Mirza and Y. Rahulamathavan

are rewarded with Cryptocurrency cash. Then, the recipient successfully receives the transaction, and the transaction is completed. The Blockchain is secured with high level cryptography mechanisms for security of data and examples include digital signatures and hashing, to ensure data integrity, data authenticity, and non-repudiation. Blockchain technology consists of two main components, which are transactions and blocks. Transactions are the movement of value across the network. It occurs when a user on the network, sends an amount of digital currency to another individual through the Blockchain. Blocks are where permanent recordings of transaction data is located. Blocks are mined every 10 minutes and involve information including transaction creation date for example [34].

Types of Blockchain Networks Most cryptocurrencies make use of public Blockchain networks however there are other different types of Blockchain networks, including private Blockchains, hybrid Blockchains, and consortium Blockchains [57]. The available types of Blockchain networks are described and explained below in more detail [55] (Fig. 2).

Public A public Blockchain is a decentralised and permissionless Blockchain network that enable any individual to join and become a node, and carry out public Blockchain

Fig. 2 Blockchain transaction process [24]

Security Analysis of Android Hot Cryptocurrency Wallet Applications

91

processes such as creating blocks of data, and view and verify any transaction records. Public Blockchain networks make use of Proof of Work and Proof of Stake consensus methods to prove authenticity of data blocks created on the Blockchain. Cryptocurrencies such as Bitcoin make use of public Blockchain networks over other type of networks due to allowing individuals to digitally hold Cryptocurrency and/or make transactions without their sensitive information being exposed, which could be a possible attack route for hackers if this information was not kept hidden, making public Blockchain networks highly secure. Public Blockchain networks however experience scalability issues and decreases in performance speeds due to several amounts of nodes being able to join and execute tasks and processes, as well as the fact that the network takes long to carry out transaction consensus.

Private A private Blockchain is a permissioned, partially decentralised, smaller scaled network that is managed by one central server or authority that selects which individuals can become a node on the network, along with what rights each individuals have on the network. Each individual has a specific identity to identify which features they are capable of accessing on the Blockchain. Central authorities in private Blockchains are capable of altering any transaction that they come across if it is needed [71]. Due to the size of private Blockchain networks, transaction speeds are much faster compared to public Blockchain networks. Additionally, only authorised nodes on the Blockchain can view or share certain information, meaning data is kept safe and private from any unauthorised individuals. However with private Blockchains, the level of security is lower. The reason why is due to the lack of nodes on the network, as if some of the nodes were to be identified as bad actors, then this could become a threat and affect consensus methods, resulting in hackers gaining unauthorised access and stealing sensitive data for fraudulent activity intent.

Hybrid A hybrid Blockchain is a totally customisable Blockchain network that takes a mixture of features from both public and private Blockchains. Hybrid Blockchains are managed by a single organisation or authority, and makes all transaction records on the network private but allows individuals to verify them. All transactions that are made on the network cannot be altered in any way by the owner of the network. The Blockchain network owner chooses who has access to certain features of the Blockchain and decides which data is to be kept private and publicly accessible. Transaction data is protected throughout its process from any external individual [29].

92

D. Mirza and Y. Rahulamathavan

Hybrid Blockchains benefit users as transaction are faster and cheaper, where fees can be 0.01$. It delivers controlled access but also still delivers high level security, scalability, and decent transparency. A minor drawback is that the single authority in control may struggle with keeping up with all of the nodes on the system, and the roles they have and features they acquire and are allowed to access. However hybrid Blockchains are known to be the best type of network in today’s world [74].

Consortium A consortium Blockchain is similar to private Blockchains in the way that they are permissioned and managed by a central authority, however are managed by a number of authorities, on a decentralized network. Consortium Blockchains tend to use features from both public and private Blockchain networks [6]. All consensus methods are set by already made nodes, which also have the role of sending, receiving and verifying outgoing transactions along the network. Consortium Blockchains are very secure as they also restrict access to users joining the network, enabling data to be protected by unauthorised users. Additionally, this type of network is very scalable, in terms of how many transactions can be performed and sent over the network per second, as well as the fact that the delivery of transactions is also very fast. However, organisational and managerial issues over the network may rise, due to the several organisation authorities present, and sharing the maintenance of the network between each authority. Additionally, if clashes from authority-to-authority were to take place, then this would negatively affect the reliability of the network [51].

Cryptocurrency Cryptocurrencies are a form of digital currency that runs and is recorded on the publicly accessible Blockchain network. Cryptocurrencies allow individuals to quickly send and receive cryptocurrencies online, no matter where they are in the world, securely, with cryptographic mechanisms put in place. All transactions that are carried out, are publicly, permanently recorded on the Blockchain [75]. There are many different current cryptocurrencies such as Bitcoin, Ethereum, and Litecoin, however Bitcoin was the first ever Cryptocurrency created. In 2008, during a financial crisis, Satoshi Nakamoto thought that banks had too much control over the money given in from people, so created Bitcoin, in 2009 the following year, a peer-to-peer online payment system where individuals can send money to each other instantly without the need of a centralised bank involved [25]. Cryptocurrencies are bought and sold through Cryptocurrency exchanges, which consist of hundreds of different cryptocurrencies. Each Cryptocurrency requires one

Security Analysis of Android Hot Cryptocurrency Wallet Applications

93

Table 2 Main differences between hot wallets and cold wallets Hot wallet Connected to Internet Vulnerable to security attacks (hacks) Access to high amount of funds Easy to use and inexpensive Suited to new Cryptocurrency users Used for sending assets quickly

Cold wallet Not connected to Internet (offline) Very secure Less accessible funds More difficult to use, and costly Suited to experienced users Used for storing large assets over time

node on the Blockchain. Nodes enable transactions to be made, and help keep track of them, to confirm if a transaction has successfully been sent or not [1]. Cryptocurrency holders are not permitted to give any personal data about themselves to make any transactions, which is what makes the Cryptocurrency so secure. This decreases the risk of hackers gaining access to this information and stealing sensitive data [36].

Crypto Wallet, Crypto Wallet Types, and Architecture Cryptocurrency wallets are wallet systems where users are able to send, receive and store multiple Cryptocurrencies all in one place simultaneously. There are two types of crypto wallets which are hot wallets and cold wallets, where this article will focus more on mobile non-custodial hot wallets and its security levels. When Crypto wallets are installed, a unique address is created, which represents the user’s identity when sending, receiving or monitoring payment transactions from the Crypto wallet, acting as their private key [47] (Table 2). Crypto wallets use asymmetric cryptography due to containing a private key and a public key, which is received by the crypto wallet holder when the app is created. The public key is the unique address that is shared with parties in order to receive transactions, whereas the private key is a unique password that should never be shared with anyone, that allows users to log in to the app to use their Cryptocurrency [12]. The way this works is if John wanted to send Amy Cryptocurrency, Amy would have to aware John of her public key unique code address. Once John has received this, then he can insert her public key address into his crypto wallet transaction field, along with how much digital cash he wants to send her, and then Amy will receive this Cryptocurrency securely in a matter of minutes. If technical issues ever rise, users are typically given a seed phrase, as an emergency security mechanism to recover and access their Cryptocurrency in their Crypto wallet again. A seed phrase consists of several random words that users must remember or store safely, to gain access to their wallets again if they run into any errors [50].

94

D. Mirza and Y. Rahulamathavan

Types of Crypto Wallets Hot Wallet Hot wallets are crypto wallets that are connected to the internet. They are fairly easy to use however due to being connected to the internet, this makes them more prone to hackers. Examples of these type of wallets could be through online Cryptocurrency exchange platforms, smartphone/tablet application based wallets or web based wallets that are accessible through a web browser [63]. One type of hot wallet is custodial, which is where a wallet provider is present and is responsible for managing the private keys for the crypto wallet. Whereas the other type of hot wallet is a non-custodial (DeFi Wallet), where the wallet (user) owner has responsibility over their funds and private keys themselves. Non-custodial hot wallets are a lot more favourable as users tend to like having full control over their wallet. However security issues may rise if non-custodial hot wallet users were to not appropriately secure their private keys. If their private key were to be stolen or hacked into, then hackers could use this to gain access into the user’s wallet and steal their funds.

Types of Hot Crypto Wallets Desktop wallets Crypto wallets that are downloaded and installed on a desktop computer or laptop device. This type of wallet is accessible to the user when using the computer, and is very secure, especially when users add an additional anti-virus software application and/or firewalls to their desktop. If desktop wallet users were not to install additional anti-virus software then their computer could get a virus or is more prone to be hacked and have all of their Cryptocurrency funds lost. Mobile wallets Crypto wallets that users download from their application store on their smartphone device. Mobile wallets are simpler and easier to use and understand compared to other hot wallets. They are the most known form of crypto wallet and are even now used in retail stores as a form of payment for shopping. Web (online) wallets Crypto wallets that users access on the web, using any computer type device, wherever they are in the world. Private keys in web wallets are stored and managed online also, by a third party provider which means it is more prone to security threats from hackers.

Cold Wallet Cold wallets are crypto wallets that are not connected to the internet. Cold wallets are more secure compared to hot wallets due to preventing hackers from entering

Security Analysis of Android Hot Cryptocurrency Wallet Applications

95

the wallets through the internet, however cold wallets are generally more complex when maintaining the wallet, so are not for inexperienced users [48].

Types of Cold Wallets Paper wallets A piece of paper that has keys and QR codes printed onto it that can be used for sending and receiving Cryptocurrency. Paper wallets can make transactions by scanning the keys and QR code on the paper, or by writing it in manually into the Cryptocurrency site or app that is used. Paper wallets are a secure cold wallet option due to being away from the internet as a whole. Hardware wallets A hardware device usually in a shape of a USB that stores the private keys inside it. In order to make a transaction, the device must be inserted into a computer system, then enter a password, and then send digital cash, and then the transaction will be successfully completed. However, the user will only be safe from hackers when they eject the device out of their computer.

Architecture of Hot Crypto Wallets As seen, a user’s cryptocurrencies can all be placed in one location, in their crypto wallet. Their digital assets aren’t actually stored within the wallet however, it is stored on the Blockchain, but the user is able to interact with their digital assets using their private key, allowing them to send and/or receive Cryptocurrency, and view past transaction history [69] (Fig. 3).

Fig. 3 Architecture of hot crypto wallets

96

D. Mirza and Y. Rahulamathavan

When making transactions, the user sending the funds needs the recipient’s public key, as this is required to complete the transaction through the Blockchain. If the user wanted to receive funds from another user, they need to send their public key to the sender, to allow them to make a transaction to them. The private key is the key that must be kept safe by the user, as this gives them access to their account and assets. The private key also verifies incoming transactions to them. If the user loses their private key, it is very dangerous, as hackers can steal it and eventually steal the user’s digital assets as well. Also, Cryptocurrency wallet attackers use phishing tactics to fool users into inputting their passwords into fake websites and have them install buggy applications to steal their sensitive information [35].

Android Crypto Wallet Vulnerabilities Vulnerabilities of Crypto Wallets Key storage Key storage is massively important for crypto users to keep in mind, as if they lose their key, share their key with someone, or store their key in an unsafe location that is prone to hackers, then it is possible for their key to become compromised, resulting in their account getting hacked into and digital assets getting stolen. When users create crypto wallet accounts, a private key is automatically generated and enables the user with access to their digital assets. Most crypto wallets require users to create a pin or password to actually access the crypto wallet which must also be kept safe. Confidentiality could be at risk for crypto custodial wallet users if crypto wallet developers have not encrypted user’s private keys effectively, leading to hackers being able to see their keys in plaintext format, which may cause a security breach in which hackers steal the private key in order to steal users digital assets [65]. Additionally, hackers are able to monitor user screens or read user keyboard input which may be a threat to crypto wallet users as in some cases they may need to type in passwords or private keys into the wallet, which is a method that hackers can take advantage of in order to also steal user’s digital assets from user’s private key management. Transaction process When users send crypto currency to other users, the transaction is verified using their private key, which then connects onto the Blockchain, creating a block of data on the Blockchain, and then waits for the recipient to approve and receive the funds to finalise the transaction. Confidentiality and integrity are both at risk here due to the Blockchain network and wallet servers requiring a valid online network connection between each other. This opens a door for attackers as they are able to monitor screen and user’s keyboard input to inspect and tamper with the transaction data, resulting in users becoming misled as the attackers could display wrong information to the users.

Security Analysis of Android Hot Cryptocurrency Wallet Applications

97

Vulnerabilities of Android OS Root privilege With Android OS, Android users are capable gaining root privilege, where they can read and write code, as well as alter or modify settings on the device system that an ordinary user could not. Although this is a beneficial feature for some users who want complete control over their Android device, this increases the risk of hackers misusing the device and potentially compromising the device, and stealing data from the user. Rooting takes away all of the standard security mechanisms put in place by the device’s system already, leading the device to become more vulnerable to potential attacks. When a user roots an Android device, the device becomes completely free, and is prone to untrusted software applications and malware such as viruses, worms and Trojans. These types of malware causes harm to the device, causes the user’s device to continuously crash, as well as delete files from the device [43]. Also, when a device is rooted, hackers are more likely to gain unauthorised access into the device, in order to then access sensitive areas such as backups of data and encryption keys that help to protect data within the device [19]. USB debugging USB debugging is a tool that enables users and developers to perform different types of operations when connecting their Android device to their laptop or desktop computer system through a USB cable lead. ADB (Android Debug Bridge) is the command tool that is responsible for allowing the computer system to debug the connected Android device [16]. The term debugging means removing errors from hardware or software. ADB enables users with root privileged type commands such as installing packages, debugging applications, and sending and receiving files of data. When a user connects their Android device to their desktop computer, they are automatically asked whether they wish to proceed with the debugging or not. USB debugging opens up potential security vulnerabilities as if the device were to be connected to the desktop computer system with USB debugging mode left on, then the desktop is in charge of the management of the device [31]. This means if the desktop system were to have any malware such as Trojans, then this could result in the malware linking and moving itself into the Android device, which would cause data on the Android device to be corrupted and/or stolen. Accessibility service Android OS provides a long-running service, known as accessibility service that helps users with disabilities or users that are new to smartphones in general, to make the best use out of their device [28]. An example of a function is Talkback, which is where the device screen reads all content shown on the device, mainly good for visually impaired individuals. Another function is Voice Access, which enables users to control their device using their voice, allowing them to open applications for example, all from their words, which benefits users that may struggle to use their hands to control the device. When the service is turned on, it will stay running in the background and monitor the user’s input and behaviour

98

D. Mirza and Y. Rahulamathavan

with the device. Lots of different types of applications make use of this service to make the experience for users much easier even if they do not have disabilities. Although this service brings many benefits, it brings security risks as these types of applications take control of the device’s operations which may lead to several types of malware and ways attackers can hack into the device’s system. One example of malware is click jacking, where hackers send users an image that desperately tells the user to click on it, and if they proceed to click on it, then the hackers gain access to the device as the image is actually a fake link for fraudulent purposes. Another security risk is banking Trojan attacks which is where hackers use the accessibility service to read user keyboard input when users enter their sensitive banking information into real banking websites. These potential security attacks lead users to not only lose control over their device but lose all of their sensitive information, to hackers, who steal the information and use it for fraudulent purposes [76].

Android Crypto Hot Wallets and Their Security Mechanisms All crypto wallets serve a purpose of enabling Cryptocurrency users to store, send and receive all types of cryptocurrencies in one location, whether it be online, on smartphones, tablets or computer based systems. The private keys that are given to the users when crypto wallets are downloaded or accessed act as the user’s password which needs to be safe from every other individual or their crypto wallet may be able to get hacked which could lead to their digital assets getting stolen. However there are hundreds of different crypto wallets on the market today that provide a number of security measures to keep their users and their digital assets safe from hackers and external individuals. Below are real Android crypto wallet apps and their given security mechanisms set in place provided to their users (Table 3).

4 Investigation Results/Findings Research and analysis has been carried out explaining how hackers manage to hack into crypto hot wallet systems and take data, from vulnerabilities that has been mentioned earlier, and how hackers use these to their advantage in their malicious activity. A number of ways users can minimise these risks from occurring has also been recommended and identified. This analysis uses Sony Xperia XZ1 running Android 9 for the following investigation and research of these attack methods, showing how attackers may

Version 5.5.0

8.4.8

1.24.3

Wallet name Meta-Mask

Exodus

Coinomi

1M

1M

Total Google Play installations 10 M

Table 3 Security mechanisms of real crypto wallets

Ethereum/Bitcoin Blockchain

Ethereum/Bitcoin Blockchain

Blockchain type Ethereum/Bitcoin Blockchain

(continued)

Security mechanisms 1. 12 word seed recovery phrase to use in cases of recovering digital assets 2. Servers do not hold user’s sensitive information, preventing hackers from taking in-formation if unauthorised access is gained 3. All data is encrypted and can only be decrypted with user’s password, helping to prevent hackers from stealing the sensitive information [52]. 4. User data is not stored on servers. All private key and other data is stored within user’s device, ensuring security from external attackers 5. Users are required to create a password to login to the wallet, and perform operations, and are advised to choose good passwords that include capital letters in the middle of a password rather than at start or end [68]. 6. 12 word seed (secret recovery) phrase is given to users when in need of recovering digital assets or to gain access to the wallet if device is stolen for example 7. Users are able to set up 2 factor authentication to make wallet even more secure from attackers 8. User’s private keys are protected with encryption within the user’s device, helping to prevent hackers from stealing the user’s information and digital assets 9. Users are required to set up a password to log into the wallet and making transactions 10. Key-loggers are incapable of reading keyboard inputs as Coinomi uses its own key-board for inputting passwords or seed phrases for example 11. Data is protected by not tracking transactions; having no IP address; no KYC (Know Your Customer) checking; and providing maximum user anonymity [9].

Security Analysis of Android Hot Cryptocurrency Wallet Applications 99

Version 1.21.13

1.0.10

Wallet name BoltX

3S Wallet

Table 3 (continued)

10 K

Total Google Play installations 10 K

Ethereum Blockchain

Blockchain type Ethereum Blockchain

Security mechanisms 12. Users are required to fill in a 12 word seed phrase to recover their wallet and/or digital assets if the user may have accidentally deleted their app for example 13. Users must create and set up a secure password to log in and access the features of the wallet, helping to prevent hackers from stealing the user’s data and assets [32]. 14. Users are required to fill in a 12 word seed phrase if ever the case that they lost or forget their PIN code, to recover their wallet and ensure their digital assets is safe 15. Users are required to set up and always input a PIN code into the app before they can access the wallet and start storing and managing their Cryptocurrency, adding another layer to prevent hackers from gaining unauthorised access into the user’s wallet [10].

100 D. Mirza and Y. Rahulamathavan

Security Analysis of Android Hot Cryptocurrency Wallet Applications

101

Fig. 4 Screenshot of accessibility service within Android device

carry out these methods to gain unauthorised access to a user’s crypto wallet and steal information from them.

Using Accessibility Services to Steal Information As stated before, accessibility service is a helpful tool that users who are disabled or who are not familiar with technology and devices can use to get the best use out of their Android phones. However, when accessibility permissions is turned on, this opens up doors for hackers, as this feature monitors and tracks content displayed on the screen of the user’s device. Everything is monitored except passwords fields that users are required to enter, which helps to protect against external attackers, however hackers have techniques to acquire these passwords that are entered (Fig. 4). Attackers can capture user data keyboard input from third party password manager applications such as ‘1Password’, where users may type and store their passwords to protect them safely and auto fill them into their crypto wallet apps in a speedy process when required. Some of these applications make use of their own style keyboard that is insecure which opens up doors for hackers to listen in to the clicking of user input. When users input data into a password manager app that makes use of their own style keyboard, visual feedback results are produced and displayed, which prompts TYPE WINDOW – STATE CHANGED events. This allows hackers to listen in and monitor user input when users type their Cryptocurrency wallet passwords, gaining access to the user’s wallet and steal digital assets.

102

D. Mirza and Y. Rahulamathavan

If a user were to use 1Password as their password management app to store their private key password or seed phrase, hackers simply learn the layout of 1Password’s keyboard so they can recognise which key is being clicked on at any moment, allowing the hackers to find out a user’s crypto wallet password, causing the user to lose access to their account and their digital assets. It depends on the Crypto wallet android application used, as the apps that do not allow users to input data through other third party applications are secure and safe from hackers in this case, whereas if the crypto wallet does enable users to input data from third party applications then the user and their sensitive data is not safe from potential hackers. Coinomi is one crypto wallet that was found that uses their own keyboard, and therefore prevents attackers from reading keyboard inputs when their users are filling in their passwords, however not all crypto wallet apps do have their own secure keyboard.

Potential Defence Mechanisms This attack is very dangerous as attackers are able to recognise and learn exactly what user’s crypto wallet passwords are. However this risk can be minimised by users if they do not store passwords in a third party application, and actually keep their passwords safe, on paper so that when in need of inputting PIN’s or passwords, users are able to read and type their password into their crypto wallet application first hand, and not risk having their private keys and passwords compromised through third party password management apps.

Using USB Debugging to Take Information from Backup File Hackers are also able to take information from backup files with the USB debugging tool. As already stated above, USB debugging is a tool that users perform by connecting their android device into their laptop or desktop computer system with a USB cable to carry out different operations including removing errors from software for example. The command tool that enables the debugging function to happen is ADB (Android Debug Bridge). Users are able to turn on USB debugging either through a USB cable connected with a desktop device, or also by going into their device settings and clicking on ‘build number’ seven times to access developer options, which will state an option to turn on the tool (Fig. 5). A vulnerability of debugging an android device when connecting to a desktop is the fact that the computer becomes the controller of the device, and so if attackers were to gain access into the computer through types of malware then the computer can become compromised, and the user would have no idea about this as no notification appears on their device’s screen.

Security Analysis of Android Hot Cryptocurrency Wallet Applications

103

Fig. 5 Screenshot of USB debugging tool in developer options on Android device

Fig. 6 Android allow backup parameter within Android devices [3]

The ability to create a backup of application data on Android devices is automatically turned on with the parameter ‘android:allowBackup = true’. This backup functionality can also be turned off when replacing ‘true’ with ‘false’ to become ‘android:allow-Backup = false’. When users create backups themselves, it is uploaded to their Google Drive (Fig. 6). However due to backups already being on by default, most backup features in applications are left on, which mean attackers can use this to their advantage and steal data by gaining control of the user’s computer through malware, when it is debugging the connected Android device, and then create the backups of the crypto wallet application data, which involve sensitive information such as passwords, private keys, and recovery seed phrases.

104

D. Mirza and Y. Rahulamathavan

Not only are hackers able to take application data through backup that is newly created, but they may even look into local backups that the user has previously made and stored in their device already, and steal these files that involve any sensitive information from the user such as crypto wallets private keys and passwords. When an attacker has taken the backup files, they then extract the file to inspect and access its contents. Once the files are extracted, they will have access to the user’s private keys, recovery seed phrases and other types of sensitive information stored in the crypto wallet application by the user.

Potential Defence Mechanisms This attack is also very dangerous as attackers can gain advantage from the backup feature in applications that is turned on by default in the Android system. However users can prevent hackers from entering their device by simply only turning on USB debugging when they actually need it, so when they are not using it they must turn it off to protect their device from getting hacked and having malicious software spread on it. Additionally, users can also turn off the backup feature by changing the parameter to ‘android:allowBackup = false’, helping to disallow backup files to be created within applications on the user’s device.

Taking Information from a Rooted Device As stated above, rooting is a process that allows users to access more features than what their device currently allows them to from factory. Some of these features include having full control over applications on the device, enabling users to read and write code and alter a lot more system settings. There are a range of different types of rooting applications including KingRoot, RootMaster and One-Click-Root, offering different features and ways to root devices (Fig. 7). However, hackers are able to tamper with, and take information from a rooted device due to it removing a number of security features that helped to prevent malicious attacks. Rooting opens doors for hackers and malicious software that could put a user’s device in very big risk, especially with a crypto wallet installed, holding their private key information and access to their digital assets [14]. This malicious software involves worms, viruses, spyware, and Trojans, that can take user’s device data, delete data, and slow down the device’s app. Rooting makes users devices much more vulnerable as the device is less secure, enabling hackers to compromise the device much more easily, gaining access to backups of data and crypto wallet private keys for example. Any user is able to root their device, and there are ways to still protect devices after rooting however not all users may be aware of this, which makes rooting a big risk, especially when the user

Security Analysis of Android Hot Cryptocurrency Wallet Applications

105

Fig. 7 Screenshot of KingRoot rooting application for Android devices

has a crypto wallet installed onto their device, without the factory security features in place [66]. When a user roots their device they are able to install applications from anywhere on the internet, not only Android’s Google Play Store. This may seem convenient but is very risky as if a user were to install a crypto wallet application from outside of the Google Play Store, then it is more likely to be insecure, and in some cases may even be a hacker’s app to lure users in, to then steal their data.

Potential Defence Mechanisms However to prevent malicious software such as worms, viruses, spyware, and Trojans from getting onto the user’s device, the user can install trusted anti-virus software from the Google Play Store. This will keep the user’s device safe from having data on their device tampered with, deleted and hacked into by attackers. This form of attack can lead to malware removing and corrupting user data, however users can create backups of their overall device data if ever in need of recovering their data or downloading their data again. This will come in handy especially if malware gets onto a user’s crypto wallet application, corrupting and disabling user’s access to the wallet and their assets. Creating a backup of all this data will allow users to download the data back again and have all of their passwords and transaction history files still available to them [2].

106

D. Mirza and Y. Rahulamathavan

5 Discussions The basis of Cryptocurrency and Blockchain has been explained, going through the transaction processes of sending and receiving cryptocurrencies, as well as the importance of reliable key storage. Real life Android hot wallets were identified, and their security measures were found, showing how Cryptocurrency exchanges aim to protect their user’s data and their digital assets. An example of a widely used security measures found in majority of the popular apps was the use of a 12 word recovery seed phrase, which is a set of words that users can insert to recover their private key, in emergency cases of accidentally uninstalling their wallet or losing their private key. The vulnerabilities of Android crypto wallets were found and analysed, split in to two parts, gathering vulnerabilities of crypto wallets in general, which involved the transaction process and how transactions require internet connections, opening up doors to hackers to listen in and read user keyboard input. The second vulnerability within crypto wallets found was implications relating to key storage, explaining how if users do not keep their private key to themselves and in a safe locations then their wallet can be potentially become compromised to hackers. Vulnerabilities of Android’s operating system has also been identified, explaining how accessibility services, USB debugging and root privilege are all tools that can be taken advantage of by hackers if users do not turn them off or secure their device appropriately. Not only were these vulnerabilities identified but investigations were made to demonstrate how hackers could be able to steal sensitive information from user’s wallets through these vulnerabilities mentioned. The results show that all of these vulnerabilities could possibly have users lose control of their wallets, and have their private keys and digital assets stolen. After the investigation, suggestions of defence mechanisms were proposed to provide users with measures to take to prevent hackers from compromising their wallets involving their sensitive information and digital assets being stolen.

6 Conclusion As the years go by, more and more people invest and trade with cryptocurrencies and own Cryptocurrency wallets to store their digital assets all in one place, showing the need for high level security within the applications. There are alternative wallets to hot wallets that protect users from attackers more, however users in today’s world are pulled towards the idea of having an application or website that they can access through their mobile phone or desktop computer system that they use every day. Hence why this article focused more on hot wallets and measured how secure these types of Android Cryptocurrency wallets really are in this day and age. In terms of future directions, this security analysis can advance by physically trying to extract sensitive information out of a number of crypto wallets on devices

Security Analysis of Android Hot Cryptocurrency Wallet Applications

107

first hand, and inspect whether data is able to be read or not. This could actually show in depth security of applications that are used today, and can help users to decide and question where they store their digital assets. This chapter shows the need and importance of high level security measures over this rising world of Cryptocurrency. Researchers from resources that have been mentioned, had time and knowledge to extract data from real life crypto wallets, showing that it is possible and shows how easier it may be for professional hackers. This is not to say that Android Cryptocurrency hot wallets are not secure. Millions of individuals around the world invest in Cryptocurrency and store their assets in wallets and experience no problems at all, however the massive attacks and security breaches in Cryptocurrency exchanges in news articles over the last decade show how no information system is safe. Even the largest and most popular systems can still be hacked, not only showing the need for higher level of security measures from Cryptocurrency exchanges and wallet developers, but also for crypto wallet users to be cautious about which Cryptocurrency wallet they decide to store their digital assets in, and understand how they can keep their digital assets even more safe, and act upon it.

References 1. N. Amiet, Blockchain Vulnerabilities in Practice (2021) [online], Available at: https:// dl.acm.org/doi/pdf/10.1145/3407230. Accessed 5 Sept 2022 2. AndroidRecovery.com, How to Keep Android Phone Safe and Secure after Root (2016) [online], Available at: https://www.androidrecovery.com/blog/keep-android-phone-safe-afterroot.html. Accessed 5 Sept 2022 3. N. Arora, Android’s Attribute Android: Allow Backup Demystified (2020) [online], Available at: https://betterprogramming.pub/androids-attribute-android-allowbackup-demystified114b88087e3b. Accessed 5 Sept 2022 4. Avi Networks, What is Elliptic Curve Cryptography? Definition & FAQs | Avi Networks (n.d.) [online], Available at: https://avinetworks.com/glossary/elliptic-curve-cryptog-raphy/. Accessed 5 Sept 2022 5. R. Awati, What Is Nonrepudiation and How Does It Work? (2021) [online], Search Security, Available at: https://www.techtarget.com/searchsecurity/definition/ nonrepudia-tion#:~:text=In%20online%20transactions%2C%20digital%20signatures,with%20a%20corresponding%20public%20key. Accessed 5 Sept 2022 6. A. Banerjee, Everything You Need to Know About Consortium Blockchain (2021) [online], Blockchain-council.orgx, Available at: https://www.blockchain-council.org/blockchain/ everything-you-need-to-know-about-consortium-blockchain/. Accessed 5 Sept 2022 7. BBC News, Coincheck: World’s Biggest Ever Digital Currency ’Theft’ (2018) [online], Available at: https://www.bbc.co.uk/news/world-asia-42845505. Accessed 5 Sept 2022 8. BBC News, North Korea Hackers Stole $400m of Cryptocurrency in 2021, Report Says (2022) [online], Available at: https://www.bbc.co.uk/news/business-59990477. Accessed 5 Sept 2022 9. O. Beigel, Coinomi Review – My Personal Experience (2022 Updated) (2021) [online], 99 Bitcoins, Available at: https://99bitcoins.com/bitcoin-wallet/coinomi-review/. Accessed 5 Sept 2022 10. BHO Network – Bring blockchain to life, Consumer-Friendly Features 3S Wallet Brings to Crypto Users (n.d.) [online], Available at: https://bho.network/en/consumer-friendly-features3s-wallet-brings-to-crypto-users. Accessed 5 Sept 2022

108

D. Mirza and Y. Rahulamathavan

11. Bitcoin.org, Securing Your Wallet – Bitcoin (2022) [online], Available at: https://bitcoin.org/ en/secure-your-wallet. Accessed 5 Sept 2022 12. H. Boyda¸s Hazar, The Importance of Regulations on Cryptocurrency Transactions (2019) [online], Available at: http://drhulya.com/im-ages/Regulations_on_Cryptocurrency.pdf. Accessed 5 Sept 2022 13. K. Buchholz, Infographic: Where Cryptocurrency Is Most Heavily Used (2022) [online], Statista Infographics, Available at: https://www.statista.com/chart/26757/cryptocurrencyadoption-world-map/#:~:text=Among%20developed%20countries%2C%20cryptocurrency%20use,also%20registered%20as%20heavy%20users. Accessed 5 Sept 2022 14. Bullguard.com, The Risks of Rooting Your Android Phone – BullGuard (2012) [online], Available at: https://www.bullguard.com/bullguard-security-center/mobile-security/mobile-threats/ android-rooting-risks.aspx. Accessed 5 Sept 2022 15. G.C. Kessler, An Overview of Cr View of Cryptography (2016) [online], Available at: https:/ /commons.erau.edu/cgi/viewcontent.cgi?article=1137&context=publication. Accessed 5 Sept 2022 16. Cisco Meraki, Enabling Device Owner Mode using Android Debug Bridge (ADB) (2022) [online], Available at: https://documentation.meraki.com/SM/Device_Enrollment/ Enabling_Device_Owner_Mode_using_Android_Debug_Bridge_(ADB)#:~:text=Android%20 Debug%20Bridge%20(adb)%20is,available%20to% 20a%20connected%20device. Accessed 5 Sept 2022 17. Commons.wikimedia.org, File:Blockchain-Process.png – Wikimedia Commons (2017) [online], Available at: https://commons.wikimedia.org/wiki/File:Blockchain-Process.png. Accessed 5 Sept 2022 18. L. Conway, What Are the Safest Ways to Store Bitcoin? (2021) [online], Investopedia, at: https://www.investopedia.com/news/bitcoin-safe-storage-cold-wallet/ Available #:~:text=Every%20wallet%20contains%20a%20set,never%20see%20her%20bitcoins%20 again. Accessed 5 Sept 2022 19. Ctl.io, Why to Not Trust the Root User with Your Data – CenturyLink Cloud Developer Center (2017) [online], Available at: https://www.ctl.io/developers/blog/post/why-to-not-trustthe-root-user-with-your-data. Accessed 5 Sept 2022 20. L. Daly, What Is Bitcoin? Definition and How It Works (2022) [online], The Motley Fool, Available at: https://www.fool.com/investing/stock-market/market-sectors/finan-cials/ cryptocurrency-stocks/bitcoin/. Accessed 5 Sept 2022 21. B. Daniel, Symmetric vs. Asymmetric Encryption: What’s the Difference? (2021) [online], Trentonsystems.com, Available at: https://www.trentonsystems.com/blog/symmetricvs-asymmetric-encryption. Accessed 5 Sept 2022 22. W. Duggan, The History of Bitcoin, the First Cryptocurrency (2022) [online], Available at: https://money.usnews.com/investing/articles/the-history-of-bitcoin. Accessed 22 Aug 2022 23. M.E. Shacklett, L. Rosencrance, What Is Authentication? (2021) [online], Search Security, Available at: https://www.techtarget.com/searchsecurity/definition/authentication. Accessed 5 Sept 2022 24. K.E. Wegrzyn, E. Wang, Types of Blockchain: Public, Private, or Something in Between (2021) [online], Available at: https://www.foley.com/en/insights/publications/2021/08/typesof-blockchain-public-private-between. Accessed 5 Sept 2022 25. Euromoney.com, Blockchain Explained: The Difference Between Blockchain and Bitcoin | Euromoney Learning (2021) [online], Available at: https://www.euromoney.com/learning/ blockchain-explained/the-difference-between-blockchain-and-bitcoin. Accessed 5 Sept 2022 26. C. Faife, Hackers Breached Mailchimp to Phish Cryptocurrency Wallets (2022) [online], The Verge, Available at: https://www.theverge.com/2022/4/4/23010317/hackers-mailchimptrezorcryptocurrency-phishing. Accessed 5 Sept 2022 27. GeeksforGeeks, The CIA Triad in Cryptography – GeeksforGeeks [online], Available at: https://www.geeksforgeeks.org/the-cia-triad(2021a) 20authoin-cryptography/#:~:text=Confidentiality%20means%20that%20only% rized,be%20accessed%20by%20unauthorized%20individuals. Accessed 5 Sept 2022

Security Analysis of Android Hot Cryptocurrency Wallet Applications

109

28. GeeksforGeeks, What Is Accessibility Service in Android? – GeeksforGeeks (2021b) [online], Available at: https://www.geeksforgeeks.org/what-is-accessibility-service-in-android/ . Accessed 5 Sept 2022 29. D. Geroni, Hybrid Blockchain: The Best of Both Worlds (2021) [online], 101 Blockchains, Available at: https://101blockchains.com/hybrid-blockchain/. Accessed 5 Sept 2022 30. A. Ghaffar Khan, A. Hussain Zahid, M. Hussain, U. Riaz, Security of Cryptocurrency Using Hardware Wallet and QR Code [online] (2019), Available at: https://ieeexplore.ieee.org/stamp/ stamp.jsp?arnumber=8966739&casa_token=FG8Z4k7H6sQAAAAA:nWtunCEVwu39sHf5z CSmJjmvLrMJe0wey7Cq7fbMZlx1JZHy85aXonM9IU7ipJyrEq0DVFyh3Q. Accessed 5 Sept 2022 31. A. Ghosh, Why You Should Not Keep Your Smartphone USB Debugging Enabled (2021) [online], The Customize Windows, Available at: https://thecustomizewindows.com/2021/08/ why-you-should-not-keep-your-smartphone-usb-debugging-enabled/. Accessed 5 Sept 2022 32. B. Global, Experience BOLT X Like Never Before (2021) [online], Medium, Available at: https://medium.com/bolt-global/experience-bolt-x-like-never-before-e567e35510a5. Accessed 5 Sept 2022 33. Go.chainalysis.com, The 2021 Geography of Cryptocurrency Report (2021) [online], Available at: https://go.chainalysis.com/rs/503-FAP-074/images/Geography-of-Cryptocurrency2021.pdf. Accessed 5 Sept 2022 34. M.H. Miraz, M. Ali, Applications of Blockchain Technology Beyond Cryptocurrency (2018) [online], Available at: https://arxiv.org/abs/1801.03528. Accessed 5 Sept 2022 35. H. Hasanova, U. Baek, M. Shin, K. Cho, M. Kim, A Survey on Cybersecurity Vulnerabilities and Possible Countermeasures Blockchain (2019) [online], Available at: https://onlinelibrary.wiley.com/doi/pdf/10.1002/ nem.2060?casa_token=mHiylzZ0edYAAAAA:991EEPYYzEcehrzEVQk9vHrQWGKjMcuB Fq47maxv9hdzDvpYewe6DzrfsoNin_jwmOCOL0Ae-K-frg. Accessed 5 Sept 2022 36. M. Hashemi Joo, Y. Nishikawa, K. Dandapani, Cryptocurrency, A Successful Application of Blockchain Technology (2019) [online], Available at: https://www.emerald.com/insight/ content/doi/10.1108/MF-09-2018-0451/full/html#sec001. Accessed 5 Sept 2022 37. D. He, S. Li, C. Li, S. Zhu, S. Chan, W. Min, N. Guizani, SecuCryptocurrency Wallets in Android-Based Applications Analysis of rity (2020) [online], Available at: https://ieeexplore.ieee.org/abstract/document/ 9143206?casa_token=7aTYe8k87o0AAAAA:aCPdT7K4e8p2EGQUVX1jqhp_slt_LMMLIWX XVuAxJytJuZYiDEZ9zxsztMeR1VX8FS4NWN9qA. Accessed 5 Sept 2022 38. S. Higgins, Gatecoin Claims $2 Million in Bitcoins and Ethers Lost in Security Breach (2016) [online], Available at: https://www.coindesk.com/markets/ 2016/05/16/gatecoinclaims-2-million-in-bitcoins-and-ethers-lost-in-security-breach/ #:~:text=A%20new%20update%20from%20the,%242.14m%20at%20press%20time. Accessed 5 Sept 2022 39. F. Hussain, Hackers Steal $5 Million Worth of Bitcoin with A Simple Phishing Attack (2015) [online], HackRead | Latest Cyber Crime – InfoSecTech – Hacking News, Available at: https:/ /www.hackread.com/5-mil-bitstamp-bitcoin-hacked-phishing-attack/. Accessed 5 Sept 2022 40. A. Kharpal, R. Browne, Hackers Return Nearly Half of the $600 Million They Stole in One of the Biggest Crypto Heists (2021) [online], Available at: https://www.cnbc.com/2021/08/11/ cryptocurrency-theft-hackers-steal-600-million-in-poly-network-hack.html. Accessed 5 Sept 2022 41. A. Kharpal, Hackers Steal Over $40 Million Worth of Bitcoin from One of the World’s Largest Cryptocurrency Exchanges (2019) [online], Available at: https://www.cnbc.com/2019/05/08/ binance-bitcoin-hack-over-40-million-of-cryptocurrency-stolen.html. Accessed 5 Sept 2022 42. T. Kovalcik, Digital Forensics of Cryptocurrency Wallets (2022) [online], Available at: https:// www.diva-portal.org/smash/get/diva2:1671204/FULLTEXT02. Accessed 5 Sept 2022 43. K. Kulkarni, A.Y. Javaid, Open Source Android Vulnerability Detection Tools: A Survey (2018) [online], Available at: https://arxiv.org/ftp/arxiv/papers/1807/1807.11840.pdf. Accessed 5 Sept 2022

110

D. Mirza and Y. Rahulamathavan

44. K. Kumar Panigrahi, Difference Between Block Cipher and Stream Cipher (2022) [online], Tutorialspoint.com, Available at: https://www.tutorialspoint.com/difference-between-blockcipher-and-stream-cipher. Accessed 5 Sept 2022 45. J. Lake, What Is the Diffie–Hellman Key Exchange and How Does It Work? (2021) [online], Comparitech, Available at: https://www.comparitech.com/blog/information-security/ diffie-hellman-key-exchange/. Accessed 5 Sept 2022 46. C. Li, D. He, S. Li, S. Zhu, S. Chan, Y. Cheng, Android-based Cryptocurrency Wallets: Attacks and Countermeasures (2020) [online], Available at: https://ieeexplore.ieee.org/document/ 9284708. Accessed 5 Sept 2022 47. A. Lielacher, What is a Cryptocurrency Wallet and How at: https:/ It Work? (2021a) [online], Axi.com, Available Does /www.axi.com/uk/blog/education/blockchain/what-is-cryptocurrencywallet#:~:text=Wallet%20addresses%20are%20composed%20of,to%20send%20and%20 receive%20documents. Accessed 5 Sept 2022 48. A. Lielacher, What is a Cryptocurrency Wallet and How Does It Work? (2021b) [online], Axi.com, Available at: https://www.axi.com/uk/blog/education/blockchain/what-iscryptocurrency-wallet#what-is-a-crypto-wallet. Accessed 5 Sept 2022 49. J. Maxim, Hackers Steal 7,170 Bitcoins From Chinese Exchange BTER (2014) [online], Available at: https://bitcoinist.com/hackers-steal-7170-bitcoins-chinese-exchange-bter/. Accessed 5 Sept 2022 50. L. Mearian, What’s a Crypto Wallet (and How Does It Manage Digital Currency)? (2019) [online], Computerworld, Available at: https://www.computerworld.com/article/3389678/ whats-a-crypto-wallet-and-does-it-manage-digital-currency.html. Accessed 5 Sept 2022 51. Medium, Exploring the 4 Types of Blockchain Technology (2022) [online], Available at: https://medium.datadriveninvestor.com/exploring-the-4-types-of-blockchaintechnologyeafb1e2d5394. Accessed 5 Sept 2022 52. metamask.zendesk.com, Basic Safety and Security Tips for Meta(2022) [online], Available at: https://metamask.zendesk.com/ Mask hc/en-us/articles/360015489591-BasicSafety-and-Security-Tips-forMetaMask#:~:text=MetaMask%20does%20not%20control%20any,with%20your%20Secret% 20Recovery%20Phrase. Accessed 5 Sept 2022 53. U. Mukhopadhyay, A. Skjellum, O. Hambolu, J. Oakle, L. Yu, R. Brooks, A Brief Survey of Cryptocurrency Systems (2016) [online], Available at: https://ieeexplore.ieee.org/stamp/ stamp.jsp?arnumber=7906988&casa_token=V3HNttSAONMAAAAA:dq8RglPt02sMsEG5g3 wfwT188uq0Yje4gZw3uUTaRQHlE92PYiNG8kFC3xfNhY_F8ZmWo6SPQQ&tag=1. Accessed 5 Sept 2022 54. NewsBTC.com, CBE Kipcoin Shut Down After Claims of Losing 3000 BTC to Hackers (2014) [online], Available at: https://www.newsbtc.com/news/chinese-bitcoin-exchangekipcoin-shuts-claims-losing-3000-btc-hackers/. Accessed 5 Sept 2022 55. M. Niranjanamurthy, B.N. Nithya, S. Jagannatha, Analysis of Blockchain Technology: Pros, Cons and SWOT (2019) [online], Available at: https://link.springer.com/article/10.1007/ s10586-018-2387-5. Accessed 5 Sept 2022 56. A. Norry, The History of the Mt Gox Hack: Bitcoin’s Biggest Heist (2020) [online], Blockonomi, Available at: https://blockonomi.com/mt-gox-hack/. Accessed 5 Sept 2022 57. C. Parizo, What Are the 4 Different Types of Blockchain Technology? (2021) [online], SearchCIO, Available at: https://www.techtarget.com/searchcio/feature/What-are-the-4-differenttypes-of-blockchain-technology. Accessed 5 Sept 2022 58. J. Potapenko, A. Hil, A. Voitova, Crypto Wallets Security as Seen by Security Engineers | Cossack Labs (2021) [online], Cossack Labs, Available at: https://www.cossacklabs.com/blog/ crypto-wallets-security/. Accessed 5 Sept 2022 59. A. Rajendra Sai, J. Buckley, A. Le Gear, Privacy and Security Analysis of Cryptocurrency Mobile Applications 2019 [online], Available at: https://ieeexplore.ieee.org/document/ 8686583. Accessed 5 Sept 2022 60. E. Rajy Latifa, E. Kiram My Ahmed, E. Ghazouni Mohamed, A. Omar, Blockchain: Bitcoin Wallet Cryptography Security, Challenges and Countermeasures (2017) [online], Available

Security Analysis of Android Hot Cryptocurrency Wallet Applications

111

at: https://www.icommercecentral.com/open-access/blockchain-bitcoin-wallet-cryptographysecurity-challenges-and-countermeasures.php?aid=86561. Accessed 5 Sept 2022 61. G. Reidy, S. Jackman, Hackers Steal $60 Million From Japanese Crypto Exchange Zaif (2018) [online], Bloomberg.com, Available at: https://www.bloomberg.com/ news/articles/2018-09-19/tech-bureau-says-6-7b-yen-in-cryptocurrency-lost-inzaifhack#:~:text=The%20theft%20of%20Bitcoin%2C%20Monacoin,the%20rest%20was%20 client%20money. Accessed 5 Sept 2022 62. T. Rodgers, H. Smith, How Does Cryptocurrency Work? – Times Money Mentor (2022) [online], Times Money Mentor, Available at: https://www.thetimes.co.uk/money-mentor/ article/how-cryptocurrency-works/#How-does-cryptocurrency-work. Accessed 5 Sept 2022 63. A. Rosic, Cryptocurrency Wallet Guide: A Step-By-Step Tutorial (2020) [online], Blockgeeks, Available at: https://blockgeeks.com/guides/cryptocurrency-wallet-guide/. Accessed 5 Sept 2022 64. J. Russell, Korean Crypto Exchange Coinrail Loses Over $40M in Tokens Following a Hack (2018) [online], Techcrunch.com, Available at: https://techcrunch.com/2018/06/10/koreancrypto-exchange-coinrail-loses-over-40m-in-tokens-following-a-hack/. Accessed 5 Sept 2022 65. C. Sephton, Why Private Keys Are Important – And How to Keep Crypto Safe (2021) [online], Currency.com, Available at: https://currency.com/why-private-keys-are-important-and-how-tokeep-crypto-safe. Accessed 5 Sept 2022 66. J. Snyder, What Are the Security Risks of Rooting Your Smartphone? (2022) [online], Samsung Business Insights, Available at: https://insights.samsung.com/2022/07/28/what-arethe-security-risks-of-rooting-your-smartphone-4/. Accessed 5 Sept 2022 67. H. Staff, RSA Encryption Explained – Everything You Need To Know (2021) [online], HistoryComputer, Available at: https://history-computer.com/rsa-encryption/. Accessed 5 Sept 2022 68. Support.exodus.com, The Importance of a Good Password – Exodus Support (2021) [online], Available at: https://support.exodus.com/article/935-the-importance-of-a-good-password. Accessed 5 Sept 2022 69. S. Suratkar, M. Shirole, S. Bhirud, Cryptocurrency Wallet: A Review [online], Available at: https://ieeexplore.ieee.org/iel7/9314948/9315188/ (2020) 09315193.pdf?casa_token=c_5g-wiSkckAAAAA:B8zNxKXePPMxEjDTaTs_Msu72z8aIqmhrTRZO6hZzNktmkqPDIYo6DY9bibAKhyhpnjhzeFK3Q. Accessed 5 Sept 2022 70. M. Uddin, M. Mannan, A. Youssef, Horus: A Security Assessment Framework for Android Crypto Wallets. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (2021) [online], Available at: https://link.springer.com/ chapter/10.1007/978-3-030-90022-9_7. Accessed 5 Sept 2022 71. M. Wachal, ˛ Private Blockchain Benefits Explained (2021) [online], SoftwareMill, Available at: https://softwaremill.com/private-blockchain-benefits-explained/. Accessed 5 Sept 2022 72. L. Wagner, Achieving Data Integrity Using Cryptography (2020) [online], Blog.boot.dev, at: https://blog.boot.dev/bitcoin/achieving-data-integrity-using-cryptography/ Available #:~:text=Data%20integrity%20refers%20to%20the,message%20wasn’t%20tampered%20with. Accessed 5 Sept 2022 73. Wisebitcoin.com, Bitcoin Adoption – Who is Setting the Trend? | Wisebitcoin (2021) [online], Available at: https://www.wisebitcoin.com/newsroom/blog/bitcoin-adoption. Accessed 5 Sept 2022 74. ZebPay, Advantages And Disadvantages Of Different Types Of Blockchain | Zebpay (2022) [online], Available at: https://zebpay.com/blog/advantages-and-disadvantages-of-differenttypes-of-blockchain. Accessed 5 Sept 2022 75. P. Zimmerman, Blockchain Structure and Cryptocurrency Prices [online], Available at: https://www.researchgate.net/publication/ (2019) 332969913_Blockchain_structure_and_cryptocurrency_prices. Accessed 5 Sept 2022 76. ZoneAlarm Security Blog, The Risk of Accessibility Permissions in Android Devices | ZoneAlarm Security Blog (2020) [online], Available at: https://blog.zone-alarm.com/2020/12/ the-risk-of-accessibility-permissions-in-android-devices/. Accessed 5 Sept 2022

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection in Industrial Control Systems of Production and Manufacturing – A Systematic Review Nipuna Sankalpa Thalpage and Thebona Arachchige Dushyanthi Nisansala

1 Introduction As technology advances toward Industry 4.0, most industries and operations around the world are migrating to new technological tools and integration for easier operations, efficiency, and convenience. In the industrial environment, this has effectively boosted communication, automation, and digitization. As a result, Industrial control systems (ICS), which integrates Information Technology (IT), Operational Technology (OT) systems, and physical devices of industry became more exposed to cyber threats recently. Some of the Recent cyber-attacks against ICS include those on the Israeli industrial cybersecurity company OTARIO (2022), the Russian meat provider Miratorg (2022), the Florida water plant USA (2021), Colonial Pipelines USA (2021), and the CPC Corp. Taiwan (2020). Although Kaspersky was able to safeguard 40% of all ICS in energy companies from malware attacks [1]. Kaspersky Industrial Control System – Cyber Emergency Response Team (ICS-CERT) reports that cyberattacks against ICS have increased significantly in the last few years, and the main source has been the Internet [19]. In H1 2022, around 102,000 malware variants were blocked by Kaspersky and malware objects were blocked on 31.8% of global ICS computers [19] (Chart 1).

N. S. Thalpage () Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, UK e-mail: [email protected] T. A. D. Nisansala Department of Information Technology, Faculty of Information Technology and Sciences, International College of Business and Technology, Colombo, Sri Lanka e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Hewage et al. (eds.), Data Protection in a Post-Pandemic Society, https://doi.org/10.1007/978-3-031-34006-2_4

113

114

N. S. Thalpage and T. A. D. Nisansala

Percentages of malicious objects blocked on ICS 40

PERCENTAGES

35 30

32.6

33.4

33.8

H1 2020

H2 2020

H1 2021

31.4

31.8

H2 2021

H1 2022

25 20 15 10 5 0

YEAR Chart 1 Cyber-attacks percentages on ICS [19]

Production and manufacturing is one of the main industries that use ICS for their operations, hence rely heavily on the Internet of things (IoT) and advanced technological methods. Unfortunately, the IoT integration process of Industry 4.0 has made the industry of production and manufacturing more vulnerable to cyber threats. According to IBM reports, it shows that Manufacturing is the industry mostly affected cyber-attacks in 2021, which accounted for around 23% and the victim organizations still have not patched 47% of vulnerabilities that caused those attacks [26]. Also Gartner has predicted that 45% of organizations around the world will have to experience cyber-attacks on their supply chain management software’s by 2025 [39]. According to the above numbers, cyber threats are on the rise in the manufacturing industry, and they are primarily attributable to vulnerabilities in the firms of ICS, this also indicates that current cyber threat countermeasures are insufficient to prevent such cyber-attacks. Digital twin is a new technology associated with Industry 4.0 principles that can provide solutions against these cyber threats. It is currently being used in various industries and businesses; it has also been declared as one of the top 10 strategic technology trends for 2019 by Gartner [14]. Also, it has been highly rated as a strategic technology by many other influential market intelligence firms such as International Data Corporation (IDC), Allied Business Intelligence (ABI) Researches and BITKOM [4]. National Aeronautics and Space Administration (NASA) was the first to utilise digital twin technology, which they defined as “A digital twin is an integrated Multiphysics, multiscale, probabilistic simulation of an as-built vehicle or system that uses the best available physical models, sensor updates, fleet history, etc., to mirror the life of its corresponding flying twin” [16]. Gartner defined digital twins as “A digital twin is a digital representation of a real-world entity or system. The implementation of a digital twin is an encapsulated software object or model that

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

115

mirrors a unique physical object, process, organization, person, or other abstraction” [15]. So, it is fairly obvious that a digital twin is a system that uses real-time and historical data to recreate a physical object in a virtual environment. Even though numerous cybersecurity safeguards for ICS in manufacturing and other industries are in place these days, recent figures suggest that cybercriminals are still able to penetrate the systems and obtain access to important data and operations. A digital twin is a new disruptive technology that may merge the digital and physical worlds, and it is one of the technologies cybersecurity professionals and experts are striving to deploy to stay one step ahead of these cybercriminals [11]. This study was primarily conducted to identify opportunities for using digital twin technology to combat the large number of cyber threats that are occurring in the production and manufacturing industries, as well as to find ways to secure industrial control systems by employing the digital twin technology as a tool for intrusion detection.

Research Question • How to apply digital twins technology to detect the cyber-attacks on ICS mainly focusing on the industry of production and manufacturing?

Objectives • Identify the difference between applying digital twins instead of conventional security methods for ICS security. • Identify the different methods and areas of applying digital twins for ICS security. • Identify major issues and weaknesses of ICS, which makes them vulnerable to cyber threats. • Recommend the use of digital twins as a novel technology to detect and counter cyber threats against the ICS.

Significance of the Study Since cyber threats have become a major global issue, it is critical to identify ways to counter these risks. This paper is attempting to add contributions to cyber security solutions for ICS, which will benefit individuals and organisations. Many investigators around the world who are striving to create cybersecurity solutions through their investigations will gain a lot of knowledge about the unique technology of digital twins that has been proposed as a solution in this study. Due to the lack of studies on this technology, this study will serve as an inspiration for

116

N. S. Thalpage and T. A. D. Nisansala

many researchers to begin experimenting with it, which will be extremely beneficial to the research field of cybersecurity. Additionally, researchers can establish new research paths and add more contributions with scientific information by utilising the new knowledge and research gaps created as a result of this study. This study will also benefit individuals working in the field of cybersecurity, as they have been searching for innovative ways to safeguard their enterprises from cyber threats. Individuals who have been impacted by cyber-attacks in the past might consider implementing digital twins for real-time protection and to test their systems for security flaws by utilising the information provided in this study. Furthermore, with so many warfare scenarios targeting ICS around the world these days, the government bodies of countries can look into implementing this novel technology as a security measure and a strategy to protect against the effects of cyber-attacks.

Limitations of the Study The findings of this study have to be interpreted carefully as this research is conducted under some limitations such as, • Time constraint – As this is a systematic review, the duration of the study needs to be longer than a conventional literature review with a conventional framework. However, due to the deadline set for submission of the study, it had to be completed within the given time frame by applying relevant changes. • Country context – Sri Lanka is a developing country and most of the latest technologies and trends will take some time to implement and flourish. So, the data relevant to the local context could not be applied to the study. • The uniqueness of the intervention used for the study – Digital twin is a very novel concept that has been mainly in the experimental stages. So, there is a limited number of researchers and research papers available relevant to this technology although it is increasing over time. • Not accessible journals and papers – Some of the valuable papers that could have been useful for the study were not accessible. • Limited reviewers – Although two reviewers were involved in some sections of the study, the majority of the review was done by just one reviewer.

Chapter Outline The organisation of the chapters of the dissertation is as follows, Section 1: Introduction Introduction Section provides a background about the cybersecurity issues of the ICS and the impact caused by the cyber threats in the industrial context.

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

117

Section 2: Literature Review The literature review provides an overall idea about the research work conducted relevant to cyber threats against ICS by mainly focusing on the manufacturing industry and also some background literature on what kind of solutions researchers are working on relevant to the cyber threats against the ICS. There was also some investigation of the literature about the digital twins concept, ways of applying it against ICS and some weaknesses of the digital twin technology was covered in this section. Section 3: Research Methodology Section 3 describes the methodologies used to conduct this study including the rapid review methods and the use of PRISMA guidelines for formulating research questions, formulating search strategy, study selection and screening, quality assessment and data extraction. This chapter also describes the software and the tools that were used in these processes. Section 6: Data Presentation and Analysis The data analysis process was described in Sect. 6 with the steps of thematic analysis, and the results were presented as theories and themes that emerged during the analysing process. Section 8: Conclusions and Recommendation Section 8 contains a discussion of the findings, contributions, and future directions of the study, as well as a conclusion note of the study’s summary, which includes a verification to see if the research questions have been answered and future work recommendations.

2 Literature Review ICS are comprised of a variety of control components that work together to achieve a specific industrial goal [1]. With the advent of Industry 4.0, automation of ICS is becoming a frequent trend in the manufacturing industries. They (ICS) are rapidly evolving from traditional electromechanical-based systems to modern information and communication technology (ICT)-based systems, resulting in a close relationship between cyber and physical components [23]. Although ICS automation has benefited industries in many ways, such as efficiency, communication, digitisation, and so on, it has the potential to increase cyber risks on ICS. According to figures from ICS – CERT, the cyber-attack surface against ICS has grown significantly in recent years, primarily in 2012, 2014, and 2016 [9]. One of the most famous ICS attacks is the “Stuxnet worm” attack that targeted Iran’s nuclear power plants. Which caused significant global harm by affecting 60%

118

N. S. Thalpage and T. A. D. Nisansala

of Iran’s, 18% of Indonesia’s and 8% of India’s computers, as it infected over 200,000 devices and physically harmed thousands of them [36]. The attack on a pipeline in turkey in 2008 was another massive cyber-attack that produced a powerful explosion and released approximately 30,000 barrels of oil above a water aquifer. This has cost around $five million a day of transit tariffs for British Petroleum [23]. Further, another massive attack that happened more recently is the attack launched against the Ukrainian power grid in 2015, which targeted three main power companies in Ukraine, resulting in power failure that affected around 225,000 households, which even occurred a threat to public safety [9]. According to Mclaughlin et al. [23], the main reason for the success of these cyber-attacks is that ICS rely on the security through obscurity principle, which was also admitted by a study done by Dragos Inc. in 2018, as they uncovered 64% of the patches for vulnerabilities discovered in ICS [9]. Also, the ICS contains several components that may have a life span of more than 15 years and may rely on obsolete technologies. They can also be tough to update or replace. As a result, they may be difficult to protect against cyber-attacks as their security mechanisms are more outdated [10]. Moreover, because ICS use common operating systems (e.g. Windows OS) and open standard networking protocols, they are even vulnerable to unintentional intrusions such as the slammer worm that infected power plants in Ohio [23]. Furthermore, most network protocols for ICS were designed with a focus on reliability and real-time requirements, with less consideration given to the security of ICS systems in the process [9], which is another vulnerability of the ICS that exposes them for cyber threats. Researchers have lately begun investigating the digital twin’s concept to build stronger security mechanisms for ICS as a solution to the increasing number of cyber-attacks on ICS. According to an article from Matthias Eckhart and Andreas Ekelhart, there are several possible ways to implement digital twins for ICS such as intrusion detection, system testing and simulation, detecting misconfigurations, and penetration testing [8]. Further, relevant to the Industry of manufacturing, Negri et al. [25] have identified three more use cases, such as supporting health analysis for improving maintenance and planning, digitally mirroring the life cycle of the Asset, and supporting decision making. So the implementation of a novel concept like digital twins will open up many opportunities and ways to protect the ICS from cyber threats [30]. According to Akbarian et al. [1], using digital twin technology will lessen the negative effects that will occur on a live system when deploying security tests, and it will also use fewer computer resources than a physical objects security test procedure. Furthermore, they have presented an intrusion detection system that consists of two methods of safeguarding the physical systems, which are attack detection and attack classification [1]. Even in that famous Stuxnet incident, if the system had digital twins monitoring the status of the frequency converters, it would

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

119

have detected the anomaly centrifuge speeds as the malicious code started changing the values of the centrifuges to a harmful level [7]. According to Mclaughlin et al. [23], conventional security methods do not address the ICS’s cyber and physical components interconnectivity as the ICS are different in many ways from conventional IT systems. Some of the facts they stated are the purpose of ICS is to keep the integration of the industrial process running smoothly, continuous procedures and high availability, designed to focus on specific industrial processes and may lack the resources to provide additional features such as security, human response is critical for ICS sensors, replacing the components may take a significant time compared to IT systems components [23]. Another benefit of using the digital twin technology for intrusion detection is that it can operate in several modes in comparison to conventional security systems which mainly use historical data analytics for security procedures. According to Dietz and Pernul [7], digital twins can operate in modes like simulation, historical data analytics/optimisation, and replication. Figure 1 shows the modes and how the data is used as a basis for the procedure. Table 1 shows the differences of those operation modes with their advantages and disadvantages, which also indicate the broadness of security areas covered by the digital twins. Further, according to Dietz and Pernul [7], with the conventional security measures, different parties will handle the security in different lifecycle phases of the assets. Due to insufficient sharing, this results in duplicate data and information as well as resource waste. The digital twins paradigm tackles this issue by integrating all lifecycle stages continuously by organising information throughout the asset’s life; This also means that digital twins can be used throughout the life cycle of an asset, which saves time and other resources. Michael Grieves, who is considered the founder of the digital twins, has also supported this claim by further stating that the digital twin will evolve with its physical asset and increase the accuracy and complexity as the life cycle progresses [17].

Fig. 1 Modes and data as the basis for security operations with Digital Twins [7]

Disadvantage Temporal dependencies Analysis of potential future conditions not possible Database size influences the functionality of most of the algorithms

Hypothetical conditions AS-IS state not known Isolated view of the system Complexity requires professional users

Temporal dependencies Stimuli/events to be known in advance Complexity requires professional users

Advantage AS-IS state analysis Alerts for current security incident Broad user base through the prominence of the techniques

Time independence Reproducibility Repeatability Analysis of potential future conditions Security-by-design support

AS-IS state analysis Alerts for current security incidents Digital tracing of real-world stimuli/events

Operation mode Analytics/optimisation

Simulation

Replication

Table 1 Comparison of security operations modes of Digital Twins [7]

Statistical analyses, Machine learning and so on. Data queries Network-traffic Analysis, outlier detection, and so forth Specification data

Techniques

Emulation, stimuli reproduction, differential algorithms Attack and threat detection

Techniques Applications

Database

Emulation, simulation Vulnerability analysis, system-security testing State data, specification data

Techniques Applications

Database

Applications

Manifestation State data

Aspect Database

120 N. S. Thalpage and T. A. D. Nisansala

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

121

Many researchers have tried some different methods to explore the possibilities of implementing digital twins for ICS security. Three methods for detecting intrusion on systems have been provided in an article by Eckhart et al. [10], such as indirect monitoring of physical objects (without having to affect a live system), comparing the physical device’s input and output with their digital twin and identifying any intrusion detection with anomaly and also identifying the cause of the infection and how it occurred by using the historical data that gathered in the process of replicating the physical objects with the digital twins. According to Akbarian et al. [1], although Eckhart et al. [10], has used a passive monitoring approach, it may result in important data missing, and also the digital twin will not follow the physical asset continuously, hence the digital twin may not be able to imitate the unexpected changes such as system faults. Considering this and the synchronisation challenges that occurred in many of the other models proposed by researchers, they have presented three architectures that make the digital twin work by syncing with the physical asset by illustrating which signals need to be sent to the digital domain through the physical domain, with the assumption that the real-system model will be replicated in the virtual domain by its identification algorithms. Becue et al. [4], have suggested a model for enhancing cybersecurity in the production industrial environment by using digital twins in combination with the cyber range. Here they have indicated the virtual environments as the term cyber range. Figure 2 depicts more about the model they have suggested. Although digital twins appear to be a novel and promising method for protecting ICS against cyber threats, as with many other technologies, researchers have discovered some weaknesses in this technology. According to Eckhart and Ekelhart [8], sometimes it is difficult to implement digital twins that replicate their physical assets. Due to differences in the network stack implementation, they may present themselves in the network traffic and the timing may cause digital twins to be out-synchronised with their physical assets. According to many of the studies synchronisation between the digital twins and the physical object has become an issue for many researchers. In the article [8], authors have avoided this issue intentionally by assuming the specifications of the digital twins will match the specifications of the physical asset. Although Eckhart et al. [10], have proposed a passive replication synchronisation method in their next article by identifying this issue, that method will only copy a limited amount of data from physical assets to the digital twin, which makes the digital twin not following the physical object continuously [1]. As the uniqueness of many concepts is a challenge, digital twins, which are still in the development and testing stages, are also challenging to execute efficiently for security purposes [7]. Another issue that comes with the simulation process of the digital twins is the need for expert knowledge and tools. Due to a lack of expert knowledge and instruments, it may produce inaccurate information and data, which can have a negative influence, particularly on decision-making and other activities in security procedures [7].

122

N. S. Thalpage and T. A. D. Nisansala

Fig. 2 Digital twin and cyber range for enhanced cybersecurity [4]

According to Dietz and Pernul [7], The ICS inter-connectivity also constitutes an additional problem when using digital twins, because most physical objects are linked, which also affects one item’s safety or weaknesses on other objects. They also highlighted that several parties contribute information on the physical assets through the life cycle, which will also affect the consistency of the information across their life cycle for the digital twins. Considering this literature, the use of digital twins for cybersecurity will affect in a more beneficial way for the ICS. And it was very much applicable in the industry of production and manufacturing. Although there is more work to be done in synchronisation, simulation and expertise knowledge factors, things can be expected to improve with development as it is still very much a novel concept. For future direction wise, more researchers need to be engaged in synchronisation, necessary technologies improvements, simulation, and so on. Overall, in fairness to this literature review, digital twins have made a serious case of technology that can be used to counter ICS cyber-threats.

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

123

3 Research Methodology This study was conducted by using the method of systematic review, as it is a convenient and simple way to find the best studies and approaches by detecting current research gaps and study limitations [33]. According to Manterola et al. [22], a systematic review is necessary to integrate such a massive amount of data and give a rational basis for decision-making. In addition, they stated the reasons for justifying systematic reviews, in scenarios such as, There is doubt about an intervention’s impact with the evidence against its true usefulness when it is needed to identify the level of an intervention’s impact, and when it is intended to study the behaviour of an intervention in subject subgroups [22]. According to an article from Selçuk [32], A systematic review approach will reduce biases of the study and increase its scientific validity. Further, according to Linares-Espinós et al. [21], a systematic review comprises a thorough and reproducible explanation of the findings of the research. By mainly considering the time, the number of reviewers and the limitations of the reviewers this study has been adopted into a rapid systematic review. Although there was no official way of defining a rapid review, it can be defined as a method of synthesising knowledge by omitting certain phases of a systematic review to identify solutions in a short period [35]. As a result, some of the steps of a systematic review according to the PRISMA framework have been omitted in this study by mainly considering the time, the number of reviewers, reviewer limitations, the qualitative nature of the study, and the study type and objectives. This research can also be described as a systematic mapping review as it is a process of identifying theories and themes based on the research question and objectives and then looking for new directions [27].Quality assessment of the study has been conducted by adopting Critical Appraisal Skills Program (CASP) checklists following the type and research questions of this study. The risk of bias assessment was excluded from this study because just one more reviewer was engaged, with the restriction of only auditing the study’s crucial phases. For identifying themes in the analysing and synthesis process, thematic analysis has been used for this study.

Research Questions Research questions were formulated by using the common tool PICO uses for systematic reviews. PICO stands for Population, Intervention, Comparison, and Outcome, according to its letters [29]. Although some other models that are similar to PICO, such as SPIDER (sample, phenomenon of interest, design, evaluation, research type) and SPICE (setting, perspective, intervention, comparison, evaluation) are available, PICO has been the

124 Table 2 Research question and objectives formulation using the PICO model

N. S. Thalpage and T. A. D. Nisansala PICO elements P (Population) I (Intervention) C (Comparison) O (Outcome)

Keywords ICS, Manufacturing Industry Digital Twins Conventional Security methods Intrusion Detection

most widely used model for both question formulation and literature search [12]. So, it is been used for this study as it is a simple framework that can be applied to any kind of study design. Table 2 illustrates how the PICO model is used to develop the research questions.

Research Questions Formulated Through PICO Framework • Identify the difference between applying digital twins instead of conventional security methods for ICS security. • Recommend the use of digital twins as a novel technology to detect and counter cyber threats against the Industrial Control systems • Identify the different methods and areas of applying Digital Twins for ICS security • Identify Major Issues and Weaknesses of ICS, which make them vulnerable to cyber threats.

Protocol and Eligibility Criteria The structure of this study was derived by using the guidelines of the PRISMA framework. Initially, the PRISMA framework was developed in 2009 by the group called PRISMA which mainly consists of the Cochrane authors which consist of a fourphase flow diagram and a 27-item checklist [32]. However, the PRISMA 2009 statement has been extended in 2020, to guide the reporting of Network Meta-Analyses (NMA), meta-analyses of individual participant data, systematic reviews of consequences, systematic reviews of diagnostic test accuracy studies, and scoping reviews, as they have been recommended to follow guidelines of the PRISMA 2020 statement [28]. So for this study, PRISMA 2020 framework has been used. Inclusion and exclusion criteria for searching purposes in this study were set as follows Inclusion Criteria IC 01 – Between years 2014 and 2021

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

125

IC 02 – Written only in English IC 03 – Peer review journals, reports and technical documents, conference proceedings IC 04 – Include answers to research questions IC 05 – Studies that are directly subject to cybersecurity of ICS Exclusion Criteria EC 01 – Unpublished thesis and dissertations EC 02 – Case studies EC 03 – No one has cited EC 04 – Studies have not been relevant to cybersecurity EC 05 – Similar articles and duplicate studies

Information Sources Google Scholar was the database utilised for the study’s initial search, as it is been free search engine that provides a wide range of articles relevant to many subject areas [37]. Following the identification of the most relevant papers for the study, the citations of those papers were used to locate further publications relevant to the study’s research topics. Since some of the publications were not located or the full texts were not available through Google Scholar, websites and databases including ResearchGate, IEEE, and the ACM digital library have to be accessed. The initial search began in June 2021, and the final search was concluded in July 2021.

Search Strategy A four-step procedure was used as the Search strategy as follows, • Identify the synonyms for the keywords in the research topic using the PICO model • Merged the identified synonyms using “OR” Boolean • All the combinations of terms were connected using “AND” Boolean. • Search for the references inside the articles Table 3 demonstrates the search strategy used for the study Identify the synonyms for the keywords in the research topic using the PICO model PICO framework has been used to identify the required keywords for searching, as it can be used to create an effective search strategy when conducting a systematic review [34].

126

N. S. Thalpage and T. A. D. Nisansala

Table 3 Summary of Search strategy used

OR

AND

Population

ICS OR Industrial Control Systems OR Supervisory Control and Data Acquisition systems (SCADA)

Intervention

Digital Twins

Comparison Outcome

Conventional security methods on ICS Intrusion Detection OR cybersecurity

Table 4 Search terms according to the PICO model PICO element P I C O

Keywords Industrial Control Systems, ICS, SCADA, Digital Twin (No keywords used for conventional methods) Intrusion Detection, Cybersecurity

Also, PICO is a very useful search strategy as it searches for every possible combination using the search keywords [31]. The following Table 4 shows the keywords used for the search. Merged the identified synonyms using “OR” Boolean Identified keywords from step 1 have been merged using the OR Boolean, Ex: – Industrial Control Systems “OR” ICS “OR” SCADA. All the combinations of terms were connected using “AND” Boolean. For the final search, all the merged combinations of the words in the PICO elements have been connected. Ex: – (Industrial Control Systems “OR” ICS “OR” SCADA) AND (Digital Twin) AND (Intrusion Detections “OR” IDS “OR” Cybersecurity). Search for the references inside the articles Finally, a search utilising the references from the selected articles produced more related papers for the research questions.

Study Selection A specialised software program called Covidence was utilised for the study selection procedure. Covidence can be used for many processes during a systematic review, including importing and screening citations, study selection, data extraction and exportation, and quality assessments. Covidence is software that primarily assists

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

127

reviewers by saving them a significant amount of time during the systematic review process [3]. According to the results of a study conducted by Harrison et al. [18], Covidence and Rayyan are the software’s providing the best user experience for the reviewers when undertaking the title and abstract screening. It is also very inexpensive software that allows the option of multiple reviewers to work efficiently through the steps of a systematic review [3]. Study selection has been conducted in three phases which involves Covidence in the first two steps. The three phases of study selection are as follows. Phase 01 – Title and Abstract screening using the Covidence software After the initial search produced 158 studies, all 158 papers were used for the title and abstract screening since no duplicate studies were discovered. Out of the 158 studies, 138 were excluded by checking with inclusion criteria for not being relevant to the research questions (n = 125), only the abstracts were available (n = 10), and language issues (n = 03). The following Figs. 3 and 4 shows screenshots before and after the Title and Abstract screening process.

Fig. 3 Before the Title and abstract screening

Fig. 4 After the Title and abstract screening

128

N. S. Thalpage and T. A. D. Nisansala

Fig. 5 Before the full-text review

Fig. 6 After the full-text review

Phase 02 – Full text review using the Covidence software There were 20 studies used for the full text review process after excluding 138 studies from the title and abstract screening process. Mendeley reference manager, which is free software with both desktop and web versions that extract references from PDFs and the web [6], was utilised to upload the full-text papers into the Covidence software. In the full-text screening process, 17 studies were excluded for wrong intervention (n = 08), wrong outcomes (n = 05), wrong population (n = 02) wrong study design (2). The following Figs. 5 and 6 show the screenshots of the full-text review. Phase 03 – Citation Tracking Citation tracking is a method that uses so many various techniques to collect related references directly or indirectly from the initial source’s references which are also

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

129

called the “seed references”. These references will be evaluated according to the inclusion and exclusion criteria used at the start of the review process and the seed references used for the citation tracking process will be the ones filtered by the initial study screening processes [13]. Citation tracking has been utilised in this study to find out more important papers relevant to the research questions. In this study, three papers were selected after going through the processes of title and abstract screening and full-text screening has been used for citation tracking as seed reference. Firstly, the reference lists of those three studies were searched, and similar studies were located based on the inclusion and exclusion criteria, which is a process known as backward citation tracking. Thereafter, by using Google Scholar, searching has been done for papers that cited the initial seed reference, which is known as forward citation tracking. Following that, the citation tracking method was applied for articles that were newly discovered via the citation tracking process of the initial seed articles, which also discovered valuable relevant articles based on the inclusion and exclusion criteria. This iterative process of finding new papers through forward and backward citation tracking is also called snowballing. It is always important to check the paper with the inclusion and exclusion criteria before using it for this process [38]. After going through all these phases finally, there were 10 eligible papers found for this review. Detailed information on the study selection process has been shown in the following Fig. 7 using the PRISMA flow diagram.

4 Quality Assessment A commonly used quality assessment tool for qualitative studies, The Critical Appraisal Skills Program (CASP), was used to assess the study quality in this review. CASP has separate checklists for different study designs such as systematic reviews, random control trials, qualitative studies, case-control, economic designs, cohort studies and clinical prediction rule [24]. Therefore, a manually modified checklist has been created for this study by using criteria from all those checklists. Another reason for adopting CASP as the quality assessment tool for this study is that, although there are several quality assessment tools to choose from, the CASP tools are simple and effective in covering the areas required for critical evidence appraisal [24]. A checklist of seven questions has been created from the CASP checklists to assess the quality of the papers used for the study, the list of questions used was as follows. • • • •

Did the study address a focused research question? Was the research design appropriate to address the aims of the research? Were the data collection methods clear and addressed the research issue? Is there a clear statement of findings?

130

N. S. Thalpage and T. A. D. Nisansala

Fig. 7 Screenshot of PRISMA flow Diagram designed for the study selection process

• Would the experimental intervention provide solutions to the addressed problem? • Are the methods used to construct and validate the outcomes clearly described? • Do the results of this study fit with other available evidence? According to the quality assessment conducted, the majority of the studies (9/10) were high-quality studies and only one (1/10) study has been rated as a mediumquality study, which indicates the papers used for the study possess good standards according to the quality assessment process.

5 Data Extraction Process All the papers that passed the quality assessment process have been used for the data extraction phase. The data was extracted by focusing on the research topics as well as the data analysis and synthesis procedures. The supervisor of this study was involved as the other reviewer for the quality assessment process and has been also used for the observational purpose of this process to reduce the biases of the data extraction.

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . . Table 5 Types of Qualitative data extracted from the studies

Context data Title Author Year

131

Qualitative data Research questions and aims Interventions Methods Outcomes Comparisons Conclusions Recommendations

Fig. 8 Snapshot of the Data extraction table

The extracted data from the studies were saved in a Microsoft Word document for the data analysing procedure, which will be performed later. By focusing on the research questions, key qualitative data have been extracted from the selected studies. The data extraction table that was created was shared with the study’s supervisor, who is considered the other reviewer. Here most of the themes and qualitative data were extracted in a quoted manner by directly importing from the studies, which represents inside the quotations on the data extraction table. Table 5 shows the main qualitative data extracted from the studies according to the categories (Fig. 8).

132

N. S. Thalpage and T. A. D. Nisansala

6 Data Presentation and Analysis Data analysing has been undertaken using a widely used method of thematic analysis which has been defined as a method of identifying, analysing and reporting patterns from a qualitative dataset [5]. Thematic analysis was chosen mainly because the nature of this study is a systematic mapping technique that requires a deductive approach to data analysis, thematic analysis, which allows researchers to analyse data using both a researcher-driven deductive and a data-driven inductive approach [20]. So it will suits this study more than other qualitative data analysing methods. Further, it is a less complicated method and can be applied to a wide range of conceptual frameworks and research questions [5]. Since it is a systematic review using a mapping process, the deductive approach of a thematic analysis which uses predefined theories, frameworks and other focuses of the researcher has been the key to building up theories and themes [20]. Therefore, it has been applied as it seems to be the more fitting methodology considering the nature of the study. Therefore, the themes that have been created during this study were mainly focusing on the research question and the objectives. Thematic analysis of six phases introduced by Braun and Clarke [5], has been used for this study, which contains the following phases. • • • • • •

Familiarisation with data Coding Searching for themes Reviewing themes Defining and naming themes Writing up

Braun and Clarke [5] further stated that this six-phase model of thematic analysis is not necessarily a sequential process that cannot be moved from one phase to another without completing each phase properly, but the analysis process must be continuous.

Familiarisation with Data According to the model by Braun and Clarke [5], data has been familiarised by going through the selected papers several times.

Coding After getting familiarised with the papers, data has been coded by finding meaningful data chunks including definitions, views, findings . . . etc. Figure 9 shows a screenshot of a sample of coding done in the selected papers.

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

133

Fig. 9 Screenshot of coding for important data in papers

The coding process has been also been shared with the supervisor of the study who took part in the reviewing process of the codes. Initially, the data has been directly quoted from the papers and they have been extracted into a table created with Microsoft word after that.

Searching for Themes After the coding process Data codes have been categorised into the predefined groups relevant to the research questions and objectives. Which are – (i) Comparing digital twins with conventional security methods (ii) Looking for different areas and ways of applying digital twins for ICS security (iii) Identifying the weaknesses and issues of ICS that make them vulnerable to cyber threats. (iv) Looking for possibilities of digital twins as a solution for ICS security. Figure 10 shows a screenshot of categorized codes for those four categories. As this is a systematic mapping review with a deductive approach, all the initial codes extracted from papers were could not utilised and only the codes that have been relevant to the main four research objectives have been grouped to identify the themes. By connecting codes in different categories, initial themes have been identified. Identified themes relevant to different categories are as follows in Table 6.

134

N. S. Thalpage and T. A. D. Nisansala

Fig. 10 Screenshot of categorised data codes relevant to research objectives

Reviewing Themes In the Reviewing process, theme no.1 of the Category No.1 has been removed as it was addressed in only one paper. And the themes No.1 and No.2 in the Category 2 has been combined as one theme after reviewing the facts from the papers and renamed as “Digital twins can be applied in different cybersecurity areas through a life cycle of an asset” with a subtheme of “Digital twins can be used as a tool for Digital Forensics”. Further, theme no. 1 of Category 3 has been omitted due to the limited amount of papers providing direct facts to prove the theme, and theme no.2 of Category 03 has been revised as” ICS are vulnerable to cyber threats primarily because of software issues and outdated technologies” after reading the contents from the codes.

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

135

Table 6 Initial themes created Category No. 01

Category name Use of digital twins instead of conventional methods

02

Different methods and ways of applying digital twins

03

Challenges and weaknesses of ICS security

04

Recommending digital twins as a solution for ICS security

Themes Digital twins have the ability to operate in several modes in comparison to traditional security methods. Use of traditional IT security methods are not enough to provide security for ICS from cyber threats. Utilising digital twins to conduct security tests is more efficient than using traditional security testing methods. Digital twins can be Applied in Different ways and areas for cybersecurity Digital twins can be applied in Different stages of the life cycles of an asset Synchronisation is important for the effective operation of digital twins. ICS are mainly vulnerable to cyberattacks due to the application of outdated technologies and principles. The advent of digital twins as a novel simulation technology has opened up numerous areas of cybersecurity solutions.

Defining and Writing Up After the reviewing process, finalised themes are as follows 1. Use of traditional IT security methods is not enough to provide security for ICS from cyber threats. 2. Utilising digital twins to conduct security tests is more efficient than using traditional security testing methods. 3. Digital twins can be applied in different cybersecurity areas through the life cycle of an asset. 3.1 Digital twins can be used as a tool for Digital Forensics 4. ICS are vulnerable to cyber threats primarily because of software issues and outdated technologies 5. Advent of digital twins as a novel simulation technology has opened up numerous areas of cybersecurity solutions Figure 11 shows a screenshot of the thematic map for creating the themes.

136

N. S. Thalpage and T. A. D. Nisansala

Fig. 11 Screenshot of the thematic map

7 Data Presentation Theme 01 – Use of Traditional IT security methods are not enough to provide security for ICS from the cyber threats Applying traditional security methods against cyber threats that occur in ICS is not providing enough security according to four papers used for this Review. Traditionally, IT security systems rely mainly on methods such as anti-viruses, firewalls, and security updates, however, considering the complexities of ICS, these methods are not as effective when used for the ICS. As in paper [7] this fact has been clearly mentioned as “although number of security mechanisms (e.g. Firewalls and air gaps for network segregation) exist for industrial ecosystems, they are usually not sufficient to reach a proper security level. In fact, by compromising Windows engineering systems of contractors with physical access to the targeted systems, Stuxnet vividly demonstrated how to overcome air gaps. To conclude, mechanisms beyond conventional information security are typically required to achieve the desired security level” with also getting the significant ICS attack “Stuxnet worm” as an example where traditional IT methods have been Penetrated by the cyber criminals.

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

137

Because ICS are primarily focused on things like industrial operations, continuity, and availability, security has never been addressed as a significant factor for the ICS, which is another reason for primarily adopting traditional security measures for the ICS. In paper [9] addressed this point as “While the typical (business) IT systems tend to place more weight on the confidentiality and integrity of data, OT systems (i.e., ICSs) primarily focus on the availability of industrial operations (Knowles et al. 2015). For instance, most industrial network protocols have not been designed with security in mind”. A factor of the ICS that is not addressed properly by traditional IT systems is the integration of the devices within the ICS. Sometimes, different parties have been assigned to handle the different sections of the ICS, so they may implement separate networks for those sections, which is also not helping the security of the ICSs. In paper [23] they have addressed it as “Traditional IT security solutions fail to address the coupling between the cyber and physical components of an ICS”. Therefore, the facts given above could suggest that, Traditional IT systems are not capable enough to provide security for the ICS. Theme 02 – Utilising digital twins to conduct security tests is more efficient than using traditional security testing methods Implementing security tests on real systems can negatively affect the systems and it can reduce its efficiency of it. By using the digital twins, security testing procedure such as penetration testing can be done in a virtual environment which does not affect the real systems. This theme has been highlighted in papers [2, 8, 9]. In the paper [8] has stated “Security analysts can explore a production clone, instead of relying on documentation and theoretical attack vectors. Furthermore, real devices can be tested by first connecting them in the virtual environment” Which means the security testing procedure is safer and easier with digital twins as they create this virtual clone that allows the security testers to conduct security testing. The use of digital twins also allows security testers to apply testing with more computer resources and cost-reducing benefits as an extra advantage. Eckhart and Ekelhart [9] has mentioned “In essence, digital twins enable penetration testers to perform security tests virtually, i.e., on the digital twins instead of on real systems. In this way, it can be ensured that the execution of these tests does not negatively affect the operation of live systems while also sparing operators from having to deal with the costs associated with testbeds”, which also indicates the facts mentioned above. Theme 03 – Digital twins can be applied in different cybersecurity areas through the life cycle of an asset Another theme that emerges from these papers is the ability of digital twins to be applied in various ways for cybersecurity in different life cycle phases of an asset, specially from the papers [7–9, 17]. In the paper [9] authors have presented the following table in Fig. 12 which shows the different ways of applying digital twins through the life cycle of an asset. Further, the papers [7, 9, 17] recommended implementing the digital twins in the design phase, as it benefits the system in so many ways, “We need to determine the

138

N. S. Thalpage and T. A. D. Nisansala

Fig. 12 Using digital twins through the life cycle of an asset [9]

modes of failure when the system is in use. We need all of this information before the physical system is actually produced. This will reduce failures of the physical system when it is deployed and in use, reducing expenses, time, and most importantly harm to its users”. [17]. Subtheme 3.1 – Digital twins can be used as a tool for Digital Forensics Another theme that arises from the papers is the possibility of using digital twins for digital forensics, as it provides some evidence related to security incidents through its virtual environment, and can be used in conjunction with deception technologies, such as honeypots, which are used in digital forensics [7], and the evidence of activities provided by the digital twins can also use in compliance with security standards. “Security and legal compliance recently, Tauber and Schmittner (2018) published an article that highlights the importance of monitoring the CPS’s security and safety posture during operation. The authors emphasise that this activity could provide evidence of meeting security standards (e.g., IEC 62443 (IEC 2009)), which would, in turn, assist organisations in complying with legal requirements”. [9] Theme 04 – ICS are vulnerable to cyber threats primarily because of software issues and outdated technologies Many of the ICS has been designed for operational and integration purposes and in the process, security was not addressed as a key factor. Therefore, components of the ICS are not updated regularly as in the IT systems. In papers [9, 10, 23] authors have demonstrated this fact. “ICS frequently rely on security through obscurity” [23], “In contrast to IT systems, ICSs may consist of components which have a lifecycle of more than 15 years” [10]. Also, the issues relevant to software used in ICS have been another weakness addressed through the papers “most of attacks originated internal to a company. Recently attacks external to a company are becoming frequent. This is due to the use of Commercial Off-The-Shelf (COTS) devices, open applications and operating

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

139

systems, and increasing connection of the ICS to the Internet.” [23], “According to ICS-CERT, the highest percentage of vulnerabilities in ICS products is improper input validation by ICS software also known as the buffer overflow vulnerability” [23]. In addition, Software management failures and authentication issues are making the ICS more vulnerable to cyber threats. “Poor management of credentials and authentication weaknesses are second and third respectively” [23]. Theme 05 – Advent of digital twins as a novel simulation technology has opened up numerous areas of cybersecurity solutions Most of the novel technologies provide new opportunities and paths to provide solutions for real-world problems. Digital twin is also a unique technology that provides solutions to cybersecurity and many other areas. Protecting data and assets are a key consideration for cybersecurity in the manufacturing industry where new concepts like digital twins can be used for great effect. In the papers [2, 4, 7, 9, 30]. Akbarian et al. [2] has stressed that “Digital twin is one of the main concepts associated with smart manufacturing, and it opens up new possibilities in terms of monitoring, simulating, and optimizing and predicting the state of cyber-physical systems (CPSs)”, Which suggests that the digital twin is connected with the new trends such as smart manufacturing and provide numerous ways of applying against cyber threats. Associated with integration principles in Industry 4.0 helps digital twins overcome the challenges that come with traditional IT systems and take cybersecurity to a wholly new level. In the paper [30] have discussed this fact as “Another aspect to consider is the integration of physical and virtual processes within the industry, giving birth to novel services such as the “digital twins”. This opens up both new opportunities (detection of anomalies through analysis of simulations) and challenges (control of virtualised environments)”.

8 Conclusion and Recommendation Discussion The focus of the study was on four primary topics related to the use of digital twins for ICS cybersecurity, with a consideration of the production and manufacturing industry. Those topics are, (a) Identify the difference between applying digital twins instead of conventional security methods for ICS security (b) Identify the different methods and areas of applying digital twins for ICS security (c) Identify major issues and weaknesses of ICS, which makes them vulnerable to cyber threats. (d) Recommend the use of digital twins as a novel technology to detect and counter cyber threats against the ICS. According to the theories that emerge from a thorough analysis of the data from the articles chosen for the review, some major points related to the usage of digital twins for ICS cybersecurity have been uncovered, which include the facts such as

140

N. S. Thalpage and T. A. D. Nisansala

(1). Traditional IT methods do not meet the security needs of ICS, as evidenced by the data obtained during the review. Traditional IT systems do not address key points such as integrity in ICS systems, and ICS systems’ primary focus is on operations and integrations, which does not help their security process (2). Security can be conducted in a safer, more efficient, and cost-effective manner thanks to the usage of digital twins, which provide a virtual environment for security testing, it also allows security testers not to depend on the basic approaches of security testing and to increase the computer resources that required for more effective security testing (3). Digital twins can be utilised for cybersecurity in a variety of ways during an asset’s life cycle phases of designing, operational and end of life in ways such as detecting hardware and software misconfigurations, security testing, security and legal compliance...etc. It can be used as a tool for Digital Forensics which is another important factor in controlling cyber threats (4) The findings of this review also suggest that, due to software issues and outdated technologies ICS are getting more vulnerable to cyber threats. In most cases, this occurs when using commercial offthe-shelf devices that cannot be modified with the necessary OS and software, and also when using open OS and software. Furthermore, unlike IT components, which can be upgraded frequently, most ICS components have a life cycle of over a decade (5). Digital twins as a new technology associated with Industry 4.0 that offers a variety of new opportunities and approaches to cybersecurity, from the data we found in this review. It opens up new possibilities in areas such as monitoring, simulating, optimising, and predicting which takes the cybersecurity of ICS to another level. Considering these findings, this study provides significant evidence of the applicability of digital twins for cybersecurity in ICS, particularly in the manufacturing industry. Additionally, the findings of this study include information on many areas and methods of using digital twins for ICS cybersecurity. Furthermore, this study discovered some information on ICS weaknesses and issues that make them vulnerable to cyber threats. Information uncovered from this study will be mainly useful for researchers, security employees, industrial workers related to ICS, and so on. On the researcher’s note, this information can be utilised to conduct further research relevant to digital twins and ICS mainly considering the cybersecurity factor. Especially a further systematic review can be undertaken, overcoming the limitations of this study, by lengthening the period and increasing the number of reviewers. Also, further research can be conducted on the synchronisation factor of the physical and virtual environment of the digital twins which has not been addressed properly according to facts from this review. Researchers can also work on developing solutions for identified weaknesses and issues in ICS relevant to cybersecurity from this study.

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

141

Conclusion In a conclusion, this study is a systematic review that has been conducted by using rapid review methodologies in a systematic mapping method. Papers for the review have been selected with a search strategy with a range of inclusion and exclusion criteria and by utilising the citations. The review has been conducted according to the PRISMA checklists and analysing has been done by using a thematic analysis process. The review findings identified key points mostly related to the applicability, different areas and methods of applying digital twins for the ICS, as well as their weaknesses and issues. So, following the study’s research questions, these findings have provided solutions with this review, and the study objectives have been met. There were also certain areas where more research was needed, such as synchronisation and providing solutions for ICS security weaknesses and challenges.

References 1. F. Akbarian, E. Fitzgerald, M. Kihl, Intrusion detection in digital twins for industrial control systems, in 2020 28th International Conference on Software, Telecommunications and Computer Networks, SoftCOM 2020, (2020), pp. 2–5. https://doi.org/10.23919/ SoftCOM50211.2020.9238162 2. F. Akbarian, E. Fitzgerald, M. Kihl, Synchronization in Digital Twins for Industrial Control Systems (2020), Available at: http://arxiv.org/abs/2006.03447. Accessed 25 Feb 2022 3. J. Babineau, Product review: Covidence (systematic review software). J. Can. Health Libr. Assoc./Journal de l’Association des bibliothèques de la santé du Canada 35(2), 68 (2014). https://doi.org/10.5596/c14-016 4. A. Becue et al., CyberFactory#1 – Securing the industry 4.0 with cyber-ranges and digital twins, in IEEE International Workshop on Factory Communication Systems – Proceedings, WFCS, 2018 June, (2018), pp. 1–4. https://doi.org/10.1109/WFCS.2018.8402377 5. V. Braun, V. Clarke, Using thematic analysis in psychology. Qual. Res. Psychol. 3(2), 77–101 (2006). https://doi.org/10.1191/1478088706qp063oa 6. Butros, A. & Taylor, S., 2011. Managing information: evaluating and selecting citation management software, a look at EndNote, RefWorks, Mendeley and Zotero. Mar del Plata, Argentina, s.n 7. M. Dietz, G. Pernul, Unleashing the digital twin’s potential for ICS security. IEEE Secur. Priv. 18(4), 20–27 (2020). https://doi.org/10.1109/MSEC.2019.2961650 8. M. Eckhart, A. Ekelhart, Towards security-aware virtual environments for digital twins, in Proceedings of the 4th ACM Workshop on Cyber-Physical System Security, (2018), pp. 61–72. Available at: NS. 9. M. Eckhart, A. Ekelhart, Digital Twins for Cyber-Physical Systems Security : State of the Art and Outlook (2019), https://doi.org/10.1007/978-3-030-25312-7 10. M. Eckhart, M. Eckhart, A. Ekelhart, A Specification-based State Replication Approach for Digital Twins, (October 2018) (2020), https://doi.org/10.1145/3264888.3264892 11. Engstler, The Cyber Digital Twin Revolution (2021), Available at: https://www.forbes.com/ sites/forbestechcouncil/2021/02/25/the-cyber-digital-twin-revolution/?sh=3e19b9dc24f2. Accessed 11 Oct 2021

142

N. S. Thalpage and T. A. D. Nisansala

12. M.B. Eriksen, T.F. Frandsen, The impact of patient, intervention, comparison, outcome (PICO) as a search strategy tool on literature search quality: A systematic review. J. Med. Libr. Assoc.: JMLA 106(4), 420 (2018). https://doi.org/10.5195/JMLA.2018.345 13. H. Ewald et al., Using citation tracking for systematic literature searching – Study protocol for a scoping review of methodological studies and a Delphi study. F1000Research 9, 1–25 (2021). https://doi.org/10.12688/f1000research.27337.3 14. Gartner.com, Top 10 Strategic Technology Trends for 2019 | Gartner (2018), Available at: https://www.gartner.com/en/newsroom/press-releases/2018-10-15-gartner-identifiesthe-top-10-strategic-technology-trends-for-2019. Accessed 20 Jul 2021 15. Gartner.com, Definition of Digital Twin – IT Glossary | Gartner (2021), Available at: https:// www.gartner.com/en/information-technology/glossary/digital-twin. Accessed 18 Jul 2021 16. E.H. Glaessgen, D.S. Stargel, The digital twin paradigm for future NASA and U.S. Air force vehicles, in Collection of Technical Papers – AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference, (2012). https://doi.org/10.2514/6.2012-1818 17. M. Grieves, J. Vickers, Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems, in Transdisciplinary Perspedctives on Complex Systems, (2017), pp. 85– 113. https://doi.org/10.1007/978-3-319-38756-7 18. H. Harrison et al., Software tools to support title and abstract screening for systematic reviews in healthcare: An evaluation. BMC Med. Res. Methodol. 20(1), 1–12 (2020). https://doi.org/ 10.1186/s12874-020-0897-3 19. Ics-cert.kaspersky.com, Threat Landscape for Industrial Automation Systems. Statistics for H1 2022 | Kaspersky ICS CERT (2022), Available at: https://ics-cert.kaspersky.com/publications/ reports/2022/09/08/threat-landscape-for-industrial-automation-systems-statistics-for-h1-2022/ . Accessed 9 Nov 2022 20. M.E. Kiger, L. Varpio, Thematic analysis of qualitative data: AMEE Guide No. 131. Med. Teach. 42(8), 846–854 (2020). https://doi.org/10.1080/0142159X.2020.1755030 21. E. Linares-Espinós et al., Methodology of a systematic review. Actas Urol. Esp. (English Edition) 42(8), 499–506 (2018). https://doi.org/10.1016/j.acuroe.2018.07.002 22. C. Manterola et al., Systematic reviews of the literature: What should be known about them. Cir. Esp. (English Edition) 91(3), 149–155 (2013). https://doi.org/10.1016/ j.cireng.2013.07.003 23. S. Mclaughlin et al., The Industrial Control Systems Cyber Security Landscape, (May 2016) (2020), https://doi.org/10.1109/JPROC.2015.2512235 24. S. Nadelson, L.S. Nadelson, Evidence-based practice article reviews using CASP tools: A method for teaching EBP. Worldviews Evid.-Based Nurs. 11(5), 344–346 (2014). https:// doi.org/10.1111/wvn.12059 25. E. Negri, L. Fumagalli, M. Macchi, A review of the roles of digital twin in CPS-based production systems. Procedia Manuf. 11(June), 939–948 (2017). https://doi.org/10.1016/ j.promfg.2017.07.198 26. newsroom.ibm.com, IBM Report: Manufacturing Felt Brunt of Cyberattacks in 2021 as Supply Chain Woes Grew – Feb 23, 2022 (2022), Available at: https://newsroom.ibm.com/2022-0223-IBM-Report-Manufacturing-Felt-Brunt-of-Cyberattacks-in-2021-as-Supply-Chain-WoesGrew. Accessed 5 Nov 2022 27. A. O’Cathain et al., What can qualitative research do for randomised controlled trials? A systematic mapping review. BMJ Open 3(6), e002889 (2013). https://doi.org/10.1136/ BMJOPEN-2013-002889 28. M.J. Page et al., Updating guidance for reporting systematic reviews: Development of the PRISMA 2020 statement. Elsevier 134, 103–112 (2021). https://doi.org/10.1016/ j.jclinepi.2021.02.003 29. L. Pontes et al., Security in Smart Toys: A Systematic Review of Literature., Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). (Springer, 2019). https://doi.org/10.1007/978-3-030-21935-2_3

Exploring the Opportunities of Applying Digital Twins for Intrusion Detection. . .

143

30. J.E. Rubio et al., Analysis of intrusion detection systems in industrial ecosystems, in ICETE 2017 – Proceedings of the 14th International Joint Conference on e-Business and Telecommunications, vol. 4, (2017), pp. 116–128. https://doi.org/10.5220/0006426301160128 31. A. Sayers, Tips and tricks in performing a systematic review. Br. J. Gen. Pract. 58(547), 13136 (2008). https://doi.org/10.3399/BJGP08X277168 32. A.A. Selçuk, A guide for systematic reviews: PRISMA. Turk. Arch. Otorhinolaryngol. 57(1), 57–58 (2019). https://doi.org/10.5152/TAO.2019.4058 33. E. Steiger, J.P. de Albuquerque, A. Zipf, An advanced systematic literature review on spatiotemporal analyses of Twitter data. Trans. GIS 19(6), 809–834 (2015). https://doi.org/ 10.1111/tgis.12132 34. C. Stern, Z. Jordan, A. Mcarthur, Developing the review question and inclusion criteria. Am. J. Nurs. 114(4), 53–56 (2014). https://doi.org/10.1097/01.NAJ.0000445689.67800.86 35. A.C. Tricco et al., A scoping review of rapid review methods. BMC Med. 13(1), 1–15 (2015). https://doi.org/10.1186/S12916-015-0465-6/TABLES/6 36. B. Vigliarolo, Stuxnet: The Smart Person’s Guide – TechRepublic (2017), Available at: https:// www.techrepublic.com/article/stuxnet-the-smart-persons-guide/ 37. W.H. Walters, Google Scholar coverage of a multidisciplinary field. Inf. Process. Manag. 43(4), 1121–1132 (2007). https://doi.org/10.1016/j.ipm.2006.08.006 38. C. Wohlin, Guidelines for snowballing in systematic literature studies and a replication in software engineering. ACM Int. Conf. Proceeding Ser. (2014). https://doi.org/10.1145/ 2601248.2601268 39. www.gartner.com, Gartner Identifies Top Security and Risk Management Trends for 2022 (2022), Available at: https://www.gartner.com/en/newsroom/press-releases/2022-03-07gartner-identifies-top-security-and-risk-management-trends-for-2022. Accessed 9 Nov 2022

Securing Privacy During a World Health Emergency: Exploring How to Create a Balance Between the Need to Save the World and People’s Right to Privacy Shasha Yu and Fiona Carroll

1 Introduction It is two, almost three, years since COVID-19 first ravaged the world and it is safe to say that people have inevitably been affected by it. For example, it has caused more “mass trauma” than even the WWII [29]. Just as human beings’ post-traumatic stress reactions will have long-lasting effects, the problems revealed by COVID-19 will have far-reaching effects on the global economy, politics, healthcare, education and other aspects. Among them, in stark contrast to the global economic pause, is the rapid growth in data volume brought about by the pandemic, and the resulting set of problems prompting the need for strategies on how to better implement data protection in the post-pandemic era. The following section of this chapter will give a pre-COVID-19 data protection overview. The chapter will discuss the changes to data brought about by COVID-19, it will highlight the challenges that we are now facing in a post-COVID-19 world and finally, it will suggest some solutions.

2 Pre-COVID-19 Data Protection Overview There are different academic and legal definitions for “data protection”. Petrocelli [80] argues that it means “protecting important data from damage, alteration, or loss”. Room et al. [87] propose that it can be viewed in three different ways: S. Yu () School of Professional Studies, Clark University, Worcester, MA, USA e-mail: [email protected] F. Carroll Cardiff School of Technologies, Cardiff Met University, Cardiff, Wales, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Hewage et al. (eds.), Data Protection in a Post-Pandemic Society, https://doi.org/10.1007/978-3-031-34006-2_5

145

146

S. Yu and F. Carroll

In an operational sense, it means achieving predefined outcomes during the data processing; In a legal sense, it means the regulatory framework that governs these activities; In a colloquial and limited sense , it is sometimes viewed as a synonym for the security of data. Legally, “data protection” is often used interchangeably with terms such as “privacy protection” and “personal data protection”. According to the General Data Protection Regulation (GDPR), “personal data” means any information relating to an identified or identifiable natural person (“data subject”) [110]. In US legislation, data protection melds the areas of data privacy (i.e., how to control the collection, use, and dissemination of personal information) and data security (i.e., how to protect personal data from unauthorized access or use, and how to respond these unauthorized access or use) [71]. Traditional data protection technologies include anonymization, fuzzification, and cryptography [118], but in the big data environment, they are rarely successful [101]. Almost any data element in a big data environment is identifiable and, when combined with other data elements stored in the same or different locations, can reveal sensitive information [65]. To protect personal data, most governments have regulations in place to regulate the collecting, processing, and dissemination of data and penalize organizations that fail to adequately protect it. The GDPR came into force in 2016 after passing the European Parliament and is mandatory for all organizations as of May 25, 2018 [34]. The UK enacted the Data Protection Act 2018 in the same year, which is the UK’s implementation of the General Data Protection Regulation (GDPR) [38]. Unlike the EU’s uniform legislative model, there is no uniform data protection law at the federal level in the US. Instead, the US has adopted a sub-sectoral decentralized legislative model with specific data protection legislation in areas such as telecommunications, finance, health, education, and children’s online privacy [71]. Canada introduced the Privacy Act in 1983, which governs how the federal government handles personal information collected, used or disclosed from the public and its employees [82]. In 2020, the Personal Information Protection and Electronic Documents Act (PIPEDA) received royal assent and was amended in 2015 to set out the basic rules for how companies must handle personal information in the course of their business activities [81]. The Law of the People’s Republic of China on Cybersecurity, which has been implemented since 2017, regulates the processing of personal information and online data security, but there is no specific data protection law [22]. In contrast, some countries, including Brazil, South Korea and Japan, have enacted comprehensive data protection legislation [71]. As of January 2019, 132 countries worldwide have enacted data privacy laws covering both the private and public sectors, and at least 28 other countries have official acts of such laws in various stages of progress [40].

Securing Privacy During a World Health Emergency: Exploring How to Create. . .

147

3 Changes to Data Brought About by COVID-19 Rapid Growth in the Volume of Data COVID-19 significantly impacted the amount of data that was and is created online. As Jovic et al. [49, p. 1] noted: “Coronavirus disease of 2019 and related containment measures have grossly affected the daily living and created a need for alternative ways of social communication and entertainment”. Indeed, the COVID19 pandemic and its many lock-downs meant that people used the Internet more than ever to stay connected with others, on a personal and work basis. Furthermore, Forbes [8] highlighted that millions of people went online for entertainment with total internet hits surging by between 50% and 70% in 2020. In addition, the UN [105] estimated an additional 782 million people came online during the pandemic (2020 and 2021) but with that also came, unequal access, incendiary hate speech and cybercrime. They believe that this will result in the cost of data breaches reaching a top $5 trillion by 2024 [105]. In their study, Masaeli et al. [64, p. 1] “found an increase in Internet-based addictive behaviours during the COVID-19 pandemic mostly due to financial hardships, isolation, problematic substance use, and mental health issues such as depression, anxiety, and stress”. The analysis of Twitter data shows a significant increase in abusive content generated during the lock-downs [5]. So along with more users using the internet a lot of work also started to be undertaken to make sense of what was/is happening to people as their lives changed and they were using the internet more. For example, Daughton et al. [23] used social media data from Twitter to identify human behaviors relevant to COVID-19 transmission, as well as the perceived impacts of COVID-19 on individuals. With this comes many challenges for the protection of people’s data online.

Expansion in the Scope of Data COVID-19 has not only upended data science practices but it also showed the world the power of data. In fact, “a staggering 19,389 articles about COVID-19 were shared in the first four months of the pandemic, a third of which were preprints, unvetted and unfiltered for all to see” [113, p. 1]. This rapid data sharing during COVID-19 has not only changed science forever [113] but also widened the scope of data (i.e. taken the meaning of data sharing to a new level). Indeed, data can now be seen to provide the means to support the response to new global threats such as COVID-19. “Electronic health record (EHR) data are a valuable resource to mitigate the COVID-19 pandemic” [88, p. 1]. As Mueller and Papenhausen [70, p. 1] highlight the data helped “local governments to plan the allocation of testing kits, testing stations, and primary care units”, it also helped them “in setting guidelines for residents, such as the need for social distancing, the use of face masks, and when to open local businesses that enable human contact” [70, p. 1]. In addition,

148

S. Yu and F. Carroll

social media played a crucial role in disseminating health information and tackling infodemics and misinformation [104]. New technologies had a huge role to play here, with applications of AI and Big Data being central in the global effort to manage the pandemic. However, Hickok [43, p. 1] points out “exigent circumstances must not lead us down the road of trading ethics for techniques and technologies that impact democracy under the guise of ‘public health”’. Data protection and ethics need to remain at the forefront of all these new changes.

Changes for Data Controllers and Processors As mentioned over a hundred countries have some form of international privacy laws for data protection to ensure that citizens and their data are protected [102]. Many of these were inspired by the General Data Protection Regulation (GDPR) which is a regulation in EU law on data protection and privacy [27]. Essentially, GDPR is a set of rules (with clearly defined data controllers and processors) designed to give EU citizens more control over their personal data. However, despite these and other privacy rules, the coronavirus pandemic affected people and their data in ways that no-one could have foreseen. In detail, governments had to make hard decisions around the tracking and containment of the spread of the coronavirus whilst also harnessing the power of data (and using people’s data) to come up with solutions to control the virus. It was apparent that data and people’s data were at the heart of this. Moreover, Majeed et al. [59] highlighted that every “country has implemented digital solutions in the form of mobile applications, web-based frameworks, and/or integrated platforms in which huge amounts of personal data are collected for various purposes (e.g., contact tracing, suspect search, and quarantine monitoring)”. But again, it was/is a fine balance as many of these technologies can collect very sensitive data like people’s movements, spatio-temporal activities, travel history, visits to churches/clubs, purchases, and social interactions [59]. Data controllers who are the entity (person, organization, etc.) that determines the why and the how for processing personal data and the data processor who processes personal data only on behalf of the controller [27] often needed to adapt/ change their practices of data collection. As Giménez [36, p. 1] pointed out “COVID19 is allowing the adoption of all kinds of exceptional measures that would not be justified in a normal state situation”. When the coronavirus pandemic was first declared, many data controllers and processors questioned whether it was acceptable to request information from their employees/customers in order to assess the risk factors of spreading the virus. However, as the virus progressed, many changes were made to ensure that a correct balance was maintained between the prevention of infections and the collection, treatment and transfer of personal data that can promote the identification of specific people [36].

Securing Privacy During a World Health Emergency: Exploring How to Create. . .

149

Weakened Control of Data by Data Subjects Everyday more data about each of us is being generated and as Kerry [52] notes “it’s a losing game both for individuals and for our legal system.” In aftermath of COVID-19, there still needs to be a balance between the benefits of access to our personal data (i.e. location data etc.) with the risks to our privacy and human rights. However, as we have seen many times during this period (2020–2021), there was a reduction in data protection and privacy standards to enable the required data flows for the COVID-19 solutions. As Zwitter et al. [121, p. 1] highlighted: “such large-scale incursion into privacy and data protection is unthinkable during times of normalcy”. However, as they discuss “in times of a pandemic the use of location data provided by telecom operators and/or technology companies becomes a viable option” [121, p. 1]. As a result of this government drive and also the increased technological complexities, it was hard for people to gain control over their own personal data. Furthermore, Nickel et al. [72, p. 1] show: “data subjects’ trust in the institutions and organizations that control their data, and their ability to know their own moral obligations in relation to their data, are undermined by significant uncertainties regarding the what, how, and who of mass data collection and analysis”. This is becoming more so, as we move into the collection of biometric data which is especially sensitive as it can reveal intimate information about data subjects [54]. In fact, as Veliz et al. [109, p. 1] point out: “Patients should not be forced into giving up any more personal information than what is strictly necessary to receive an adequate treatment”. Furthermore, at a systems level, Veale et al. [108, p. 1] show that “some confidentiality-focused Data protection by design (DPbD) strategies used by large data controllers leave data reidentifiable by capable adversaries while heavily limiting controllers’ ability to provide data subject rights.”

Increased Impact of Data on Every Day People’s Lives Big data, for most people, has affected their daily experiences (i.e. education, health, travel, shopping etc.) in ways that they don’t even realise. On the positive side, it has improved the quality of life, saving them time, money and energy but countering this it also has made them lazier and has caused huge invasions of their privacy and security. In terms of COVID-19, as we have seen “controlling such epidemic requires understanding its characteristics and behavior, which can be identified by collecting and analyzing the related big data” [3, p. 1]. Data was key here to understanding the impact of the virus and how people needed to adapt their behaviours. As Carroll et al. [16, p. 1] note “global disease trackers quantifying the size, spread, and distribution of COVID-19 illustrate the power of data during the pandemic”. However, as discussed this requires a balancing between the spread of the virus and the risks to peoples privacy. Interestingly, data was also collected and used to investigate the mental impact of COVID-19. As Zhang [119, p. 1]

150

S. Yu and F. Carroll

highlighted: “These exploratory analyses revealed the specific emotions that people experienced and the topics that people are concerned about during the pandemic”. Essentially data was being collected to understand the virus but also to understand how people were coping with the virus. As Carroll et al. [16, p. 1] show: “There are dual concerns about the availability and suppression of COVID-19 data; due to historic and ongoing racism and exclusion, publicly available data can be both beneficial and harmful”. Moreover, post pandemic and now working to save the planet, governments around the world propose to improve urban life by creating smart cities filled with sensors and other digital technologies that would collect large amounts of data about citizens and their activities [93]. Again, it is safe to say that this can have a big impact on peoples lives; particularly concerns about privacy, questions of ownership, and other issues have made smart city technology and the data they collect controversial [93].

New Data Technologies and Their Role Without a doubt, technology innovation is critical to all kinds of businesses around the world yet its ever evolving nature is constantly impacting (positive and negative) on our society. As Srinavin et al. [96] highlight: “The world is currently driven by data”. However, for many companies, there is currently a need to deal with the profound changes it creates, particularly, in the way they manage their business, their customers and their business models, since they are overrun by a data-driven revolution in management [84]. In fact, big data and it’s technologies has the potential to ‘disrupt’ the senior management of organisations, prompting directors to make decisions more rapidly and to shape their capabilities to address environmental changes [67]. As a result, developing big data applications that work for/align within companies has become increasingly important in the last few years [75]. As companies become more and more dependent on the insights extracted from these huge volumes of data, it is important that they are aligned to the business/ society’s code of ethics. In their research, Garcia et al. [33] explore Big Data-driven Artificial Intelligence (AI) applied to social systems, in detail, they explore the concept of artificial intelligence as an enabler of novel social solutions. Indeed, as we have seen through COVID-19, the impact of these new technologies is great, however, as we move into a post COVID-19 world, we need to make sure that these technologies are aligned to humans in terms of our moral values and ethical principles [47]. Otherwise, the impact on society might not be a good one.

4 Challenges of Data Protection in Post-COVID Society The urgency of the fight against COVID-19 has led most governments to reprioritize during the pandemic, making data privacy a secondary priority [39]. Some govern-

Securing Privacy During a World Health Emergency: Exploring How to Create. . .

151

ments have treated COVID-19 as a special case where sensitive data can be collected to protect public health [74]. However, these policies and measures introduced on an ad hoc basis in response to major emergencies may pose long-term privacy risks [78]. In addition, pandemics have brought about fundamental changes in the way people work and live [15], with telecommuting, tele-education, telemedicine and online shopping increasingly being chosen, which exposes more of their private data to the Internet and thus faces new data protection challenges.

Handling of COVID-Related Data In the face of public health emergencies, society has taken a more tolerant approach to the collection and use of data. Even the GDPR, which has extremely strict requirements for the protection of personal data, recognises in Articles 6 and 9 that data processing is justified without the consent of the person concerned when it is in the public interest to do so [110]. In order to limit the spread of the virus and speed up recovery, countries around the world have expanded and broadened the scope of data collection, processing and sharing to include sensitive data involving personal privacy such as geolocation data and biometrics [74]. For example, South Korean government agencies used surveillance camera footage, smartphone location data and credit card purchase records to help trace the recent movements of coronavirus patients and establish the virus transmission chains [17]. In addition, Italy, Israel and Singapore have all been reported to trace people exposed to potential risks based on their mobile phone locations [94]. Governments are even monitoring the spread of viruses with additional mobile data collected from email, banking services, social media and postal services. Some invasive means of data collection, such as drones, face recognition and thermal scans were also used in many places [11, 42]. Cell phone location-based applications were developed during the pandemic to keep users informed about the epidemic in their area and to get alerts about the risk of infection, and once permission was obtained from the user, these applications could legally access the user’s geographic location [30, 89]. For the vast majority of users, these apps are not actively removed or location permissions turned off even after the pandemic ends, so their location data will continue to be recorded for a long time [77]. Where the users visit, their daily routines, who they interact with, etc. can all be easily obtained from location data, even without the need for technical analysis [107]. This location data could potentially leak users’ privacy as if they were under surveillance. For example, the New York Times’ Time Privacy Project was able to de-anonymize location data in just minutes with publicly available information based on anonymous cell phone location datasets from 2016 to 2017, tracking the whereabouts of President Trump, and even the travel and personal information of Secret Service agents, Defense Department officials and Supreme Court technicians could be easily obtained [103]. Prior to the outbreak, people

152

S. Yu and F. Carroll

could choose whether and how to share their data for privacy purposes, keeping the dissemination of personal data to a necessary minimum [78]. However, the large amount of data collected during the epidemic greatly reduced the control people had over their personal data. Even after the epidemic has ended, some data may remain after desensitization for scientific research in response to future needs in a similar crisis. Thanks to big data and artificial intelligence technologies, this “de-identified” data can be easily turned into identifiable data [66]. Recent research has shown that there exists a plethora of techniques for re-identifying individuals using seemingly anonymous information [35]. With more such technologies being developed every day, it means that this data is like a Pandora’s box that can be opened at any time, leading to a privacy disaster. Governments often partner with communications or IT companies to obtain data, or leave it to private companies to develop applications related to epidemics [44] [37]. As a result, private companies have access to vast amounts of personal data, but the profit-seeking nature of private companies raises concerns about whether they will properly handle this collected data. Because much of the data associated with COVID-19 is obtained without the consent of the individuals involved, the public does not even know which companies are collecting the information or what they are doing with it, let alone being able to assert individual rights based on the data. The vast amount of healthcare data collected during the pandemic is tied to people’s health, safety, and privacy, and a breach could have serious consequences. A report from IBM shows that healthcare has had the highest average data breach costs of any industry for 12 consecutive years, with costs rising 42% since 2020 [46]. People cede their privacy to data controllers at special times in the public interest, and data controllers are expected to process personal information collected during those times in a timely manner after the special times have passed in order to protect the individual’s “right to be forgotten”. However, there are no industry norms or best practices on how to handle such data, and some companies may consider such data as their data assets for long-term storage, or even for trading profits. For example, Google’s SensorVault location logging feature, which has been available since 2009, regularly collects data from GPS, cell towers, nearby WIFI devices and Bluetooth beacons and keeps it indefinitely unless the user deletes it [106]. The British consulting firm Cambridge Analytica was also exposed to have collected personal data from millions of Facebook users without their consent, mainly for political advertising [20]. Facial recognition technology is also used for contact tracing in China, but little is known about how these facial images are stored and used, who may use or misuse them, and to what extent [100]. Uncontrolled use of data can even have long-term effects on patients. For example, some recovered patients have been treated in a discriminatory fashion in the workplace and have been denied employment by employers because of their infection records [92].

Securing Privacy During a World Health Emergency: Exploring How to Create. . .

153

Technical Challenges to Protect Privacy Risks from Remote Work As data collection explodes in the era of the epidemic, the technology to protect privacy is under pressure. Because of the need for social isolation in an epidemic, more and more companies are conducting business by remote work, and some have even shifted their operations from offline to online on a permanent basis [10]. However, the telecommuting model of working faces significant risks of data leakage. In the offline office mode, workplaces usually have professional data protection solutions with specialized technicians and professional firewalls and other equipment. However, in remote work mode, employees tend to abandon regular office security practices and most companies do not have specific security training for remote workers [60]. Human error, security breaches, hacking, etc. in remote work can cause data breaches [61]. For example, Dedalus Biologie, a French company, had the health data of nearly 500,000 people stolen online from approximately 30 medical laboratories as a result of multiple breaches [26]. Employees working remotely using their own or company-equipped computers to remotely connect to the company network will be a weak link that can be exploited or attacked by hackers to steal data [31]. According to Bloomberg, companies with more than 60% of their employees working remotely have the highest average cost of data breaches [95]. Work-from-home was widely used during the pandemic as a response to lockdown policies during the pandemic and has become an option for many in its aftermath. This blurs the boundary between public and private and makes it easier to expose personal privacy. Vulnerabilities in video conferencing software such as Zoom, Teams, etc. also put user privacy at risk [12]. The impact of social isolation and lockdown has also forced some businesses involving a high degree of personal privacy to move from offline to online, such as telemedicine and tele-psychological counseling [63]. Social isolation measures have widespread psychological impact on people, and the narrowed access to services available to individuals (e.g., closure or restriction of services in hospitals, clinics, and counselors’ studios) has forced them to turn to other channels [91]. To cope with the huge number of rigid demand for psychological counseling, some underqualified online platforms and APPs have been developed [6]. These platforms may have loopholes in technology and management, and cannot provide data security guarantee, causing hidden dangers to user data [63]. Risks from Big Data and Artificial Intelligence Inference Analysis In the era of Big Data, sophisticated electronic devices are constantly recording personal data for analysis, and the power of technology can increase the likelihood of personal privacy violations. A growing body of research has shown that inferential analysis based on data aggregation is rendering a lot of the most advanced pseudonymous/anonymous data set practices meaningless [86]. The UN Special Rapporteur on the Right to Privacy has also highlighted the risks associated with the combination of “closed” and “open” datasets [13]. Since new information is created when data is aggregated, the more raw data there is, the more new information is created as a result of the aggregation [41]. Even if the original data is anonymized and non-

154

S. Yu and F. Carroll

sensitive, sensitive personal information may be obtained after aggregation [66]. Data re-identification studies have shown that a variety of publicly available digital footprints, including text instances, browsing history or Facebook likes, alone are sufficient to infer personal characteristics such as sexual orientation, race, religious and political views, personality and intelligence [19]. In the context of a pandemic, large amounts of personal data are collected from various sources and exchanged and shared among different entities to enable collaborative epidemic control. Few people know for what purpose this data will serve [56]. Therefore, one of the challenges to be faced in the post-COVID-19 era is how to enhance the protection of data while achieving its normal circulation and use. Risks from Cloud Services With the increasing amount of remote work, more and more data needs to be stored or processed on the cloud, especially the large amount of data related to COVID-19 is usually stored in the cloud space, which also faces the risk of data leakage [61]. Cloud computing provides many services to its customers, such as computing resources, data storage, networking technologies, and software applications [69]. Some attributes of cloud services, such as, on-demand self-service, pay-per-use resources, mobile access to resources, resource pooling, virtualization, and rapid elasticity [76], make it an ideal platform for storing and processing data during a pandemic. The advantages of cloud computing over onpremise technologies are high scalability, low cost, low maintenance, unlimited storage capacity, access from anywhere via the Internet, and more [69]. However, this also makes the cloud vulnerable to risks and threats from attackers who try to exploit any vulnerability in the system to compromise the security objectives, i.e., confidentiality, integrity, availability [45]. Many organizations are also facing some security challenges with some risks of data leakage due to sudden use of cloud platforms without adequate precautions [61]. Some system administrators do not have sufficient cloud management skills to implement effective protection of the data they manage, which can also easily lead to data leakage [4]. In June 2022, a hacker sold personal data of nearly 1 billion people in China from Ali cloud servers for 20 bitcoins on a dark web platform, and the media verified that the data sold was real [120]. This is just one example of the many data breaches in the cloud, where 45% of data breaches occur, according to IBM’s report [46].

Ethical Challenges of a Rapidly Evolving Digital Society To control the spread of the epidemic, some countries have adopted contact-tracing programs, health codes, etc. This has advanced the digital society, but it also poses ethical challenges [78]. Discrimination and Inequality Caused by Labeling According to labeling theory, people attach labels on others to facilitate their understanding of the social world around them [7]. In pandemics, government departments divide people into uninfected, suspected, and infected for pandemic preparedness, which leads to

Securing Privacy During a World Health Emergency: Exploring How to Create. . .

155

stigma and discrimination. In China, for example, the government used health codes to mark the health status of residents and whether they had visited high-risk areas [57, 79]. A green code indicates no risk of infection, a yellow code indicates risk of exposure to infection, and a red code indicates an infected person [21]. Residents need to travel with a green code displayed on their cell phones to travel, and need to be quarantined immediately once the health code shows a red code [79]. This measure amounts to a digital tool that labels people in a variety of ways, in effect subjecting many to discriminatory treatment, and some may be denied services or access to public places simply because the trip code indicates they are from a high-risk area. Some people who do not know how to use a smartphone or do not have one are de facto denied access to public places and public transportation because they cannot show a green health code [56]. Worse, people who pass through risky places even for a short time are segregated by the code due to the opaque mechanism of the code assignment [21]. This function may be improperly used by companies or individuals to assign red codes for purposes not directly or specifically related to the COVID response, violating basic human rights and freedoms [117]. In India, people are stigmatized and treated in a discriminatory manner simply because their infection status, race, occupation, and religious identity are somehow related to COVID. Suspected and actual infected people are negatively labeled as “new coronavirus carriers” and are stigmatized and abandoned, while health care workers, police and municipal workers at the forefront of the fight face social ostracism and abuse, and even hostility, segregation and violence based on geographic or religious identity as a result of COVID [9]. In fact, multiple studies have shown that discrimination and stigma associated with COVID are prevalent in Asia, Europe, Africa, North America, and Europe [28, 48, 50, 58, 62, 68]. For example, a study from the U.S. National Institute on Minority Health and Health Disparities (NIMHD) showed that people from all racial/ethnic minority groups in the United States reported experiencing more COVID-19-related discrimination than white adults, with Asian and American Indian/Alaska Native adults being the most likely to experience such discrimination [99]. This nationally representative online survey shows that limited English proficiency, lower education, lower income, and living in a big city or East South Central census division also increase the prevalence of discrimination [99]. The research suggests that the COVID-19 pandemic has exacerbated existing resentment toward racial and ethnic minorities and other minority populations [2, 73]. Conflict Between the Right to Group Safety and Individual Privacy In the context of an epidemic, when group safety and individual privacy cannot be reconciled, controlling the spread of disease and safeguarding public health becomes a higher priority. The European Data Protection Board’s official statement on the processing of personal data in the context of the COVID-19 outbreak begins by stating that “Data protection rules (such as the GDPR) do not hinder measures taken in the fight against the coronavirus pandemic. The fight against communicable diseases is a valuable goal shared by all nations and therefore, should be supported in the best possible way [25].” Since the beginning of the epidemic, its high

156

S. Yu and F. Carroll

contagiousness has caused panic, and as cases spread rapidly, personal information, family backgrounds, travel records, and other personal details of infected people were disseminated by the news media and social media [85]. With the continuous development of electronic tracking means, people’s information is recorded and exposed more often. For example, Hong Kong uses electronic bracelets to track the movement data of people isolated at home [55], and more countries have developed contact tracing applications that reveal the location of individuals and groups of people through data generated by telephone networks, WiFi connections, satellite-based radio navigation systems and other surveillance combinations. The European Commission also shares cell phone location data from multiple telecom providers to track the spread of the coronavirus [18]. In many cases, once people are diagnosed with the infection, information such as their age, gender, address, specific locations recently visited and the time of their visit are made public to alert potential close contacts for early screening [53]. This information can easily be located to specific individuals, and even more personal information is dug up by internet users and subjected to cyber violence [98]. As the pandemic continues to spread, more personal privacy is sacrificed in exchange for group safety.

Legal Challenges of Data Protection Although many national data protection laws provide for privacy protection, there are still many legal gaps in privacy protection in the era of big data. First, many current laws focus on the illegal collection or disclosure of data, but barely address the misuse of data after it has been collected [111]. There is no solution to the typical problem of “lawful acquisition and improper abuse” in the big data environment. Take the Personal Information Protection Law of China as an example. While the law emphasizes the importance of protecting personal data, it does not provide specific guidance on data protection practices, such as how and where to store sensitive facial data, whether and what kind of encryption should be used, and who should be the data auditor, rulemaker, or enforcer [100]. Second, most data protection laws legally regulate the process of data collection, transmission, processing, and preservation, but this is for data from data subjects, and data controllers can in fact generate new information by inference that is not fully protected by data protection laws [112]. This data generated by Big Data inference in fact plays the same role as the original data from the data subject, and even have higher application value because they are often the result of some kind of analysis. For example, in countries where the risk of infection is determined by trip codes, some people are given a “yellow code” for medium risk or a “red code” for high risk because they receive signals from the same mobile base station as the infected person, but these health codes are not collected from the data subject. These health codes are not collected from the data subject, but are automatically generated by the system according to some algorithm [117]. This inference represents a possibility rather than a certainty, and is not verifiable, making it impossible for the data subject

Securing Privacy During a World Health Emergency: Exploring How to Create. . .

157

to exercise the right of correction [112]. That is, when a person is marked by the algorithm as having a yellow code with a moderate risk of infection, it is difficult for that person to prove that they do not have that risk and request a change to a green code. This vulnerability can be abused by data controllers, thereby infringing on the rights of data subjects. For example, some bank depositors in Henan, China protested against not being able to withdraw their deposits from the bank, only to find that their trip codes turned to red codes thus being forced to quarantine in their homes, even for depositors living thousands of miles away in another city, Guangzhou, which was also set to red codes [117]. Third, the standard of data desensitization rules needs to be clarified. The de-identification of sensitive data is conducive to the protection of personal privacy and the remedy of infringement of personal privacy, but the legislation of various countries has not yet set specific standards for the degree of anonymization of data, even to the extent that there is controversy over how sensitive data should be defined [83]. The U.S. Health Insurance Portability and Accountability Act (HIPAA) only mentions the concept of “de-identification” and requires that anonymous data cannot be compared with other data to achieve subject identification [1]. However, this requirement is becoming more and more vulnerable in the current technological environment. As computing power increases further, so does the likelihood that data controllers will combine various innocuous data sets to form inferences equivalent to sensitive data [83], which can in fact bypass legal regulation.

5 Solutions The COVID-19 outbreak will eventually end, but the pandemic is a wake-up call to human society as to whether we are prepared to respond well enough when the next global public emergency arrives.

Establish Mechanisms to Respond to Similar Crises The pandemic has exposed that there is still no perfect mechanism to strengthen data protection globally in the face of sudden emergencies. This ultimately involves the participation of data subjects, data collectors, data transmitters, data processors, policy makers, and industry regulators, and is based on the long-term goal of data protection to form a series of effective measures and systems, which should be a proactive rather than a reactive data protection program. Past human history shows that when governments take extraordinary measures to resolve unexpected crises, these measures may remain in place on a regular basis even when the crisis has lifted. These measures often tend to expand government power and are invasive to citizens’ privacy [51]. For example, the United States has rushed the Patriot Act through Congress since the 9/11 attacks. The Act provides

158

S. Yu and F. Carroll

that the U.S. government may access any information stored in U.S. data centers or stored by U.S. companies without the prior consent of the data subject [14]. Under the Act, it is highly unlikely that the data subject would be unaware of the government’s access to the information and would be accused of abusing the government’s power in the “war on terror” [116]. In South Korea, following the outbreak of Middle East Respiratory Syndrome in 2015, data protection laws were overturned to allow authorities to collect large amounts of personal data for contact tracing under the Communicable Disease Prevention and Control Act. This includes credit card transactions, public transportation data, and medical and prescription records [32]. In Poland, cell phone location data collected during the outbreak may be retained for up to six years after the outbreak [90]. In China, some local governments have moved to use health codes developed during the epidemic as a long-term digital management tool tied to other public services [21]. In the midst of this pandemic, we are already seeing some countries and governments increasingly collecting personal and private data on their citizens. The European Commission has issued an order requiring communications companies to turn over aggregated data on movement trajectories to track the trajectory of the pandemic and to monitor whether people are complying with home-based orders. Some European countries have also adopted independent information controls in response to the crisis: Germany has included provisions in its national legislation implementing the European General Data Protection Regulation that explicitly allow for the use of personal data in the event of a pandemic; Italy has passed emergency legislation requiring all people who have recently visited an area at risk of an epidemic to inform the health authorities, either voluntarily or through a doctor. China’s “health codes” and “trip codes” are widely used to assess whether people are healthy and immune and to determine who can return to work [21]. While these measures have certainly played a role in the response to the pandemic, their effect on privacy violations is also evident. It is therefore predictable that when the next similar crisis comes, these “successful” measures will most likely be replicated and people will again have to surrender their privacy rights in exchange for the public good. These temporary measures are not conducive to the long-term health of social institutions; rather, only a proactive approach and the establishment of a regular crisis response mechanism will allow us to be truly prepared for the next crisis. Such a regular crisis response mechanism should have detailed regulations corresponding to the level and scope of the crisis, the subject of implementation, the way information is used, the duration of use, and the termination criteria. As well, the circumstances under which the collected information can be integrated with other information, as well as the way, subject, and period of use of the information, etc. should be clearly defined. For example, unconventional restrictions on personal privacy should be made clear that they will only be used in this epidemic, and only in cases where alternatives are difficult to come by for the same purpose. Individuals’ consent and authorization cannot be relied upon alone to protect personal privacy. Nor should the right to restrict the use of personal authorization data be “waivable”.

Securing Privacy During a World Health Emergency: Exploring How to Create. . .

159

Promote Technological Innovation In the future, the explosive growth of data volume will remain an unchangeable trend, and the ensuing privacy concerns will become more and more prominent. Thanks to technological advances, we can actually choose more privacy-friendly technologies to achieve the same or similar purposes. Digital Proximity Tracing (DPT) is considered to be a better solution that protects people’s privacy while effectively controlling the spread of an outbreak than GPS-based solution. Digital Proximity Tracing is a method of capturing anonymized interactions between individuals and subsequently sending alerts, usually using a smartphone or purposed-built device. When the smartphones of two users, Person A and Person B, come into close proximity, they exchange their respective anonymous keys. If Person A tests positive for COVID-19, his or her status is updated in the app. Person A’s status can be notified to Person B via a decentralised or centralised method. Using the decentralised method, the smartphone only uploads Person A’s key to the back-end server. All keys from the infected user are downloaded by the app on Person B’s phone, and key matching is performed locally, with alerts sent accordingly to notify contacts that they have been exposed. Through a centralised approach, the smartphone uploads Person A’s keys and other keys collected from their previous contacts. Key matching is performed on a centralised server and contacts who may be at risk are subsequently notified. Any data captured using a DPT solution should not include the identity or geographical coordinates of an individual and therefore effectively protects people’s personal privacy [115].

Strengthen Industry Regulation The World Health Organisation has issued a document on ethical considerations for digital proximity tracking technologies for COVID-19 contacts, providing guidance to governments, public health agencies, non-state actors (NGOs, charities, foundations) and companies on the ethical and appropriate use of digital tracing technologies in response to COVID-19. These suggestions include seventeen principles, which are: time limitation, testing and evaluation, proportionality, data minimization, use restriction, voluntariness, transparency and explainability, privacy-preserving data storage, security, limited retention, infection reporting, notification, tracking of COVID-19 positive cases, accuracy, accountability, independent oversight, civil society and public engagement [114]. These principles provide good guides for strengthening industry regulation. Indeed, industry regulators should strengthen regulation not only on the use of a particular technology application, but also on the use of data, particularly the use of intrusive data collection tools, such as those with facial recognition and mobile phone location tracing programs. Second, it should also regulate whether the data collected is used for its original purpose. Data is often contextually relevant. An insensitive dataset used

160

S. Yu and F. Carroll

in a different context may be sensitive [83], and we need appropriate governance frameworks to ensure that this data is generated, analyzed, stored, and shared in a legal and responsible manner. In light of the COVID-19 pandemic, location data may be very useful for epidemiological analysis. But in the context of a political crisis, the same location data may threaten the rule of law, democracy, and the enjoyment of human rights. For example, German police were reported to have misused the COVID contact tracing app to track down witnesses to a local crime, when the data should have been accessible only to local health authorities [24]. When both industry self-regulation and regulation fail, it effectively destroys public trust. In contrast, in China, some officials have been punished for abusing the health code system by changing the health codes of bank depositors, which in a way serves as a warning to those who abuse the data [97].

Improve Laws and Regulations Although more than 130 countries around the world have enacted privacy laws and regulations by the time of the pandemic [40], we have seen in the past two years or so that people’s privacy has not been adequately protected. This means that our legal system still needs to be further improved in detail to make it more workable. It is often said that data is the new oil. In the era of rapid development of artificial intelligence and big data technology, the mining of data will generate new values, but this is at the cost of people’s privacy risks. Traditional data protection laws or privacy protection laws still focus on the protection of existing data, while failing to provide protection for new information generated in the process of aggregation, analysis, and inference of data [112]. The law should provide more detail on the process of aggregation and inference of data, and how to apply the results. In addition, while data desensitization is provided for in many laws, there is no clear definition of what standards need to be met, which effectively relieves data controllers of their data confidentiality obligations [83]. The authors of this paper argue that when data controllers provide desensitized data to outside parties, they should have a reasonable expectation of the possibility of data re-identification and take proactive measures to prevent re-identification. Legislation on data protection involves not only legal issues but also technical issues, and the legislature should listen to the opinions of scholars, especially privacy protection experts, in the legislative process in order to develop better privacy protection laws.

6 Conclusion The COVID-19 pandemic has created an unprecedented global crisis. This epidemic and the accompanying various prevention and control policies, such as embargoes

Securing Privacy During a World Health Emergency: Exploring How to Create. . .

161

and movement restrictions, have affected many areas of society and every aspect of everyone’s life and work. This will have a long-term impact for many years to come. In the digital age, data exists in every aspect of people’s lives. The sudden outbreak of pandemics that have forced people to disconnect physically and turn to digital connections has raised concerns about data security like never before. As we have discussed, the pandemic has changed people’s lives dramatically, including the rapid growth of data volumes, the expansion of data scope, the change of data controllers and processors, and the weakening of data subjects’ control over data, which is having a greater impact on the privacy of people’s daily lives. In the post-COVID-19 era, we are faced with a new set of challenges, including how to learn from our mistakes—how to handle COVID-related data and how to address the technical, ethical and legal challenges of protecting privacy. The authors of this paper call on all parties concerned to work together to establish a sound and standing mechanism to deal with the challenges of a possible similar crisis (pre and post pandemic). We need to advance technological innovation in privacy protection, strengthen industry regulation of data controllers, and promote legislation and enforcement of privacy protection to protect people and ensure a safer and more private society.

References 1. Act1996, Health insurance portability and accountability act of 1996. Public Law 104, 191 (1996) 2. I.Y. Addo, Double pandemic: racial discrimination amid coronavirus disease 2019. Soc. Sci. Humanit. Open 2(1), 100074 (2020) 3. S.J. Alsunaidi, A.M. Almuhaideb, N.M. Ibrahim, F.S. Shaikh, K.S. Alqudaihi, F.A. Alhaidari, I.U. Khan, N. Aslam, M.S. Alshahrani, Applications of big data analytics to control covid-19 pandemic (2021). https://doi.org/10.3390/s21072282 4. M.A.M. Ariffin, M.F. Ibrahim, Z. Kasiran, API vulnerabilities in cloud computing platform: attack and detection. Int. J. Eng. Trends Technol. 1, 8–14 (2020) 5. P. Babvey, F. Capela, C. Cappa, C. Lipizzi, N. Petrowski, J. Ramirez-Marquez, Using social media data for assessing children’s exposure to violence during the covid-19 pandemic. Child Abuse Negl. 116 (2021). https://doi.org/10.1016/j.chiabu.2020.104747 6. S. Bassan, Data Privacy Considerations for Telehealth Consumers Amid COVID-19, vol. 7 (Oxford University Press, Oxford, 2020) 7. H.S. Becker, Outsiders: Studies in Sociology of Deviance (New edition) (Free Press, New York, 1997) 8. M. Beech, Covid-19 pushes up internet use 70% and streaming more than 12%, first figures reveal (2020). https://www.forbes.com/sites/markbeech/2020/03/25/covid-19-pushes-upinternet-use-70-streaming-more-than-12-first-figures-reveal/?sh=381d7d4f3104. Last Accessed 15 Dec 2022 9. D. Bhanot, T. Singh, S.K. Verma, S. Sharad, Stigma and discrimination during covid-19 pandemic. Front. Public Health 8, 577018 (2021) 10. A. Bick, A. Blandin, K. Mertens, et al., Work from home after the COVID-19 outbreak. Federal Reserve Bank of Dallas, Research Department (2020) 11. S. Braithwaite, Italian police can now use drones to monitor people’s movements, aviation authority says (2020). https://www.cnn.com/world/live-news/coronavirus-outbreak-03-2420-intl-hnk/h_b5c13ce244635a6e5b945f6462b4a374. Last Accessed 15 Dec 2022

162

S. Yu and F. Carroll

12. T. Brewster, Microsoft teams and zoom hacked in $1 million competition (2021). https:// www.forbes.com/sites/thomasbrewster/2021/04/08/microsoft-teams-and-zoom-hacked-in1-million-competition/?sh=61c60f6b68f6m-home-cybersecurity-risks-create-these-newdangers-opportunities. Last Accessed 15 Dec 2022 13. J.A. Cannataci, Report of the special rapporteur on the right to privacy. Human Rights Council (2016) 14. M. Carlisle, How 9/11 radically expanded the power of the U.S. Government (2021). https:// time.com/6096903/september-11-legal-history/. Last Accessed 15 Dec 2022 15. E. Caroppo, M. Mazza, A. Sannella, G. Marano, C. Avallone, A.E. Claro, D. Janiri, L. Moccia, L. Janiri, G. Sani, Will nothing be the same again?: changes in lifestyle during covid-19 pandemic and consequences on mental health. Int. J. Environ. Res. Public Health 18(16), 8433 (2021) 16. S.R. Carroll, R. Akee, P. Chung, D. Cormack, T. Kukutai, R. Lovett, M. Suina, R.K. Rowe, Indigenous peoples’ data during covid-19: from external to internal. Front. Sociol. 6 (2021). https://doi.org/10.3389/fsoc.2021.617895 17. S. Cha, S. Korea to test AI-powered facial recognition to track covid-19 cases (2021). https:// www.reuters.com/world/asia-pacific/skorea-test-ai-powered-facial-recognition-track-covid19-cases-2021-12-13/. Last Accessed 15 Dec 2022 18. F.Y. Chee, Vodafone, Deutsche Telekom, 6 other telcos to help EU track virus (2020). https://www.reuters.com/article/us-health-coronavirus-telecoms-eu-idUSKBN21C36G. Last Accessed 15 Dec 2022 19. F. Chen, N. Wang, J. Tang, D. Liang, H. Feng, Self-supervised data augmentation for person re-identification. Neurocomputing 415, 48–59 (2020) 20. N. Confessore, Cambridge analytica and facebook: the scandal and the fallout so far (2018). https://www.nytimes.com/2018/04/04/us/politics/cambridge-analytica-scandalfallout.html. Last Accessed 15 Dec 2022 21. W. Cong, From pandemic control to data-driven governance: the case of China’s health code. Front. Polit. Sci. 3, 627959 (2021) 22. R. Creemers, P.T. Graham Webster, Translation: cybersecurity law of the People’s Republic of China (effective June 1, 2017) (2018). https://digichina.stanford.edu/work/translationcybersecurity-law-of-the-peoples-republic-of-china-effective-june-1-2017/. Last Accessed 15 Dec 2022 23. A.R. Daughton, C.D. Shelley, M. Barnard, D. Gerts, C.W. Ross, I. Crooker, G. Nadiga, N. Mukundan, N.Y.V. Chavez, N. Parikh, T. Pitts, G. Fairchild, Mining and validating social media data for covid-19-related human behaviors between January and July 2020: infodemiology study. J. Med. Internet Res. 23 (2021). https://doi.org/10.2196/27059 24. DW, German police under fire for misuse of covid contact tracing app (2022). https://www. dw.com/en/german-police-under-fire-for-misuse-of-covid-contact-tracing-app/a-60393597. Last Accessed 15 Dec 2022 25. EDPB, Statement on the processing of personal data in the context of the covid-19 outbreak (2020). https://edpb.europa.eu/news/news/2020/statement-processing-personal-datacontext-covid-19-outbreak. Last Accessed 15 Dec 2022 26. EDPB, Health data breach: Dedalus Biologie fined 1.5 million euros (2022). https://edpb. europa.eu/news/national-news/2022/health-data-breach-dedalus-biologie-fined-15-millioneuros_en. Last Accessed 15 Dec 2022 27. EU, What is a data controller or a data processor? (2022). https://ec.europa.eu/info/law/lawtopic/data-protection/reform/rules-business-and-organisations/obligations/ 28. X. Fernández-i Marín, C.H. Rapp, C. Adam, O. James, A. Manatschal, Discrimination against mobile European Union citizens before and during the first covid-19 lockdown: evidence from a conjoint experiment in Germany. Eur. Union Polit. 22(4), 741–761 (2021) 29. W. Feuer, Who says pandemic has caused more ‘mass trauma’ than WWII (2021). https:// www.cnbc.com/2021/03/05/who-says-pandemic-has-caused-more-mass-trauma-thanwwii-and-will-last-for-years.html. Last Accessed 5 March 2021

Securing Privacy During a World Health Emergency: Exploring How to Create. . .

163

30. Forbes, Covid-19 phone location tracking: yes, it’s happening now—here’s what you should know (2020). https://www.forbes.com/sites/zakdoffman/2020/03/27/covid-19-phonelocation-tracking-its-moving-fast-this-is-whats-happening-now/?sh=f594a2711d31. Last Accessed 15 Dec 2022 31. S. Furnell, J.N. Shah, Home working and cyber security–an outbreak of unpreparedness? Comput. Fraud Secur. 2020(8), 6–12 (2020) 32. W. Gallo, South Korea balances privacy, public health in virus fight (2020). https://www. voanews.com/a/east-asia-pacific_south-korea-balances-privacy-public-health-virus-fight/ 6188556.html. Last Accessed 15 Dec 2022 33. P. Garcia, F. Darroch, L. West, L. Brookscleator, Ethical applications of big data-driven AI on social systems: literature analysis and example deployment use case. Information (Switzerland) 11 (2020). https://doi.org/10.3390/INFO11050235 34. GDPR.EU, What is GDPR, the EU’s new data protection law? (2022). https://gdpr.eu/whatis-gdpr/. Last Accessed 15 Dec 2022 35. G. Georgiadis, G. Poels, Towards a privacy impact assessment methodology to support the requirements of the general data protection regulation in a big data analytics context: a systematic literature review. Comput. Law Secur. Rev. 44, 105640 (2022) 36. A.O. Giménez, Covid-19: a challenge for the personal data protection. Actualidad Juridica Iberoamericana (2020) 37. Google, Exposure notifications: help slow the spread of covid-19, with one step on your phone (2022). https://www.google.com/covid19/exposurenotifications/. Last Accessed 15 Dec 2022 38. GOV.UK, Data protection (September 2018). https://www.gov.uk/data-protection. Last Accessed 15 Dec 2022 39. GPA, Gpa covid-19 taskforce: compendium of best practices in response to covid-19 (2020). https://www.pcpd.org.hk/english/news_events/media_statements/files/compendium.pdf. Last Accessed 15 Dec 2022 40. G. Greenleaf, Global data privacy laws 2019: 132 national laws & many bills. SSRN (May 2019). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3381593 41. N. Gruschka, V. Mavroeidis, K. Vishi, M. Jensen, Privacy issues and data protection in big data: a case study analysis under GDPR, in 2018 IEEE International Conference on Big Data (Big Data). IEEE (2018), pp. 5027–5033 42. D. Harwell, Thermal scanners are the latest technology being deployed to detect the coronavirus. But they don’t really work (2020). https://www.washingtonpost.com/ technology/2020/05/11/thermal-scanners-are-latest-technology-being-deployed-detectcoronavirus-they-dont-really-work/. Last Accessed 15 Dec 2022 43. M. Hickok, Ethical AI and big data in times of pandemic. J. Leadersh. Account. Ethics 17 (2020). https://doi.org/10.33423/jlae.v17i4.3100 44. HRW.ORG, Mobile location data and covid-19: Q&a (2020). https://www.hrw.org/news/ 2020/05/13/mobile-location-data-and-covid-19-qa. Last Accessed 15 Dec 2022 45. S.A. Hussain, M. Fatima, A. Saeed, I. Raza, R.K. Shahzad, Multilevel classification of security concerns in cloud computing. Appl. Comput. Inform. 13(1), 57–65 (2017) 46. IBM, Cost of a data breach 2022. https://www.ibm.com/reports/data-breach. Last Accessed 15 Dec 2022 47. IEEE, Ethically aligned design [from the editor]. IEEE Control Syst. 38 (2018). https://doi. org/10.1109/mcs.2018.2810458 48. R. Jaspal, B. Lopes, Discrimination and mental health outcomes in British Black and South Asian people during the covid-19 outbreak in the UK. Ment. Health Relig. Cult. 24(1), 80–96 (2021) 49. J. Jovic, M. Pantovic-Stefanovic, M. Mitkovic-Voncina, B. Dunjic-Kostic, G. Mihajlovic, S. Milovanovic, M. Ivkovic, A. Fiorillo, M. Latas, Internet use during coronavirus disease of 2019 pandemic: psychiatric history and sociodemographics as predictors. Indian J. Psychiatry 62(Suppl 3), S383–S390 (2020) 50. E. Katana, B.O. Amodan, L. Bulage, A.R. Ario, J.N.S. Fodjo, R. Colebunders, R.K. Wanyenze, Violence and discrimination among Ugandan residents during the covid-19 lockdown. BMC Public Health 21(1), 1–13 (2021)

164

S. Yu and F. Carroll

51. L. Kemp, The ‘Stomp Reflex’: when governments abuse emergency powers (2021). https://www.bbc.com/future/article/20210427-the-stomp-reflex-when-governments-abuseemergency-powers. Last Accessed 15 Dec 2022 52. C.F. Kerry, Why protecting privacy is a losing game today—and how to change the game. Brookings Institution (2018) 53. N. Kim, ‘more scary than coronavirus’: South Korea’s health alerts expose private lives (2020). https://www.theguardian.com/world/2020/mar/06/more-scary-than-coronavirussouth-koreas-health-alerts-expose-private-lives. Last Accessed 15 Dec 2022 54. M. Kupiec, Protection of students’ personal data in times of development of biometric technologies as a challenge for universities in Poland. Białostockie Studia Prawnicze 25 (2020). https://doi.org/10.15290/bsp.2020.25.04.06 55. R. Kwan, Hong Kong to electronically tag covid patients as it adopts China’s health code system (2022). https://www.theguardian.com/world/2022/jul/12/hong-kong-to-electronicallytag-covid-patients-as-it-adopts-chinas-health-code-system. Last Accessed 12 July 2022 56. V.Q. Li, L. Ma, X. Wu, Covid-19, policy change, and post-pandemic data governance: a case analysis of contact tracing applications in East Asia. Policy Soci. 41(1), 01–14 (2022) 57. F. Liang, Covid-19 and health code: How digital platforms tackle the pandemic in China. Soc. Media Soc. 6(3), 2056305120947657 (2020) 58. N.M. Lou, K.A. Noels, S. Kurl, Y.S.D. Zhang, H. Young-Leslie, Covid discrimination experience: Chinese Canadians’ social identities moderate the effect of personal and group discrimination on well-being. Cult. Divers. Ethn. Minor. Psychol. 29, 132–144 (2023) 59. A. Majeed, S.O. Hwang, A comprehensive analysis of privacy protection techniques developed for covid-19 pandemic. IEEE Access 9 (2021). https://doi.org/10.1109/ACCESS.2021. 3130610 60. F. Malecki, Overcoming the security risks of remote working. Comput. Fraud Secur. 2020(7), 10–12 (2020) 61. S. Mandal, D.A. Khan, A study of security threats in cloud: passive impact of covid19 pandemic, in 2020 International Conference on Smart Electronics and Communication (ICOSEC) (2020), pp. 837–842. https://doi.org/10.1109/ICOSEC49089.2020.9215374 62. M. Marchi, F.M. Magarini, A. Chiarenza, G.M. Galeazzi, V. Paloma, R. Garrido, E. Ioannidi, K. Vassilikou, M.G. de Matos, T. Gaspar, et al., Experience of discrimination during covid-19 pandemic: the impact of public health measures and psychological distress among refugees and other migrants in Europe. BMC Public Health 22(1), 1–14 (2022) 63. N. Martinez-Martin, I. Dasgupta, A. Carter, J.A. Chandler, P. Kellmeyer, K. Kreitmair, A. Weiss, L.Y. Cabrera, et al., Ethics of digital mental health during covid-19: crisis and opportunities. JMIR Ment. Health 7(12), e23776 (2020) 64. N. Masaeli, H. Farhadi, Prevalence of internet-based addictive behaviors during covid-19 pandemic: a systematic review. J. Addict. Dis. 39 (2021). https://doi.org/10.1080/10550887. 2021.1895962 65. A. McMahon, A. Buyx, B. Prainsack, Big data governance needs more collective responsibility: the role of harm mitigation in the governance of data use in medicine and beyond. Med. Law Rev. 28(1), 155–182 (2019). https://doi.org/10.1093/medlaw/fwz016 66. A. McMahon, A. Buyx, B. Prainsack, Big data governance needs more collective responsibility: the role of harm mitigation in the governance of data use in medicine and beyond. Med. Law Rev. 28(1), 155–182 (2020) 67. A. Merendino, S. Dibb, M. Meadows, L. Quinn, D. Wilson, L. Simkin, A. Canhoto, Big data, big decisions: the impact of big data on board level decision-making. J. Bus. Res. 93 (2018). https://doi.org/10.1016/j.jbusres.2018.08.029 68. D. Miconi, Z.Y. Li, R.L. Frounfelker, V. Venkatesh, C. Rousseau, Socio-cultural correlates of self-reported experiences of discrimination related to covid-19 in a culturally diverse sample of Canadian adults. Int. J. Intercult. Relat. 81, 176–192 (2021) 69. A. Montazerolghaem, M.H. Yaghmaee, A. Leon-Garcia, Green cloud multimedia networking: NFV/SDN based energy-efficient resource allocation. IEEE Trans. Green Commun. Netw. 4(3), 873–889 (2020)

Securing Privacy During a World Health Emergency: Exploring How to Create. . .

165

70. K. Mueller, E. Papenhausen, Using demographic pattern analysis to predict covid-19 fatalities on the us county level. Digit. Gov.: Res. Pract. 2 (2021). https://doi.org/10.1145/3430196 71. S.P. Mulligan, C.D. Linebaugh, W.C. Freeman, Data protection and privacy law: an introduction (2019). https://sgp.fas.org/crs/misc/IF11207.pdf. Last Accessed 9 May 2019 72. P.J. Nickel, The ethics of uncertainty for data subjects (2019). https://doi.org/10.1007/978-3030-04363-6_4 73. NIH, People from racial, ethnic, and other groups report frequent covid-19-related discrimination (Feb 2022). https://www.nih.gov/news-events/news-releases/people-racial-ethnic-othergroups-report-frequent-covid-19-related-discrimination. Last Accessed 15 Dec 2022 74. OECD, Ensuring data privacy as we battle covid-19. (2020). https://www.oecd.org/ coronavirus/policy-responses/ensuring-data-privacy-as-we-battle-covid-19-36c2f31e/. Last Accessed 15 Dec 2022 75. A. Oussous, F.Z. Benjelloun, A. Ait Lahcen, S. Belfkih, Big data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. 30(4), 431–448 (2018). https://doi.org/10.1016/j.jksuci. 2017.06.001,https://www.sciencedirect.com/science/article/pii/S1319157817300034 76. F.K. Parast, C. Sindhav, S. Nikam, H.I. Yekta, K.B. Kent, S. Hakak, Cloud computing security: a survey of service-based models. Comput. Secur. 114, 102580 (2022) 77. H. Park, J. Eun, J. Lee, Why do smartphone users hesitate to delete unused apps?, in Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (2018), pp. 174–181 78. M.J. Parker, C. Fraser, L. Abeler-Dörner, D. Bonsall, Ethics of instantaneous contact tracing using mobile phone apps in the control of the covid-19 pandemic. J. Med. Ethics 46(7), 427– 431 (2020) 79. R.Z. Paul Mozur, A. Krolik, In coronavirus fight, China gives citizens a color code, with red flags (2022). https://www.nytimes.com/2020/03/01/business/china-coronavirus-surveillance. html. Last Accessed 14 June 2022 80. T. Petrocelli, Data Protection and Information Lifecycle Management (Prentice Hall PTR, Hoboken, 2005) 81. PRIV.GC.CA, Pipeda legislation and related regulations (2018). https://www.priv.gc.ca/en/ privacy-topics/privacy-laws-in-canada/the-personal-information-protection-and-electronicdocuments-act-pipeda/r_o_p/. Last Accessed 15 Dec 2022 82. PRIV.GC.CA, The privacy act legislation and regulations (2019). https://www.priv.gc.ca/en/ privacy-topics/privacy-laws-in-canada/the-privacy-act/r_o_a/. Last Accessed 15 Dec 2022 83. P. Quinn, The Difficulty of Defining Sensitive Data–The Concept of Sensitive Data in the EU Data Protection Framework, vol. 22 (Cambridge University Press, Cambridge, 2021) 84. E. Raguseo, Big data technologies: an empirical investigation on their adoption, benefits and risks for companies. Int. J. Inf. Manage. 38 (2018). https://doi.org/10.1016/j.ijinfomgt.2017. 07.008 85. P.D. Reuven Fenton, B. Golding, NYC lawyer with coronavirus in ‘severe’ condition: Health department (2020). https://nypost.com/2020/03/03/nyc-lawyer-with-coronavirus-in-severecondition-health-department/. Last Accessed 15 Dec 2022 86. L. Rocher, J.M. Hendrickx, Y.A. De Montjoye, Estimating the success of re-identifications in incomplete datasets using generative models. Nat. Commun. 10(1), 1–9 (2019) 87. S. Room, Data Protection and Compliance: Second Edition. G - Reference, Information and Interdisciplinary Subjects Series, BCS Learning & Development Limited (2021). https:// books.google.com/books?id=DYUuzQEACAAJ 88. B.A. Satterfield, O. Dikilitas, I.J. Kullo, Leveraging the electronic health record to address the covid-19 pandemic (2021). https://doi.org/10.1016/j.mayocp.2021.04.008 89. Science, Cellphone tracking could help stem the spread of coronavirus. Is privacy the price? (2020). https://www.science.org/content/article/cellphone-tracking-could-help-stem-spreadcoronavirus-privacy-price?cookieSet=1. Last Accessed 15 Dec 2022 90. M. Scott, Z. Wanat, Poland’s coronavirus app offers playbook for other governments (2020). https://www.politico.eu/article/poland-coronavirus-app-offers-playbook-forother-governments/. Last Accessed 15 Dec 2022

166

S. Yu and F. Carroll

91. C. Shachar, J. Engel, G. Elwyn, Implications for telehealth in a postpandemic future: regulatory and privacy issues. J. Am. Med. Assoc. 323(23), 2375–2376 (2020) 92. C. Si, People who have recovered from covid should not be discriminated (2022). https:// www.chinadaily.com.cn/a/202208/02/WS62e94817a310fd2b29e6fde1.html. Last Accessed 15 Dec 2022 93. M. Simeone, The smart city as a library. Portal 20 (2020). https://doi.org/10.1353/pla.2020. 0011 94. N. Singer, C. Sang-Hun, As coronavirus surveillance escalates, personal privacy plummets (2020). https://www.nytimes.com/2020/03/23/technology/coronavirus-surveillancetracking-privacy.html. Last Accessed 15 Dec 2022 95. M. Singh, Home working is creating dangers, new business for cybersecurity (2021). https://www.bloomberg.com/news/articles/2021-10-06/work-from-home-cybersecurityrisks-create-these-new-dangers-opportunities. Last Accessed 15 Dec 2022 96. K. Srinavin, W. Kusonkhum, B. Chonpitakwong, T. Chaitongrat, N. Leungbootnak, P. Charnwasununth, Readiness of applying big data technology for construction management in Thai public sector. J. Adv. Inf. Technol. 12 (2021). https://doi.org/10.12720/jait.12.1.1-5 97. R. Staff, Chinese officials punished for changing health codes of bank depositors - state media (2022). https://www.reuters.com/article/china-banks-henan-idINL4N2YA03D. Last Accessed 15 Dec 2022 98. F. Stockman, What it’s like to come home to the stigma of coronavirus (2020). https://www. nytimes.com/2020/03/04/us/stigma-coronavirus.html. Last Accessed 15 Dec 2022 99. P.D. Strassle, A.L. Stewart, S.M. Quintero, J. Bonilla, A. Alhomsi, V. Santana-Ufret, A.I. Maldonado, A.T. Forde, A.M. Nápoles, Covid-19-related discrimination among racial/ethnic minorities and other marginalized communities in the United States. Am. J. Public Health 112(3), 453–466 (2022) 100. Z. Su, A. Cheshmehzangi, D. McDonnell, B.L. Bentley, C.P. Da Veiga, Y.T. Xiang, Facial recognition law in China. J. Med. Ethics 48, 1058–1059 (2022) 101. L. Sun, H. Zhang, C. Fang, Data security governance in the era of big data: status, challenges, and prospects. Data Sci. Manage. 2, 41–44 (2021). https://doi.org/10.1016/j.dsm.2021.06. 001,https://www.sciencedirect.com/science/article/pii/S2666764921000163 102. Thales, Beyond GDPR: employing ai to make personal data useful to consumers (2020). https://doi.org/10.7551/mitpress/12439.003.0014 103. S.A. Thompson, C. Warzel, How to track president trump (2019). https://www.nytimes.com/ interactive/2019/12/20/opinion/location-data-national-security.html. Last Accessed 15 Dec 2022 104. S.F. Tsao, H. Chen, T. Tisseverasinghe, Y. Yang, L. Li, Z.A. Butt, What social media told us in the time of covid-19: a scoping review (2021). https://doi.org/10.1016/S25897500(20)30315-0 105. UN, As internet user numbers swell due to pandemic, UN Forum discusses measures to improve safety of cyberspace - United Nations sustainable development. (2021). https://www.un.org/en/desa/internet-user-numbers-swell-due-pandemic-un-forumdiscusses-measures-improve-safety-cyberspace. Last Accessed 15 Dec 2022 106. J. Valentino-DeVries, Google’s Sensorvault is a boon for law enforcement. This is how it works (2019). https://www.nytimes.com/2019/04/13/technology/google-sensorvaultlocation-tracking.html. Last Accessed 15 Dec 2022 107. J. Valentino-DeVries, How your phone is used to track you, and what you can do about it (2020). https://www.nytimes.com/2020/08/19/technology/smartphone-location-trackingopt-out.html. Last Accessed 15 Dec 2022 108. M. Veale, R. Binns, J. Ausloos, When data protection by design and data subject rights clash. Int. Data Privacy Law 8 (2018). https://doi.org/10.1093/idpl/ipy002 109. C. Véliz, Not the doctor’s business: privacy, personal responsibility and data rights in medical settings. Bioethics 34 (2020). https://doi.org/10.1111/bioe.12711

Securing Privacy During a World Health Emergency: Exploring How to Create. . .

167

110. P. Voigt, A. Von dem Bussche, The EU General Data Protection Regulation (GDPR). A Practical Guide (1st edn.) (Springer International Publishing, Cham, 2017). 10(3152676), 10-5555 111. S. Wachter, Data Protection in the Age of Big Data, vol. 2 (Nature Publishing Group, Berlin, 2019) 112. S. Wachter, B. Mittelstadt, A right to reasonable inferences: re-thinking data protection law in the age of big data and AI. Colum. Bus. L. Rev. (2019), p. 494 113. C. Watson, Rise of the preprint: how rapid data sharing during covid-19 has changed science forever (2022). https://doi.org/10.1038/s41591-021-01654-6 114. WHO, Ethical considerations to guide the use of digital proximity tracking technologies for covid-19 contact tracing (2020). https://www.who.int/publications/i/item/WHO-2019-nCoVEthics_Contact_tracing_apps-2020.1. Last Accessed 15 Dec 2022 115. WHO, Indicator framework for the evaluation of the public health effectiveness of digital proximity tracing solutions (2021). https://www.who.int/publications/i/item/9789240028357. Last Accessed 15 Dec 2022 116. K.C. Wong, The making of the USA patriot act II: public sentiments, legislative climate, political gamesmanship, media patriotism. Int. J. Sociol. Law 34(2), 105–140 (2006) 117. T. Wong, Henan: China covid app restricts residents after banking protests (2022). https:// www.bbc.com/news/world-asia-china-61793149. Last Accessed 14 June 2022 118. C. Yin, J. Xi, R. Sun, J. Wang, Location privacy protection based on differential privacy strategy for big data in industrial internet of things. IEEE Trans. Ind. Inf. 14(8), 3628–3636 (2018). https://doi.org/10.1109/TII.2017.2773646 119. X. Zhang, Y. Wang, H. Lyu, Y. Zhang, Y. Liu, J. Luo, The influence of covid-19 on the well-being of people: big data methods for capturing the well-being of working adults and protective factors nationwide. Front. Psychol. 12 (2021). https://doi.org/10.3389/fpsyg.2021. 681091 120. S. Zheng, Hackers claim theft of police info in China’s largest data leak. https://www. bloomberg.com/news/articles/2022-07-04/hackers-claim-theft-of-police-info-in-china-slargest-data-leak?leadSource=uverify%20wall. Last Accessed 14 Dec 2022 121. A. Zwitter, O.J. Gstrein, Big data, privacy and covid-19 –learning from humanitarian expertise in data protection. J. Int. Humanit. Action 5 (2020). https://doi.org/10.1186/s41018-02000072-6

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning Jonathan White and Phil Legg

1 Introduction Machine learning is now seen as a key component within many data processing applications that are embedded in our daily lives, from applications such as facial recognition for accessing our devices and authenticating our actions, through to recommendation systems that predict purchasing habits, or that suggest what we may wish to pursue for leisure activities. Forecasting tools rely on machine learning techniques for predicting stock market, weather, crime rates, and many other aspects of our society. Image-based systems gather data about vehicle driving habits and behaviours. Even our health care systems use machine learning to build predictive models for successful surgery, or for assisting the detection of disease in patients. The twenty-first century is very much the information age with data at the heart of it, that underpins and enables many of the activities and actions that influence our societal interactions and our daily lives. IoT and smart devices mean that data generation and collection can be embedded in all aspects of society, from banking to travel, and from entertainment to personal security, but there remain fundamental questions on where this data is stored, where should this data be stored, and how should this data be used? Smartphones and IoT devices have become ubiquitous—much like the field of ubiquitous computing once predicted they would. We voluntarily carry devices that capture our location using GPS, that have high quality camera and microphone systems, and that are continuously connected to the Internet via high-speed 5G connectivity. These devices may also know our banking and financial details, our health care records,

J. White () · P. Legg University of the West of England, Bristol, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Hewage et al. (eds.), Data Protection in a Post-Pandemic Society, https://doi.org/10.1007/978-3-031-34006-2_6

169

170

J. White and P. Legg

our personal connections and conversations, and many more aspects about ourselves than we may even imagine. As the results of machine learning often improves with a larger dataset, sharing local training data with a centralised learning model has been demonstrated to improve performance [55]. The sharing of data with a central location is not without problems. Datasets can contain confidential and personally identifiable information, and with society being increasingly aware of privacy issues, along with the introduction of legal data privacy frameworks entering as law such as the European Union’s General Data Protection Regulation (GDPR) [2] and the California Consumer Privacy Act (CCPA) (California Consumer Privacy Act (CCPA), 2018) [1], data aggregation is less feasible, and it has become necessary to find ways to build a common shareable learning model without the need of data sharing with a centralised server. In this chapter, we explore the advancements of federated learning with a focus on cyber security and data privacy. We provide a case study for applying federated learning to cyber security analytics of a distributed intrusion detection system. We also examine the challenges associated with federated learning and distributed data privacy. This chapter contributes a comprehensive study on the topic, both in terms of the prior literature and an applied case study, and provides a road map for possible future research challenges.

2 Understanding Federated Learning Traditional classification-based machine learning models are developed through the use of training data. This data must be readily-available and centralised, such that batches of data can be iteratively used to refine the model parameters and minimise the loss function between predicted classes and their actual classes. This requirement to have a centralised pool of training data is the key distinction between traditional machine learning and federated learning, since the security of data confidentiality when transferring, processing and storing the data centrally cannot be assured by the end user. With federated learning, first introduced by Google AI researchers in 2016 [41], this takes a distributed approach for refining the model parameters without the need for transmitting and gathering individual user data together in a centralised manner. This is based on a distributed clientserver model where multiple clients communicate with a centralised server. A global model is stored on a central server (typically a cloud provider), whilst each client serves as an edge device that trains its own localised model on its own data. These ‘edge’ devices could be anything from mobile phones, IoT devices, or full compute systems, however the data remains within the bounds of the edge device, keeping this local and private from other users. Data remains at the point of creation, a localised machine learning model is trained, and then only the localised model parameters are transmitted across the network. These parameters are then used to develop a global model, typically through averaging techniques, which can then be

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

171

shared back to each local system. In this way, no personal data ever leaves the point where it was generated, therefore meaning that there is no centralised collection of raw data, nor is personal data transmitted over the network. The original use-case was on the basis of predictive text messaging. Predictive text has been around for a number of years, with early T9 predictive text using a dictionary-based lookup [64]. However, this was limited to initially providing the same dictionary to each user and improvements to the predictions for a given user were per device. How can one improve the quality of predictive text, by harnessing the collective intelligence of all users, without exposing their individual information between users? That was the driving force behind federated learning and Google applied this to the GBoard predictive keyboard used on Android smartphones. In the prototypical federated learning setting, during each iteration of training the global model is pushed to a subset of nodes. Each of these nodes performs a few epochs of stochastic gradient descent (SGD) to fine tune the user’s model. Each fine-tuned model update is sent back to the central server, which aggregates the updates and combines them into a new global model derived from the collection of local models (Fig. 1) [4, 33]. In the case of the Google GBoard applications, the user’s smartphone device is used to fine tune the local model based on the typing behaviour of the device owner. The local models are occasionally pushed back to a central aggregation server to update the global word prediction model, and this improved model is subsequently pushed back out to end users. The benefits of such a system are that global user behaviour is integrated with the specific data from the local model. In this way, the model can be trained on a potentially wider overall distribution of real-world data without exposing the private data of users, thus potentially solving privacy concerns. Contributions from each local user will often differ, and this management of the local updates is done through the devicespecific properties. For example, a phone may retrain a model whilst it is plugged in and on charge overnight. It could also provide an updated local model, or retrieve an updated global model during this time. In this way, the model updates also do not impact on the daily usage of the phone from the end-user perspective. The advantages of such a system are that researchers can train models using private and sensitive data without any concern regarding directly processing the data. The data strictly remains on the device and only learned model updates are transferred between the central server and the data owners. Therefore, such systems are compliant with data protection regulations such as GDPR. Communication costs of the federated learning approach can be significantly smaller than those used for traditional machine learning as the model update parameters are significantly smaller than the size of the raw data used to train the model, but federated networks can potentially comprise of a massive number of devices, for example millions of smartphones, and communication can become a critical bottleneck in federated networks [10]. It is therefore important to develop efficient communication methods that reduce the number of communication rounds and minimise the transmitted message size. Some other disadvantages of federated learning systems are that the cost for implementing FL is higher than collecting the information and processing it

172

J. White and P. Legg

Fig. 1 Federated learning

Fig. 2 Horizontal federated learning

centrally, especially during the initial iterative phases of training. It is also necessary for the devices that own the data to perform the computation for training and these devices may have limited computation capacity. Therefore, it may not be possible or economic to conduct local training, or it may lead to fairness issues when devices drop out of a given iteration due to connectivity or energy constraints [10]. Therefore federated learning methods must be robust enough to handle dropped devices during communication/training, anticipate low or partial participation in a given iteration, and tolerate heterogeneous hardware. The distribution characteristics of data, defining the common and differentiating factors across heterogeneous data and clients participating in FL can be broadly characterised into three groups: Horizontal FL, Vertical FL and Transfer Learning. Horizontal Federated Learning shares the same feature space, but across multiple samples [66]. Figure 2 shows an example where each entry is associated to a specific person. A classic example of horizontal federated learning is the Google keyboard app (Gboard) that predicts the next possible words when a user is inputting text. The

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

173

Fig. 3 Vertical federated learning

training data for this model is potentially privacy sensitive, but the set of features used to derive this would be common across all clients. Vertical Federated Learning, or feature-based federated learning is applicable when the datasets share the same sample ID space but differ in the feature space [22]. Figure 3 shows an example for two clients: Client A is a movie reseller that captures data regarding a customer’s movie purchases, and Client B is a movie review site that has information regarding the customer’s movie reviews. Both capture different features but they share a common sample set of the customer. Data from the two different domains can be utilised to better serve the customer to provide better movie recommendations based on the customer’s review history. Federated Transfer Learning extends traditional ML transfer learning techniques which applies when two data sets differ in both the samples and feature space, and yet a model can be trained on one dataset and still applied to the other. It allows a model that has been trained on a similar dataset for solving a completely different problem, to train a new model with a new requirement. Training on a pre-trained model yields better results in comparison to training with a fresh model [36]. The FL approach has to deal with smaller and potentially biased NonIndependently-Identically-Distributed (non-IID) datasets, which is challenging as it affects how long convergence may take due to clients with different distributions providing opposing gradient updates [32]. Research has tried to combat the degradation in performance by augmenting the local data for classes which the

174

J. White and P. Legg

client is missing, yielding an IID dataset [26]. Non-IID data can also be dealt with by having personalised models. Fallah, Mokhtari, and Ozdaglar [14] train a global model that can be personalised to each of the clients with a few steps of gradient descent. Rather than training one model across all clients, or a model for each of the clients, they employ a hybrid approach and train a model for each cluster of devices. These clusters can be determined based on the topology of the network. E.g., models can be based on region, such as UK vs USA.

Topology There are two typical federated learning architectures, Centralised and Decentralised (Peer-to-Peer). With centralised FL the clients must implicitly trust the central server that manages clients, receives the local model updates, and updates and distributes the global model. This places a single point of failure in the system if it were to suffer from a malicious attack or system failure. Moreover, if there were a high number of participating clients, the central server must be able to handle a potentially large amount of communication. As such, decentralised FL has recently emerged as a method to reduce the communication overheads of the busiest node. In this topology, no global model exists and, as shown in Fig. 4b, clients exchange model updates to reach a consensus model.

Fig. 4 Federated learning workflows. (a) Centralised federated learning. (b) Decentralised federated learning

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

175

3 Data Privacy and Cyber Security in Federated Learning Having introduced the concept of federated learning and how this can be utilised, here we now explore federated learning in the context of cyber security. We look at this from two perspectives: the security and privacy challenges of using Federated Learning, and how federated learning can further progress cyber security data analytics.

Privacy Challenges and Federated Learning Threat Models Although federated learning prevents private data from being shared with others, recent works have demonstrated that FL may not always provide sufficient privacy guarantees and may be vulnerable to other attacks [11, 20, 39, 46]. There are several privacy attacks that allow the local model to be reconstructed or infer user data [40]. This section discusses several privacy attacks and possible mitigations for them. Inference Membership Attacks allow an adversary to infer if a particular data point has already been seen and used to train the global model. FL expands the range of attacks so that we cannot just determine if the data has been previously seen, but also infer if the given sample belongs to the private training data of a specific participant [43, 61]. These attacks can either be active or passive. In a passive attack, the adversary observes the global model updates and makes inference without modifying the learning process. With active attacks the adversary participates in the learning process. One such attack exploits the stochastic gradient descent algorithm to observe if the gradient loss is reduced in subsequent rounds. If so, it is very likely that the sample was in the training set [48]. Reconstruction Attacks have shown the possibility of reconstructing local data by inversion of the model gradients sent from the client to the server. Geiping et al. [19] show high-res reconstruction of both single and multiple images from the knowledge of the parameter gradients, effectively demonstrating a loss of privacy, even for trained deep networks. Other studies have demonstrated, without actively interfering in the training process, and only from passively observing, the reconstruction of the local client model [65]. This enables other attacks such as membership or attribute inference attacks. As discussed in the topology section, with a centralised federated learning architecture, clients must implicitly trust the central server. If the server is compromised or accessed by malicious actors, all local models sent by the clients can be intercepted and analysed allowing for reconstruction of the original client data. To mitigate these attacks, several studies have proposed defensive strategies. Differential privacy involves adding carefully selected noise to the outputs [3]. This can be added at the individual client the server level, or a hybrid approach. Ideally noise should be added at the client level so that the server would never see the raw data, however, in practice, adding noise at the client is impractical as the clients have

176

J. White and P. Legg

comparatively little data. Adding noise at the server level to a large set of updates does not impact the accuracy as much. These approaches often provide privacy at the cost of reduced model performance or system efficiency [42] and do not protect data privacy against a malicious server. Truex et al. [58] presented an alternative hybrid approach that combined differential privacy and Secure Multiparty Computation (SMC) to prevent differential attacks, but this work did not consider attacks on the data by hidden adversaries during the parameter upload stage. Secure aggregation is a lossless cryptographic technique that ensures that the server can only see the aggregate of thousands of updates rather than the individual model updates whilst retaining the original accuracy. However, the resultant method incurs significant extra communication cost [9]. Additional techniques such as randomising the device check-in, or shuffling the model updates sent to the server can further increase privacy. Moving to a decentralised or hierarchical FL architecture can help protect against a compromised or malicious central server. As well as considering privacy attacks in federated learning, there are other threat models that attack the CIA triad of Confidentiality, Integrity and Availability in the FL domain. Poisoning Attacks The objective of a poisoning attack can either be a random (untargeted) attack that aims to reduce the overall accuracy of the FL model through misclassification, or a targeted attack whose objective is to cause the FL model to output a specific target label for a given, and incorrect, input [8]. The clients can either poison the data, where the adversarial nodes manipulate their local datasets so that the local models acquire undesirable properties, or the local model, where the adversary replaces the local model with one that exhibits a certain behaviour. Model poisoning attacks look less natural but are shown to be more effective in [6] and [7]. Backdoor Attacks Malicious clients may act as honest clients and influence the training model performance by sending erroneous data. There has been some dispute over the past two years regarding backdoor attacks. Bagdasaryan et al. [6] presented a paper entitled How to Backdoor Federated Learning, where they established that it was possible to hard-wire backdoors in federated learning models to ensure the global model mis-predicts on specific subtasks (e.g., classify green cars as frogs). Follow up work by Google [56] questions the effectiveness of these attacks. They state that simple defence mechanisms such as norm clipping of the models, or additive differential privacy type of noise can significantly limit their effectiveness. This claim is disputed by Wang et al. [62] in the paper Attack of the Tails: Yes, You Really Can Backdoor Federated Learning. They prove that edge case backdoors can be easily built, causing misclassification by the predictive global model, ranging from image classification to next-word prediction. They also demonstrate that these backdoors can further bypass state-of-the-art defence mechanisms proposed in current literature. The authors demonstrate new security mechanisms that attempt to improve the security and robustness of FL systems, but these result in unfair

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

177

treatment of the clients. It can filter out data from users that have diverse data compared to the average user [27]. This leads to a trade-off of fairness with regards to robustness. The works cast serious doubts on the feasibility of a both robust and fair prediction by FL systems in the presence of edge case data.

Federated Learning for Cyber Security Security research for federated learning often focuses on the vulnerability of federated learning itself, such as attacks on privacy, or on model and data poisoning. Federated learning applications have been proposed for areas such as facialrecognition, autonomous vehicles, medical diagnosis, and preventing financial fraud. More recently, research has turned to addressing how federated learning can be applied to the cyber security domain. Cyber security protects the properties of: Confidentiality: the property that information is not made available or disclosed to unauthorised individuals, entities or processes; Integrity: the property of safeguarding the accuracy and completeness of assets, and Availability: the property of being accessible and usable upon demand by an authorised entity. Sharing knowledge of various cyber attacks, including spoofing, intrusion detection, anomaly detection, and Denial-of-Service/Distributed-Denial-of-Service (DoS/DDoS), acquired in a timely manner, and widely disseminated, makes it easier to develop and improve cyber defence models and methods. Therefore, FL has enormous potential to successfully safeguard cyberspace at both the device and network levels. Traditional machine learning classifiers have been adopted by many researchers to build Intrusion Detection Systems (IDS) by identifying patterns in network traffic and associate this to malicious users or intruders in the network, or to identify anomalies. IDS can build a strong platform that can identify known attacks using signature-based detection, but they struggle to identify novel attacks if the signatures do not match. Novel attacks can be identified with anomaly detection systems, but these often have high false-positive detection rates. Numerous machine learning (ML) techniques have been developed to identify and stop intrusions and anomalies as a result of the unprecedented growth of heterogeneous Internet of Things (IoT) and Industrial IoT (IIoT) devices. The feature of federated learning where the edge device executes, trains and improves the local model, shares update data with others, whilst preserving the privacy of, and removing the need to share all of the network data seen, makes this an ideal area for applying FL. Recent works on applying federated learning for IDS and anomaly detection are discussed here. Rahman et al. [51] proposed an IDS for an IoT environment using the NSL-KDD dataset. This dataset analyses a small IoT network environment and tries to solve some of the inherent problems of the ageing KDD’99 dataset that has been widely used by many researchers. They perform some limited testing with distributing the data across four nodes and performing five rounds of testing. They also partition the data with both equal and random data distribution of attack types. They achieve accuracy close to that of the centralised learning approach and end devices were able

178

J. White and P. Legg

to detect attacks that were not present in their local dataset. The advantage of such a system is that in a real-world application, an IDS would be able to detect attacks that were not present in their local dataset. The downside of proposed method is that only accuracy measures were examined, and the sample size was limited. Other performance indicators were not taken into account. Liu et al. [37] incorporated an attention mechanism-based convolutional neural network-long short-term memory to detect anomalies in time-series data for an IIoT application. The attention-based mechanism captures fine-grained features, preventing memory loss and gradient dispersion problems. Additionally the authors proposed a gradient compression mechanism to improve communication efficiency, and a large number of experiments on real datasets showed an effective communication overhead reduction of 50% when compared to conventional FL frameworks. Various other studies have also explored federated learning for intrusion or anomaly detection for Internet of Things (IoT) [16, 17, 20, 28, 47, 49, 50], Industrial IoT (IIoT) devices [24, 29, 34, 57, 63] or generalised IDS [5]. A novel application of federated learning for cyber security was shown for Security Operations Centre (SOC) collaboration for malicious URL detection [30]. As sharing raw data from different customers is problematic due to it containing confidential personally identifiable information (PII) information, smaller organisations and clients can find themselves at a disadvantage as their machinelearned threat detection model will perform worse than the same model in a larger organisation due to volume of data seen. The authors distinguished between URLs leading to phishing sites, defacement sites, malware-spreading, and spam sites from those that were benign. A dataset containing more than 700K malicious and benign URLs was collected. Analysing the performance when the data was split with varying sample sizes, a 30% improvement in accuracy with collaborative learning was seen for small agents. However, large-agent performance was slightly worse (0.5%). Another application of FL to the cyber security space by Hahanov et al. [23] is Malware detection, where the authors employed a federated ML-architecture of sandbox computing to demonstrate reduced communication overheads by processing suspicious samples, locally whilst improving the global performance of the aggregate cloud model. Other works have also applied FL to the subject of malware detection such as [25, 35, 52].

4 Case Study: Federated Learning for Intrusion Detection Systems To further motivate the discussion in this chapter, we explore a case study of applying federated learning in the context of intrusion detection systems. The CICIDS2017 dataset, originally developed by the Canadian Institute of Cybersecurity at the University of New Brunswick, has become a popular benchmarking

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

179

dataset for many previous researchers, and so makes for a suitable dataset to use. To the best of our knowledge, this dataset has not previously been used for exploring federated learning models and so we consider how this could be structured as a federated problem. We first conduct an investigation into the dataset, including a reflection on recent works that identify some shortcomings with the original dataset as proposed. We deploy a traditional machine learning framework to establish performance metrics within what may be considered as traditional usage. We then deploy a Federated approach, and trial various distributions of the data that would reflect how data collection could be managed in practice. We establish how a model could then be trained within the federated learning paradigm and how this compares with a traditional centralised approach. We use a typical Python-based implementation for our assessment, using the recent Flower library1 for developing our federated learning approach. Flower is an intuitive and flexible library for federated learning that supports many major existing ML platforms including scikit-learn, Tensorflow and PyTorch. For our experimentation, we use scikit-learn for our traditional machine learning processing, and we use Tensorflow deployment for our Federated Learning model.

The CICIDS2017 dataset The CICIDS 2017 dataset consists of 2,830,743 labelled instances that relate to observed packet captures across a synthetic corporate network, as presented by Sharafaldin et al. [53]. The data consists of 15 distinct classes, where 14 different attack types are represented as well as a class that represents benign network activity. Naturally, the dataset exhibits strong class imbalance towards benign activity as would be typical. The dataset has proven popular with many researchers in recent years, however Engelen et al. [13] recently examined the correctness of the dataset to identify errors in how the network flows and features have been derived. The CICFlowMeter tool that is used for converting PCAP data to flow communications was found to exhibit bugs that caused the feature sets to be derived incorrectly. With this knowledge, we therefore use the revised CICIDS2017 feature data made available by Engelen et al. from their website. Table 1 shows the number of instances associated with each class in both the original and the corrected datasets. The dataset consists of 84 columns that denote the features about each communication flow from the original packet capture data. A communication flow is described as a conversation between a source IP and a destination IP, such as a complete TCP transaction stream, from when the stream is opened to when the stream is closed. This provides a manageable way of handling large packet capture data, and also to assign a label about the class of each flow (i.e., was the flow

1 https://flower.dev/.

180

J. White and P. Legg

Table 1 CICIDS 2017 dataset, showing number of instances per class. For the corrected dataset, some classes have been split to successful and attempted attacks. The number in brackets shows the samples labelled as attempted rather than successful Class BENIGN DoS Hulk PortScan DDoS DoS GoldenEye FTP-Patator SSH-Patator DoS slowloris DoS Slowhttptest Bot Web Attack—Brute Force Web Attack—XSS Infiltration Web Attack—Sql Injection Heartbleed Total

Original dataset 2,273,097 231,073 158,930 128,027 10,293 7938 5897 5796 5499 1966 1507 652 36 21 11 2,830,743

Corrected dataset 1,657,693 158,469 (593) 159,151 95,123 7567 (80) 3973 (11) 2980 (8) 4001 (1731) 1742 738 (1470) 151 27 (652) 32 (16) 12 11 2,100,814

benign, or was there an attack of some kind present). We remove five columns in total: flow ID, source IP, destination IP, timestamp, and class label. This resulted in 79 statistical features that characterise the flow, derived from attributes such as flow duration, total forward packets, total backward packets, packet lengths, inter-arrival times, flag counters (PSH, URG, RST, CWR, ECE, ACK), packet headers, and packet segment sizes. It also consists of nominal data of source port, destination port, and protocol used. Whilst the original dataset accounted for 14 attack types plus the benign class, the revised dataset provides 25 classes that distinguish between successful and attempted attacks for 8 of the original attack types. These are cases where the data had been labelled as the attack type, however no data had actually been obtained from the target. We prepare the dataset for our experimentation, by dropping all rows with NaN and Inf values, and by decoupling the label column, resulting in 2,827,876 rows by 78 columns. We use the sci-kit learn library to normalize each column vector using the StandardScaler function, and use the LabelEncoder function to prepare the output labels. We then perform a stratified train_test_split function to prepare our train and test cases due to the large class imbalance of benign instances, to ensure that relative class frequencies are approximately preserved in each train and validation fold. This results in a training set of 1,680,016 instances (80%), with a test set of 420,005 (20%). We train three classifiers: DecisionTreeClassifier, MLPClassifier and XGBClassifier, and we report on the weighted precision, recall and f1-scores (macro scores in brackets) (Table 2).

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

181

Table 2 Classification results on the improved CICIDS 2017 dataset using 3 models DecisionTreeClassifier MLPClassifier XGBClassifier

Accuracy 0.9995 0.9943 0.9996

Precision 0.9995 (0.8990) 0.9943 (0.7747) 0.9996 (0.9823)

Recall 0.9995 (0.9234) 0.9943 (0.7831) 0.9996 (0.9395)

F1-Score 0.9995 (0.9058) 0.9943 (0.7753) 0.9996 (0.9534)

Class imbalance is challenging to mitigate in this use case, since any real-world model would naturally find significantly more benign traffic than malicious. A naive classifier that merely predicted all classes to be benign would likely achieve an accuracy score of 80%, given the properties of our test sample, hence the need to also examine the precision, recall and f1 statistics. The runtime across the 3 classifiers as performed using a CPU implementation of scikit-learn was 4 min 59 s, 52 min 54 s, and 1 h 25 min 37 s respectively for the three classification models, making DecisionTreeClassifier a significantly faster approach to employ, whilst XGBClassifier provides the greater macro improvement in terms of precision and recall.

CICIDS 2017: Training a federated model We now consider how the CICIDS 2017 dataset holds as a federated learning problem. As described, the dataset provides details of a variety of attacks observed from packet capture flows across a corporate network. Adopting a centralised approach may not be suitable for a number of reasons, which could include operating over multiple sites, sub-contracting, and multi-national regulation. Furthermore, for a large corporate organisation, it may not be ideal to be sharing such diagnostics information centrally that impacts on the throughput of other organisational services. Finally, with data remaining at the edge of the network, at or near the point of creation, this offers some security benefits where data cannot be intercepted in transit. Khramtsova et al. [30] provide a case of using federated learning across multiple SOCs (Security Operation Centres), in which case, organisations may want to collectively share a model of threat intelligence without disclosing any intellectual property that may be present in their own monitoring and analysis practice. We consider two cases: 10 federated clients and 100 federated clients. For each, we study different means by which the data may be distributed across multiple parties. This can include having clients that only observe a specific or small group of possible attacks, but that want to benefit from the global model that represents all known attacks. The federation may also be split based on subnets of the organisation, or based on temporal aspects of when the observations are made. Our experimentation is to assess whether a federated approach can achieve a classification model that is comparable to that of the centralised learning model

182

J. White and P. Legg

Table 3 Details about the network topology used to model CICIDS2017 Device Firewall DNS + DC Server External: Kali External: Win Internal: Web server 16 Public: Internal: Ubuntu server 12 Public: Internal: Ubuntu 14.4, 32B: Internal: Ubuntu 14.4, 64B: Internal: Ubuntu 16.4, 32B: Internal: Ubuntu 16.4, 64B: Internal: Win 7 Pro, 64B: Internal: Win 8.1, 64B: Internal: Win Vista, 64B: Internal: Win 10, pro 32B: Internal: Win 10, 64B: Internal: macOS:

IP address 205.174.165.80, 172.16.0.1 192.168.10.3 205.174.165.73 205.174.165.69, 70, 71 192.168.10.50, 205.174.165.68 192.168.10.51, 205.174.165.66 192.168.10.19 192.168.10.17 192.168.10.16 192.168.10.12 192.168.10.9 192.168.10.5 192.168.10.8 192.168.10.14 192.168.10.15 192.168.10.25

reported previously. A model of similar performance (97% accuracy or above) would be regarded as highly desirable, given the performance on the classification task, whilst also operating in a distributed, privacy-preserving manner that can help to ensure the confidentiality and integrity of data, given that the original data no longer has to be transmitted across the network. We use the Flower library to modify our previous pipeline to support a federated approach. The CICIDS2017 dataset is provided with a clear topology of the network that details the following information (Table 3). For our study, we can separate the CICIDS2017 dataset across a set of distributed edge clients. Whilst the simplest approach would be to randomly distribute data across each client node, in practice one may question why this would be appropriate. We therefore take a more structured approach for this distribution, based on the known topology of the network infrastructure. We also compare this to a stratified sampling approach, whereby each client is provided with a subset of the original training data that is representative of all classes. 1. Each client holds a stratified subset of the original training data (we demonstrate this across 10 clients). 2. Each client holds data only related to their IP address 3. Each client holds data related to a specific Operating System group (Ubuntu, Windows, macOS) 4. Each client holds data only related to an individual attack class 5. Each client holds data related to a specific attack class (DoS, Portscan, Patator, Web attack, Other)

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

183

Finally, we consider temporal aspects of federated learning. Whilst we could in theory include this as part of our study to examine the performance of FL over time, due to how the attacks are distributed across this dataset it would not provide a suitable or reliable means of testing this. Nevertheless, we include this as an improve aspect for how the model improves over time in an FL context, and also what the privacy implications are of temporal models. The FL model uses a fully connected neural network model developed in Tensorflow. We use a simple model for the purposes of our demonstration, with an input of 79 features that are fed to a dense layer with 256 nodes, a dense layer with 128 nodes, and a final dense layer with 25 nodes that denotes our expected classes. Sigmoid activations are performed at each layer of the network. For our FL experimentation, we do not conduct experimentation to modify the model parameters or architecture further as this is not the core focus of this research. All testing was performed on a single system with Intel Core i7-8665U CPU @ 1.90 GHz and 32 GB RAM, using a Flower simulation process to model the distributed client connections.

Results We present five different strategies for curating a federated learning approach using the CICIDS2017 dataset, and provide accuracy results for the centralised server model under these conditions. We describe the rationale of the data distributions across the clients, and relate this to how a distributed Security Operations Centre (SOC) could function if tasked with the monitoring of multiple end-users where privacy should be maintained between end-users. For our testing we use a simple Sequential Keras model that consists of 3 layers of densely-connected nodes, with 256, 128 and 25 nodes respectively. The revised CICIDS2017 dataset provides 25 class labels due to differentiating between attempted attacks versus successful attacks. We use the Adam optimizer, with Sparse Categorical Cross entropy as our loss function. We run our Federated Learning simulation over 5 rounds, where for each run we train a local model on all clients, and we choose a random sample of 50% clients to perform evaluation on. All clients then contribute their model weights to the central server, where federated averaging is applied, and a final evaluation is performed on the server after each training round, using the test data partition as used in our earlier machine learning experimentation. The distribution of training data to initialise each client is drawn from the original training data split as per the earlier machine learning experimentation, based on one of the five schemes as outlined below. We report the accuracy results of each training round as determined by the server. All scripts used for the purpose of experimentation are available online at https://github.com/ pa-legg/federated_learning/. Below we outline the five different federated schemes used for experimentation, with accuracy results for each presented in Table 4.

184

J. White and P. Legg

Table 4 Results for five different federated schemes. We report the accuracy score on the server test set for each iteration, for a total of five iterations Stratified sampling Individual IP IP group Individual attack Attack group

0 0.06823 0.0 0.00071 0.0 0.0

1 0.98576 0.98478 0.99126 0.99159 0.99185

2 0.99155 0.99117 0.99235 0.99234 0.99248

3 0.99218 0.99206 0.99247 0.99307 0.99322

4 0.99244 0.99219 0.99222 0.99338 0.99326

5 0.99239 0.99239 0.99310 0.99335 0.99352

– Split by stratified sampling (10 clients): We use the StratifiedKFold function from sci-kit learn to obtain .N = 10 groups based on the original centralised training dataset used previously, that are then distributed to each of the client devices. The central server holds the test dataset that was previously used. Each client provides a model trained on its respective data, and then the model parameters are transmitted centrally and combined using federated averaging. – Split by individual IP: For this test, we split the dataset across 12 clients, where each client receives a subset of the original training data. The data is split based on each individual IP as given in the network topology. Given a client IP address, we create a subset that contains all data instances where that IP occurs as either a source IP or a destination IP. This results in 12 groups of 77,735, 110,885, 106,630, 165,510, 130,653, 114,619, 104,803, 87,468, 89,821, 85,393, 52,044, and 387,245 instances respectively. – Split by IP group: For this test, we split the dataset across 4 distinct groups based on the IP addresses. The network topology provided by CICIDS2017 shows Ubuntu, Windows, and macOS devices as well as two devices that act as web servers. We therefore split based on these four groupings, giving 77,735, 628,297, 419,530, and 387,261 instances in each group respectively. – Split by individual attack: For this test, we split the dataset so each individual attack is observed independently by each separate client. This results in 25 clients, with 1,325,654, 3178, 2384, 9, 6, 3201, 1365, 1394, 2694, 126,775, 463, 6053, 9, 64, 121, 971, 13, 26, 522, 22, 10, 1176, and 590 instances respectively. – Split by Attack group: We split the dataset so that similar attacks are grouped to be observed by the same client. We define the following groups: DDoS/DoS (218,043), Patator (5577), Heartbleed (9), Web Attacks (1646), Bot (1766), PortScan(127,218), Infiltration (39), and benign (1,325,654). Table 5 shows the results per class for two of the schemes outlined previously: the stratified sampling scheme and the group by attack scheme. We choose these two schemes as they exhibited the greatest accuracy difference in the previous experiment, with grouping by attack showing the greatest improvement in accuracy. Whilst a number of classes are able to achieve high accuracy (.>99%) for both schemes, there are other key details to examine from this result.

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

185

Table 5 Results for stratified sampling and for attack group, per attack class. The table shows the number of instances of each class in the test data, as well as the total predicted cases, the number of predicted cases that are correct, and the percentage of correct cases compared with the test data. Rows denoted by .∗ indicate the ‘attempted’ classes from the improved CICIDS2017 dataset Stratified groups Group by attacks Attack y_test Total Correct % Total Correct % BENIGN 331,415 331,424 329,794 99.51 331,158 329,980 99.57 PortScan 31,805 32,510 31,026 97.55 32,280 30,975 97.39 DoS Hulk 31,694 31,698 31,687 99.98 31,694 31,689 99.98 DDoS 19,025 19,017 19,017 99.96 19,018 19,017 99.96 DoS GoldenEye 1514 1512 1502 99.21 1504 1503 99.27 DoS slowloris 800 772 756 94.5 795 792 99.00 FTP-Patator 795 794 791 99.5 797 794 99.87 DoS Slowhttptest .∗ 673 767 671 99.7 766 672 99.85 SSH-Patator 596 590 584 97.99 590 589 98.83 DoS Slowhttptest 348 336 326 93.68 355 344 98.85 DoS slowloris .∗ 341 325 323 94.72 342 340 99.71 Bot .∗ 294 0 0 0 0 0 0 Web Attack—Brute Force .∗ 243 0 0 0 384 241 99.18 Bot 148 141 134 90.54 147 143 96.62 Web Attack—XSS .∗ 130 0 0 0 0 0 0 DoS Hulk .∗ 116 92 91 78.45 142 114 98.28 Web Attack—Brute Force 30 27 19 63.33 29 23 76.67 DoS GoldenEye .∗ 16 0 0 0 0 0 0 Infiltration 6 0 0 0 2 2 33.33 Web Attack—XSS 5 0 0 0 0 0 0 Infiltration .∗ 3 0 0 0 0 0 0 Heartbleed 2 0 0 0 2 2 100 FTP-Patator .∗ 2 0 0 0 0 0 0 SSH-Patator .∗ 2 0 0 0 0 0 0 Web Attack—SQL Injection 2 0 0 0 0 0 0 Total 420,005 420,005 416,721 52.34 420,005 417,220 63.85

Firstly, we can see that there are some classes that do not appear in the predictive labels provided by the model, such as ‘Web Attack—SQL Injection’ and ‘Web Attack—XSS’. This is also true for some of the ‘attempted’ classes that are denoted by the asterisk and are part of the improved CICIDS2017 dataset, such as ‘SSH-Patator .∗’. It is important to note that the dataset does suffer from class imbalance, however this would naturally be inherent in this problem domain, with benign samples significantly outweighing the malicious attack types. The stratified sampling scheme uses 10 clients, yet Table 1 shows that the dataset only contains 8 instances of the attempted SSH-Patator, and 11 instances of the attempted FTPPatator. It is therefore likely that some of the client groups may have very few, or possibly even no instances of these particular attacks. In comparison, when we

186

J. White and P. Legg

group the client data by the attack types, we find that there are some classes that can be classified correctly, such as ‘Web Attack—Brute Force’, ‘Infiltration’, and ‘Heartbleed’, where the percentage of correct labels are 99.18, 33.33, and 100 respectively, a marked improvement from zero in the previous case. Between these two schemes for client data distribution, we observe an overall improvement in the server model’s correct classifications from 416,721 to 417,220, resulting in an additional 499 classifications. The average accuracy across the classes increases from 52.34% to 63.85%.

Discussion We have provided a practical case study of applying federated learning against a traditional IDS dataset. The experimentation shows all models achieved in the region of 99% accuracy for this task, however we have investigated this further to show the performance difference between different schemes for client data distribution. Table 5 shows this investigation for two of the schemes described: stratified sampling, and grouping by attack. Whilst a confusion matrix could be used to examine the performance, this becomes quite difficult given the number of classes, and the class imbalance of the benign class. Generally, where misclassification does occur attacks can be classed as benign. Despite an accuracy score of 99.57% for the benign class, we find 1178 samples that are incorrectly labelled as benign. Having said this, it is important to also recognise that our federated learning approach is trained for merely 5 iterations, and so further training rounds could well improve this. What is also important to note, is that the federated learning approach can achieve accuracy scores of the same order as the centralised approach for this problem domain, with only a minor performance impact compared to the centralised model, whilst individual clients are only made aware of a fraction of the original data. This is a particularly useful result to show, since this shows that we can train partial models based on the observed attacks on one system, and couple this with a partial model that has observed different attacks from another system. Yet both systems can benefit from this collective knowledge, as seen where we distribute the client data based on specific attack groups. One client may therefore have only observed denial of service attacks, whilst another client may have only observed web attacks, and yet both can benefit from a federated model that encapsulates both, without exposing their observation data to the server. We envisage that SOC providers who manage the security across multiple customers could drastically benefit from a distributed federated learning approach for the combined learning of a classification method across all customers, that does not require any customers to transmit their own data to the central provider.

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

187

5 Open Issues and Future Trends Federated learning offers an emerging and innovative framework to facilitate collaborative learning. It is an active and ongoing area of research that has already shown real-world application and benefits. Whilst recent works have begun to address some of the issues and challenges discussed in this chapter, there are a number of open issues and areas for future consideration which we discuss here. Fairness in FL focuses on equalising the performance across different participating devices or silos. Models may perform better on clients that have more data, marginalising the minorities in our distribution. Moreover, the process of federated learning itself introduces biases, either favouring devices that partake more often in federated learning, or favouring faster devices as these are less likely to be stragglers and thus be dropped by a federated learning algorithm. Some approaches have modified the aggregation algorithms from the initial FedAvg model to improve fairness, such as that by Li et al. [31], who proposed q-Fair-FederatedLearning (q-FFL). This tuned the hyper parameters allowing the worst performing clients to dominate the overall loss, so it becomes fairer. Mohri, Sivek and Suresh [44] introduced the learning scenario of agnostic federated learning where they shape the overall loss function to incentivise the model to perform equally on all devices, although this method has only been applied on a small scale. However, in a meritocracy, the devices that participate more in the training process should be rewarded accordingly. Giving better models to those devices would incentivise people to contribute to the federated learning process instead of holding the data for themselves. Zhang et al. [67] look to give the best models to the devices that have attributed the most to the data, and the worst performing models to the least. Alternatively, a collaborative fairness model weights models based on how useful the gradient updates from a given device was in training [38]. This partially addresses another form of attack called the Free-Riding attack where some of the clients participating in the training process are passive, not taking part in the training process and not contributing to the global model. From analysis of the works surrounding improving fairness, and those that address improving robustness of FL, it appears that these tend to be mutually exclusive with few works attempting to address both aspects. It remains an open challenge how we find a suitable balance and compromise between improving the fairness of models and improving the robustness of the model. Security and privacy is a key driver of federated learning, and yet how this is fully realised will continue to be an active area of research. The distributed nature of federated learning raises additional problems that must be considered such as inference attacks, or poisoning of local data or models. Research has proposed solutions to these challenges by adopting techniques such as differential privacy and secure aggregation algorithms [9, 21, 42, 66]. However, these approaches have been shown to be detrimental to the model performance or require additional computational resources, therefore, efficient algorithms that can operate with high accuracy with minimal computational requirements are required.

188

J. White and P. Legg

Hierarchical clustering has recently been proposed as an innovative approach to cluster clients that share similar behaviours. By building a generalised model that incorporates clusters in an equal fashion, learning bias can be reduced [12, 67]. Such approaches could potentially assist with detecting model poisoning by detecting abnormal updates, as poisoned data typically introduces bias. Poisoned data and poisoned models can significantly alter the prediction outcome of the global model. Therefore when such dramatic changes occur, backward traceability mechanisms should be able to efficiently detect which client’s aggregated update introduced the change of prediction to the global model. To address this, further research is required into how explainable AI [45] could be utilised to track changes within the global model whilst also preserving the privacy of the clients, and more broadly, how explainable AI can help better interpret the complexities of federated learning clients in a privacy-preserving fashion. Partially local federated learning is another research direction that is emerging [54], whereby clients may share only a fraction of their knowledge in order to further enhance the security and privacy of model curation. Clients only share a subset of their knowledge, where the goal is to learn and serve a personalised model for different clients, adapting the computational and communications costs to the clients capabilities. Related research by [15] introduces a federated few-shot learning framework that is able to classify unseen data classes with only a few data samples. This was tested with vision and language tasks but this approach has not been tested in the cyber security domain and applied to intrusion detection systems. Few-shot learning systems are likely to increase in popularity to consider how learning can be achieved with less data samples, especially in cases where acquisition of data samples may be challenging. Another important research direction that has recently emerged is proposed by Turina et al., that combines Federated Learning and Split Learning with Deep Learning [59, 60], as a means to reduce training time whilst increasing the privacy of clients. An end-to-end performance analysis of Federated and Split learning on resource constrained IoT devices was performed by [18] using speech command datasets which showed that for non-IID data, the training cost was too expensive. It has not been investigated how this approach would apply to IDS or other cyber security related domains or if these same constraints apply in other domains.

6 Conclusion With increasingly distributed systems, greater reliance on edge computing, smartphones, and IoT, there is a real prospect of shifting where computation is performed. Furthermore, this shift enables us to rethink where data should reside in order to ensure end-user privacy. Federated learning has already been adopted in smartphone devices for this very purpose as we have discussed, and it has since started to emerge in a variety of other domains including cyber security and smart cities.

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

189

There are clear challenges ahead in the landscape of machine learning and cyber security, and how machine learning can be used to enhance cyber security, whilst also ensuring that security and privacy is factored in to machine learning deployment. Federated learning has demonstrated a notable shift in the centralised machine learning paradigm to help work toward this goal. Notably however, there is an increasing expectation of explainability in how machine learning decisions are made, whilst mechanisms such as federated learning are proactively supporting a partial view of information that would go against this notion of full transparency in the data. There is a growing expectation by both developers and consumers of trusted computing, yet federated learning systems arguably hide aspects of the overarching process, which could potentially impact on how human trust and explainability is achieved. The conflict between end-user privacy and the broader corporate or government level of security has long been felt. We have set out a discussion on what a roadmap of future directions for federated learning may look like. It then remains to be seen how we can operate with explainable models of security in a world where end-user privacy-preservation is also required.

References 1. California Consumer Privacy Act (CCPA) (2018). https://oag.ca.gov/privacy/ccpa 2. General Data Protection Regulation (GDPR) (2022). https://gdpr-info.eu/ 3. S. Abdulrahman, H. Tout, H. Ould-Slimane, A. Mourad, C. Talhi, M. Guizani, A survey on federated learning: the journey from centralized to distributed on-site learning and beyond. IEEE Internet Things J. 8(7), 5476–5497 (2021). https://doi.org/10.1109/JIOT.2020.3030072 4. M. Alazab, R.M. Swarna Priya, M. Parimala, P.K.R. Maddikunta, T.R. Gadekallu, Q.V. Pham, Federated learning for cybersecurity: concepts, challenges, and future directions. IEEE Trans. Ind. Inf. 18(5), 3501–3509 (2022). https://doi.org/10.1109/TII.2021.3119038 5. O. Aouedi, K. Piamrat, G. Muller, K. Singh, Fluids: federated learning with semi-supervised approach for intrusion detection system, in 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC) (2022), pp. 523–524. https://doi.org/10.1109/ CCNC49033.2022.9700632 6. E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, V. Shmatikov, How to backdoor federated learning, in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 108. PMLR (26–28 Aug 2020), pp. 2938–2948. https://proceedings.mlr.press/v108/bagdasaryan20a.html 7. A.N. Bhagoji, S. Chakraborty, P. Mittal, S. Calo, Analyzing federated learning through an adversarial lens, in Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97. PMLR (09–15 Jun 2019), pp. 634–643. https://proceedings.mlr.press/v97/bhagoji19a.html 8. A. Blanco-Justicia, J. Domingo-Ferrer, S. Martínez, D. Sánchez, A. Flanagan, K.E. Tan, Achieving security and privacy in federated learning systems: survey, research challenges and future directions. Eng. Appl. Artif. Intell. 106, 104468 (2021). https://doi.org/10.1016/j. engappai.2021.104468 9. K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H.B. McMahan, S. Patel, D. Ramage, A. Segal, K. Seth, Practical secure aggregation for privacy-preserving machine learning, in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. CCS ’17. Association for Computing Machinery, New York (2017), pp. 1175–1191. https://doi.org/10.1145/3133956.3133982

190

J. White and P. Legg

10. K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Koneˇcný, S. Mazzocchi, B. McMahan, T. Van Overveldt, D. Petrou, D. Ramage, J. Roselander, Towards federated learning at scale: system design, in Proceedings of Machine Learning and Systems, vol. 1 (2019), pp. 374–388. https://proceedings.mlsys.org/paper/2019/ file/bd686fd640be98efaae0091fa301e613-Paper.pdf 11. N. Bouacida, P. Mohapatra, Vulnerabilities in federated learning. IEEE Access 9, 63229–63249 (2021). https://doi.org/10.1109/ACCESS.2021.3075203 12. C. Briggs, Z. Fan, P. Andras, Federated learning with hierarchical clustering of local updates to improve training on non-IID data, in 2020 International Joint Conference on Neural Networks (IJCNN) (2020), pp. 1–9. https://doi.org/10.1109/IJCNN48605.2020.9207469 13. G. Engelen, V. Rimmer, W. Joosen, Troubleshooting an intrusion detection dataset: the CICIDS2017 case study, in 2021 IEEE Security and Privacy Workshops (SPW) (2021), pp. 7– 12. https://doi.org/10.1109/SPW53761.2021.00009 14. A. Fallah, A. Mokhtari, A. Ozdaglar, Personalized federated learning with theoretical guarantees: a model-agnostic meta-learning approach, in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc. (2020), pp. 3557–3568. https://proceedings.neurips. cc/paper/2020/file/24389bfe4fe2eba8bf9aa9203a44cdad-Paper.pdf 15. C. Fan, J. Huang, Federated few-shot learning with adversarial learning, in 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) (2021), pp. 1–8. https://doi.org/10.23919/WiOpt52861.2021.9589192 16. Y. Fan, Y. Li, M. Zhan, H. Cui, Y. Zhang, IoTdefender: a federated transfer learning intrusion detection framework for 5G IoT, in 2020 IEEE 14th International Conference on Big Data Science and Engineering (BigDataSE). IEEE (2020), pp. 88–95. https://doi.org/10.1109/ BigDataSE50710.2020.00020 17. M.A. Ferrag, O. Friha, D. Hamouda, L. Maglaras, H. Janicke, Edge-IIoTset: a new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning. IEEE Access 10, 40281–40306 (2022). https://doi.org/10.1109/ACCESS. 2022.3165809 18. Y. Gao, M. Kim, S. Abuadbba, Y. Kim, C. Thapa, K. Kim, S.A. Camtepe, H. Kim, S. Nepal, End-to-end evaluation of federated learning and split learning for internet of things. arXiv preprint arXiv:2003.13376 (2020). https://doi.org/10.48550/arXiv.2003.13376 19. J. Geiping, H. Bauermeister, H. Dröge, M. Moeller, Inverting gradients - how easy is it to break privacy in federated learning?, in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc. (2020), pp. 16937–16947. https://proceedings.neurips.cc/paper/2020/ file/c4ede56bbd98819ae6112b20ac6bf145-Paper.pdf 20. B. Ghimire, D.B. Rawat, Recent advances on federated learning for cybersecurity and cybersecurity for federated learning for internet of things. IEEE Internet Things J. 9(11), 8229– 8249 (2022). https://doi.org/10.1109/JIOT.2022.3150363 21. R. Gosselin, L. Vieu, F. Loukil, A. Benoit, Privacy and security in federated learning: a survey. Appl. Sci. 12(19) (2022). https://doi.org/10.3390/app12199901 22. B. Gu, A. Xu, Z. Huo, C. Deng, H. Huang, Privacy-preserving asynchronous vertical federated learning algorithms for multiparty collaborative learning. IEEE Trans. Neural Netw. Learn. Syst. (2021), pp. 1–13. https://doi.org/10.1109/TNNLS.2021.3072238 23. V. Hahanov, A. Saprykin, Federated machine learning architecture for searching malware, in 2021 IEEE East-West Design & Test Symposium (EWDTS) (2021), pp. 1–4. https://doi.org/10. 1109/EWDTS52692.2021.9581000 24. M. Hao, H. Li, X. Luo, G. Xu, H. Yang, S. Liu, Efficient and privacy-enhanced federated learning for industrial artificial intelligence. IEEE Trans. Ind. Inf. 16(10), 6532–6542 (2020). https://doi.org/10.1109/TII.2019.2945367 25. R.H. Hsu, Y.C. Wang, C.I. Fan, B. Sun, T. Ban, T. Takahashi, T.W. Wu, S.W. Kao, A privacypreserving federated learning system for android malware detection based on edge computing, in 2020 15th Asia Joint Conference on Information Security (AsiaJCIS). IEEE (2020), pp. 128– 136. https://doi.org/10.1109/AsiaJCIS50894.2020.00031

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

191

26. E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, S.L. Kim, Communication-efficient on-device machine learning: federated distillation and augmentation under non-IID private data. arXiv preprint arXiv:1811.11479 (2018). https://doi.org/10.48550/arXiv.1811.11479 27. P. Kairouz, H.B. McMahan, B. Avent, A. Bellet, M. Bennis, A.N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, R.G.L. D’Oliveira, H. Eichner, S.E. Rouayheb, D. Evans, J. Gardner, Z. Garrett, A. Gascón, B. Ghazi, P.B. Gibbons, M. Gruteser, Z. Harchaoui, C. He, L. He, Z. Huo, B. Hutchinson, J. Hsu, M. Jaggi, T. Javidi, G. Joshi, M. Khodak, J. Konecný, A. Korolova, F. Koushanfar, S. Koyejo, T. Lepoint, Y. Liu, P. Mittal, M. Mohri, R. Nock, A. Özgür, R. Pagh, H. Qi, D. Ramage, R. Raskar, M. Raykova, D. Song, W. Song, S.U. Stich, Z. Sun, A.T. Suresh, F. Tramèr, P. Vepakomma, J. Wang, L. Xiong, Z. Xu, Q. Yang, F.X. Yu, H. Yu, S. Zhao, Advances and open problems in federated learning. Found. Trends Mach. Learn. 14(1–2), 1–210 (2021). https://doi.org/10.1561/2200000083 28. L.U. Khan, W. Saad, Z. Han, E. Hossain, C.S. Hong, Federated learning for internet of things: recent advances, taxonomy, and open challenges. IEEE Commun. Surv. Tutorials 23(3), 1759– 1799 (2021). https://doi.org/10.1109/COMST.2021.3090430 29. T.V. Khoa, Y.M. Saputra, D.T. Hoang, N.L. Trung, D. Nguyen, N.V. Ha, E. Dutkiewicz, Collaborative learning model for cyberattack detection systems in IoT industry 4.0, in 2020 IEEE Wireless Communications and Networking Conference (WCNC). IEEE (2020), pp. 1–6. https://doi.org/10.1109/WCNC45663.2020.9120761 30. E. Khramtsova, C. Hammerschmidt, S. Lagraa, R. State, Federated learning for cyber security: Soc collaboration for malicious url detection, in 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) (2020), pp. 1316–1321. https://doi.org/10.1109/ ICDCS47774.2020.00171 31. T. Li, M. Sanjabi, A. Beirami, V. Smith, Fair resource allocation in federated learning. arXiv preprint arXiv:1905.10497 (2019). https://doi.org/10.48550/arXiv.1905.10497 32. X. Li, K. Huang, W. Yang, S. Wang, Z. Zhang, On the convergence of FedAvg on non-IID data. arXiv preprint arXiv:1907.02189 (2019). https://doi.org/10.48550/arXiv.1907.02189 33. T. Li, A.K. Sahu, A. Talwalkar, V. Smith, Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37(3), 50–60 (2020). https://doi.org/10.1109/MSP.2020. 2975749 34. J. Li, L. Lyu, X. Liu, X. Zhang, X. Lyu, FLEAM: a federated learning empowered architecture to mitigate DDoS in industrial IoT. IEEE Trans. Ind. Inf. 18(6), 4059–4068 (2021). https://doi. org/10.1109/TII.2021.3088938 35. K.Y. Lin, W.R. Huang, Using federated learning on malware classification, in 2020 22nd International Conference on Advanced Communication Technology (ICACT) (2020), pp. 585– 589. https://doi.org/10.23919/ICACT48636.2020.9061261 36. Y. Liu, Y. Kang, C. Xing, T. Chen, Q. Yang, A secure federated transfer learning framework. IEEE Intell. Syst. 35(4), 70–82 (2020). https://doi.org/10.1109/MIS.2020.2988525 37. Y. Liu, S. Garg, J. Nie, Y. Zhang, Z. Xiong, J. Kang, M.S. Hossain, Deep anomaly detection for time-series data in industrial IoT: a communication-efficient on-device federated learning approach. IEEE Internet Things J. 8(8), 6348–6358 (2020). https://doi.org/10.1109/JIOT.2020. 3011726 38. L. Lyu, X. Xu, Q. Wang, H. Yu, Collaborative Fairness in Federated Learning. Springer International Publishing, Cham (2020), pp. 189–204. https://doi.org/10.1007/978-3-030-63076-8_ 14 39. L. Lyu, H. Yu, Q. Yang, Threats to federated learning: a survey. CoRR abs/2003.02133 (2020). https://doi.org/10.48550/arXiv.2003.02133 40. C. Ma, J. Li, M. Ding, H.H. Yang, F. Shu, T.Q.S. Quek, H.V. Poor, On safeguarding privacy and security in the framework of federated learning. IEEE Netw. 34(4), 242–248 (2020). https:// doi.org/10.1109/MNET.001.1900506 41. B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A.y. Arcas, Communication-efficient learning of deep networks from decentralized data, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 54. PMLR (20–22 Apr 2017), pp. 1273–1282. https://proceedings.mlr.press/ v54/mcmahan17a.html

192

J. White and P. Legg

42. H.B. McMahan, D. Ramage, K. Talwar, L. Zhang, Learning differentially private recurrent language models. arXiv preprint arXiv:1710.06963 (2017). https://doi.org/10.48550/arXiv. 1710.06963 43. L. Melis, C. Song, E. De Cristofaro, V. Shmatikov, Exploiting unintended feature leakage in collaborative learning, in 2019 IEEE symposium on security and privacy (SP). IEEE (2019), pp. 691–706. https://doi.org/10.1109/SP.2019.00029 44. M. Mohri, G. Sivek, A.T. Suresh, Agnostic federated learning, in Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97. PMLR (09–15 Jun 2019), pp. 4615–4625. https://proceedings.mlr.press/v97/mohri19a. html 45. R.K. Mothilal, A. Sharma, C. Tan, Explaining machine learning classifiers through diverse counterfactual explanations, in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. FAT* ’20. Association for Computing Machinery, New York (2020), pp. 607–617. https://doi.org/10.1145/3351095.3372850 46. V. Mothukuri, R.M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, G. Srivastava, A survey on security and privacy of federated learning. Futur. Gener. Comput. Syst. 115, 619–640 (2021). https://doi.org/10.1016/j.future.2020.10.007 47. V. Mothukuri, P. Khare, R.M. Parizi, S. Pouriyeh, A. Dehghantanha, G. Srivastava, Federatedlearning-based anomaly detection for IoT security attacks. IEEE Internet Things J. 9(4), 2545– 2554 (2022). https://doi.org/10.1109/JIOT.2021.3077803 48. M. Nasr, R. Shokri, A. Houmansadr, Comprehensive privacy analysis of deep learning: passive and active white-box inference attacks against centralized and federated learning, in 2019 IEEE symposium on security and privacy (SP). IEEE (2019), pp. 739–753. https://doi.org/10.1109/ SP.2019.00065 49. T.D. Nguyen, S. Marchal, M. Miettinen, H. Fereidooni, N. Asokan, A.R. Sadeghi, Dïot: a federated self-learning anomaly detection system for IoT, in 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE (2019), pp. 756–767. https:// doi.org/10.1109/ICDCS.2019.00080 50. T.D. Nguyen, P. Rieger, M. Miettinen, A.R. Sadeghi, Poisoning attacks on federated learningbased IoT intrusion detection system, in Proceedings of Workshop Decentralized IoT Syst. Secur. (DISS) (2020), pp. 1–7 51. S.A. Rahman, H. Tout, C. Talhi, A. Mourad, Internet of things intrusion detection: centralized, on-device, or federated learning? IEEE Netw. 34(6), 310–317 (2020). https://doi.org/10.1109/ MNET.011.2000286 52. V. Rey, P.M.S. Sánchez, A.H. Celdrán, G. Bovet, Federated learning for malware detection in IoT devices. Comput. Netw. 204, 108693 (2022). https://doi.org/10.1016/j.comnet.2021. 108693 53. I. Sharafaldin, A.H. Lashkari, A.A. Ghorbani, Toward generating a new intrusion detection dataset and intrusion traffic characterization, in ICISSP (2018) 54. K. Singhal, H. Sidahmed, Z. Garrett, S. Wu, J. Rush, S. Prakash, Federated reconstruction: partially local federated learning, in Advances in Neural Information Processing Systems, vol. 34. Curran Associates, Inc. (2021), pp. 11220–11232. https://proceedings.neurips.cc/ paper/2021/file/5d44a2b0d85aa1a4dd3f218be6422c66-Paper.pdf 55. Y. Sun, H. Ochiai, H. Esaki, Intrusion detection with segmented federated learning for largescale multiple LANs, in 2020 International Joint Conference on Neural Networks (IJCNN). IEEE (2020), pp. 1–8. https://doi.org/10.1109/IJCNN48605.2020.9207094 56. Z. Sun, P. Kairouz, A.T. Suresh, H.B. McMahan, Can you really backdoor federated learning? arXiv preprint arXiv:1911.07963 (2019). https://doi.org/10.48550/arXiv.1911.07963 57. R. Taheri, M. Shojafar, M. Alazab, R. Tafazolli, Fed-iiot: a robust federated malware detection architecture in industrial IoT. IEEE Trans. Ind. Inf. 17(12), 8442–8452 (2020). https://doi.org/ 10.1109/TII.2020.3043458 58. S. Truex, N. Baracaldo, A. Anwar, T. Steinke, H. Ludwig, R. Zhang, Y. Zhou, A hybrid approach to privacy-preserving federated learning, in Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security. AISec’19. Association for Computing Machinery, New York, (2019), pp. 1–11. https://doi.org/10.1145/3338501.3357370

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

193

59. V. Turina, Z. Zhang, F. Esposito, I. Matta, Combining split and federated architectures for efficiency and privacy in deep learning, in Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies (2020), pp. 562–563. https://doi.org/ 10.1145/3386367.3431678 60. V. Turina, Z. Zhang, F. Esposito, I. Matta, Federated or split? A performance and privacy analysis of hybrid split and federated learning architectures, in 2021 IEEE 14th International Conference on Cloud Computing (CLOUD) (2021), pp. 250–260. https://doi.org/10.1109/ CLOUD53861.2021.00038 61. Z. Wang, M. Song, Z. Zhang, Y. Song, Q. Wang, H. Qi, Beyond inferring class representatives: user-level privacy leakage from federated learning, in IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE (2019), pp. 2512–2520. https://doi.org/10. 1109/INFOCOM.2019.8737416 62. H. Wang, K. Sreenivasan, S. Rajput, H. Vishwakarma, S. Agarwal, J.y. Sohn, K. Lee, D. Papailiopoulos, Attack of the tails: yes, you really can backdoor federated learning. Adv. Neural Inf. Process. Syst. 33, 16070–16084 (2020). https://proceedings.neurips.cc/paper/2020/ file/b8ffa41d4e492f0fad2f13e29e1762eb-Paper.pdf 63. X. Wang, S. Garg, H. Lin, J. Hu, G. Kaddoum, M.J. Piran, M.S. Hossain, Toward accurate anomaly detection in industrial internet of things using hierarchical federated learning. IEEE Internet Things J. 9(10), 7110–7119 (2021). https://doi.org/10.1109/JIOT.2021.3074382 64. G. Wilkinson, P. Legg, “What did you say?”: extracting unintentional secrets from predictive text learning systems, in 2020 International Conference on Cyber Security and Protection of Digital Services (Cyber Security) (2020), pp. 1–8. https://doi.org/10.1109/ CyberSecurity49315.2020.9138882 65. C. Xu, G. Neglia, What else is leaked when eavesdropping Federated Learning?, in CCS workshop Privacy Preserving Machine Learning (PPML). Soeul (Nov 2021). https://doi.org/ 10.1145/1122445.1122456, virtual, Contributed talk 66. Q. Yang, Y. Liu, T. Chen, Y. Tong, Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10(2) (2019). https://doi.org/10.1145/3298981 67. J. Zhang, C. Li, A. Robles-Kelly, M. Kankanhalli, Hierarchically fair federated learning. arXiv preprint arXiv:2004.10386 (2020). https://doi.org/10.48550/arXiv.2004.10386

Emerging Computer Security Laws and Regulations Across the Globe: A Comparison Between Sri Lankan and Contemporary International Computer Acts S. Y. Rajapaksha, L. G. P. K. Guruge, and S. L. P. Yasakethu

1 Sri Lankan Context In order to evaluate the Sri Lankan computer security laws over the emerging international laws, a clear understanding on Sri Lanka and the existing legislations is required. This section carries a detailed explanation on Sri Lanka, its’ technological background, and the legislations on computer security. Further explanation on the landscape of cyberthreats also included to elaborate the existing cybersecurity infrastructure in the country.

Background of Sri Lanka and Technology The island nation of Sri Lanka, historically known as Ceylon, is located in the Indian Ocean and is divided from the Indian Region by the Palk Strait. It has a maximum length of 268 miles (432 km) and a maximum width of 139 miles. Due to its location at the intersection of several sea lanes that span the Indian Ocean, Sri Lanka has indeed been influenced culturally by other Asian civilizations. Taprobane was the name given to it by the Greek geographers. It was referred to as Serendib by Arabs. Ceylon was the name given to it by later European cartographers; it is sometimes still used for trade. In 1972, it was given the name Sri Lanka. Sri Lanka gained independence in 1948, ending nearly 150 years of British control, and was accepted to the UN 7 years later. The nation belongs to both the South Asian Association

S. Y. Rajapaksha · L. G. P. K. Guruge · S. L. P. Yasakethu () School of Computing & IT, Sri Lanka Technological Campus, Padukka, Sri Lanka e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Hewage et al. (eds.), Data Protection in a Post-Pandemic Society, https://doi.org/10.1007/978-3-031-34006-2_7

195

196

S. Y. Rajapaksha et al.

for Regional Cooperation and the Commonwealth. The population of Sri Lanka is about 20 million. Sri Lanka is slowly developing with the rest of the world into new advancements and technology is one of that. If the technological advancements all around the world is considered, Sri Lanka has a slow development speed. The country had to face other social, economic and political circumstances. If the digital literacy of Sri Lanka is considered, Department of Census and Statistics estimates that 38.7% of Sri Lankans are digitally literate in 2017, which is defined as being able to use a computer, laptop, tablet, or smartphone without assistance. The rates of digital literacy are 36.4% and 16.4%, respectively, in the urban and suburban sectors. Even though Sri Lanka is as not as highly developed from the technological status, the impact of the technological advancements has affected Sri Lanka as well. There is no person who doesn’t own a mobile device in the world now. Everyone in one way or another, directly or indirectly, works or deals with a computer system. As a coin has two sides, technological developments didn’t only got advanced from the ethical side of the coin, the unethical practices of technology have also developed with the time. The fact that technology is embedded into almost everyone’s day to day life, it is highly probable to get affected by the attacks and crimes of the digital and cyber space. Sri Lanka is ranked 79 out of 190 members of the United Nations in terms of its progress toward a digital administration, according to the E-Government Development Index (EGDI). Sri Lanka received a score of 0.6522 in the online service index (OSI), which examines all digital government applications, 0.2445 in the telecommunication infrastructure index (TII), which examines the state of the Internet’s facilities and usage, and 0.7363 in the human capital index (HCI), which examines adult literacy and educational attainment. Therefore, Sri Lanka outperforms the worldwide average in terms of online service index and human capital index but falls short of the average in terms of communications infrastructure index [1].

What Is a Crime? What Is a Computer Crime? In various settings, the word “crime” can mean different things. The Sri Lankan penal code (Sec. 38-1) uses the term “offense” to designate such deeds and failures, although in other jurisdictions the term “crime” is used to denote a banned act or omission under a penal legislation. Every person has found computers to be significant assets, and practically every aspect of the computer industry, both domestically and internationally, is fueling modern society’s technical advancements in both the social and economic spheres. The nature of offenses and the number of offenses has started to rise in tandem as computer usage increases. The broad definition of computer crime, according to Furnell (2002), is a criminal offense in which the offender uses specialized tech skills.

Emerging Computer Security Laws and Regulations Across the Globe:. . .

197

A computer or network may be the source, or the target of a criminal action known as computer crime. It can be broadly characterized as illegal conduct involving computer network infrastructure, illicit accesses, device abuse, electronic fraud, and forging. The crime can target computer networks or devices directly or facilitated by computer networks or devices. As the criminal activities in Sri Lankan penal code is dealing with the crimes related to people and property and the word property is not defining the computer devices or the cyber spaces and the definitions given are not adequate to fight with the technological system crimes, the Sri Lankan Computer Crimes Act was proposed and introduced in 2007 [4]. Computer crimes and cybercrimes has a thin line of a difference, where cybercrimes use computer devices along with the internet, communication resources or any other resource in cyberspace, while computer crimes are the non-traditional crimes that are taking place involving the digital devices.

Overview of Computer Crimes Act 2007 The computer crimes act of Sri Lanka was proposed and executed in 2007. The act is sectioned into three main parts as computer crime, investigations and miscellaneous. Under the first section the identified crimes are categorized as offenses concerning, • • • • • • • • •

Unauthorized access (section 03) Ulterior intent offenses (section 04) Unauthorized modification (section 05) National security, public security (section 06) Illegal data obtains (section 07) Illegal interception (section 08) Illegal computer devices (section 09) Unauthorized disclosures (section 10) Attempts of commit offenses (section 11)

Under the second section (investigation), it focuses on the investigations of offenses against the digital space. The pages 15–24 in the act exemplifies the laws that are related to the computer crime investigation. On page 16 of the computer crime act. It defines that any computer related crime that can be Perceptible will be dealt with under Criminal Procedure Act 1979, which means that the offender can be arrested without a warrant. In order for a crime to be investigated there should be a very strong forensic crime unit that is not easily breakable. The minister of the science and technology field is supposed to appoint members of the expert team who will be handling the investigation. For example, the experts can be identified as the university employees, preferably the VC. The non-warrant search mainly addresses in case the investigation needs to be done soon, where the evidence might not get destroyed or tainted. The Act has imposed a set of duties for the experts and the police officers. The Act is mainly empowering the experts and the police officers [3].

198 Table 1 The table of incidents by SL CERT annual report 2011 [7]

S. Y. Rajapaksha et al. Type of incident Phishing Abuse/Privacy Scams Malware Defacements Hate/Threat mail Unauthorized access Intellectual Property Violations DoS/DDoS Fake accounts Total

Number 6 2 3 1 20 3 3 5 1 1425 1469

When it comes to cyber and computer crimes rates and statistics of Sri Lanka, it can be seen that several types of attacks and offenses have increased over the time. Sri Lanka CERT has released annual reports of the cyber state and statistics.

Comparison on the Reports of 2011 and 2020 From the archive of Sri Lanka CERT, the nearest report to the Computer Crime Act realize is the 2011 annual report and as the comparison the latest annual report that is considered is the annual report of 2020. The statistics given in Sri Lanka CERT web, divides the incident types for a few categories such as availability, intrusions, information content security, fraud, malicious codes, abusive content (harassment) and other incidents. The report in 2011 states that “Incidents reported to Sri Lanka CERT increased to 1469 in the year 2011. In the year 2010, only 151 incidents were reported. This can be seen as a major increase in the reported incidents compared to the year 2010” [7]. The Table 1 shows the count of incidents which the report contains. In 2011, the most committed crime was the website fake accounts with 1425 incidents. The least committed crimes are malware and DoS with 1 incident each. In this scenario the fake account explains the social media accounts that have done identity theft. Furthermore, the report contains a graph that shows the increase of computer and cybercrime incidents from the year 2006, which is 1 year before the Computer Crime Act was introduced. The Fig. 1 shows the graph which the report of 2011 contained. By the graph it can be noted that there was a drastic increment from 2010 to 2011 in the use of devices as well as the crime rate. In 2020, the latest report will be released by SL CERT. It can be understood clearly that the number of incidents has increased drastically. In 2011 the highest number of incidents for social media were 1425 (annual), while by 2020 that has increased “on average more than 1000 cases reported each month”. It is, however, explained that “Among the social media incidents, as usual Facebook related

Emerging Computer Security Laws and Regulations Across the Globe:. . .

199

Fig. 1 Graphical representation of crime increment – taken from SL CERT annual rep 2011 [7] Table 2 The table of incident occurred in 2020 according to SL CERT [6] Incident type DDoS Ransomware Abuse/Hate/Privacy violation Malicious software issues Phone hacking Scams Phishing Website compromise Financial/Email frauds Intellectual property violations Server compromised Social media Other

No. of incidents – 2019 2 6 3 8 1 5 5 175 7 1 2 2662 364

No. of incidents – 2020 1 24 70 9 6 157 17 85 57 1 6 15,895 48

incidents were the highest. This may be due to increased use of social media, due to COVID-19 pandemic situation.” The table of the number of incidents which is taken from the SL CERT report 2020 is shown in the Table 2. Even after a decade, the social media incidents have paved the way to be in the first place of highest number of the security incident in Sri Lanka. Addition to the social media issues, if considered the other types of computer crimes as scams, malwares, DoS, phishing etc. have also increased. There are two other newly added entries which are missing from the 2011 report which are financial frauds and ransomwares. It can be clearly seen that not only has the number of security incidents related to computers has increased but also new types of threats and attacks are also increasing with time. The Fig. 2 contains the graph included in the report 2020 on the overall status of the incidents throughout a decade.

S. Y. Rajapaksha et al.

No. of Incidents

200

18000 16000 14000 12000 10000 8000 6000 4000 2000 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

Fig. 2 The overall state of security incidents throughout a decade – SL CERT report 2020 [6]

Sri Lanka CERT have summarized the observations they have made about 2020 computer crimes in comparison to other years, which are given below as stated originally. I. “Number of reported cases related to privacy violations has decreased during the year 2020. II. Financial frauds targeting local importers and exporters have seen a massive upturn during the year 2020 compared to 2019. III. There has been a significant increase in the spread of ransomware and malicious software during the year of 2020, where sensitive data belonging to both individuals as well as corporate businesses have been made unavailable through encrypting, erasing, or modifying data. IV. A significant number of web site compromises targeting government and private sector organizations were recorded in 2020. However, there is a notable 48% decrease when compared to the year 2019. V. Incidents reported to Sri Lanka CERT have increased to 16,376 in the year 2020. In the year 2019, 3566 incidents were reported. This is nearly a 460% increase in reported incidents compared to the year 2019.” [5]. The state of computer crimes has evolved rapidly throughout these years from 2007 to up today. There are new types of security violations as well as the number of security misuse has also been increased. Unfortunately, the laws and regulations that define the boundary of ethical and unethical behaviors have not been updated after its initial release in 2007. Generally, if there are no laws or rules that will negatively affect human life, they are naturally bending towards leniency. It is human nature and to control that there should be appropriate rules and regulations. The increase in the number of incidents and computer crimes could be a result of that non updated policies that make it

Emerging Computer Security Laws and Regulations Across the Globe:. . .

201

easier to commit a crime because they know there is no rule to punish them even if they are caught.

The Loopholes in the Computer Act 2007 The problems or the loopholes of the Sri Lankan Computer Crimes Act in 2007 have arisen mainly because they are not updated, and it is almost two decades now. There is a very fast development of technology as well as attacks. The issues in the current Computer Crime Act 2007 are discussed according to the point given below [4]. • • • • • • • •

Narrow Scope (The scope of definitions) Convictions are not strong enough and updated The act is mostly focusing only on computers and computer programs Focuses more on integrity and confidentiality issues but not availability Proper incident response strategies are not defined Attacks are not defined Updates on policies and auditing, maintaining sections are not included The power is centralized upon one person

These points will be further discussed in the section of the comparison with intentional computer crime acts.

2 International Context This section carries an explanation of the emerging cybersecurity laws in the world. Furthermore, the security legislations of four selected countries will be evaluated by focusing the acts which enhance the cyber-defenses.

Introduction to International Contemporary Laws Cyber security is a growing concern to countries at all levels of development since people in every sector could be affected from cyber-attacks. With the increase of universal access to the Internet, growth of the social network usage and the development of digital government services, the threats from foreign powers, terrorists and criminals over the cyber space also increasing rapidly. This issue has been spanned to governments of all countries, their agencies and contractors over the globe and the requirement for the states to create legal frameworks and agencies to protect information and provide legal advice to the businesses and citizens also raised to ensure a sufficient legal protection for the victims of such cyber-related incidents.

202

S. Y. Rajapaksha et al.

According to the records, the Europe has been at the forefront in cybersecurity legislation. The member states of the European Council signed the Strasbourg Convention in 1981 which was called as the “Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data” [19]. Then the world’s first Internet regulatory law, the “Information and Communication Services Regulation Act” was passed by Germany in 1997 [19]. The “Convention on Cybercrime” initiated by the European Union in 2001 was the world’s first international treaty against cybercrime. This convection is known as the Budapest Convection which was the signatory international instrument for cybercrime legislation in many other countries in the world [19]. The United Nations Conference on Trade and Development (UNCTAD) [8] was held in December 2021 and the 194 UNCTAD member nations’ adoption of e-commerce law in the areas of e-transactions, consumer protection, data protection/privacy, and cybercrime has been tracked. It shows whether a specific nation has passed legislation or has a draft law that is waiting to be passed. According to the report [8], 81% countries have e-transaction laws, 59% countries have consumer protection laws, 71% countries have privacy laws, and 80% countries have cybercrime laws which ensure the security in cyber-space of respective countries. The evolving cyber security legislation landscape is shown in the Table 3. The cybercrime legislation adoption is varying with the region as Europe has the highest adoption rate (91%) while Africa has the lowest (72%). The law enforcement organizations and prosecutors face a major difficulty because of the changing nature of cybercrime and the resulting skills gaps, particularly when it comes to cross-border enforcement. With the increase of the usage of social media platforms and online economic platforms, the usage, and the sharing of personal information with third parties also increased. Therefore, to regularize the sharing of this information with a notice or consent of consumers, the importance of adopting privacy and data protection legislation is increasingly recognized. But still the regions of Africa and Asia shows a comparatively low percentage of adopting these legislations. Despite of having a trust and the confidence between the consumer and

Table 3 Adoption of cybersecurity legislation worldwide Legislation type Cybercrime Legislation Data Protection and Privacy Legislation Online Consumer Protection Legislation E-transactions Legislation

Legislation (%) 80

Draft legislation (%) 5

No legislation (%) 13

No data (%) 1

71

9

15

5

59

5

9

27

81

7

4

8

Source: United Nations Conference on Trade and Development – December 2021 [8]

Emerging Computer Security Laws and Regulations Across the Globe:. . .

203

the business in e-commerce sector, still the adoption of online consumer protection legislations is significantly low. Nearly 27% among 194 countries, do not have proper data on these legislations which implies the consumer protection is not fully addressed. It is essential to have a e-transaction law to conduct commercial transactions online and these e-transaction legislations have been adopted by 81% of 194 countries. A significant amount of developing and least developing countries also included within this 81%, and this implies a good enforcement in e-transactions worldwide. The COVID-19 pandemic situation since 2019, sparked a massive humanitarian disaster that had a negative impact on almost all nations in the world and the governments’ health security measures compelled businesses and individuals to adopt new behaviors, such as social isolation and working from home. As a result, a growing number of people started making purchases, conducting business, managing their offices, and even attending school online, making them extremely exposed to cyberattacks [2]. All citizens, professionals, politicians, and, more broadly, all decision makers now have a general concern for cyber security because of these changed behaviors due to the pandemic situation. Some major problems that may require government policy attention also have been brought to light by the strain on these changes in existing systems. Therefore, during the pandemic, the governments of many countries are taking the time to look over the issues and find ways to update and strengthen their current policies to fit with defending people from cyber-attacks. The following cybersecurity legislation overview covers four jurisdictions from four major regions of the world. Estonia from the European region, Singapore from the Asia region, South Africa from the African region, and the USA from the American region will be reviewed since they can be identified as the countries which has a relatively strong cybersecurity legislation within the region. For the purpose of this overview, the term cybersecurity legislation refers to the legislation of electronic transactions, consumer protection, privacy and data protection and cybercrime. The Table 4 shows the legislations exist in the selected countries which could be adopted to upgrade the Sri Lankan cybersecurity legislation.

Table 4 Existing legislations in selected countries Country Estonia Singapore South Africa USA Sri Lanka

Electronic transactions Yes Yes Yes Yes Yes

Consumer protection Yes Yes Yes Yes Yes

Privacy and data protection Yes Yes Yes Yes No

Cybercrime Yes Yes Yes Yes Yes

Source: United Nations Conference on Trade and Development – December 2021 [8]

204

S. Y. Rajapaksha et al.

Estonia At the end of April 2007, Estonia was subjected to a wave of significant cyberattacks. The attacks were extensive and severe, focusing on media websites, banks, government offices, and congress. Before the incident, cyber-attacks had not been taken seriously as a present danger to the state or its population. There was no established code of conduct or consensus among decision-makers. After being negatively impacted by this incident, Estonia released its first National Cybersecurity Strategy in 2008 [19], indicating that cybersecurity has grown to be a crucial component of its national security. This document, which was created by the Defense Department, emphasizes the necessity of regionally coordinated efforts while noting the interdisciplinary nature of cybersecurity. Later, in 2014 and 2019, respectively, Estonia published two updates to its National Cybersecurity Strategy [19]. As the Ministry of Economic Affairs and Communication was responsible for drafting the paper, it was made clear that Estonia would enhance its information security capabilities and create essential infrastructure, rather than only deal with cyberattacks. The Table 5 refers the major cybersecurity laws of Estonia under electronic transactions, consumer protection, privacy and data protection and cybercrime. After nearly 10 years since the incident, Estonia has established itself as a world authority in cyber security-related information, advising many other nations on the subject. The country has signed agreements with Austria, Luxembourg, South Korea, and NATO to expand training and collaboration in the field [19]. The current state of Estonia’s cyber security is strengthened by a wellfunctioning e-government infrastructure, a trustworthy digital identity, a requirement for a minimum level of security for all government agencies, and a centralized system for tracking, reporting, and resolving issues. Important service providers must evaluate and control their ICT risks. Most essential, there is a shared awareness that cooperation and a combined effort are needed at all levels such as state, business sector, and individuals if cyber security is to be ensured. There are several dedicated and administrative measures regulating IT – Security and specific dedicated laws regulating IT-Security regarding critical infrastructures as well. There are also a few general legal principles, such as those found in criminal and civil law, that have an impact on IT security. Court cases specifically addressing IT security norms are not common.

Table 5 Cybersecurity legislation of Estonia [8] Legislation Cybercrime Legislation Data Protection and Privacy Legislation

Online Consumer Protection Legislation E-transactions Legislation

Existing acts Criminal Code The General Data Protection Regulation (Regulation (EU) 2016/679) (GDPR) Data Protection Act Consumer Protection Act (in English) Digital Signature Act Adopted on March 8, 2000

Emerging Computer Security Laws and Regulations Across the Globe:. . .

205

According to the 2021 Cybersecurity Review from ‘e-Estonia’ [9], the year 2021 was given a title of ‘the year of major security vulnerabilities’ due to the incident of ‘Log4j zero-day vulnerability.’ An Estonian attacker took advantage of a security flaw in the Information System Authority (RIA) to work lawfully obtain almost 300,000 document photos from the database of identity documents. Even though the security vulnerability was swiftly patched after discovery of incident, the criminal was apprehended, and the downloaded information was seized within a few days with the use of the nation’s current cybersecurity infrastructure, it remains to be seen the in-state cyber conflict was also present in Estonia as well [9]. However, the number of incidents reported within the year of 2021 was 2237 and it clearly shows a decrement in compared to the years of 2019 and 2020 [9], which were the years with high amount of reported cyber incidents in the world due to the COVID19 pandemic situation. The number of reported cases of ransomware attacks also decreased from 33 in 2020 to 30 in 2021 [9] while ransomware has become into a global epidemic in the online space. With this highly improved cybersecurity infrastructure in Estonia, according to the RIA [10], it appears that Estonia’s small size, its language variety, and the nation’s steadily improving level of online cleanliness are major factors in the nation’s relatively low number of cyberattacks compared to other nations’ relatively high incidences.

Singapore Singapore’s Cybersecurity Strategy outlines the country’s priorities, goals, and vision for cybersecurity. It promotes coordinated action and international collaboration for a reliable and secure cyberspace. The National Cyber Security Centre (part of the CSA), in accordance with Singapore’s Cybersecurity Strategy, will work with sector regulators to provide a national level response and facilitate quick alerts to cross-sector threats, and the Singapore Computer Emergency Response Team (SingCERT) [12] responds to cybersecurity incidents on behalf of its Singapore constituents. It was created to make it easier to discover, address, and stop cybersecurity-related online problems. The Table 6 refers the initiative cybersecurity laws of in Singapore under electronic transactions, consumer protection, privacy and data protection and cybercrime. To improve the Critical Information Infrastructure (CII1) sectors’ cyber resilience and better secure cyberspace, the Cyber Security Agency of Singapore (CSA) has launched two new initiatives as reviewing the Cybersecurity Act (CSA) and updating the Cybersecurity Code of Practice (CCoP) – the 11 CII sectors [12]. These two initiatives have been taken to strengthen Singapore’s cyber defenses by aligning the CSA with the rapidly evolving digital space and to combat new and emerging threats more effectively such as ransomware and domain-specific dangers like 5G.

206

S. Y. Rajapaksha et al.

Table 6 Cybersecurity legislation of Singapore [8] Legislation Cybercrime Legislation Data Protection and Privacy Legislation Online Consumer Protection Legislation E-transactions Legislation

Existing acts Cybersecurity Act No.9/ 2018 Personal Data Protection Act No. 26 2012 Consumer Protection (Fair Trading) Act Electronic Transactions Act 2010, Cap 88. [Adopted from; UNCITRAL Model Law on Electronic Commerce (1996) & United Nations Convention on the Use of Electronic Communications in International Contracts (NY, 2005).]

Review of the CSA A legislative foundation for Singapore’s national cybersecurity monitoring and management has been established under the Cybersecurity Act [11], which took effect from August 2018 and there has been a noticeable rise in reliance on digital infrastructure and services. The CIIs that enable the provision of critical physical services like water and electricity have been the focus of this CS Act. With the new review, it has been considered in enlarging the CS Act to cover virtual assets (such as cloud hosted systems) as CII if they provide important services and increase awareness of dangers over Singapore’s cyberspace [12]. Moreover, the CS Act review [12] also included essential digital services, such as applications, and vital digital infrastructure that are necessary to support the digital way of life and enable the digital economy.

Updating the Cybersecurity Code of Practice (CCoP) To maintain a solid cybersecurity foundation for the CII sectors, the Cybersecurity Act provides a framework for the identification of CII, and CII Owners throughout the 11 important sectors are required to adhere to the mandated cyber hygiene procedures within the CCoP [11]. To improve the level of cybersecurity for Operational Technology – CII, a set of mandatory OT – specific cybersecurity practices was added as an addition to the CCoP in December 2019 [12]. However, basic cyber hygiene procedures may no longer be enough for CII Owners to fight against the threats as they continue to advance and become more sophisticated. Specifically, ransomware has developed into a significant and pervasive danger that can compromise national security and impair vital services. Every CII sector also faces cybersecurity concerns unique to their digital environments, such as the adoption of 5G technology or the move to the Cloud [12]. Cyber hygiene procedures that apply to all crucial industries would not be able to address such concerns. The review of CSA aims to improve the current CCoP to accomplish the objectives such as assisting CIIs to increase their chances of protecting themselves from cyber threat actors deploying sophisticated threats, enabling CIIs to be

Emerging Computer Security Laws and Regulations Across the Globe:. . . Table 7 Total number of cases handled by the SingCERT within each year [12]

Year 2018 2019 2020

207 Number of cases 4977 8491 9080

more responsive to newly developing threats in particular domains, and improving coordinated defenses across the public and private sectors to quickly recognize, understand, and react to cyberthreats and/or assaults. Adopting a threat-based strategy to pinpoint the typical strategies and methods which threat actors utilize during a cyberattack is an example enhancement of CCoP [12], which allows CSA to decide on measures, create new procedures, and/or improve current processes to prevent and impede the activities of threat actors during a cyberattack. The government’s continued efforts to engage with people, the corporate sector, and the public sector to improve security awareness and develop the skills and attitude necessary to become a Smart Nation are included in these latest initiatives to strengthen cyber security. According to the Singapore Cyber Landscape (SCL) released in July 2021 by the Cyber Security Agency of Singapore [12], a significant increase in cyber-attacks such as ransomware and online scams has been recorded. With the COVID-19 pandemic, the increased digitalization made a rapidly evolving cyber threat landscape within the Singapore aligning with the global threat landscape during the time. The Table 7 shows the total number of cases handled by the SingCERT [12] within 3 years and the number of incidents were nearly doubled with the start of pandemic in 2019. In 2020, Cyber Security Agency noticed that international threat actors had profited off the pandemic’s anxiety and terror, which had an impact on both people and businesses [13]. These threat actors made themselves known, focusing on things like contact tracking activities, data security, vaccine-related research, and e-commerce. These also occurred since the Work-from-Home (WFH) agreements became more popular as people and companies embraced new technologies to ensure company continuity. Therefore, the significance of the updated cybersecurity legislations has been highlighted by the rapid pace of digitalization and the size and sophistication of cyberthreats. The additional section of the four pillars of Singapore’s cybersecurity strategy, which was unveiled in 2016 [13], covers Singapore Cyber Security Agency’s partnerships with various public and private sector stakeholders to make Singapore’s cyberspace safer as well as its efforts to collaborate with international partners to co-create a rules-based multilateral order in cyberspace. According to the existing regulations, organizations must implement a strong cyber security framework consisting of policies, procedures, and practices to ensure identification, protection, and detection of cyber security threats as well as adequate response and recovery from cyber security incidents as the commercial, and reputational risk of cyber security issues in Singapore continues to grow.

208

S. Y. Rajapaksha et al.

South Africa Like other nations, South Africa developed a range of e-government strategies at the national, provincial, and local levels, nominally all under the Department of Public Service and Administration. Since 1997 [14], the cybersecurity legislation was developed with a gradual process of adoption and consultation aimed at boosting government productivity and efficiency and enhancing convenience for citizens. Due to these implementation’s low capacity and the ministers’ and officials’ unwillingness to engage with the difficulties, the goals were frequently not met. Despite concerns to human rights from the exploitation of the vast amounts of personal data possessed by the government or its theft by cybercriminals, cybersecurity received little attention. To guarantee data privacy, the Protection of Personal Information (POPI) Act established the Information Regulator in 2013 [14]. The POPI system contains too broad national security exemptions and is only being implemented slowly. In terms of government coordination, corporate and citizen engagement, cybersecurity legislation, and the availability of trained labor, South Africa falls behind industrialized economies [14]. Due to delays, it has been unable to learn from other, fast moving developing nations’ experiences or benefit from the changes they have made to their policies and, particularly, their implementation. The Parliament has not demanded that the government move more quickly or investigated any potential instances of the use of authority that violates human rights. The Table 8 shows the legislations implemented in South Africa with the span of time. Since cybersecurity is a growing concern, with the advancements in technology and digitalization of services, in 2015 the South African government responded with an implementation of National Cybersecurity Policy Framework (NCPF) [14] under the Ministry of State Security. Initially, the NCPF drafted in 2010 [14], acknowledged the inadequacy of current legal measures required to combat and prosecute cybercrime, as well as the lack of coordination within the government. The NCPF was developed to assist in the creation of necessary institutions to support cybersecurity, ensure the cybersecurity risks and vulnerabilities are reduced,

Table 8 Cybersecurity legislation of South Africa [8] Legislation Cybercrime Legislation

Data Protection and Privacy Legislation Online Consumer Protection Legislation

E-transactions Legislation

Existing acts Electronic Transactions and Communication Act 2002 [Adopted from; Budapest Convention on Cybercrime] Protection of Personal Information Act 4 of 2013 Consumer Protection Act Electronic Communications and Transactions Act, 2002 Electronic Communications and Transactions Act, updated in 2010 [Adopted from; UNCITRAL Model Law on Electronic Commerce (1996)]

Emerging Computer Security Laws and Regulations Across the Globe:. . .

209

encourage coordination and cooperation between the public and commercial sectors, encourage and expand global collaboration, building capacity and fostering a cybersecurity culture, and encourage adherence to the necessary operational and technical cybersecurity standards. By the end of 2019, South Africa was listed as having the sixth-most dense cybercrime in the world, according to a report in the South African news website named Independent Online (IOL) [15]. With these existing concerns, it was essential for South Africa to have precise definitions of cybercrimes to effectively regulate and prosecute them and to address the requirement, the South African Cybercrimes Act 19 [16] came to effect in 2020. Every person and organization in South Africa who uses the internet for communication or data processing is impacted by the legislation. The compliance of their organizations with the legislation will be up to experts in cyber risk and governance. The country started enforcing most of the bill’s provisions on 1 December 2021, following years of discussion, reviews, and modifications that began in 2015 [15]. The growth in cyberattacks during the lockdown period beginning in early 2020 and the increased [15] use of technology for communication, particularly during the COVID 19 pandemic, have continued the government’s desire to tighten regulation on cybercrime. Financial services organizations and electronic communications service providers are required under the South African Cybercrimes Act 19 [16] of 2020 to report all offenses to South African Police Services within 72 hours of becoming aware of the violation. Cybercriminals can be found, tracked down, and prosecuted by investigators and prosecutors thanks to the law since the act’s Chapter 4 [16] grants law enforcement the authority to use a search warrant to conduct investigations, search, access, and seize digital devices. A short time after the attack on the justice department in September 2021, some of these act’s provisions went into effect [15]. The Protection of Personal Information Act (POPIA) [15], which protects the confidentiality, security, and integrity of personal and private information, is also connected to the Cybercrimes Act since the experts frequently need access to data from devices during investigations into cybercrime to contextualize the issues at hand. The multiple cyberattacks on South African organizations have encouraged additional cybercrime. Fraudsters have accessed personally identifiable information for nefarious reasons by utilizing the data collected in prior cyberattacks like the TransUnion ransomware incident in March 2022 [15]. Although catching cybercriminals has been difficult, there have been some notable successes with the recent advancement of cybersecurity legislations in South Africa, such as the recent capture of cybercriminals suspected of money laundering, scams, and online fraud. A fraud group believed to have defrauded a US-based corporation out of about EUR 455,000, according to an INTERPOL article from 5 April 2022 [15], was busted in raids across Johannesburg by detectives from the Hawks Serious Commercial Crimes Unit, US Secret Service officers, and INTERPOL. According to reports [15], the operation was a component of a global initiative carried out within the framework of INTERPOL’s Global Financial Crime Task Force (IGFCTF), where

210

S. Y. Rajapaksha et al.

14 nations, including South Africa and the US, work closely together to combat the threat of financial crime that is enabled by the internet.

United States of America The initiation of the cybersecurity legislation in USA which is effective today was taken by the US Department of Defense (DoD) by releasing a guidance called ‘Department of Defense Strategy for Operating in Cyberspace’ []. This guidance has outlined five major objectives as mentioned below. • Treating cyberspace as an operational domain • Employing new defensive concepts to protect DoD networks and systems • Collaborating with other agencies and the private sector in pursuit of a “wholeof-government cybersecurity Strategy” • Working with international allies in support of collective cybersecurity • In supporting the development of a cyber workforce capable of rapid technological innovation Protecting the federal government’s information systems and the country’s essential cyber infrastructure was noted as a government-wide high-risk area in a US Government Accountability Office (GAO) report from March 2011, adding that federal information security had been thus designated since 1997. Since 2003, methods for securing critical infrastructure, also known as cyber-critical infrastructure protection, have been added to the systems protecting critical infrastructure. The Table 9 shows the initial legislations implemented in USA with the span of time. Under the domestic law security requirements, the United States has been started implementing industry standards [17] to improve security for designated “critical infrastructure” and to share incident information to improve responses. Additionally, guidelines also being created for US public agencies. Other industry-specific laws

Table 9 Cybersecurity legislation of USA [8] Legislation Cybercrime Legislation Data Protection and Privacy Legislation Online Consumer Protection Legislation

E-transactions Legislation

Existing acts Computer Fraud and Abuse Act 1986 Title 18 – Crimes and Criminal Procedure Privacy Act of 1974 Federal Trade Commission Act 15 Undertaking Spam, Spyware and Fraud Enforcement with Enforcers Beyond Borders Act of 2006 – U.S. Safe Web Act Electronic Signatures in Global and National Commerce Act (E-SIGN), 15 U.S.C. §§ 7001-7003

Emerging Computer Security Laws and Regulations Across the Globe:. . .

211

also contain provisions for cybersecurity to some extent. The National Institute of Standards and Technology (NIST) is obligated under the Cybersecurity Enhancement Act of 2014 [17] to continue creating industry-based standards and best practices for “critical infrastructure” for private sector users. Additionally, while protecting confidentiality, privilege, and immunity from liability and antitrust laws, the Cybersecurity Act 2015 encourages private operators to exchange information about attacks with other operators and the authorities [17]. But only defensive security measures are permitted. The National Cybersecurity Protection Act of 2014 mandates the Department of Homeland Security’s National Cybersecurity and Communications Integration Center to gather and disseminate information regarding risks and incidents to the public and private sector on behalf of public authorities [17]. The Department of Homeland Security is required by the Federal Cybersecurity Enhancement Act of 2016 and the Federal Information System Modernization Act of 2014 (FISMA 2014) to establish intrusion assessment plans for federal authorities, among other things [17]. Under the Safeguard Rule of the Gramm-Leach-Bliley Act of 1999, financial institutions must, among other things, guarantee data integrity and notify customers when their personal information is compromised [17] and the Health Insurance Portability and Accountability Act (HIPAA) regulations mandate information security for healthcare organizations [18]. Third parties are also prohibited from eavesdropping or revealing communications without authorization under the Electronic Communications Privacy Act (ECPA) [17]. The US is becoming more active in reviewing potential foreign investments to make sure that the foreign owners will not cause problems for the national interest. Trade and export restrictions are in place to limit the spread of sensitive cybersecurity technologies. Through the Appropriations Act, the US apparently also mandated that NASA and the Justice Department only purchase information security solutions from vendors that had received the approval of federal law enforcement personnel. Additionally, Executive Order 13694 places restrictions on significant malicious cyber-enabled activities and empowers the US Treasury Department’s Office of Foreign Assets Control (“OFAC”) to freeze the assets of those responsible for cyberattacks that endanger US national security, foreign policy, the economy, or financial stability [17]. The cyber criminalizing specific acts [17] like the United States Code (USC) designate the incidents such as online identity theft, hacking, infiltration into computer systems, child pornography, and infringement on intellectual property as crimes. Furthermore, the US cybersecurity legislations covers the areas of essential services, critical infrastructures, data privacy, banking or financial regulations, foreign investment review, export control, sanctions against cyberattacks and cyberenabled and cyber specific crimes.

212

S. Y. Rajapaksha et al.

3 Comparison: Sri Lankan Computer Crime Act with Foreign Computer Crime Acts This section carries a further explanation of the loopholes of Sri Lanka’s Computer Crime Act in 2007 with a comparison of the international computer crimes acts. Narrow Scope (The Scope of Definitions) The definitions or the explanations of the given sections are narrow when considering the Computer Act 2007. The given definition should be updated and include more explanation than having it as a vague definition that anything can be possible. For example, “Any person who intentionally does any act, in order to secure for himself or for any other person”, in this case what if someone argues that it is unintentional. Either the definition can include in a case of unintentional action, or it could define how these intentional actions are measured. Convictions Are Not Strong Enough and Updated When it comes to the repercussions of committing a crime, the given convictions or the results are not that convincing to not do a cybercrime, and most of these crimes definitions doesn’t cover the newly founded attacks or threats. Therefore for some crimes done in cyber space and computer related ones doesn’t have a proper action that can be taken for example this is taken from the Computer Crime Act where it explains a repercussion for a “an offense under this Act shall be guilty of an offense and shall on conviction be liable to a fine not less than one hundred thousand rupees and not exceeding three hundred thousand rupees or to imprisonment of either description for a term not less than six months and not exceeding three years, or to both such fine and imprisonment”. The Act is Mostly Focusing Only on Computers and Computer Programs When it comes to computer crimes or even cybercrimes, it doesn’t only mean the computer and computer programs. It certainly has more areas and aspects to it. In the Act the computer is “‘computer’ means an electronic or similar device having information processing capabilities”, which is not enough with the current cyber and computer threat vectors. Computer and computer programs along with network, internet, IoT, and any other type of a digital device, app or environment should be considered in a computer crime act. Focuses More on Integrity and Confidentiality Issues but Not Availability The Act’s focus is more and almost completely upon the confidentiality and integrity of the CIA triad, where it should include availability aspects as well. Few sections of the Act are listed below that depicts the confidentiality and integrity aspects of computer laws, I. Securing unauthorized access to a computer is an offense (law 3). II. Doing any act to secure unauthorized access to commit an offense (law 4). III. Causing a computer to perform a function without lawful authority is an offense (law 5).

Emerging Computer Security Laws and Regulations Across the Globe:. . .

213

IV. Dealing with data & unlawfully obtained an offense (law 7). V. Illegal interception of data is an offense (law 8). VI. Unauthorized disclosure of information enabling access to a service, an offense (law 10). However, there is no crime considered for a situation where availability is lost. Availability is also an important aspect for an organization. For example, in case an organization (government, public sector) gets infected with malware or ransomware and the systems cannot be used by the intended users, if this organization is a hospital, then the lives of patients are in jeopardy. If this organization is the stock market, it can lead to an economic crisis inside the country. Therefore, availability is as important as the other factors of the CIA triad, confidentiality, and integrity. There should be laws to protect this aspect of security as well. Proper Incident Response Strategies Are Not Defined When an incident occurs, or a crime happens, there should be a proper incident response order. If the Computer Crime Act can include a common framework, or common set of actions that an institution can follow, it by law will be an action that organizations cannot ignore. By law they will be needed to have an incident response team or a plan. Attacks Are Not Defining Throughout the act, there is no single law entry that defines malicious attacks and how to react if someone deploys such action. During the cyber and computer crime incidents given by Sri Lanka CERT report it can be seen that social media issues, ransomware issues, malware issues are rising annually. The reason it might have considered some attacks may be because during the initial stage these attacks did not occur much. But an Act, specially it is defining a highly developing area, it should be updated at least 2 years of time. These laws and regulations are not updated after 2007, which is almost two decades. Updates on Policies and Auditing, Maintaining Sections Are Not Included The Act does not mention a period of time it should be updated in. it is better if it can include a section for the update and maintenance of its own policies so that it won’t lack the necessary regulations. The Power is Centralized Upon One Person The power of decision making is always trusted in the Act. For example, “The Minister in charge of the subject of Science and Technology may, in consultation with the Minister in charge of the subject of Justice, appoint by Order published in the Gazette any public officer having the required qualification and experience in electronic engineering or software technology (hereinafter referred to as “an expert”) to assist any police officer in the investigation of an offense under this Act.” (Part III, law 17- SL Computer Crime Act 2007) and “The Minister may make regulations under this Act for the any matter authorized or required to be made under this Act, or which in required to be prescribed under this Act, or for the purpose of carrying out or giving effect to the principles and provisions of this Act.” (Part III,

214

S. Y. Rajapaksha et al.

law 17- SL Computer Crime Act 2007). These should not be responsible for one person or centralized to one person. This can cause power misuses and even single point of failures. Therefore, it is better to include a team with the experts of the related areas.

4 Suggestions and Conclusion By evaluating the emerging cyber security laws of other countries in compared to Sri Lanka, it can be clearly identified the entire cyber enabled security legislations should be reviewed and updated to establish a proper cybersecurity infrastructure within the country. This section carries few suggestions which can be adopted to review and update the existing Sri Lankan cybersecurity legislation. Review and Update the Computer Crime Act Since the existing acts are not properly addressing the areas such as new technology adoptions, critical infrastructures and cybercrime, the legislations regarding every aspect related to computer crime should be updated. And according to the advancement of technologies the given definitions should be changed, and all the possible meanings of given definitions should be included. The area of cybercrimes such as hacking, data theft, data corruption, computer sabotage, computer fraud, receiving and disposing of stolen data, computer espionage, digital trespassing, faking of technical documentations and violating the postal secrecy and secrecy of telecommunication should be addressed with additional laws and /or administrative procedures that provide for criminal sanctions for any violations including a failure to deploy cybersecurity measures. Implement New Legislations for Domestic and National Security Requirements In Sri Lankan security legislations, the areas like privacy and data protection can be significantly identified the areas which require new legislations. Therefore, several new acts and administrative measures should be implemented to ensure the efficiency of nation’s cybersecurity infrastructure including the areas listed below. • Laws regulating IT – Security • Dedicated laws or administrative measures which regulate IT-Security regarding critical infrastructures • Authorities which are responsible for enforcing the regulations and nonbinding organizations/alliances dealing with IT – Security • Laws regarding privacy/data protection, including obligations concerning IT – Security • Laws regarding primarily other fields of regulation such as financial institutions, healthcare insurance which also impose obligations concerning IT - Security • Specifications of common standards in civil law, criminal law etc. which are having impact on IT-security

Emerging Computer Security Laws and Regulations Across the Globe:. . .

215

• Contractual standards / obligations regarding IT – Security in IT service agreements, IT outsourcing agreements, cloud agreements etc. Set an Adequate Timeline to Update the Laws in Future Even though the nation’s cybersecurity infrastructure was re-established with reviewing existing legislations and imposing new legislations, frequent reviews and updates of the legislations will be required according to the advancements and adoption of the technologies. Therefore, timely evaluating and updating is an essential requirement in maintaining a vulnerable free cybersecurity law infrastructure.

References 1. COVID 19 and Cyber Security. http://www.insssl.lk/index.php?id=37. Last accessed 2022/11/02 2. Guidelines and Impact of COVID-19 on Cybersecurity: A Model for Protecting Businesses in the Digital Universe. SLJoT_2021_Sp_Issue_007.pdf (seu.ac.lk). Last accessed 2022/10/20 3. Computer crime act, Computer Crime Act (srilankalaw.lk). Last accessed 2022/09/25 4. Computer crime act, No. 24 OF 2007. ACT, No. 24 OF 2007 – LawNet. Last accessed 2022/09/20 5. Total Incidents, https://www.cert.gov.lk/2?lang=en&id=3. Last accessed 2022/11/05 6. Annual Activity Report 2020. https://cert.gov.lk/view?lang=en&articleID=164. Last accessed 2022/11/05 7. Annual Activity Report 2011. https://cert.gov.lk/view?lang=en&articleID=156. Last accessed 2022/11/05 8. UNCTAD – Summary of Adoption of E-Commerce Legislation Worldwide, https://unctad.org/ page/cybercrime-legislation-worldwide, Last accessed 2022/11/05 9. Estonia and the world: Cyber security 2021 in review, https://e-estonia.com/estonia-and-theworld-cyber-security-2021-in-review/, Last accessed 2022/11/10 10. Republic of Estonia – Information System Authority, https://www.ria.ee/en/cyber-security/ estonian-information-security-standard.html, Last accessed 2022/11/10 11. Singapore Cybersecurity Act 2018, https://sso.agc.gov.sg/Acts-Supp/9-2018/, Last accessed 2022/11/05 12. Singapore Government Agency Website, https://www.mci.gov.sg/portfolios/cyber-security/ cybersecurity-act, Last accessed 2022/11/05 13. Cybersecurity Laws and Regulations, https://iclg.com/practice-areas/cybersecurity-laws-andregulations/, Last accessed 2022/11/08 14. S. Ewan, Governance of cybersecurity – the case of South Africa. Afr. J. Inf. Commun. ISSN 2077-7205 15. How the South African Cybercrimes Act 19 of 2022 will affect on individuals and businesses, https://www.controlrisks.com/our-thinking/insights/how-the-south-africancybercrimes-act-19-of-2022-will-affect-individuals-and-businesses, Last accessed 2022/11/03 16. Cybercrimes Act 19 of 2020 (English/Afrikaans), https://www.gov.za/documents/cybercrimesact-19-2020-1-jun-2021-0000, Last accessed 2022/11/03 17. A.F. Eric, Federal Laws Relating to Cybersecurity: Overview of Major Issues, Current Laws, and Proposed Legislation, Congressional Research Service, December 12, 2014 18. Cybersecurity Regulations Impacted by COVID-19, https://www.cybersaint.io/blog/ cybersecurity-regulations-impacted-by-covid-19, Last accessed 2022/11/03 19. T. Jiang, Y. Shen, Report on the Cybersecurity Legislation of Major European Countries, Fudan Report Series, 2020 No.8(31)

Legal Considerations and Ethical Challenges of Artificial Intelligence on Internet of Things and Smart Cities Nisha Rawindaran

1 Introduction “Whatever happens Marty, don’t go back to 2020! Great Scott!”. These were the famous last words from the cult classic 80s movie, “Back to the Future II”, between Doc and Marty McFly. The movie was based on the films futuristic plot in the ability to time travel in the year of 2015 [1]. Cinema through the medium of science fiction movies, has allowed the use of various platforms of imagination to enable conversion of fiction into reality. Cinema is able to depict and bring to life imagination in what humans deem to be “impossible scenarios and inventions”, in terms of technologies being used in the film, and imaginations of the kind of impossible devices that work in the future. Cinema, with its inventions of “unbelievable” storylines and escapism from reality, has a way of presenting itself with technologies which are important and hold a promise for a better and more thought after future. The movie “Back to the Future II”, particularly coined the concept of Internet of Things (IoTs) very early on in its ideas, and how technology was imagined to be used. The movie introduced many devices connected to the internet within a Smart City environment, in a wide range of applications and connections between them. The many IoTs seen in this film include the famous hoverboards, the flying cars, fingerprint biometrics and most importantly, the goggles worn by Marty to get information from the internet, which challenged the grounds of ethics when the information was misused. This wearable tech device highly resembling today’s Google Glasses, offers an Augmented Reality (AR) experience and allows the user to access the internet, navigate, as well as take pictures or videos, crossing

N. Rawindaran () Cardiff Metropolitan University, Cardiff, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 C. Hewage et al. (eds.), Data Protection in a Post-Pandemic Society, https://doi.org/10.1007/978-3-031-34006-2_8

217

218

N. Rawindaran

boundaries of ethics yet again. “Back to the Future II” makes rich use of these technologies and AR very early on in concepts. Some of these devices have actually started appearing due to the advancement of technology in our day-to-day life to date [2]. Internet blogger Rangaiah, M. (2020) discussed the linkages between cinematic fiction and reality and expressed another movie that introduced the concepts of IoT, “Minority Report”. This movie saw fiction overlaid with elements of the story holding true to reality, in which Tom Cruise’s character is presented with various IoT devices that help his mission within the setting of Smart Cities around the world. The many theories introduced in this film were quite familiar and already present in today’s devices such as self-driving cars, which was accomplished by Elon Musk in creating the Tesla. Also, in the movie, there were various voice automated home devices that worked very similar to that of Amazon Echo, Apple Home Kit, amongst many [2]. Rangaiah also talked about the movie Star Wars and how the IoTs in this movie had a supreme galactical connection to many devices sprawled across planets, stars, and many universes beyond. Starting from notion of “The Force”, which is the energy field, created by all living entities that connects everything in the galaxy to the Droid which is every robotic entity in the Star Wars universe, everything existing in a “Smart System”. Steven Spielberg at the time in the 1970s was able to visualise IoTs to an extreme working condition, laid out by technology implemented in a fictional world. This clearly argues how humans have already held these imaginative scenarios of how the world would look like in the future. Coming back to reality, leaving fiction where it is, and starting to understand how humans behave towards the IoTs currently, has caused traction in our reality today. Much different from the ideas of “In a galaxy far, far away . . . ” in Star Wars, the age of the internet has given vision to a “Cyberspace” which allows us to mimic fiction and replay its ideas back into reality. There is an extensive array of Internet of Things (IoT) applications within Smart Cities that lend to the evolution and the role of IoT in contemporary human usage. The interconnection and interaction of IoTs offer a compelling narrative exploring the boundless possibilities and potentials of IoTs within Smart Cities unveiling a captivating continuation of this story.

2 IoTs in Cyberspace “Cyberspace” is a term coined by one of the greatest authors of science fiction and robotics of our time. William Gibson was known for his imaginative theories of “Cyberspace” and robotics intervention alongside ideas also lead by Isaac Asimov and Arthur C Clarke [3]. In Gibson’s famous quote from his book, the 1984 novel Neuromancer stated: “Cyberspace. A consensual hallucination experienced daily by billions of legitimate operators, in every nation, by children being taught mathematical concepts . . . A graphic

Legal Considerations and Ethical Challenges of Artificial Intelligence. . .

219

representation of data abstracted from banks of every computer in the human system. Unthinkable complexity. Lines of light ranged in the non-space of the mind, clusters, and constellations of data. Like city lights, receding...” [4]

In an article published by The Sydney Morning Herald in 2012, Gibson’s theory of “consensual hallucination” was described very aptly as a method in which our imaginations would fill in the gaps thus creating layers of experiences on top of what is ultimately non-existent space. In the same article he explains how “Cyberspace” is our way of understanding and giving shape to our technology. They go on to define hallucination as an experience involving the perception of something not really present [5]. According to the Herald, the internet has become the consensual hallucination in which various structures have been created to replicate our physical reality in digital environments, especially paying context to the design of websites to recreate shops, leisure, education, and social activities and duplicating our real world into the virtual world and AR, projecting human expectations of what things should be based upon. Fast forward Gibson’s theory to the year 2021, the internet has become the daily bread and is believed to be changing human’s expectation of our real life and how it shapes the internet and the “Cyberspace”. This crossover will bear a new cultural construct that is internet-born and will influence our behaviours towards each other, and a space human still have little knowledge about. What started off as layering reality over the internet, is now moving towards the internet being layered over reality as The Herald quite sumptuously argues. Whilst consensual hallucination gives structures to the internet to form this “Cyberspace”, it also questions whether real life scenarios of how humans conduct themselves are replicated across this space. IoTs are formally introduced in the next sections and its concepts and applications explained. Whilst humans learn to digest fiction versus reality in the IoTs concepts, it is used as a benchmarking exercise to dream of the “what if” scenario for our future. The sections that follow will also look at how humans react to these real IoTs and how it is used and understood from a security and privacy perspective. Human perceptions towards these IoTs are key around how humans can control and secure these devices. These elements, if learnt correctly how to control, can be protected in “Cyberspace” as will be seen and explored.

3 Internet of Things (IoTs) According to Biju, S.M., 2020, the Marvel series of Avengers, showcasing Tony Stark as “Iron Man”, very much closes the gap of fiction versus reality, in that many IoTs used in this fictional story resembles that of what is currently accessible in 2021. It is a perfect example of how cyberspace is used as the central navigation point in which data is circulated and shared based on IoTs pumping out Big Data across the internet. The Avengers series highlights the use of super levels of Artificial Intelligence (AI) and Machine Learning (ML) techniques used in creating

220

N. Rawindaran

these IoTs. For example, the character Tony Stark uses IoTs in his designed home computing system J.A.R.V.I.S (Just A Rather Very Intelligent System) that allows him for control of the home heating and cooling systems to analysing engines and assisting Stark in projects and the system managing everything in the house [6, 7]. The term IoT was developed in 1999, as initially meant to describe the following situation: “Today computers and, therefore, the Internet are almost wholly dependent on human beings for information. The problem is people have limited time, attention, and accuracy – all of which means they are not very good at capturing data about things in the real world. ... Users need to empower computers with their own means of gathering information, so they can see, hear, and smell the world for themselves...” [8]

‘Internet of Things’(IoT) and people, tend to generate ‘Big Data’. Cox and Ellsworth [9] were among the first to coin the term ‘Big Data’ referring to using larger volumes of data for visualization, datasets bigger than a “normal” dataset. Through time, humans were not very good at keeping data intact or managing their data, and data input was as great as the data being output. ‘Big Data’ has now evolved to include a range of characteristics, such as integrating different types of data and analyses [10]. As IT facilities expanded, technology saw a growth in more devices being introduced and connected to the internet so that they could access data freely. This was on an assumption that users had a good internet connection [8]. WiFi technology had also made these IoTs mobile and create the ability to obtain information from anywhere in the world. These networks of devices connected to the internet was raised in a study by Ashton [11], defined this group of devices as the ‘Internet of Things’(IoTs). In order to visualise IoTs within the context of everyday life, for Hernandez [12], the idea of “Internet of Things” (IoT), discussed examples of devices that help collate data positively in industry. An example from the health industry showed how a sensor was an ideal IoT. Hernandez mentioned that sensors could include pacemakers, location identifiers, using global positioning system (GPS), and individual identification devices, such as radio frequency identification (RFID) tags. Different information characteristics, typically of interest in the particular setting can be provided by sensors and may indicate time and location. These set of devices are cleverly connected to each other through a stream of networks connected to the internet continuously collecting data. In a medical setting, pacemakers could capture information such as heart rate, status of the patients’ vitals, what it is monitoring, the number of mobiles its applications have been downloaded to track and trace this particular patient and link this back to the hospital software for analysis. Many medical devices are able to also capture patient records, date of illnesses and recovery information in line with various statistical models, perhaps to help develop vaccines to eradicate pandemics that could occur, in the example of Covid-19. These variables captured would require some consideration of the events, situations or settings of interest and current climate and speed of processing this data. With the development of “Cloud” storage, this information then becomes accessible to anyone with the right permissions to work on this data moving forward from anywhere in the world. The

Legal Considerations and Ethical Challenges of Artificial Intelligence. . .

221

next section looks at IoTs and how ML and AI are trending into makings these devices more sensitive in the way IoTs are used.

4 IoT and AI/ML Machine Learning (ML) techniques and algorithms have now contributed to how data can be classified, labelled, and ultimately managed under the umbrella of Artificial Intelligence (AI). The ability to use techniques such a Supervised and Unsupervised Learning has helped in getting this Big Data through various classification, regression, and clustering activities. These activities allow for outcomes to be predicted [13]. Various mathematical algorithms under classification such as Support Vector Machines (SVM), Decision Trees and Neural Networks all compound to how data is treated and managed to produce the outcome and predictability required to contribute to economic growth in societies moving forward. ML applications have evolved into everyday usage in industry through everyday interactions of humans over the internet with the likes of superpower companies such as Amazon and eBay investing in these intelligent technologies. Real life examples of ML include calculating temperature, getting a quote for insurance premiums, setting pricing of goods, and to count the number of workers to the revenue of a business acceptable for economic growth. In various other examples of ML techniques such as Deep Learning, allow these algorithms to continuously learn and beat humans at their favourite games ranging from the video consoles Atari to the classic board game Go [14]. Their capabilities go far beyond the expectations of conquering human hobbies but lend further into everyday chores and events in our daily lives. Other real-life examples of ML usage lie in other industries focusing on identifying fake news, implementation of spam filter, identifying fraudulent or criminal activity online, and improving marketing campaigns. Looking at how IoTs and ML have clearly moved forward positively, increasing our way of life, and making it easier to manage, perhaps humans now could not even imagine what life would be like if these technologies were removed. As AI and ML progresses, could humans go “back in time” to how we used to live? Hard as it is to imagine, the realisation has taken one step further in that the pandemic of COVID-19 in 2020 has accelerated the usage of IoTs and its applications of AI and ML into new realms humans perhaps cannot even shake to understand. Even the likes of Chatbots have emerged to manage online interactions linked to the use of AI applications. Chatbots [15] have replaced people online and ML is now learning everything about us and how humans behave. ML in its integration into IoTs is now evolving in how we interact online and adapt to our needs and surroundings. The desire for humans to interact with machines is vital. It is no wonder the assumption of people’s awareness is at stake. The next section moves away from AI, ML and IoT devices, and explores the research towards human perception towards security and privacy in how IoT owners and bystanders interact and talk to each other within the home or social environment.

222

N. Rawindaran

5 IoT Security and the Human Factor In the previous section, the emphasis was on the usage of ML in enhancing the IoTs, to give humans the comfort needed in order to manage data and life, whilst being entertained and educated from it. Disturbingly an increased number of cyber threats and attacks grew, and their focus shifting from regular computers and servers being hacked to IoTs in the front line of this cyberwarfare. According to CSO Online [16], Cyber Attack Statistical Report March 2020; showed cybersecurity statistics at-a-glance and focused on the attacks on IoT devices tripling in the first half of 2019. In an article presented by Haney, J. (2021) follows the growth, evolutions, and usage of these IoTs within the home environment as smart homes technology [17]. Examples of these devices include smart watches, children and baby monitoring devices, animal monitoring devices, smart appliances in your home and any devices that ultimately can be connected to your internet, and with advances in ML technology, give feedback to the consumer. Haney’s paper explains that whilst these devices offer a J.A.R.V.I.S like experience, there is an added increased risk on the home network security, information privacy, and physical safety. Home users may lack understanding of the privacy and security implications and additionally, manufacturers just want to bring their product to market thus often failing to provide transparency and configuration options in accordance with governmentprovided guidelines of products yet to be widely adopted. Whilst home users are hoping for the fully integrated systems to work with the touch of a button offering the super AI and ML experience, many of these devices are independently made, thus questioning compatibility issues, and creating silos amongst these IoTs. This situational scenario gives little meaning to mitigate action to protect its home users’ security and privacy. The paper goes on to explain how it is currently unclear where the perceived responsibility for these smart home devices privacy and security lies. To answer this question, an in-depth interview methodology study was conducted on 40 smart home adopters to explore where their assigned responsibility laid, and how homeowners’ perceptions of responsibility related to these concerns. The results revealed that participants’ perceptions of responsibility reflected an interdependent relationship between consumers, manufacturers, and third parties such as the government, resulting in users being concerned about their security and privacy. To answer the studies question to perception of homes user’s responsibility towards these IoTs, showed an interdependent relationship in which participants assumed some personal responsibility but also assigned responsibility to manufacturers, government, and third-party applications. The conclusion showed that by achieving a more balanced relationship, the burden of home users and the ability to provide better support from manufacturers, leading to less vulnerable systems and greater adoption of smart home technologies. In another study, Chatterjee, S., (2020) identified factors that impact human behaviour and their perception of security and privacy in IoTs amongst Indian consumers [18]. In the study, the researcher used the Technology Adoption Model

Legal Considerations and Ethical Challenges of Artificial Intelligence. . .

223

(TAM) developed by Davis, F.D., 1989 [19]. TAM has been proved to be a mode often used when focusing on the security and privacy issues [20]. For conceptualization, Innovation Diffusion Theory (IDT) was used [21]. In the methodology, 232 usable respondents tested the hypotheses and validated the conceptual model. The results showed that perceived usefulness, perceived ease of use, compatibility and cost affected the behavioural intention of the consumers to use IoT enabled devices in India and bring attention to how the security and privacy issued were managed. In India, many businesses have already adopted IoT technology for their business solution and have cross usage to their home environments [10]. In a recent paper by Alraja et al. (2019) explored users’ attitudes towards using IoT technologies in the healthcare services. The study showed how humans receive healthcare services and way in which IoTs are connected in delivering these services. An integrated framework was developed to investigate the impact of security and privacy concerns and the users’ trust in the IoTs being used. This framework enabled the measurement of risk perception as a mediator between user trust and their attitudes towards using the IoT. A sample size of 387 respondents were used for data collection. The methodology used was exploratory and confirmatory analysis, and structural equation modelling. The levels of security, privacy and familiarity were found, and the underlying attitude towards trust affected in the IoT. The levels of trust in the IoT were found to affect both users’ perceptions of risk in, and their attitude towards, using the IoT [22]. Studies have shown that trust plays a crucial role in humans’ decision in adopting to the uses of IoTs technologies and services. These decisions of using IoTs services for example has helped them to overcome perceptions of risk and uncertainty related to it and enhances the humans’ level of acceptance and adoption intention as mentioned in a study by AlHogail, A., 2018. The main goal of AlHogail’s study was to examine the factors that influence consumer trust and their role in the adoption of IoT technology. A conceptual trust model that encompasses the major factors affecting trust towards IoT technology adoption, has been presented. The model is composed of three dimensions of factors that we assume will influence the level of trust which are: product related factors, social influence related factors and security related factors. This framework through surveying consumers’ opinions also validated views and feedback regarding factors influencing their trust towards this technology. The model assisted this research to further investigate the trust issues and trustworthiness in the guide to IoT products’ development and marketing strategies according to consumer’s requirements. According to the survey analysis, IoT products and services security and privacy were amongst the highest priorities to ensure consumers’ trust, yet they still remain a challenge in IoT technology. Socialrelated factors, such as user network was equally important [23]. Marky et al. (2020) also confirmed the studies discussed on the theories and models applied towards the user perceptions of IoT from its security and privacy angle. Built on the same theory of Technology Acceptance Model (TAM), the result of this paper is a conceptual framework that classified the factors into three main domains, namely: product related factors, social influence related factors and security related factors. IoTs and how they are used, are constantly evolving. It

224

N. Rawindaran

is not only acting in a singular environment but is opening to other introductions of IoT from visitors or family members in the main device’s vicinity denoted as bystanders. These bystanders can also collect data and observe what other IoTs do, and their outputs put to test. Marky’s research interviewed 42 young adults to better understand how this situation effected the privacy device owners and bystanders, and how their privacy would be protected. The results included those owners of IoT devices wished to adjust their device output when visitors were present. Visitors also wished to be made aware of the data collected about them, to express their privacy needs, and to take measures. Based on these results, the demand for scalable solutions that addresses these tensions that arises between the increasing discreetness of IoT devices and their increase in numbers and the requirement to preserve the self-determination of owners and bystanders at the same time was very important. The market for IoT devices is growing. Alongside with benefits offered by such devices, new privacy risks are introduced into the users’ homes. This does not only concern the user of the smart home device but also any person that is present in the smart home environment. Therefore, the presence of bystanders can result in privacy violations: the privacy of the bystander might be affected by the data collection in their surroundings and the user’s privacy might be affected by the bystander observing the output of devices. Marky’s work was aimed to shed light on these potential privacy violations [24]. Security and privacy have had its longstanding challenges and adds to the barriers of how we navigate on the internet. This navigation has now given way to the inception of “Cookies”. Yes “Cookies” . . . not the “Cookie Monster” from the children 80s series, Sesame Street to those who are still in cinematic mode from the 80s, so let us read on.

6 IoT and the Cookie Monster Web cookies by definition are used widely by publishers and third parties to track users and their behaviours. Web cookies were invented in 1994 as a mechanism for enabling state to be maintained between clients and servers as stated in a study by Cahn, A. (2016) [25]. A cookie is a text string that is placed on a client browser when it accesses a given server. The cookie is transmitted back to that server in the header of subsequent requests. Cookies have remained a central component in the web and describes these web cookies as an intrinsic element of web applications. Their use has important implications on user privacy. One of the primary goals of data brokerage firms and online advertisers is to amass as much information as possible about users toward the goal of delivering targeted ads. Concerns over user privacy and web cookies have been voiced over the years. This ‘cookie’ is a text file containing small amounts of information which a server downloads to your IoTs when visiting a website hence increasing the potential risk of something being downloaded to capture information. This is certainly something to be aware of and raise concern.

Legal Considerations and Ethical Challenges of Artificial Intelligence. . .

225

A study by Narayanan, L. (2020) focused on the factors influencing customer attitudes towards these website cookie consent amongst internet users [26]. The factors which were likely to influence their consent for accepting all cookies, was approached by data collected from 132 internet users through online survey questionnaire shared on social media networks. The results showed that the majority of respondents had more than moderate level of awareness about cookies and were more likely to accept cookies for quick access or task completion. Acceptance of cookies was varied across different categories of online activity and given a choice they were more likely to opt-out of third-party cookies which is widely used for targeted advertising. According to Narayanan, there are four purposes in which cookies can work in this collection of data. Below are extracts from this study that states the four purposes a user can choose when consenting to use a particular website. Firstly, necessary cookies are essential for websites to function properly. In order for the user to have a stable experience on this website, the cookies are configured by the website owner. These websites usually need consent from the user, for using these cookies, and they need to inform the user about the usage and its purposes. Secondly, preferences cookies are used to personalize the browsing experience for the individual user by remembering their websites preferences such as login information, form data for autofill, language preferences and other user settings. These set of cookies tend to be the most beneficial to the users as they enhance the user experience with the website. Thirdly, statistics cookies are used by the website directly or through a third party (e.g., Google Analytics) to measure different types of users browsing activity for analytics like the number of visitors to the website, page views, link clicks, location, device, language, duration of visit, entry and exit pages etc. These cookies may not benefit the user but are very beneficial for the website owner to understand the browsing behaviour of users which help in generating insights to optimize the website further in terms, design, content, and technology. Lastly, marketing cookies are often configured by third-party ad vendors to collect multiple data points of user activity similar to the statistical cookies but use those data points to create detailed user profiles and are served highly targeted ads which generate advertising revenue for the website owners and ad vendors. These cookies are considered the most invasive when it comes to the online privacy of users [27]. A 2008 study shows that consumer trust among adult online shoppers decreased when the detection of cookie usage and data collection was found. However prior disclosures over cookie usage, especially under high-risk conditions enhanced trust as explained by Miyazaki (2008) [28]. Websites started using cookie consent popups or banners to inform the users about cookie usage. Since 2009, Cookie consent notifications have been mandatory, and this was expected to have a significant impact on the usage of third-party cookies and behavioural advertising [29].

226

N. Rawindaran

This largest cookie concern came about in the consent regulation. As online advertising and usage grew, the original Data Protection directive that was issued in 1995 had to be revisited and reamended in 2009 to discipline organisations and regulate their cookie usage. The directive changed to make sure websites provided its visitors with a clear description of all the parties serving these cookies and any other tracking mechanisms and to provide information to install the cookies only after obtaining consent and thirdly to make sure the explain to the user the collected information being used. Even though their usage is not a direct violation as many websites require cookies to function properly, especially in situations where personalized website experience is provided or to keep track of user preference during a browsing session, adding or removing items from the shopping cart for instance during an online shopping session or automatically signing in the user to their email account can be an impact to the customers experience online [30]. While this kind of activity tracking by the website was to provide better user experience, is not invasive, the collection and usage of such data for other purposes which are not directly concerning the user activity, constitutes an invasion of privacy [28]. There are various types of attacks that could come through the cookie channel. Websites like YouTube use persistent cookies and these cookies are passed to the user’s computer as soon as they visit the YouTube website. Whether the user plays the video or not, YouTube writes a cookie to the user’s hard disk to track the user’s interest and, in turn, to provide the related results to the user. Privacy activists have consistently forced YouTube to change their privacy rules and the site now boasts that they are not tracking visitors who are not playing videos. However, a newly implemented cookie-lite feature in YouTube continues to set long-lasting Flash cookies on the user’s hard disk even when a user does not click play to watch a video. In the case of flash cookies, it is possible that they will remain on the user’s computer indefinitely, as there is no expiration date [31]. With the introduction of General Data Protection Regulation Act (GDPR) coming into effect 2018, the biggest visual impact was the cookie consent requests on every website visited. GDPR was the first major legislation in Europe to tackle the issue of Internet Privacy and the collection of data online. Web cookie usage suddenly transformed and became the platform in which consent was collected, and the consumers suddenly had the right to be informed when they were tracked, to access the data collected about them, to delete their data, and the ability to transfer that data to another platform. Cookies were used as the method of consent gathering via the internet due to its ability to track users online [32]. The next section focuses the attention back to issues of security and privacy in the context of IoT. Major security and privacy matters are visited, and possible strategies extracted from pre-existing research work emphasizing the challenges and prevailing solutions for future scope. Challenges and barriers are understood in how the Covid-19 pandemic caused a shift in how we worked and the impact it has made on our society in transforming our daily activities.

Legal Considerations and Ethical Challenges of Artificial Intelligence. . .

227

7 IoT and Covid-19 Impact One great example of IoTs increasing its use, was the onset of the Covid-19 pandemic which saw a jump in businesses having to move from working from the office to working from home. This opened a can of worms, in that some mixed households, e.g., children and working parents, were using the internet for various sources of entertainment and official work, compromising the security aspects of the home environment. In a paper by Rawindaran et al. (2020) a survey questionnaire to UK SMEs was conducted during the pandemic, in which data was collected from multiple respondents based on 122 participants giving comments on how businesses worked within these boundaries and how they coped during these challenging times [33]. Several questions in the form of a survey were asked on what went good, bad, and what businesses could have done better during the pandemic. The results from the survey were passed through analysis tools of Qualtrics [34] and NVIVO [35], both being able to handle qualitative data analysis to help organize, analyse, and find insights in unstructured qualitative answers. Figure 1 below showed insights of what went good for the business during Covid-19. Positive words such as “increased working from home”, “flexibility”, “collaborations”, “business continuity and efficiency”, were amongst many descriptions that gave a good explanation into technology transitioned from office to home. SME businesses alluded to comments in Fig. 1 above such as, “Clients being open to remote working, the flexibility improved the work life balance”, “We are lucky in that remote working didn’t impact the services we provide . . . ” and “More use of Zoom and Teams for remote working”. Figure 2 below, on the other hand, showed negative descriptors of Covid-19 impacting the business through words such as “uncertainty”, “unable to work”, “lack of time”, “communication and working environments”, “risk increase in business revenue”, “a lot of work and staffing issues” amongst many. Some narratives that followed Fig. 2 involved comments such as “Lack of real time, face to face communication and uncertainty of projects to be completed”,

Fig. 1 Positive Word Cloud descriptors

228

N. Rawindaran

Fig. 2 Negative Word Cloud descriptors

Fig. 3 Business – What could have been done better

“Lack of Cyber Staff to hire” , “ . . . we could have done it better in terms of staff having laptops set up to work from home”, “Lack of new business/Opportunity”, “Too much screen time”, “Systems crashing, data breaches” and “IT equipment was a lot harder to source”. Figure 3 below gave insight on what could have been done better in business. Words such as “staff support”, “better visibility of business systems and processes”, “dealing with the pandemic having now had the experience” were viewed. Narratives that came from these questions said encouraging statements such as, “In the NHS, cost prevents us from using an overarching consistent digital system and associated security features”, “providers accepting electronic signatures and electronic submission of paperwork”, “having extra staff to cope with demand” and “Having processes and procedures in place”. Some comments went deeper in that, “ . . . I feel there could have been more constructive and meaningful dialogues about how businesses could be sustained over a long term, instead of taking reactive measures to cut down costs due to panic . . . ” Figure 4 below gave insight on what could have been done better with the employees. Whilst employees suddenly had the comfort of the own home to contend with, the challenges arose when for example the internet speed and their availability were questioned.

Legal Considerations and Ethical Challenges of Artificial Intelligence. . .

229

Fig. 4 Employees- What could have been done better

Fig. 5 Government: What could have been done better

Figure 4 gave construction to “working from home- better staff communication, balance of work and personal life, and most importantly internet speed of connection and equipment quality and security”. Figure 5 below showed what could have been done better from a government’s perspective. Important words such as “better decision making, clarity of communication, grants and availability to funding to help secure businesses”, were amongst thoughts and ideas put forward. Figure 5’s main concern was the government being able to lead by example and perform a top-down model to help businesses thrive through these uncertain times. Figure 6 word cloud showed lessons learnt through “better working conditions”, “more interaction and communication online”, “better infrastructure for internet home connection”, “security and cyber safety” and “ensuring better investment is laid out on the onset of having to work from home again”. The predictions of lessons learnt can only be carried out if the business, employees, and government play their part at every level of the production line to maintain a safe and secure place to work and live at the same time. Figure 7 below gave a good summary of the survey in which, keywords such as “remote working”, “home security systems”, “communication and investment” are amongst the key takeaway messages from the questionnaire survey of these UK SMEs. When asked how well IT response of the SME company was in shifting their business over from the office environment to working from home during Covid-19, Figure 8 showed that 31.8% stating that the transition went “Very Well” with 29.5% stating “Extremely Well”.

230

N. Rawindaran

Fig. 6 Lessons Learnt Fig. 7 Summary of key takeaway words

Response of 22.7% said “Well”, leaving 15.9% saying “Somewhat Well”. None said that it did not go well, which was a positive outcome in the way these SMEs were handled and cared for during the pandemic. It was clear from the results that these SMEs who answered this section has reliable IT support to bridge the gap between knowing what to do in a pandemic. This may not be true to those SMEs who perhaps did not get the help they required and felt the pandemic on their shoulders and on their business. The next section looks at the sensitivity of the IoTs perception towards security and privacy and getting to an understanding of how data is protected, stored, and used in IoTs.

8 IoT-Perceptions of Security and Privacy Humans see security and privacy based on their experience, and how they conduct a devise, and what they use it for. For the purpose of understanding these concepts,

Legal Considerations and Ethical Challenges of Artificial Intelligence. . .

231

Fig. 8 Response of the SME company in transition from office to home, by IT

“security is defined as a company’s ability to prevent unauthorized access to customer data and financial accounts and privacy depends on the degree to which a company shares customer data after it has been collected and secured” [36]. According to the Cambridge, “perception is a belief or opinion, often held by many people based on how things seem”. Further discussed in Carbon, C. (2014) paper, sensory perception is often the most striking proof of something factual of when we perceive something, we interpret it and take it as “objective” or “real”. Most obviously, to experience this with eyewitness testimonies: If an eyewitness has “seen it with the naked eye”, judges, jury members and attendees take the reports of these perceptions not only as strong evidence, but usually as fact despite the active and biasing processes on basis of perception and memory [37]. In order to understand further smart devices and how they are perceived, there is a growing presence of IoTs in consumer households. According to Zheng, S. (2018), learning thermostats, energy tracking switches, video doorbells, smart baby monitors, and app and voice-controlled lights, shades, and speakers, are all increasingly available and affordable. These connected devices use embedded sensors and the Internet to collect and communicate data with each other and their users, seamlessly integrating the physical and digital worlds inside the home [38]. Due to this rise in electrical equipment around the house using advance technologies of AI and ML together with the IoT technology of connected devices, suddenly policing these IoTs can become a challenge and questions such as the below arise:

232

• • • •

N. Rawindaran

What data do smart home IoT devices collect? Where is the data stored? Who has ownership and access to the data? How is the data used?

Zheng answers this based on previous research investigating user interactions with IoTs plus their added attitudes and awareness of end user protection to their privacy. The methodology adopted a semi-structured interview setting with smart homeowners in the United States about their long-term experiences living with IoTs. The analysis showed that users prioritized convenience and connectedness about their privacy opinions and behaviours. User opinions about who should have access to their smart home data depended on perceived benefit from entities external to the home that create, track, regulate, or manage IoT devices and their data, the bystanders. Users also assumed their privacy was protected based on trust in IoT device manufacturers but were unaware of the potential for ML inference to reveal sensitive information from non-audio/visual data. New evidence emerged of users’ IoT-specific privacy considerations and suggested the need for improved privacy notifications and user-friendly settings, as well as industry privacy standards. The study’s findings and recommendations contributed to the broader understanding of users’ evolving attitudes towards privacy in smart homes and the legal and ethics of handling IoTs. A 2016 report published by the Broadband Internet Technical Advisory Group (BITAG) [39, 40] showcased a myriad of observations through a formal report conducted by several working groups administered to deal with IoTs. Below shows a list of observations made following this report: • Security Vulnerabilities: Some IoT devices shipped “from the factory” with software that was either outdated or became outdated over time led to vulnerabilities being discovered in the future. This makes a device less secure over time unless it has a mechanism to subsequently update its software automatically. • Insecure Communications: Most security functions designed general computers are difficult to implement on IoT devices and security flaws have been identified including unencrypted communications and data leaks from IoT devices. These devices sometimes do not come with authentication nor encryption. • Data Leaks: IoT devices may leak private user data, both from the cloud (where data is stored) and between IoT devices themselves. • Exposure to Malware Infection and Other Abuse: Malware and other forms of abuse can disrupt IoT device operations, gain unauthorized access, or launch attacks which can cause potential loss of availability or connectivity. This can be a huge risk for example in the context of a home alarm system deactivating if connectivity is lost. • Device Security and Privacy Problems Persist: IoT device security issues are likely to persist because many devices may never receive a software update, either because the manufacturer (or other party in the IoT supply chain, or IoT service provider) may not provide updates or because consumers may not apply the updates that are already available.

Legal Considerations and Ethical Challenges of Artificial Intelligence. . .

233

Following this report, the BITAG Technical Working Group issued a further recommendation on how users can adapt to protect the IoTs from a security and privacy angle. Legal compliance should be consistent whereby IoTs are shipped with all current software practises to comply with standards and regulations following General Data Protection Regulations (GDPR.) This should also include fixing software bugs and mechanisms to offer automated update in software to secure the software. IoT devices should use strong authentication by default for example password protected and not use common easy passwords. BITAG also recommends that manufacturers test the security of each device with a range of possible configurations, as opposed to simply the default configuration. IoT devices should also consider following security & cryptography best practices from its inception through to the development and deployment stages. IoTs should not rely on the network firewall alone to restrict communication and most importantly still be usable and carry-on functioning if internet connectivity was disrupted. Whilst many IoTs get out of date very quickly once manufactured and sold on to consumers, it should support future addressing and naming best practices when they are deployed. Lastly, BITAG suggests that the IoT supply chain should play their part in addressing IoT security and privacy issues and should consider an industry cybersecurity program on launching any IoTs moving forward. Organisations such as BITAG were born out of a series of roundtable discussions hosted by the Silicon Flatirons Centre at the University of Colorado School of Law. A broad cross-section of the Internet community saw the need for, and value of, a technical advisory group to discuss technical issues pertaining to the operation of the Internet, as a means of bringing transparency and clarity to network management processes as well as the interaction among networks, applications, devices, and content. In the next section, IoTs bring together Smart Cities and the challenges and barriers held in setting up communities that are seen to help technology advance yet question our ethics to data collection and the legality to data privacy and security.

9 IoT and Smart Cities In order for Smart Cities to be built with strong foundations and offer the up-to-date secure and private connections for these IoTs, challenges and barriers discussed in this paper will need to be overcome. Collaborations from the supply chain as discussed in above sections will need to be addressed, and policy put in place for government to align the market to produce IoTs that are “fit for purpose” in a Smart City environment. Ziosi et al. (2022) explored the notion of the term “Smart City” and referred it to “technological additions to existing cities, or entirely new cities built with ‘smartness’ in mind” [41]. Ziosi’s paper explored definitions and labels that define Smart Cities and gives conclusion to four dimensions that make up a Smart City.

234

N. Rawindaran

Firstly, the importance of the cities network infrastructure that involves concerns over control, surveillance, data privacy and ownership. Secondly, post-political governance and the often disputes between private and public sector bodies in decision making. Thirdly, social inclusion of the citizens of the city whereby participation is inclusive and feel free of inequality and discrimination. Finally, sustainability, with a specific focus on the environment to protect the future. According to Sourbati and Behrendt (2021), the technologies used in Smart Cities such as ML, AI and Big Data make public services such as transportation, rubbish collection, street repairs etc easier to manage and monitor in real time, increasing the efficiency of the council and other sectors combined [42]. Smart Cities can certainly fix the problems of traditional cities and make the environment an advanced one to live in. Yigitcanlar et al. (2020) gives good examples of technologies being used in these Smart Cities as one that could provide efficient services however collect more data than anticipated in a traditional setting. For example, the trains run more efficiently thus enabling city officials to collect information about train schedule and train passengers through various techniques including facial recognition scans, gait recognition, body temperature, and much more, crossing the boundaries of ethical concern of data collection and GDPR [43]. This urban innovation challenges the legal considerations in that by identifying risk, remediation can start working, based on legal and ethical actions granted alongside the change. In a study by Kirytopoulos, K. (2022), a total of 65 risks are identified which are grouped into the following 9 categories: economic, social, organizational, environmental, technological, and technical, strategic, political, legal and security, while they are also presented in a Risk Breakdown Structure (RBS) [44]. Johnson, J. (2022) discussed the Data Governance Framework from a legal consideration of Smart Cities. Being able to use these solutions helps local government and council to deploy technological solutions, which include cameras and sensors inadvertently collecting and analysing data for purposes such as reducing traffic congestion, improving vehicle and pedestrian safety, enhancing public security and emergency services, providing accessible transportation services, improving civic planning and design, and facilitating research and development. The paper explores the notion of data governance emerging as a response to providing the rules and parameters that regulate public authorities or sets up self-policing by “private actors, data collection efforts aimed at producing more efficient and dynamic interactions between local governments, their partners, citizens, and the services upon which those citizens rely” [45]. One case study from Johnsons paper was a public-centred Smart City project in Bristol, England. In 2013, the city of Bristol received funding from the UK Government to conduct smart city research and development. This funding was a joint venture between the Bristol City Council and the University of Bristol. This project evaluated IoTs and “big data” applications, rolled out efficiency and mobility solutions and managed to deploy 5G, closed circuit television, and IoT sensors to control events. The project came under sole control of the government and Bristol became in integral and official Smart City strategy and part of the “One City Plan” that aims to make Bristol the UK’s “most digitally connected city.” Johnsons’ study highlights the challenges these Smart City initiatives will eventually face.

Legal Considerations and Ethical Challenges of Artificial Intelligence. . .

235

Sustainability of general resource constraints becomes a limitation in the ability of a local government to develop a smart city initiative. In Bristol, financial difficulties and loss of staff were apparent in the implementation of the Bristol IoT strategy resulting in many developments failing. Private involvement does bring benefits of money, resourcing and technological capabilities however there lies the cross boundaries on collaboration and intent to sustain these cities moving forward under traditional management. The question of the law is the hot topic not only in a Smart City environment but additional towards IoTs and the data being collected. Surveillance of CCTV cameras for example, raise concerns over the equity and equality of data preserved for future predictions especially in the law enforcement context in the public safety space. In a project cited in Johnsons’ paper, in 2016, San Diego introduced a “Smart Streetlights” to replace high energy streetlights and also served as Smart City sensor platforms, collecting pictures, sound, and data. San Diego’s “Department Instruction” document that whilst the Sustainability Department would oversee the data and metadata collection, the San Diego Police Department would have exclusive access to the camera feeds. It was also made clear that any sensor-collected information made public would be anonymized to protect personal data. It was raised of concern when Black Lives Matter movements had protests and it was reported that the streetlights were used to monitor these protests to violate norms of fairness and equity. The project was paused until the council adopted a surveillance ordinance San Diego’s Smart Streetlights certainly raised concerns of ethics and legality of the concerns that arose when a project went public with little private involvement and surveillance concerns infringing on individual rights. In another case in Johnsons’ paper highlighting Barcelona as a Smart City in 2017, developed an “Ethical Digital Standards” and “Open Digitisation Plan” to build trust through community involvement and transparency. The city developed an open-source Ethical Digital Standards toolkit so that the data being collected in these IoTs being used by the councils were more transparent in the development of digital policies. Ethical core values included technological and data sovereignty, citizen digital rights, interoperability and accessibility, collaborative development, stakeholder participation in technological development and governance, and transparency and privacy which directs projects that use the cities data to follow specific principles to guide the ethical use of data, including transparency, tracing, diligence, privacy, trust, responsibility, and benefit. In conclusion to this section, it is vital that private and public sectors work hand in hand in order to streamline the technology, resources, and the financial gain for Smart Cities. It is also prevalent that surveillance and equity concerns involve law enforcement are adhered to and communication of how data is handled and supported is ongoing. Lastly local laws and regulations show a huge impact in establishing a strategy for transparency and accountability in building the trust in people within the Smart City layout.

236

N. Rawindaran

10 Conclusion This chapter reveals how Internet of Things (IoTs) have grown and how humans interact and live with IoTs and in understanding the security and privacy behind these devices. IoTs such as smart devices including phones, appliances in the home and in industry, are a growing trend of getting to the information faster across “Cyberspace”. Their transfer of data now presents numerous challenges including those related to privacy, security, and data breaches, or those pertaining to ethical, legal, and jurisdictional matters. IoT devices have worked alongside ‘Big Data’ and lean on Artificial Intelligence (AI) and Machine Learning (ML) techniques and its algorithms, to process and perform analysis enabling data management. The human factors had a large influence in the security of IoTs and their perceptions of security and privacy concerning these devices were explored. The theory of Technology Acceptance Model (TAM) was an important factor in obtaining the understanding of three main domains that effect human perception toward IoTs security, namely: product related factors, social influence related factors and security related factors. Most importantly the trust issue between human and IoTs. The concept of “Cookies” as a tracking tool for online web surfing and its safety measures were also explored and the concepts applied to human perception duplicated. Technical advisory groups contributed as well towards the understanding of IoTs and how humans cope and live alongside them. Humans always are on the path to awareness however this perception can be a struggle to exert control is out of the humans’ hands from a manufacturing point of view. This leads to the discussion of the supply chain of IoTs being easy to understand, clear and transparent to humans who end up being the end users and subjected to security and privacy issues. When it came to understanding bystanders, humans often considered convenience and access to information than bystanders when placing unfamiliar devices in their homes. When confronted with privacy, they expressed the need for detailed controls to adjust the output. Both views showed that bystanders were already beginning to be considered during the design of smart home devices. The increasing number of IoT devices and their discreetness raises new challenges for the design of future smart environments. Humans always provide specific challenges for the design for the future smart which can be based on many factors one such as environment based on the discussions of this paper. IoT devices and environments need respect both users and bystander’s data being collected and used. The notion of “I don’t know how to protect myself” is still very much a question unanswered until there is stability and similarity is the supply chain of IoTs. In the building of Smart Cities, the role of IoTs becomes a conflict between private and public bodies in how data is collected, managed and kept safe leaning heavily on the laws and enforcement of the country and state. Ethics formulates a large challenge in making sure equity and equality have focused when scrutinising data for policing for example. Future works should address this gap by providing scalable solutions that reduce the burden from humans and bystanders, Smart Cities and IoTs. Research will be a continues cycle as long as there are humans using technology in the world of cyber threats and

Legal Considerations and Ethical Challenges of Artificial Intelligence. . .

237

its security. The consensual hallucination as explained earlier in this study is also an important perspective to reflect on, as a vital factor to how we react and behave in the virtual world and how it is replicating the real world of these Smart Cities concepts. Humans need to be vigilant in their movements in the virtual space and behave the same as they would to protect themselves and their data in the real world. The overall message of this study indicates that there is still a long way to go for real world to overlap into Cyberspace in keeping data safe and secure.

References 1. ‘Don’t go to 2020’ – Sound advice from Dublin’s newest Back to the Future mural. Available at: https://lovindublin.com/dublin/dont-go-to-2020-sound-advice-from-dublins-newestback-to-the-future-mural (Accessed on 10 June 2021) 2. Role of Internet of Things (IoT) in Movies Blog Mallika Rangaiah Mar 26, 2020, Available at: https://www.analyticssteps.com/blogs/role-internet-things-iot-movies (Accessed on 10 June 2021) 3. www.mindbounce.com, Who Coined The Term ‘Cyberspace’? – MindBounce (n.d.) [online]. Available at: https://www.mindbounce.com/trivia/who-coined-the-term-cyberspace/ (Accessed 3 Aug 2022) 4. www.goodreads.com, A quote from Neuromancer (n.d.) [online]. Available at: https:/ /www.goodreads.com/quotes/14638-cyberspace-a-consensual-hallucination-experienceddaily-by-billions-of-legitimate. (Accessed on 08.03.21) 5. The Sydney Morning Herald, Towards a consensual hallucination (2012) [online]. Available at: https://www.smh.com.au/technology/towards-a-consensual-hallucination-201205241z7d7.html. (Accessed on 09.03.21) 6. S.M. Biju, A. Mathew, Internet of Things (IoT): Securing the Next Frontier in Connectivity (2020) 7. J. Yarnold, P. Henman, C. McEwan, A. Radke, K. Hussey, As you enter your driverless vehicle, your Chatbot reminds you to pick up milk. Your milk can be traced through its supply chain back to the farm where a robot milked the cow. The milk contains genetically engineered enzymes that were designed by predictive computer software to improve your health. Welcome to now, or at least, very soon from now 8. D.E. O’Leary, ‘Big Data’, the ‘Internet of Things’, and the ‘Internet of Signs’. Intell. Syst. Account. Finance Manag. 20(1), 53–65 (2013) 9. M. Cox, D. Ellsworth, Managing Big Data for Scientific Visualization (ACM Siggraph, 1997) 10. Gartner, (2020). Available at: https://www.gartner.com/en/information-technology/glossary/ big-data/ (Accessed on 10 June 2021) 11. K. Ashton, That ‘Internet of Things’ thing (2009). Available at: http://www.rfidjournal.com/ article/view/4986 (Accessed on 10 June 2021) 12. P. Hernandez, App employs context for Big Dataanalytics efficiency, Enterprise Apps Today, 18 September, (2012). Available at: http://www.enterpriseappstoday.com/businessintelligence/ app-employs-context-for-big- data-analyticsefficiency.html (Accessed on 10 June 2021) 13. What Is Machine Learning: Definition, Types, Applications and Examples, Available at: https://www.potentiaco.com/what-is-machine-learning-definition-typesapplicationsand-examples/ (Accessed: 11 June 2021) 14. Robocop Machine Learning Expense Fraud. Available at: https://www.cbronline.com/ emerging-technology/robo-cop-machine-learningexpense-fraud/ (no date) (Accessed: 11 June 2021)

238

N. Rawindaran

15. G. Murtarelli, A. Gregory, S. Romenti, A conversation-based perspective for shaping ethical human–machine interactions: The particular challenge of chatbots. J. Bus. Res. (2020) 16. Top Cybersecurity facts figures and statistics. Available at: https://www.csoonline.com/article/ 3153707/topcybersecurity-facts-figures-and-statistics.html (Accessed: 11 June 2021) 17. J. Haney, Y. Acar, S. Furman, “ It’s the Company, the Government, You and I”: User perceptions of responsibility for smart home privacy and security, in 30th {USENIX} Security Symposium ({USENIX} Security 21), (2021) 18. S. Chatterjee, Factors impacting behavioral intention of users to adopt IoT in India: From security and privacy perspective. Int. J. Inf. Secur. Priv. 14(4), 92–112 (2020) 19. F.D. Davis, Technology acceptance model: TAM, in Information Seeking Behavior and Technology Adoption, ed. by M.N. Al-Suqri, A.S. Al-Aufi, (IGI Global, Hershey, 1989), pp. 205–219 20. P. Jayashankar, S. Nilakanta, W.J. Johnston, P. Gill, R. Burres, IoT adoption in agriculture: the role of trust, perceived value, and risk. J. Bus. Ind. Mark. (2018). https://doi.org/10.1108/ JBIM-01-2018-0023 21. E.M. Rogers, Diffusion of innovations. Fourth edition (Free Press, New York, 1995) 22. M.N. Alraja, M.M.J. Farooque, B. Khashab, The effect of security, privacy, familiarity, and trust on users’ attitudes toward the use of the IoT-based healthcare: the mediation role of risk perception. IEEE Access 7, 111341–111354 (2019) 23. A. AlHogail, Improving IoT technology adoption through improving consumer trust. Technologies 6(3), 64 (2018) 24. K. Marky, A. Voit, A. Stöver, K. Kunze, S. Schröder, M. Mühlhäuser, “I don’t know how to protect myself”: Understanding privacy perceptions resulting from the presence of bystanders in smart environments, in Proceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society, pp. 1–11 (2020, October) 25. A. Cahn, S. Alfeld, P. Barford, S. Muthukrishnan, An empirical study of web cookies, in Proceedings of the 25th International Conference on World Wide Web, pp. 891–901 (2016, April) 26. L. Narayanan, Cookies ‘n’consent: An empirical study on the factors influencing customer attitudes towards cookie consent among internet users in EU (Doctoral dissertation, Dublin Business School, 2020) 27. M.S. Ackerman, L.F. Cranor, J. Reagle, Privacy in e-commerce: examining user scenarios and privacy preferences, in Proceedings of the 1st ACM conference on Electronic commerce, pp. 1–8 (1999) 28. A.D. Miyazaki, Online privacy and the disclosure of cookie use: Effects on consumer trust and anticipated patronage. J. Public Policy Mark. 27(1), 19–33 (2008). https://doi.org/10.1509/ jppm.27.1.19 29. A. McStay, I consent: An analysis of the Cookie Directive and its implications for UK behavioral advertising. New Media Soc. 15(4), 596–611 (2013). https://doi.org/10.1177/ 1461444812458434 30. S. Chapman, G.S. Dhillon, Privacy and the internet: The case of the DoubleClick, Inc., in Social Responsibility in the Information Age: Issues and Controversies, (IGI Global, 2002), pp. 75–88 31. Risk associated with cookies – Infosec Resources (infosecinstitute.com) (Accessed: 11 June 2021) 32. Accept All Cookies | Master of Media (uva.nl) (Accessed: 11 June 2021) 33. N. Rawindaran, A. Jayal, E. Prakash, Machine learning cybersecurity adoption in small and medium enterprises in developed countries. Computers 10, 150 (2021). https://doi.org/10.3390/ computers10110150 34. Qualtrics [Online Software]: Provo, UT, USA. Available online: www.qualtrics.com (Accessed on 3 Mar 2020). 35. D.B. Allsop, J.M. Chelladurai, E.R. Kimball, L.D. Marks, J.J. Hendricks, Qualitative methods with Nvivo Software: A practical guide for analyzing qualitative data. Psych 4(2), 142–159 (2022)

Legal Considerations and Ethical Challenges of Artificial Intelligence. . .

239

36. C.W. Turner, M. Zavod, W. Yurcik, Factors that affect the perception of security and privacy of e-commerce web sites, in Fourth International Conference on Electronic Commerce Research, Dallas TX, pp. 628–636 (2001, November) 37. C.C. Carbon, Understanding human perception by human-made illusions. Front. Hum. Neurosci. 8, 566 (2014) 38. S. Zheng, N. Apthorpe, M. Chetty, N. Feamster, User perceptions of smart home IoT privacy, in Proceedings of the ACM on Human-Computer Interaction, 2(CSCW), pp.1–20 (2018) 39. Broadband Internet Technical Advisory Group, Internet of Things (IoT) Security and Privacy Recommendations. Technical Report (2016) 40. BITAG History, (2016). Available at: https://bitag.org/ (Accessed: 12 June 2021) 41. M. Ziosi, B. Hewitt, P. Juneja, M. Taddeo, L. Floridi, Smart Cities: Mapping their Ethical Implications. Available at SSRN 4001761 (2022) 42. M. Sourbati, F. Behrendt, Smart mobility, age and data justice. New Media Soc. 23(6), 1398– 1414 (2021). https://doi.org/10.1177/1461444820902682 43. T. Yigitcanlar, K. Desouza, L. Butler, F. Roozkhosh, Contributions and risks of artificial intelligence (AI) in building smarter cities: Insights from a systematic review of the literature. Energies 13, 1473 (2020). https://doi.org/10.3390/en13061473 44. K. Kirytopoulos, T. Christopoulos, E. Dermitzakis, Smart cities: Emerging risks and mitigation strategies, in Building on Smart Cities Skills and Competences, (Springer, Cham, 2022), pp. 123–139 45. J. Johnson, A. Hevia, R. Yergin, S. Karbassi, A. Levine, J. Ortiz, Data governance frameworks for smart cities: Key considerations for data management and use. J. Law Mobil. 2022(1), 1 (2022)