Innovations in Digital Forensics [Team-IRA] 9811273197, 9789811273193

Digital forensics deals with the investigation of cybercrimes. With the growing deployment of cloud computing, mobile co

220 26 32MB

English Pages 344 [343] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Innovations in Digital Forensics [Team-IRA]
 9811273197, 9789811273193

  • Commentary
  • Thanks to Team-IRA

Table of contents :
Contents
Preface
About the Editors
About the Contributors
1. Digital Forensics for Emerging Technologies: Present and Future
1. Introduction
2. Background on Digital Forensics
3. Challenges in IoT Forensics
3.1. Interconnected devices
3.2. Heterogeneous ecosystem
3.3. Resource constraints
3.4. Lack of standardization
3.5. Privacy-preserving data sharing
4. Outline of This Book
5. Conclusion
References
2. Evaluating Deleted File Recovery Tools per NIST Guidelines: Results and Critique
1. Introduction
2. Background
2.1. Metadata-based deleted file recovery
2.1.1. FAT file system
2.1.2. NTFS file system
2.1.3. Recovering deleted files
2.2. File carving
2.3. NIST CFTT guidelines
2.3.1. For metadata-based DFR
2.3.2. For file carving
3. Objectives
4. Approach
4.1. Metadata-based tools
4.1.1. Designing recovery scenarios
4.1.2. Creating test images
4.1.3. Challenges
4.1.4. Recovering Files
4.1.5. Results
4.2. Carving-based tools
4.2.1. CFTT test cases
4.2.2. Recovering files
4.2.3. Evaluating results
4.2.4. Results
5. Discussion
5.1. Critique of DFR tools
5.1.1. Performance of metadata-based tools
5.1.2. Conditions for success of metadata-based tools
5.1.3. Performance for file carving tools
5.1.4. Conditions for success of file carving tools
5.2. Critique of NIST guidelines
5.2.1. FAT fragmentation and metadata-based tools
5.2.2. Incompatible core features for metadata-based tools
5.2.3. False-positives from file carving
6. Related Work
7. Conclusion
References
3. Optimized Feature Selection for Network Anomaly Detection
1. Introduction
2. Background
2.1. Particle swarm optimization
2.2. Ensemble methods
3. Approach Overview
4. Methodology
4.1. Optimized feature selection
4.2. Deep learning-based anomaly detection
5. IoT-Zeek Dataset Generation
5.1. Maliciousness classification
5.2. System adaptation
6. Evaluation Results and Discussion
6.1. Experimental setup
6.2. Benchmark datasets description
6.3. Feature selection results
6.4. Anomaly detection results
6.5. Comparative study
6.6. Efficiency
7. Related Works
7.1. Feature selection using optimization
7.2. Deep learning and anomaly detection
8. Concluding Remarks and Limitations
References
4. Forensic Data Analytics for Anomaly Detection in Evolving Networks
1. Introduction
2. Background
2.1. Service targeting attacks in evolving networks
2.2. Digital forensic analytics
3. Literature Review
3.1. Network anomaly detection
3.2. Forensic data analytics
3.3. Service targeting attack detection
3.4. Cybercrime-related entity detection
3.5. Research gaps
4. Multi-perspective as Intelligence for Anomaly Detection
4.1. Security posture support in evolving networks
4.2. Digital forensic analytics framework for anomaly detection
4.3. System deployment
5. Data Pre-processing and Feature Engineering
5.1. Data collection and description
5.2. Data normalization
5.3. Feature engineering
5.4. Attack patterns
6. Unsupervised Anomaly Detection
6.1. Malicious IPs and content fingerprinting
6.2. Compromised service nodes identification
7. Anomaly Detection Result Correction
7.1. Cross-perspective analysis
7.2. Time-series analysis
7.3. Offering analysis
7.4. Results summary
8. Summary
Acknowledgment
References
5. Offloading Network Forensic Analytics to Programmable Data Plane Switches
1. Introduction
2. Related Literature
2.1. P4-enabled analytics
2.2. Traditional network forensics
3. Background
3.1. A primer on programmable switches
3.2. Motivating line-rate network forensics
4. In-network Forensic Use Cases
4.1. Assessing DDoS
4.1.1. Slow DDoS
4.1.2. Volumetric analysis
4.2. Fingerprinting IoT devices
4.2.1. Switch-based constraints
4.2.2. Meeting hardware restrictions
4.2.3. P4-specific features
4.2.4. Parallel processing
4.2.5. Match table mapping
4.2.6. Device fingerprinting
4.2.7. Automating program configuration
5. Evaluation
5.1. Environmental setup
5.2. DDoS detection results
5.3. IoT fingerprinting assessment
5.3.1. Experimental setup
5.3.2. Classification results
6. Conclusion
Acknowledgment
References
6. An Event-Driven Forensic Auditing Framework for Clouds
1. Introduction
2. Related Work
3. Preliminaries
3.1. Security property
3.2. Security-related events
3.3. Security-related attributes
4. Event-Driven Cloud Auditing Framework
4.1. Overview
4.2. Framework architecture
4.3. Formal modeling and verification
5. Implementation
5.1. Overview
5.2. Event listening
5.3. Data collection
5.4. Data processing
5.5. Compliance verification
5.6. Challenges and limitations
6. Prototype Setup
7. Experiments
7.1. Scenario 1: Intra compute node attack
7.2. Scenario 2: Inter compute node attack
7.3. Feasibility of our cloud auditing framework
8. Conclusion
Acknowledgments
References
A. Appendix: Attack Scenarios
A.1. Steps of Intra Compute Node Attack
A.2. Steps of Inter Compute Node Attack
B. Appendix: Malicious Flows Fabrication
B.1. Fabrication of Malicious Flows for Outgoing Unicast
B.2. Fabrication of Malicious Flows for Incoming Unicast
B.3. Multicast and Broadcast Malicious Flows’ Fabrication
7. Multi-level Security Investigation for Clouds
1. Introduction
2. Preliminaries
2.1. Background on cloud levels
2.2. Background on dependency model
2.3. Major challenges in building predictive models
2.4. Threat model
3. Security Investigation System for Clouds
3.1. Overview
3.2. Prediction
3.3. Multi-level proactive verification
4. Implementation
4.1. Architecture
4.2. Implementation details
5. Adapting to Other Cloud Platforms
6. Experiments
6.1. Experimental settings
6.2. Experimental Results
7. Discussion
8. Related Work
8.1. Comparative study
8.2. Existing investigation approaches
9. Conclusions
References
8. Digital Evidence Collection in IoT Environment
1. Introduction
2. Definitions
2.1. Evidence
2.2. Evidence collection
2.3. IoT digital evidence collection
3. Digital Forensics
3.1. Traditional digital forensics
3.1.1. Evidence seizure
3.1.2. Evidence deconstruction and analysis
3.1.3. Forensic judgment and reporting
3.2. IoT digital forensics
3.2.1. Source of digital evidences in iot digital forensics
3.2.2. Challenges of IoT forensics
4. Digital Evidence Collection in IoT Systems
4.1. Computer evidence collection
4.2. IoT digital evidence collection
4.3. Cloud digital evidence collection
5. IoT Forensic Tools and Frameworks
5.1. Attributes of the IoT forensics tools
5.1.1. Forensics phases
5.1.2. Enablers
5.1.3. Networks
5.1.4. Sources of evidence
5.1.5. Investigation modes
5.1.6. Digital forensics models
5.1.7. IoT forensics data processing
5.1.8. Forensics layers
5.2. IoT forensics tools
5.3. IoT forensic frameworks
5.4. Discussion
6. Conclusion
References
9. Optimizing IoT Device Fingerprinting Using Machine Learning
1. Introduction
2. Related Work
3. Problem Statement
4. Proposed Methodology
4.1. Overview
4.1.1. Data Pre-processing
4.1.2. Data training
4.1.3. Data analysis and prediction
5. Experimentation
5.1. Precision and recall
5.2. F1-score (harmonic mean)
5.3. Complexity
6. Conclusion
References
10. Conclusion

Citation preview

World Scientific Series in Digital Forensics and Cybersecurity - Vol.2

Innovations in Digital Forensics

World Scientific Series in Digital Forensics and Cybersecurity Print ISSN: 2661-4278 Online ISSN: 2661-4286 Series Editor: Sanjay Goel, The State University of New York at Albany

This book series covers the latest research in the field of digital forensics as well as the state-of-the-art practice in the field. Eminent researchers and practitioners have been selected to work on different volumes of the series that will be announced and released in a sequence. Published Vol. 2 Innovations in Digital Forensics edited by Suryadipta Majumdar, Paria Shirani and Lingyu Wang Vol. 1 SecureCSocial: Secure Cloud-Based Social Network by Pradeep K Atrey and Kasun Senevirathna

World Scientific Series in Digital Forensics and Cybersecurity - Vol.2

Innovations in Digital Forensics

Editors

Suryadipta Majumdar Concordia University, Canada

Paria Shirani

University of Ottawa, Canada

Lingyu Wang

Concordia University, Canada

World Scientific NEW JERSEY



LONDON



SINGAPORE



BEIJING



SHANGHAI



HONG KONG



TAIPEI



CHENNAI



TOKYO

Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

Library of Congress Cataloging-in-Publication Data Names: Majumdar, Suryadipta, editor. Title: Innovations in digital forensics / editors, Suryadipta Majumdar, Concordia University, Canada, Paria Shirani, University of Ottawa, Canada, Lingyu Wang, Concordia University, Canada. Description: Hackensack, NJ : World Scientific, [2023] | Series: World scientific series in digital forensics and cybersecurity, 2661-4278 ; vol 2 | Includes bibliographical references and index. Identifiers: LCCN 2023006251 | ISBN 9789811273193 (hardcover) | ISBN 9789811273209 (ebook) | ISBN 9789811273216 (ebook other) Subjects: LCSH: Digital forensic science. | Computer crimes--Investigation. Classification: LCC HV8078.7 .I56 2023 | DDC 363.250285--dc23/eng/20230405 LC record available at https://lccn.loc.gov/2023006251

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

Copyright © 2023 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

For any available supplementary material, please visit https://www.worldscientific.com/worldscibooks/10.1142/13330#t=suppl Desk Editors: Nambirajan Karuppiah/Nicole Ong Typeset by Stallion Press Email: [email protected] Printed in Singapore

c 2023 World Scientific Publishing Company  https://doi.org/10.1142/9789811273209 fmatter

Preface Digital forensics deals with the investigation procedure of cybercrimes. With the growing popularity of new technologies, such as cloud computing, mobile computing, and networking, the nature of digital forensics has been evolving in recent years. This book presents state-of-the-art techniques to address imminent challenges in digital forensics. Particularly, there are three major segments in this book that cover cloud forensics, IoT forensics, and network forensics. These segments elaborate on the innovative techniques including algorithms, implementation details, and performance analysis to demonstrate their practicality and efficacy. This book will be beneficial to both practitioners and researchers who deal with digital forensics. The presented innovations will directly help various stakeholders in obtaining knowledge on stateof-the-art digital forensics techniques as well as applying those techniques in the real world. Particularly: • This book provides a big picture to digital forensic practitioners and researchers alike about the recent progress in digital forensics topics. There currently exists little effort on a systematic compilation of recent works on digital forensics innovations. • This book discusses traditional forensics approaches for the investigation of cybercrimes to provide a good understanding of the field

v

vi

Preface

of digital forensics. Furthermore, this book highlights the big challenges in adopting those traditional approaches for emerging technologies, such as cloud computing and Internet of Things (IoT). • This book includes state-of-the-art digital forensic innovations, which are specifically designed to work with newer technologies. In contrast to the traditional approaches, these new approaches can tackle technology-specific challenges and enable digital forensics investigation for emerging technologies. • This book provides a detailed description of digital forensics techniques with real-life examples and thorough implementation steps, including forensics algorithms so that the forensic practitioners can adopt our proposed approaches with minimum effort.

c 2023 World Scientific Publishing Company  https://doi.org/10.1142/9789811273209 fmatter

About the Editors Suryadipta Majumdar received the PhD degree from Concordia University, Montreal, Canada. He is currently a Gina Cody Research and Innovation Fellow and an Assistant Professor with Concordia Institute for Information Systems Engineering (CIISE), Concordia University, Montreal, Canada. Previously, he was an Assistant Professor with the Information Security and Digital Forensics Department, University at Albany — SUNY, Albany, New York. His research mainly focuses on cloud security, software defined network (SDN) security, and Internet of Things (IoT) security. Paria Shirani is an Assistant Professor at the School of Electrical Engineering and Computer Science at University of Ottawa and a member of the Security Research Center (SRC) at Concordia University. Prior to joining the University of Ottawa in 2022, she was an Assistant Professor at the Department of Computer Science at Toronto Metropolitan University. Previously, she was a National Sciences and Engineering Research Council (NSERC) vii

viii

About the Editors

postdoctoral fellow at Carnegie Mellon University (CMU), USA. Paria Shirani earned her PhD degree in information systems engineering at Concordia University, during which she was awarded a Fonds de recherche du Qu´ebec — Nature et technologies (FRQNT) doctoral scholarship. Her research interests are mainly in the fields of cybersecurity, such as binary code and malware analysis, IoT security, vulnerability detection, threat intelligence generation, and applied machine learning. Lingyu Wang received a PhD degree in information technology from George Mason University, Fairfax, Virginia, in 2006. He is a Professor with the Concordia Institute for Information Systems Engineering (CIISE), Concordia University, Montreal, Quebec, Canada. He holds the NSERC/Ericsson Industrial Research Chair (IRC) in SDN/NFV Security. His research interests include SDN/ NFV security, cloud computing security, network security metrics, software security, and privacy. He has co-authored seven books, two patents, and more than 100 refereed conference and journal articles including many published at top journals/conferences, such as the ACM Transactions on Privacy and Security, IEEE Transactions on Information Forensics and Security, IEEE Transactions on Dependable and Secure Computing, IEEE Transactions on Mobile Computing, Journal of Cell Science, IEEE Symposium on Security and Privacy (SP), ACM Conference on Computer and Communications Security (CCS), Network and Distributed System Security (NDSS) Symposium, European Symposium on Research in Computer Security (ESORICS), Privacy Enhancing Technologies Symposium (PETS), International Conference on Database Theory (ICDT) etc. He is serving as an associate editor of IEEE Transactions on Dependable and Secure Computing (TDSC) and the Annals of Telecommunications (ANTE), and as an assistant editor of Computers & Security. He has served as the program (co)-chair of seven international conferences and as the technical program committee member of more than 150 international conferences.

c 2023 World Scientific Publishing Company  https://doi.org/10.1142/9789811273209 fmatter

About the Contributors Nicole Beebe (Senior Member, IEEE) is the Melvin Lachman Distinguished Professor in Entrepreneurship, Professor of Cybersecurity, Department Chair of Information Systems and Cyber Security, and Director of the Cyber Center for Security and Analytics at The University of Texas at San Antonio (UTSA). UTSA is a National Center of Academic Excellence in Information Assurance and Cyber Defense for both education and research. Dr. Beebe received her PhD in Information Technology from UTSA, an MS in Criminal Justice from Georgia State University, and a BS in Electrical Engineering from Michigan Technological University. She has over twenty years of experience in information security and digital forensics, from both the commercial and government sectors. She is a Certified Information Systems Security Professional (CISSP), and has held three professional certifications in digital forensics. Dr. Beebe has published several journal articles related to information security and digital forensics in Decision Support Systems (DSS), IEEE Transactions of Information Forensics and Security, Digital Investigation, and many other journals. Her research interests include digital forensics, cybersecurity, and data analytics. Elias Bou-Harb (Senior Member, IEEE) is currently the Director of the Cyber Center For Security and Analytics at UTSA, where he leads, co-directs, and co-organizes university-wide innovative ix

x

About the Contributors

cybersecurity research, development, and training initiatives. He is also a tenured Associate Professor at the department of Information Systems and Cyber Security specializing in operational cybersecurity and data science as applicable to national security challenges. Previously, he was a senior research scientist at Carnegie Mellon University (CMU) where he contributed to federally funded projects related to critical infrastructure security and worked closely with the Software Engineering Institute (SEI). He is also a permanent research scientist at the National Cyber Forensic and Training Alliance (NCFTA) of Canada; an international organization which focuses on the investigation of cybercrimes impacting citizens and businesses. Dr. Bou-Harb holds a PhD degree in computer science from Concordia University in Montreal, Canada, which was executed in collaboration with Public Safety Canada, Industry Canada, and NCFTA Canada. His research and development activities and interests focus on operational cybersecurity, attacks’ detection and characterization, malware investigation, cybersecurity for critical infrastructure, and big data analytics. Amine Boukhtouta has a focus on cybersecurity, cloud, 5G, machine learning, and artificial intelligence. He joined Ericsson in 2017 and has been active in the use of machine learning and artificial intelligence to integrate cybersecurity controls in virtual 5G networks. He also serves as a lecturer of an intrusion test course ´ at the Ecole de technologie sup´erieure, Montr´eal, and an introduction to scripting languages course at the Polytechnique Montr´eal. Amine holds a PhD in electrical and computer engineering and an MSc in Information systems and security from Concordia University, Montr´eal, Canada. Masoud Bozorgi holds a Master’s degree in Software Engineering from Concordia University, Montreal, Canada. Currently, he is a senior instructor at John Abbott College in Montreal, Canada, teaching different courses such as Data Structures and Algorithm, Java Spring Boot, RESTful, AOP, Swift and Kotlin programming language, Advanced iOS Application Development, Advanced Android Application Development, SQL, Angular, React, Supervised

About the Contributors

xi

and Unsupervised Machine Learning, Natural Language Processing, Image processing, Deep Learning, and Web Application Security. His extensive experience in Networking and Software Engineering has led to training over 200 students, who have been hired by top companies such as Yelp, Morgan Stanley, Ericsson, IBM sister companies, and Bell. Prior to this role, Masoud worked as a Cloud Security Research Assistant at Ericsson in Montreal, Canada. Masoud was an Entrepreneur, Founder, Chairman, and Chief Technical Officer of his IT company in Esfahan, Iran. As the founder of the first Internet Service Provider in Iran’s second-largest city, Masoud expanded the company’s services to include networking and software engineering, growing the client base to over 10,000+ and managing a team of over 100 staff, engineers, and technicians. Masoud also worked in various Computer Network Administration and Software Engineering positions with over 14 years of experience in the field. Richard Brunner is currently a Strategy Advisor in Log5Data. He joined Ericsson in 1988 and with broad career experience in System Management, Standardization, and Strategic Product management. Richard possesses international management experience, with deep knowledge of the wireless telecommunication market. He was actively engaged in setting Ericsson’s research and partnership activities both internally and externally with industry and universities. Aniss Chohra is a backend developer at Wazo, Canada. He holds a Master’s degree in information systems engineering from Concordia University, under the supervision of Dr. Mourad Debbabi. Aniss is a member of the Security Research Center at Concordia University, and has been working on different cybersecurity research topics, such as cyberthreat intelligence extraction, network anomaly detection, and risk assessment. Aniss also holds a Master’s degree in Architecture of Computer Systems and Networks and a Bachelor’s degree in computer systems engineering from Hadj-Lakhdhar University in Algeria, in addition to successfully achieving Cisco’s Certified Network Associate certificates (CCNA1, 2, 3, and 4).

xii

About the Contributors

Jorge Crichigno (Member, IEEE) received the PhD. degree in computer engineering from The University of New Mexico, Albuquerque, USA, in 2009. He is currently an Associate Professor with the College of Engineering and Computing, University of South Carolina (USC), and the Director of the Cyberinfrastructure Laboratory, USC. His work has been funded by private industry and U.S. agencies, such as the National Science Foundation (NSF), the Department of Energy, and the Office of Naval Research (ONR). He has over 15 years of experience in the academic and industry sectors. His research interests include P4 programmable switches, implementation of high-speed networks, network security, TCP optimization, offloading functionality to programmable switches, and the IoT devices. Mourad Debbabi is a Professor at the Concordia Institute for Information Systems Engineering and the Dean of the Gina Cody School of Engineering and Computer Science, Concordia University. He holds the NSERC/Hydro-Quebec Thales Senior Industrial Research Chair in Smart Grid Security and the Honorary Concordia Research Chair Tier I in Information Systems Security. He is a founding member and Executive Director of the National Cybersecurity Consortium that leads the CyberSecurity Innovation Network (CSIN) program. He serves on the expert committee of the Ministry of Cybersecurity and Digital Technology of the Quebec Government. He serves/served on the boards of the Canadian Police College, PROMPT Qu´ebec, Cybereco, and Calcul Qu´ebec. He served as a member of CATAAlliance’s Cybercrime Advisory Council. He is the founder and the Director of the Security Research Centre of Concordia University. Dr. Debbabi holds PhD and MSc degrees in computer science from Paris-XI Orsay, University, France, and BEng. from Universit´e de Constantine. He published seven books and more than 300 peer-reviewed research articles in international journals and conferences on cybersecurity, cyber forensics, smart grid security, privacy, cryptographic protocols, cyber threat intelligence, malware analysis, reverse engineering, specification, and verification of safetycritical systems, programming languages, and type theory. He has supervised to successful completion 34 PhD students, 76 Master’s

About the Contributors

xiii

students, and 15 Postdoctoral Fellows. He served as a Senior Scientist at the Panasonic Information and Network Technologies Laboratory, Princeton, New Jersey, USA; Associate Professor at the Computer Science Department of Laval University, Canada; Senior Scientist at General Electric Research Center, New York, USA; Research Associate at the Computer Science Department of Stanford University, California, USA; and Permanent Researcher at the Bull Corporate Research Center, Paris, France. Ronald Ellis is a Senior Security Developer at Hootsuite. He completed his Master’s of Engineering in Information Systems Security at Concordia University. He also holds a BSc in Electrical/Electronic Engineering from Kwame Nkrumah University of Science and Technology. Oluwatosin Falola is a Graduate of Computer Science from the University of Regina, where he researched and wrote his thesis on the IoT Device Fingerprinting for Anomaly Detection. He is a business solutions provider. He loves to solve business problems by giving insights using business data. He currently works as a Business Intelligence Analyst (assisting with Microsoft Power BI Administration, Data Science Lab, & ETL) at the Government of Saskatchewan in the Citizen-Centric Program Delivery Unit of Saskbuild and Procurement. He was a Lead Consultant and a Business Solutions Consultant in his past engagements. He participated in enterprise solution deployment involving, Enterprise Microsoft Power BI Data Gateway, Microsoft Dynamics CRM, and Microsoft SharePoint Technologies. Kurt Friday (Graduate Student Member, IEEE) is pursuing his PhD in Information Technology at the University of Texas at San Antonio (UTSA) with added emphasis on cybersecurity, artificial intelligence, and information systems. He has a background in computer science, which he obtained at Florida Atlantic University (FAU). He has extensive experience with programmable networking and SDN, and he has developed several security and forensic mechanisms for such networks. His current research interests are in the

xiv

About the Contributors

areas of operational cyber security, attack detection and characterization, Internet measurements for cybersecurity, malware analysis, digital forensics, IoT security, and data science. Parisa Heidari received the Master’s and PhD degrees in computer ´ engineering from the Ecole Polytechnique de Montreal, Canada, in 2007 and 2012, respectively. She is a Cloud Developer with IBM Canada. Before that, she was an IoT Developer and a Security Master with Ericsson. She worked as a Research Associate with Concordia University in collaboration with Ericsson from 2013 to 2014. In 2015, she joined Ericsson Research in Montreal as a Postdoctoral Fellow. She holds to her credit several publications and patents. Her research interests include edge computing, cloud next generation, container technologies and serverless approach, smart resource dimensioning, optimal placement, and different aspects of QoS assurance in cloud systems. Yosr Jarraya is currently a researcher in security at Ericsson focusing on security and privacy in 5G, container-based environment, cloud, and SDN/NFV. Previously, she was awarded a two-year postdoctoral fellowship at the same company. She received a PhD in Electrical and Computer Engineering from Concordia University, Montreal. She has more than 10 patents granted or pending. She co-authored two books and more than 40 research papers in peerreviewed scientific journals and conferences on topics including cloud security auditing, data anonymization, network and software security, and formal verification. ElMouatez Billah Karbab is the founder of MTzLabs Corporation, an R&D and consulting company. He holds a PhD degree in information systems engineering from Concordia University. His research focuses on applied machine learning techniques to malware fingerprinting and mobile and IoT security. He served as a researcher at Concordia University where he worked on many academic, industrial, and government projects. He also served as a research scientist at the National Cyber Forensic and Training Alliance (NCFTA) of

About the Contributors

xv

Canada, an international organization that focuses on the investigation of cybercrimes. He also served as an associate researcher at the Research Centre for Scientific and Technical Information (CERIST), Algeria, where he worked on international projects in collaboration with the University of Cape Town, South Africa, and Heudiasyc Lab, France. ElMouatez has published a book, a book chapter, and many peer-reviewed research articles in international journals and conferences on malware fingerprinting using machine learning techniques, cybersecurity, and embedded systems. Ehsan Khodayarseresht is a PhD student at Concordia Information Systems Engineering (CIISE), Concordia University, Montreal, Canada. His studies in software engineering included a Master’s degree from Shahid Beheshti University and a Bachelor’s degree from Islamic Azad University — South Tehran Branch, Tehran, Iran. His research interests are IoT security and forensics, cloud computing, and green computing. Adel Larabi is a Senior Solutions Architect at Ericsson with over 25 years of leadership experience in designing innovative business solutions for Telco. He helps bridge academia research projects with commercial grade enterprise solutions. His core qualifications are in CDN, Edge Computing, big data, IMS, Media, and OSS with interest in AI applied to these domains. He is an expert in the Ericsson cybersecurity team. Habib Louafi is Assistant Professor at the Department of Science and Technology, Teluq University, Canada. He serves also as an Adjunct Professor at the Department of Computer Science, University of Regina in Canada. Before that, he was also Assistant Professor at the same Department and Assistant Professor at the Computer Science Department in the New York Institute of Technology (Vancouver). His research interests include security, privacy, and forensics. He is currently active in cloud and SDN security, network security, IoT-related security, and optimization of security deployment. He received an engineering degree in computer science from the University of Oran, Algeria, MSc degree from the Universit´e du Qu´ebec a`

xvi

About the Contributors

Montr´eal, and the PhD degree from Ecole de Technologie Sup´erieure (ETS), which is a part of the Universit´e du Qu´ebec Network. Andrew Meyer is a software developer from Akron, Ohio, USA. He graduated from Bowling Green State University, USA, in 2022 with a degree in computer science. His research interests include computer file systems, computer security, and machine learning. In addition to his contribution to this book, he has published a paper on file system forensic analysis in EAI Endorsed Transactions on Security and Safety. Daniel Migault is an expert in the Ericsson cybersecurity team and is actively involved in standardizing security protocols at the IETF. Amaliya Princy Mohan holds a Bachelor’s degree in Electronics and Communication Engineering from India and a Master’s degree in Computer Science from the University of Regina, Canada. She is currently working as an Automation Analyst with Concentra Bank, Canada. Before that, she worked as a software developer with multiple IT companies. Abdallah Moubayed received his PhD in Electrical & Computer Engineering from the University of Western Ontario, Canada, in 2018, his MSc degree in Electrical Engineering from King Abdullah University of Science and Technology, Saudi Arabia, in 2014, and his BE degree in Electrical Engineering from the Lebanese American University, Lebanon, in 2012. He was a Postdoctoral Associate in the Optimized Computing and Communications (OC2) lab at University of Western Ontario between 2018 and 2022. Currently, he is an Assistant Professor in the Electrical & Computer Engineering Department at the Lebanese American University. He has published more than 40 scientific papers in top-tier journals and conferences with wide citations. His research interests include machine learning & data analytics, next-generation network management, resource allocation, network virtualization, performance & optimization modeling, and cloud computing.

About the Contributors

xvii

Malek Mouhoub is Professor and SaskPower Chair in Artificial Intelligence at the Department of Computer Science, the University of Regina in Canada. Dr. Mouhoub was the head of the Department, from 2016 until 2019. His research interests are in Artificial Intelligence and include: Constraint Satisfaction, Combinatorial Optimization, Spatio-Temporal Reasoning, Preference Learning and Reasoning, Constraint Acquisition, Machine Learning, and Natural Language Processing. Dr. Mouhoub’s research is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Mathematics of Information Technology and Complex Systems (MITACS) federal grants, in addition to other international and provincial funds and awards. Dr. Mouhoub obtained his MSc and PhD in Computer Science from the University of Lorraine in France. Jane Iveatu Obioha holds a Bachelor’s degree in Computer Science from University of Nigeria Nsukka, Nigeria, and Master’s degree in Computer Science from the University of Regina, Saskatchewan, Canada. Her research interests include cybersecurity, IoT, and Computer Forensics. She is a member of multiple Information Technology organizations like NIPCA, PBIN, IMI, and other voluntary groups. Makan Pourzandi is a research leader at the research department, Ericsson, Canada. He received his PhD degree in Computer Science from University of Lyon I Claude Bernard, France, ´ and MSc in parallel computing from Ecole Normale Sup´erieure de Lyon, France. He has more than 20 years of experience in the fields of cybersecurity, Telecom, and distributed systems. He co-authored two books on cybersecurity published by Springer, on auditing in cloud environments and software security. He is the co-inventor of more than 25 granted US and International patents. He has published more than 85 research papers in peer-reviewed scientific journals and conferences. His current research interests include cybersecurity, cloud computing, and software security engineering.

xviii

About the Contributors

Stere Preda received his PhD in Computer Science from TELECOM Bretagne, France. Senior Researcher with expertise in cybersecurity at Ericsson, he is an active contributor to ETSI NFV security standardization. Sankardas Roy is an Associate Professor in the Computer Science department at Bowling Green State University (BGSU), USA. Before joining BGSU he worked as a postdoctoral researcher at multiple universities in USA. He received a PhD from George Mason University, USA, in 2009. His main research interest is in security vetting of Android apps, computer security, digital forensics, computer networks, and more. In these areas, he published 2 book chapters and more than 25 papers in peer-reviewed journals and conferences. He served as a program committee member for several international conferences. Mark Scanlon (Senior Member, IEEE) received the MSc and PhD degrees in remote digital forensic evidence acquisition. He is currently an Associate Professor with the School of Computer Science, University College Dublin (UCD), and the Founding Director of the Forensics and Security Research Group, UCD. He is a Fulbright Scholar in cybersecurity and cybercrime investigation. His research interests include digital forensics, artificial intelligence, computer vision, data encryption, network forensics, and digital forensics education. He is a Senior Editor of the Forensics Science International: Digital Investigation journal (Elsevier), and is a keen editor, a reviewer, and a conference organizer in the field of digital forensics, including the Digital Forensics Research Conference (DFRWS). Abdallah Shami is currently a Professor in the Electrical and Computer Engineering Department and the Acting Associate Dean (Research) of the Faculty of Engineering, Western University, London, ON, Canada, where he is also the Director of the Optimized Computing and Communications Laboratory. Dr. Shami has chaired key symposia for the IEEE GLOBECOM, IEEE International Conference on Communications, and IEEE International Conference on Computing, Networking, and Communications. He was the elected

About the Contributors

xix

Chair for the IEEE Communications Society Technical Committee on Communications Software from 2016 to 2017 and the IEEE London Ontario Section Chair from 2016 to 2018. He is currently an Associate Editor of the IEEE Transactions on Mobile Computing, IEEE Network, and IEEE Communications Surveys and Tutorials. Yue Xin is an Application Support Analyst, Soci´et´e G´en´erale. She Completed her MASc in Information Systems security from Concordia University, Canada. Her work contributes toward improving the user interface in security management platforms. Li Yang received his PhD in Electrical and Computer Engineering from Western University, London, Canada, in August 2022, his MASc. degree in Engineering from the University of Guelph, Guelph, Canada, 2018, and his BE degree in Computer Science from Wuhan University of Science and Technology, Wuhan, China, in 2016. Currently, he is a Postdoctoral Associate and Sessional Instructor in the Optimized Computing and Communications (OC2) Lab at Western University. His research interests include cybersecurity, machine learning, AutoML, deep learning, network data analytics, IoT, anomaly detection, online learning, concept drift, and time series data analytics.

This page intentionally left blank

c 2023 World Scientific Publishing Company  https://doi.org/10.1142/9789811273209 fmatter

Contents Preface

v

About the Editors

vii

About the Contributors

ix

1.

Digital Forensics for Emerging Technologies: Present and Future

1

Ehsan Khodayarseresht and Suryadipta Majumdar 2.

Evaluating Deleted File Recovery Tools per NIST Guidelines: Results and Critique

13

Andrew Meyer and Sankardas Roy 3.

Optimized Feature Selection for Network Anomaly Detection Aniss Chohra, Paria Shirani, ElMouatez Billah Karbab, and Mourad Debbabi

xxi

51

xxii

Contents

4. Forensic Data Analytics for Anomaly Detection in Evolving Networks

99

Li Yang, Abdallah Moubayed, Abdallah Shami, Amine Boukhtouta, Parisa Heidari, Stere Preda, Richard Brunner, Daniel Migault, and Adel Larabi 5. Offloading Network Forensic Analytics to Programmable Data Plane Switches

139

Kurt Friday, Elias Bou-Harb, Jorge Crichigno, Mark Scanlon, and Nicole Beebe 6. An Event-Driven Forensic Auditing Framework for Clouds

191

Habib Louafi, Yue Xin, Masoud Bozorgi, Ronald Ellis, Yosr Jarraya, Makan Pourzandi, and Lingyu Wang 7. Multi-level Security Investigation for Clouds

229

Suryadipta Majumdar 8. Digital Evidence Collection in IoT Environment

263

Jane Iveatu Obioha, Amaliya Princy Mohan and Habib Louafi 9. Optimizing IoT Device Fingerprinting Using Machine Learning

293

Oluwatosin Falola, Habib Louafi, and Malek Mouhoub 10. Conclusion Suryadipta Majumdar and Paria Shirani

319

c 2023 World Scientific Publishing Company  https://doi.org/10.1142/9789811273209 0001

Chapter 1

Digital Forensics for Emerging Technologies: Present and Future Ehsan Khodayarseresht∗ and Suryadipta Majumdar† Concordia University, 1455 de Maisonneuve Blvd. West, Montreeal, Quebec, Canada H3G 1M8 ∗ [email protected][email protected]

The number of interconnected devices has been significantly increasing in recent years, which, despite their useful functionalities, have created many new opportunities for adversaries to attack them. While security experts try to avoid any attack, digital forensics experts play a significant role in investigating the attack source and the damage size. The digital forensics results could also be considered a valuable source of information for security professionals to mitigate future cyberattacks. However, because of a wider variety of emerging technologies, such as the internet of things (IoT) and cloud computing, traditional digital forensics tools are not mostly capable of forensics investigations in these domains. In this chapter, we review existing efforts in digital forensics for emerging technologies, more particularly in IoT forensics, and highlight existing critical challenges in this area.

1.

Introduction

The growing paradigm of interconnected devices through the Internet, such as the internet of things (IoT) and cloud computing, has led to an effective mode of interaction between computing devices.

1

2

E. Khodayarseresht & S. Majumdar

It has been predicted that the expanding number of such devices will reach 500 billion in 2030 [1]. Although these developments offer convenient options for easing daily activities, the security vulnerabilities in these emerging technologies have caused the rapid growth of cybercrimes and cyberattacks [2]. As a result, there is a greater need for digital forensics investigations, which aim to find the attack source and the size of the damage. Besides, due to the vast expansion of devices, such as smart home appliances, wearable devices, and industrial control systems current traditional digital forensics approaches encounter various challenging issues. In this chapter, our primary goal is to identify and examine the current challenges in digital forensics for emerging technologies, especially in IoT forensics. The following example demonstrates the nature of a security incident and highlights the possible challenges of its investigation in an IoT environment. Figure 1 illustrates a smart home environment equipped with five different IoT sensors, including a thermometer, motion detector, smart vacuum cleaner, smart air conditioner, and a window controller, all of which are connected to a local hub (e.g., router). This hub acts as a mediator between the server or data center (which is used to store and manage data) and the IoT devices in a smart home. A window controller usually opens the home’s window if both of the following conditions are fulfilled: (1) when the thermometer indicates the room temperature to be higher than 80◦ F, and (2) when the motion detector senses any movement. However, these conditions might be fulfilled by attacking a series of devices in a smart home, resulting in a break-in. In this attack scenario, an adversary gains access to the smart vacuum cleaner and air conditioner controllers after hacking the hub. As shown using the solid line in Figure 1, an attacker can first compromise the local hub (by exploiting an existing vulnerability [3, 4]) and then infect both the smart vacuum cleaner and air conditioner. Afterward, the attacker forcefully turns on the air conditioner in the heating mode to raise the room temperature to 80◦ F+ as well as programs the vacuum cleaner for cleaning the house to create motions at home.

Digital Forensics for Emerging Technologies: Present and Future

3

Cloud Data Centers Movement Detec on Sensor

Air Condi oner Inves gator

Temp Window Controller

Adversary

Hub

Vacuum Cleaner

Fig. 1. IoT forensics example. The solid lines represent the attack direction from the adversary to the IoT devices. The dotted lines with question marks show the challenges that the investigator encounters during the investigation process in order to detect the compromised IoT devices. The solid line without an arrow indicates the connection between the cloud and the smart hub.

Due to the fact that the motion detector cannot distinguish any difference between human movements and movements by the vacuum cleaner, both conditions are fulfilled, and the window controller will open the window while no one is at home and the temperature is increased synthetically; which consequently may result in severe safety and security threats (e.g., break-ins). As shown with the dotted lines in Figure 1, an investigation of the above-mentioned security break-in might be very likely to start from the incident place, which is the window. To find the root cause, an investigator would need evidence (logs, etc.) to trace back to the hub or air conditioner, or vacuum cleaner. However, neither window controllers nor those other IoT devices currently provide enough activity logs, and there is no current mechanism to enable this investigation

4

E. Khodayarseresht & S. Majumdar

to overcome their limitations. Ultimately, these IoT forensics problems are only a small part of the available challenges, which will be described more precisely in Section 3. In summary, we first review the literature on digital forensics for emerging technologies, and then, we consider IoT forensics as a role model and identify the current critical challenges in this domain. The rest of the chapter is organized as follows. Section 2 discusses related studies. Sections 3 and 4 demonstrate and sum up the current challenges in digital forensics, respectively. And finally, Section 5 presents the conclusions.

2.

Background on Digital Forensics

Digital forensics, as a term used in Information Technology, did not start with the advent of new interconnected emerging technologies, such as IoT and cloud computing. The computer processor by design always leaves a residue of information about the nature, origin, and pathways of all data it handles. Hence, digital evidence has always been associated with digital information analysis and data recycling or retrieval. For example, in IoT forensics, even though it is a variant of digital forensics, the source of evidence could be acknowledged as the key difference between these two phrases [5]. While traditional digital forensics covers a restricted range of devices for investigation, such as personal computers and smartphones, IoT technology has brought a significant shift to the criminal investigation, and IoT forensics experts have to deal with a vast range of interconnected and automated devices, from medical implants in the human body to smart cars [5]. However, the common factor between historical digital footprint tracing models and the current digital forensics in IoT is the presence of unwanted or suspicious third-party entities, whose activities require, in the first place, an investigation. In the following, some of the most recent IoT forensics works will be reviewed. An extensive data analysis framework for automatic detection of compromised IoT devices and suspicious activities is presented

Digital Forensics for Emerging Technologies: Present and Future

5

in Ref. [6]. The proposed framework employs machine learning and data mining techniques in order to recognize unusual activities. Distributed Denial of Service (DDoS) attacks is a cyberattack approach including packet streams from several sources, and the main goal is to deprive legal users of being served by available servers through consuming critical resources [7]. For example, Mirai malware was used through a botnet for a DDoS attack in October 2016 [8], which compromised various IoT devices, such as IP cameras, home routers, and smart TVs, in order to break down different sites, including Amazon, Netflix, and Twitter. Zhang et al. [9] proposed a comprehensive IoT forensics case study on the Mirai botnet servers. The authors proposed a practical Mirai botnet network architecture and a complete investigation of the Mirai botnet server. A conventional forensics model is adopted in Ref. [10] as an IoT evidence collecting system using common characteristics among IoT devices for creating a global guideline. Besides, several IoT forensics models and frameworks based on different approaches and policies are proposed in Refs. [11–13]. Finally, current challenges, approaches, and open issues in IoT forensics that will be explained briefly in this chapter are discussed extensively in Refs. [5, 14, 15]. In conclusion, the overall digital forensics process [16] and its challenges in IoT forensics investigation [5] are shown in Figure 2 in six different phases, including Identification, Collection, Preservation, Examination, Analysis, and Presentation. We will briefly review these challenges in Section 3.

Fig. 2.

Digital forensics process along with its challenges in IoT forensics investigation.

6

3.

E. Khodayarseresht & S. Majumdar

Challenges in IoT Forensics

This section discusses the IoT forensics-related challenges based on five categories: interconnected devices, heterogeneous ecosystem, resource constraints, lack of standardization, and privacy-preserving data sharing. 3.1.

Interconnected devices

IoT devices are mostly interconnected either through local hubs or through remote servers or mobile phones. As mentioned earlier, the IoT forensics domain covers a broader range of evidence sources than the traditional digital forensics domain. Accordingly, one of the most critical issues is searching for evidence because sometimes the specialists are not even aware of the physical data source [17,18]. Stoyanova et al. [5] detect evidence identification as the essential part of any forensics examination, and they mention the following difficulties with regards to evidence identification: (1) Scope of the compromise and crime scene reconstruction refers to the dynamic nature of the communication in the IoT, making the crime scene reconstruction almost impossible. (2) Device and data proliferation refers to identifying useful data by forensics professionals among the increasing number of IoT devices and the massive amount of generated data. (3) Data location is another challenging issue stemming from the mobility nature of IoT devices as body area networks (BANs) in different wide area networks (WANs), and the data access permissions when the device location is even detected. 3.2.

Heterogeneous ecosystem

A heterogeneous IoT ecosystem consists of an environment with a wide range of devices from diverse manufacturers, such as medical sensors, smart homes’ appliances, and autonomous vehicles, which are different in hardware specifications, dedicated software interfaces,

Digital Forensics for Emerging Technologies: Present and Future

7

security and privacy preservation manners, network protocols, data format, data collection, and data analysis schemes. As a result, other challenges IoT forensics experts encounter in such environments are data source identification, data collection, examination, and analysis [5, 12, 14]. 3.3.

Resource constraints

In general, resource constraints mean lightweight devices with limited computational and storage capacity. IoT devices continuously generate much data. They need an unlimited amount of storage capacity and computational power to save and analyze the recorded data, which raises another challenging issue caused by resource constraints. To illustrate, when the internal storage is not enough for storing data, an overwriting approach has to be considered, which can destroy all clues for IoT forensics investigators when an attack happens [19]. Consequently, cloud computing that provides different services and an unlimited resource pool, such as computational power, memory, storage, virtualization techniques, load balancing, and networks, is employed to overcome the aforementioned problem. Nevertheless, this approach could cause other problems as follows: (1) The security issues: The vulnerabilities of cloud resources mentioned above could cause various problems for both IoT security and forensics specialists [5]. Stoyanova et al. [5] discuss the restricted forensic value of preserved data in the cloud due to the possibility of data modification by an intruder user who abuses vulnerable cloud resources. The authors also explain a cloudspecific issue called the cloud forensic problem, which, based on their claims, has not been solved yet. The problem is that, when malicious users gain access to the compromised system, they can alter or altogether remove the accessed data while eliminating their attack’s traces [20, 21]. (2) Lack of transparency: Transparency is a distinctive characteristic of cloud providers in different layers, such as hardware, software, security approaches, and networks. It can be a plus

8

E. Khodayarseresht & S. Majumdar

for cloud users to decline their responsibility for different cases mentioned above, while it can be considered a drawback for IoT forensics investigators. Cloud providers do not reflect any details about their underlying infrastructures and services for both data security and maintaining their reputation in competition with their rivals [22]. Therefore, IoT forensic experts are faced with another challenge during their examination process [5]. Besides, each cloud provider may include multiple geo-distributed cloud data centers in different countries, which can increase the difficulty of the IoT forensics investigation process due to data location identification and the different regulations for accessing the target data in the different parts of the world [5]. 3.4.

Lack of standardization

As discussed earlier, the heterogeneity of IoT devices could cause various obstacles for IoT forensics experts in collecting the required evidence, and none of the available traditional digital forensics tools and evidence collection methods are able to handle all of the available IoT devices [17, 23]. This issue can exacerbate investigation processing time, including data identification and cleaning tasks [24]. 3.5.

Privacy-preserving data sharing

In addition to all challenges mentioned above, privacy is another principal difficulty in the IoT forensics investigation, which is rooted in preserving the privacy of collected users’ sensitive data in the forensics examination process. On the one hand, cloud providers do not give access to the shared memory to IoT forensics analysts because it may contain other users’ data and could cause the violation of confidentiality agreements [22, 25]. As a result, few noteworthy methods for collecting evidence without violating user privacy rights exist [19]. Furthermore, according to Ref. [20], General Data Privacy Regulation (GDPR), legislation for encouraging corporations to secure their users’ data and punishing organizations that do not follow the security rules [26], imposes a lot of extra costs on companies because

Digital Forensics for Emerging Technologies: Present and Future

9

they need to hire more human resources, such as data protectors and data protection officers. On the other hand, maintaining the privacy of collected data and the scope of investigators’ right to access data is another challenging barrier [5, 14]. 4.

Outline of This Book

This book includes several important topics in digital forensics evaluation with the growing popularity of emerging technologies (e.g., networking, cloud computing, IoT). Particularly, the rest of this book is divided into three segments: (i) file systems and networking, (ii) cloud computing, and (iii) IoT. First, Chapters 2–4 present different techniques to conduct digital forensics in modern file systems and networks. Second, Chapters 5–7 illustrate several methods to conduct digital forensics in cloud computing while addressing its unique challenges. Third, Chapters 8 and 9 describe the techniques to enable digital forensics in IoT environments. Finally, Chapter 10 concludes this book. 5.

Conclusion

This chapter reviewed digital forensics’ recent related works and the current critical challenges in this domain. More precisely, we categorized current challenges into two different categories: challenges in IoT forensics and challenges in edge computing. During this revision, we highlighted some of the most significant challenges that traditional digital forensics tools cannot overcome. References 1. H. Abie. Cognitive cybersecurity for CPS-IoT enabled healthcare ecosystems. In 2019 13th International Symposium on Medical Information and Communication Technology (ISMICT), pp. 1–6 (2019). 2. S. Alabdulsalam, K. Schaefer, T. Kechadi, and N.-A. Le-Khac. Internet of things forensics–Challenges and a case study. In IFIP International Conference on Digital Forensics, pp. 35–48 (2018).

10

E. Khodayarseresht & S. Majumdar

3. W. Ding, H. Hu, and L. Cheng. IOTSAFE: Enforcing safety and security policy with real IoT physical interaction discovery. In the 28th Network and Distributed System Security Symposium (NDSS 2021), (2021). 4. H. Chi, Q. Zeng, X. Du, and L. Luo. PFirewall: Semantics-aware customizable data flow control for home automation systems. arXiv preprint arXiv:1910.07987 (2019). 5. M. Stoyanova, Y. Nikoloudakis, S. Panagiotakis, E. Pallis, and E. K. Markakis. A survey on the internet of things (IoT) forensics: Challenges, approaches, and open issues. IEEE Communications Surveys & Tutorials, 22(2), 1191–1221 (2020). 6. S. Torabi, E. Bou-Harb, C. Assi, and M. Debbabi. A scalable platform for enabling the forensic investigation of exploited IoT devices and their generated unsolicited activities. Forensic Science International: Digital Investigation, 32, 300922 (2020). 7. C. Douligeris and A. Mitrokotsa. DDoS attacks and defense mechanisms: Classification and state-of-the-art. Computer Networks, 44(5), 643–666 (2004). 8. C. Douligeris, O. Raghimi, M. B. Louren¸co, L. Marinos, and all ENISA CTI Stakeholders Group. ENISA Threat Landscape 2020 - Botnet. https://www. enisa.europa.eu/publications/enisa-threat-landscape-2020-botnet Accessed: 2022-12-11. 9. X. Zhang, O. Upton, N. L. Beebe, and K.-K. R. Choo. IoT botnet forensics: A comprehensive digital forensic case study on mirai botnet servers. Forensic Science International: Digital Investigation, 32, 300926 (2020). 10. J. M. C. G´ omez, J. C. Mond´ejar, J. R. G´ omez, and J. M. Mart´ınez. Developing an IoT forensic methodology. A concept proposal. Forensic Science International: Digital Investigation, 36, 301114 (2021). 11. S. Perumal, N. M. Norwawi, and V. Raman. Internet of things (IoT) digital forensic investigation model: Top-down forensic approach methodology. In 2015 Fifth International Conference on Digital Information Processing and Communications (ICDIPC), pp. 19–23 (2015). 12. T. Zia, P. Liu, and W. Han. Application-specific digital forensics investigative model in internet of things (IoT). In Proceedings of the 12th International Conference on Availability, Reliability and Security, pp. 1–7 (2017). 13. V. R. Kebande and I. Ray. A generic digital forensic investigation framework for internet of things (IoT). In 2016 IEEE 4th International Conference on Future Internet of Things and Cloud (FiCloud), pp. 356–362 (2016). 14. J. Hou, Y. Li, J. Yu, and W. Shi. A survey on digital forensics in internet of things, IEEE Internet of Things Journal, 7(1), 1–15 (2019). 15. A. MacDermott, T. Baker, and Q. Shi. IoT forensics: Challenges for the IoA era. In 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pp. 1–5 (2018). 16. A. Shalaginov, A. Iqbal, and J. Oleg˚ ard. IoT digital forensics readiness in the edge: A roadmap for acquiring digital evidences from intelligent smart applications. In International Conference on Edge Computing, pp. 1–17 (2020).

Digital Forensics for Emerging Technologies: Present and Future

11

17. M. Conti, A. Dehghantanha, K. Franke, and S. Watson. Internet of things security and forensics: Challenges and opportunities, Future Generation Computer Systems, 78, 544–546 (2018). 18. R. Hegarty, D. J. Lamb, A. Attwood, et al. Digital evidence challenges in the internet of things. In INC: the Tenth International Network Conference (INC), pp. 163–172 (2014). 19. I. Yaqoob, I. A. T. Hashem, A. Ahmed, S. A. Kazmi, and C. S. Hong. Internet of things forensics: Recent advances, taxonomy, requirements, and open challenges. Future Generation Computer Systems, 92, 265–275 (2019). 20. B. Duncan, A. Happe, and A. Bratterud, Using unikernels to address the cloud forensic problem and help achieve EU GDPR compliance. In Cloud Computing, pp. 71–76 (2018). 21. S. Watson and A. Dehghantanha. Digital forensics: The missing piece of the internet of things promise. Computer Fraud & Security, 2016(6), 5–8 (2016). 22. S. O’shaughnessy and A. Keane. Impact of cloud computing on digital forensic investigations. In IFIP International Conference on Digital Forensics, pp. 291–303 (2013). 23. S. Zawoad and R. Hasan. Digital forensics in the age of big data: Challenges, approaches, and opportunities. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp. 1320–1325 (2015). 24. D. Quick and K.-K. R. Choo. IoT device forensics and data reduction. IEEE Access, 6, 47566–47574 (2018). 25. A. Nieto, R. Rios, and J. Lopez. IoT-forensics meets privacy: Towards cooperative digital investigations. Sensors, 18(2), 492 (2018). 26. A. Chapman. Intruder detection through pattern matching and provenance driven data recovery. In Proceedings of the Cloud Computing, pp. 58–64 (2018).

This page intentionally left blank

c 2023 World Scientific Publishing Company  https://doi.org/10.1142/9789811273209 0002

Chapter 2

Evaluating Deleted File Recovery Tools per NIST Guidelines: Results and Critique Andrew Meyer∗ and Sankardas Roy† Computer Science Department, Bowling Green State University, Bowling Green, OH, 43403, USA, ∗ [email protected][email protected]

To carry out post-mortem investigations of cybercrimes, professionals use various digital forensics (DF) tools. To aid in the standardization of DF tools, National Institute of Standards and Technology (NIST)’s Computer Forensics Tool Testing (CFTT) program has compiled a set of expectations for these tools’ behavior. DF tools’ meeting these expectations is critical for the integrity of forensic analysis. In this chapter, we focus on standardization of Deleted File Recovery (DFR) tools, which constitute a specific class of DF tools. We design a set of test images across widely used file systems. We run extensive experiments on these test images as well as the test images provided by CFTT to evaluate the DFR tools, and to our surprise we find that many DFR tools that are available in the market do not fully meet the CFTT expectations. We report a comparative evaluation of these tools per CFTT expectations, which could help the user choose the right tool. We also identify the factors which may result in a failure for DFR tools, and reflect on the applicability of the CFTT expectations. We hope that our current report will trigger more research on standardization of DFR tools from the researcher community.

13

A. Meyer & S. Roy

14

1.

Introduction

Nowadays, government organizations as well as private enterprises encounter cyberattacks quite frequently. After such an attack, law enforcement typically conducts a digital forensics (DF) investigation [1, 2] with the goal of identifying what caused the attack, how it was executed, and who the potential perpetrators are. In addition to cyberattacks, we encounter many other digital crimes, such as theft of intellectual properties, compromise of private data, and more. To conduct post-mortem analysis of cyberattacks and digital crimes, forensics professionals employ multiple DF tools. The success of the investigation depends on the accuracy and overall quality of DF tools. Furthermore, as the investigation often culminates with a judicial proceeding in the court of law, integrity of DF tools is critical. On the one hand, a court may throw out a piece of evidence if it was collected by a DF tool that does not follow the standards, and on the other hand, inaccurate results from a DF tool can lead to improper prosecution of an innocent defendant. DF tools that are available in the market come from many vendors, and the government needs to keep an eye on the integrity of these tools. As a standardization initiative of DF tools, Computer Forensics Tool Testing Program (CFTT) [3] at National Institute of Standards and Technology (NIST) published a list of expectations for these tools. In this chapter, we consider standardization of Deleted File Recovery (DFR) [4, 5] tools which is a special type of DF tools. Given a storage device, such as a hard disk or a USB drive, a DF professional uses a DFR tool to search for and retrieve deleted files. DFR tools find wide use in real-life DF investigations. For instance, after capturing a hard disk from a suspect’s computer, a DF professional uses a DFR tool to retrieve the files that the suspect might have deleted to destroy some important artifacts. A retrieved file might add critical evidence to the case in hand. In other words, success or failure of a DFR tool can sway the outcome of a case. Depending on the working principle of DFR tools, they fall into two subtypes: Metadata-based DFR tools and file carving tools. The first subtype utilizes the file system metadata (if available) to identify

Evaluating Deleted File Recovery Tools per NIST Guidelines

15

(a.k.a. recover) a deleted file. The second subtype (i.e., a file carving tool) does not rely on the file system metadata and instead utilizes the target file’s header and footer signature for the recovery task. In this chapter, we consider both metadata-based DFR tools and file carving tools, and we evaluate them up to the NIST CFTT expectations. Scientific evaluation of a DFR tool is challenging because multiple factors of the given forensics scenario play a role in the tool’s success or failure. A few such factors are (a) whether the target file is fragmented, (b) whether the target file is (partially) overwritten by another file, (c) presence of other active or deleted files in the storage device (that hosts the files), etc. When we compare DFR tools, we make sure that these factors are the same to ensure a fair and scientific comparison. To evaluate the file carving tools, NIST CFTT portal provides a set of test images which are designed considering the above factors. We use these test images in our study of file carving tools. For metadata-based DFR tools, in addition to the aforementioned factors, another factor is type of the file system (e.g., FAT versus NTFS). Unfortunately, NIST CFTT does not provide any test image for evaluation of metadata-based DFR tools [4]. So, for each file system type we ourselves design a set of test images per NIST guidelines [4], considering each of the above factors independently. In particular, we designed 14 test images for FAT and NTFS, and we claim that these cover most of the real-life scenarios. Via extensive experiments we evaluated multiple frequently used DFR tools,a and we find that many tools do not meet one or more NIST expectations. We find that a tool correctly retrieves the deleted file in some scenarios, whereas it fails in other scenarios. Furthermore, we observe that two tools can perform differently on the same test image. From our experience, whenever possible, we strive to provide logical explanations of these behaviors. We find that in many cases success or failure of a tool depends on its design principle. For a We

have chosen a few tools as the subject of our study, but such choosing does not imply authors’ recommendation or endorsement of any particular tool.

A. Meyer & S. Roy

16

instance, a file carving tool may employ a strong match policy to identify a deleted file, which can help prevent false positives (when a tool retrieves garbage data instead of a real file), but this may cause the tool to miss some deleted files. Conversely, employing a weak match policy leads to opposite outcomes. We compare performance of the DFR tools and report a comparative analysis. In our opinion, such comparative analysis is beneficial to both DF tool users and vendors: it can help users choose the right DF tool, and it can help vendors identify room for improvement and their niche in the market. NIST CFTT publishes reports on DFR tools’ performance, but only a subset of DFR tools are covered [6,7]. We need to expand the coverage as more and more tools come to the market. Furthermore, some reports on the CFTT portal are outdated versions of the DFR tools, which warrants updating. For instance, at the time of writing, the reports on Autopsy [8] and FTK [9], two frequently used DF tools, which came out in 2014, have received numerous updates since then. As we have tried to compare DFR tools’ behavior in many test scenarios with NIST CFTT expectations, we have also gotten a firsthand experience of applicability of the NIST CFTT guidelines in a practical setting. For instance, we observe some situations where a tool may be held to an impossible standard. We provide a critique on applicability of the NIST CFTT guidelines.

2.

Background

We present some of the background material in this section that will prepare us for the latter part of the chapter. In particular, we present an overview of metadata-based recovery and file carving. Then, we present the NIST CFTT expectations for such recovery tools. 2.1.

Metadata-based deleted file recovery

This recovery mechanism relies on the residual metadata of the deleted files, which are still present in the file system. Note that in a typical file system, a file has some metadata in addition to the actual

Evaluating Deleted File Recovery Tools per NIST Guidelines

17

file content. However, different file systems manage the metadata in different ways. In Sections 2.1.1 and 2.1.2, we present an overview of FAT and NTFS file systems, respectively, which is often simplified to aid readability. Then, in Section 2.1.3, we discuss how metadata can help in recovery of a deleted file. 2.1.1.

FAT file system

For each file, the FAT file system maintains an important piece of metadata, which is known as a directory entry and is of 32 bytes, containing three key elements: (i) name of the file (say bar.txt), (ii) size of the file, and (iii) index of the starting cluster that holds the file content. The file system maintains the index of the other clusters of the file (bar.txt) via a global table known as the FAT table. For each cluster in the file system, there is an entry in the FAT table, which holds the status (allocated versus unallocated) of the cluster, and if allocated, it also tells us what the next cluster is. In particular, if xth entry is 0, we infer that the cluster is unallocated. Otherwise, if xth entry is y, then that means the next cluster after cluster x (as part of the same file) is y. When xth entry holds a special value called end of file (EOF), we infer that cluster x is the last cluster of a file. For instance, Figure 1 illustrates the directory entry and data clusters of a file bar.txt. The FAT table is also shown. We can infer from the FAT table that the file’s (bar.txt) content is stored in contiguous clusters, from cluster 200 to cluster 204. When a file is deleted in FAT system, most of the actual content and metadata might remain intact in many cases. Figure 2 illustrates the status of the file system after the file bar.txt is deleted. All fields of the directory entry remain unchanged except the first character of the file name is replaced with an underscore (‘ ’) to flag the deleted status. The main change happens in the FAT table where all entries that were associated with bar.txt are zeroed. However, the clusters (holding the file content) remain intact until some other file (partially or fully) overwrites them. So far, we have considered a file whose actual content is stored in contiguous clusters. It is also possible that a file’s content is not

18

A. Meyer & S. Roy

Fig. 1. The actual data part and metadata of file bar.txt in a FAT file system. The directory entry of this file and the actual file content clusters (shaded) are shown on the left. The FAT table is shown on the right.

Fig. 2. The actual data part and metadata of bar.txt after the file is deleted, whereas its entries in the FAT table (i.e., 200th entry to 204th entry) are zeroed.

Evaluating Deleted File Recovery Tools per NIST Guidelines

19

Fig. 3. The actual content and metadata of bar.txt is shown. Per the FAT table, the file has two fragments (clusters 200–201 and cluster 204).

stored in contiguous clusters, and such a file is called a fragmented file. Figure 3 illustrates an example where file bar.txt is fragmented. 2.1.2.

NTFS file system

NTFS file system does not maintain a global table (like the FAT table) that holds the status for each cluster. Instead, NTFS maintains a global table called Master File Table (MFT) that holds information for each file in the system. In particular, each file f has an entry in MFT, which holds information about file f , and the size of this entry is 1024 bytes. If file f is small, then the whole file content as well as the metadata will be stored inside f ’s MFT entry; otherwise, f ’s content is non-resident and is stored in other clusters. Figure 4 illustrates an example where the file bar.txt’s content is non-resident, and it has two fragments. When a file f is deleted in NTFS, the corresponding entry in MFT is flagged as deleted and the

20

A. Meyer & S. Roy

Fig. 4. To illustrate NTFS file system, the MFT entry of bar.txt and the actual content carrying clusters are shown. This file has two fragments (clusters 200–201 and cluster 204).

corresponding clusters (if any) are flagged as deleted, but the metadata and file content generally remain intact. Note that NTFS file system does not have specific zones for data area in contrast to FAT having a specific area for data and specific area for the FAT table. 2.1.3.

Recovering deleted files

From the previous discussion, we note that in many situations the metadata (e.g., directory entry in FAT or MFT entry in NTFS) of the deleted file remains unchanged and can be used for identifying and recovering the deleted file. As an example, the directory entry of bar.txt in Figure 2 tells us that the file’s content starts from cluster 200, and using the size field value (e.g., 5) we can infer that the file’s content is hosted in the cluster chain from cluster 200 to cluster 204, assuming that there is no fragmentation. We can recover the file by reading the raw content of these five clusters, e.g., by using dd command in Linux.

Evaluating Deleted File Recovery Tools per NIST Guidelines

21

We encounter one critical challenge while recovering a fragmented file in FAT because the directory entry of a file does not contain any information about the fragments. However, we do not face this challenge in NTFS file recovery because the MFT entry (as shown in Figure 4) does contain the start and end cluster index for each run (i.e., fragment). 2.2.

File carving

File carving does not rely on the file system metadata, and hence is independent of the type of the file system (e.g., FAT versus NTFS). In file carving, we assume that the target file has a known header and footer signature (i.e., a sequence of special bytes). As an example, a JPG file has such a header and a footer signature — in particular, certain two bytes for the header and certain two bytes for the footer. The file carving process basically scans the whole storage space (i.e., the target storage device from where we plan to recover deleted files) byte by byte and identifies each match of header and footer signature. Then, the content between any header and any footer is potentially a recovered file. Depending on the strong versus weak match policy, there could be false positives (i.e., bogus files being retrieved) and false negatives (i.e., files missed to be retrieved). 2.3.

NIST CFTT guidelines

NIST CFTT program has published guidelines [4, 5] on how to evaluate DFR tools. 2.3.1.

For metadata-based DFR

NIST’s guidelines [4] for evaluating metadata-based DFR tools consist of four core features and several optional features. We evaluate based only on the core features and leave the optional features for later work. In this section, we list the core features as they appear in the NIST guidelines document, along with our own interpretation and commentary.

A. Meyer & S. Roy

22

DFR-CR-01: “The tool shall identify all deleted File System-Object entries accessible in residual metadata” [4]. We say a tool fulfills this core feature if it reports something for each file system metadata entry which has been marked as deleted. DFR-CR-02: “The tool shall construct a Recovered Object for each deleted File System-Object entry accessible in residual metadata” [4]. We say a tool fulfills this core feature if it outputs a file for each of the deleted files identified per DFR-CR-01, even if the output file is empty. DFR-CR-03: “Each Recovered Object shall include all nonallocated data blocks identified in a residual metadata entry” [4]. Our interpretation of this feature is file system-dependent as there are differences in what information is available in metadata. In the FAT file system, it is impossible to detect fragmentation purely from metadata, so we say a tool fulfills this core feature if it recovers at least all unallocated clusters that were allocated to the first fragment. In the NTFS file system, the locations of all fragments are left in metadata after file deletion, so to fulfill this core feature, a tool must recover all unallocated clusters that were allocated to the deleted file. DFR-CR-04: “Each Recovered Object shall consist only of data blocks from the Deleted Block Pool” [4]. We say a tool fulfills this core feature as long as the recovered file contains only data from the original deleted file, or null data to represent parts of the file that have been overwritten or are otherwise inaccessible. 2.3.2.

For file carving

NIST’s guidelines [5] for evaluating file carving tools consist of five core features. In this section, we list the core features as they appear in the NIST guidelines document, along with our own interpretation and commentary. FC-CR-01: “The tool shall return one carved file for each supported file header signature from a source file that is present in the search arena” [5]. Each file from the original disk will begin with a header

Evaluating Deleted File Recovery Tools per NIST Guidelines

23

signature specific to its file format. We say a tool fulfills this core feature if it carves a file starting at each of those header signatures. In other words, tools that perform well on this core feature have a high “hit rate.” FC-CR-02: “A carved file shall only contain data blocks from the search arena” [5]. In other words, the tool should only work within the drive or partition it is given, and should not try to carve from areas that are out of bounds. FC-CR-03: “All data blocks in a carved file shall originate in a single source file” [5]. We say a tool fulfills this core feature if each recovered file only contains data from one file on the original disk. FC-CR-04: “The file type of a carved file shall match the file type of its contents” [5] We interpret this to mean that the file extension given to a recovered file must accurately describe the format of the file data. We exclude false positives from this evaluation because their data are highly unlikely to be of any file format. So, we only consider files which were carved starting from a valid header signature. FC-CR-05: “The tool shall return carved files in a state that conforms to a valid file of the carved file type” [5]. We say a tool fulfills this core feature if each recovered file can be parsed without error by some application software. We use the ImageMagick tool suite to evaluate this. 3.

Objectives

DFR tools, which come in both metadata-based and file carving variants, are software for recovering deleted data from a digital storage device. We design experiments to evaluate a selection of DFR tools according to guidelines set by NIST CFTT. We specifically investigate the following questions: • How well do our selected DFR tools meet each of the NIST CFTT guidelines? • What conditions influence the success and failure of DFR tools?

A. Meyer & S. Roy

24

4.

Approach

To properly evaluate DFR tools, we must see how they perform in various file recovery scenarios. We do this by running the tools on various disk images, each containing a file system with some deleted files. The recovered files output by a DFR tool are examined to see how well the tool meets the NIST guidelines for deleted file recovery. By using test images, each designed to emphasize specific recovery tasks, we can get a sense of a tools’ strengths and weaknesses, and what tasks are especially difficult for certain types of tools. With this in mind, we designed several disk images to test metadata-based DFR tools. NIST CFTT has already designed a set of images for testing file carving tools, so we use those rather than creating our own. A high-level illustration of our methodology can be seen in Figure 5. 4.1.

Metadata-based tools

In this section, we explain our process for testing metadata-based DFR tools.

Fig. 5. A read-only disk image, a file which contains the raw data of a file system, is obtained for each test case. For metadata-based test cases, we create a file system on an external drive and delete files from it, then save it as a disk image. For file carving test cases, we use a set of disk images prepared by NIST CFTT. In either case, disk image is input to a DFR tool, whose output is checked for compliance with the NIST guidelines for metadata-based DFR or file carving, depending on the tool.

Evaluating Deleted File Recovery Tools per NIST Guidelines

4.1.1.

25

Designing recovery scenarios

We begin by designing a set of test cases to simulate common challenges of metadata-based deleted file recovery. In order to get the most information possible about each tools’ capabilities, we aim to simulate a wide variety of recovery scenarios. We first isolate the most basic challenges of file recovery, and create test cases for them. Ideally, these test cases should represent the most basic “building blocks” of file recovery scenarios, which can be used to compose more complex scenarios. When appropriate, we also create test cases representing the basic combinations of these simple cases. Our intention with this atomic approach is to create test cases that are generalizeable to the majority of recovery scenarios, even the many which we do not explicitly cover. If a tool performs well on the “building blocks” and their basic combinations, we can predict that it will perform similarly on more complex combinations. In addition to this philosophy, we must remain within the scope of the NIST guidelines. NIST requires test images to be “created and deleted in a process similar to how an end-user would create and delete files” [4]. We take this to mean that any interaction with the file system must be through a standard operating systems’ read and write operations. We are not allowed to edit the file system directly, as “files and file system metadata that are specifically corrupted, modified, or otherwise manipulated to appear deleted” [4] are explicitly out of scope. Within these constraints, we can induce two phenomena which make file recovery more challenging: fragmentation and overwriting. Following the aforementioned goal of making our tests atomic, all our test cases (besides the first trivial case) use fragmented and/or overwritten files as building blocks. As such, our test cases fall into five general categories: (1) no fragmentation or overwriting, (2) only fragmentation, (3) only overwriting, (4) both fragmentation and overwriting, and (5) fragmentation “out of order.” Each test case is listed in Table 1 along with a brief description. We have given each a distinct name in line with the naming scheme for the CFTT file carving test cases. A selection of test cases are

A. Meyer & S. Roy

26 Table 1.

Metadata-based DFR test cases. Basic deleted file

basic

Deleted file is contiguous and the only file on the disk Deleted file is fragmented

fragments1 fragments2

Deleted file is fragmented around an active file (illustrated in Figure 6) Deleted file is fragmented around another deleted file Deleted file is overwritten

overwrite1 overwrite2 overwrite3 overwrite4 overwrite5 overwrite6

Deleted file is overwritten at the front by an active file Deleted file is overwritten in the middle by an active file (illustrated in Figure 7) Deleted file is completely overwritten by an active file Deleted file is overwritten at the front by another deleted file (illustrated in Figure 8) Deleted file is overwritten in the middle by another deleted file Deleted file completely overwritten by another deleted file

Deleted file is fragmented and overwritten combo1

combo2

Deleted file is fragmented around an active file, and the second fragment is overwritten by another active file (illustrated in Figure 9) Deleted file is fragmented around an active file, and the second fragment is overwritten by another deleted file Deleted file is fragmented out-of-order

disorder1 disorder2 disorder3

Deleted file fragmented out-of-order with empty space in between the fragments (illustrated in Figure 10) Deleted file fragmented out-of-order with an active file in between the fragments Deleted file fragmented out-of-order with a deleted file in between the fragments

illustrated, with each column in an illustration portraying the state of the file system at a point in time. Each file is given a unique letter and shading, and the start and end of a fragmented file are denoted when relevant. Fragmentation is trivial in NTFS because information about the runs of a file persist after deletion. Since this renders the fragments

Evaluating Deleted File Recovery Tools per NIST Guidelines

Fig. 6.

Fig. 7.

27

fragments1: File A is fragmented and has been deleted.

overwrite1: File A has been deleted and partially overwritten by file B.

and disorder cases functionally identical, we test only the fragment cases for NTFS. Cases overwrite2 and overwrite5 cannot be created through regular file operations in NTFS due to NTFS’s file allocation behavior. Thus, we also exclude overwrite2 and overwrite5 for NTFS. We do not exclude any test cases when using FAT.

28

A. Meyer & S. Roy

Fig. 8. overwrite4: File A has been deleted and partially overwritten by file B, which has since been deleted.

Fig. 9. combo1: File A is fragmented and has been deleted. The second fragment has then been overwritten by file C.

4.1.2.

Creating test images

We create the test cases on a 32-GB flash drive, using 4 MiB partitions for FAT cases and 6 MiB partitions for NTFS cases. For each case, it is important to start by writing over the partition with zeroes, ensuring the image is easily reproducible. Next, a file system

Evaluating Deleted File Recovery Tools per NIST Guidelines

Fig. 10.

29

disorder1: File A is fragmented out-of-order, and has been deleted.

should be written to the partition, and files should be written to it and deleted until the file system matches one of the planned test cases. We used text files, each containing a single repeated letter (e.g., “aa1M” is a text file containing 1 MiB of the letter ‘a’). Note that text files cannot be recovered with file carving, so this choice forces tools which combine both DFR methods to rely solely on metadata-based recovery. We write files to the test file system by copying them from another drive, and appending to files when we need to force fragmentation. Once the test file system matches one of the planned test cases, we use the dd utility to create a disk image of that partition. Since the disk image can be made read-only, it is safer and more convenient to run tests on the disk image instead of the physical drive. Note that we create FAT test cases using Ubuntu 18.04 and NTFS test cases using Windows 10. 4.1.3.

Challenges

Creating test images can be difficult without a solid understanding of low-level file system behavior. For example, a newly written file may not have its data written to the disk right away. Even with fast modern storage, disk I/O involves a considerable amount of overhead. To improve performance, operating systems generally prefer to write

30

A. Meyer & S. Roy

data in one larger batch versus several small batches. So, the operating system will often store write operations in a cache and wait for a more optimal moment to actually write to the disk. Typically, this optimization only causes problems in the event of sudden power loss or improper shutdown, but it can also make the state of the file system less predictable. We found that we were often deleting a file while it was still cached, meaning it would never be written to the disk to begin with. This is obviously a problem as it leaves nothing on the disk to recover. Thankfully, Linux and Windows both provide a sync system call, which causes the cached writes to be performed immediately. Calling sync before each file deletion resolves the issue; alternatively, unmounting the file system triggers similar behavior. Since most of the test images involve manipulating file data into certain arrangements, it is also helpful to have some understanding of allocation algorithms. Allocation algorithm refers to the steps the operating system takes to decide where on the disk to write new data. Generally, the operating system wants to limit fragmentation, as contiguous files are more efficient to read and write to. Common allocation algorithms like “first available,” “next available,” and “best fit” will place files according to that general principle, but use differing strategies to do so. Understanding the allocation algorithm removes a lot of trial and error from the process of making test images. For example, we observed that Linux uses a “next available” algorithm when writing to FAT. After mounting the file system, the first file to be written will be placed at the first available space in the file system. However, unlike the “first available” algorithm, the operating system remembers where it last placed a file, and it will place the next file at the first available space after that saved location, even if space opens up at the beginning of the file system. Windows, on the other hand, uses a “best fit” algorithm when writing to NTFS. This allocation algorithm takes a more proactive approach to reducing fragmentation by writing each file to the smallest space in which it can fit without being fragmented.

Evaluating Deleted File Recovery Tools per NIST Guidelines

4.1.4.

31

Recovering Files

We chose to test five popular DFR tools: Autopsy [10], FTK Imager [11], Magnet AXIOM [12], Recuva [13], and TestDisk [14]. For Autopsy, we performed a standard recovery with ingest modules disabled. For FTK Imager, we used the free version and performed a standard recovery with default settings. For Magnet AXIOM, we used AXIOM Process to perform a “full scan,” then exported all files accessible from “Filesystem View” in AXIOM Examine. For Recuva, we used the free version to perform a standard recovery with default settings. For TestDisk, we used “file undelete” in “Advanced Filesystem Utils” to recover files. 4.1.5.

Results

After running a DFR tool for a test case, we examine the files output by the tool to see if the NIST guidelines have been met. Note that a file does not need to be perfectly recovered for a tool to meet the guidelines, although all the guidelines would be met in that case. Sometimes, as with overwritten files, part of the file’s data literally no longer exist on the disk, so recovering the entire file is impossible. The NIST guidelines are designed with this reality in mind. Most tools recover files perfectly for FAT fragments1, and all tools recover files perfectly for FAT basic, NTFS basic, and NTFS fragments1 and fragments2. For the other cases, we need to examine the recovered files closely to see which core features are being met. We judge each tool as either meeting or failing each core feature individually for each test case. Results are shown in Figures 11 and 12 for FAT and NTFS test cases, respectively. Note that in the one test case where DFR-CR-01 is not met, meaning no deleted file was detected, we cannot make a judgement for the other core features. Recovering fragmented files: We observe that tools typically use one of two strategies in the event of FAT fragmentation. The first is to simply ignore fragmentation and recover the full length of a

A. Meyer & S. Roy

32

Autopsy

FTK

Recuva

Magnet AXIOM

TestDisk

Fig. 11. Test results on metadata-based DFR tools using FAT-formatted test images. Each row corresponds to a test image and each column corresponds to a core feature. An “O” indicates success and an “X” indicates failure.

file as though it is contiguous, even if there are active files in that length. Recuva and Magnet AXIOM seem to take this approach. The second strategy is to still recover the full length of the file, but only from unallocated spaces. So, any active files will be skipped over. Autopsy, FTK, and TestDisk seem to take this approach. This is most easily observed in FAT fragments1 ; Autopsy, FTK, and TestDisk will skip over file B to recover all of file A, while Recuva and Magnet AXIOM will recover the first fragment of file A along with the active file B. This is why Recuva and Magnet AXIOM fail DFR-CR-04 for that case. However, when a file is fragmented around another deleted file, like in FAT fragments2, all tools take the first approach and erroneously return file B as part of file A. All tools seem to struggle

Evaluating Deleted File Recovery Tools per NIST Guidelines

Autopsy

FTK

Recuva

33

Magnet AXIOM

TestDisk

Fig. 12. Test results on metadata-based DFR tools using NTFS-formatted test images. Each row corresponds to a test image and each column corresponds to a core feature. An “O” indicates success, an “X” indicates failure, and a “n/a” indicates no judgement could be made.

with out-of-order fragmentation. When the first fragment is at the very end of the file system, Recuva, FTK, and TestDisk only recover the first fragment, Autopsy returns a small amount of null data, and Magnet AXIOM returns an empty file and an error message. In NTFS, more information about a file’s location remains after the file is deleted, so recovering fragmented files is trivial. All tools we tested are able to handle fragmentation in NTFS with no issues. Recovering overwritten files: We observe that when a deleted file is overwritten by an active file, most tools still attempt to recover the overwritten part as though it was part of the original file, failing DFR-CR-04. The only tool to consistently meet DFR-CR-04 in these cases is FTK Imager, which only recovers data from before the overwritten part. TestDisk also does this for FAT overwrite2 only, otherwise behaving like the other tools. In FAT, Autopsy only recovers the first cluster of an overwritten file, regardless of what part was

A. Meyer & S. Roy

34

overwritten. In NTFS, it gives the same results as other tools. For FAT overwrite1 and overwrite2, Magnet AXIOM recovers overwritten sections up to the end, but no further. This is a bit strange as it suggests Magnet AXIOM can detect overwriting, but still returns the overwritten part. For other cases, it behaves like the other tools. When a deleted file is overwritten by another deleted file, even FTK acts as though the overwritten part belongs to the original file. This suggests that the tools which can detect overwriting do so based on the allocation status of each data block; other than that, they only consider the metadata of the file being recovered, not the files around it. Miscellaneous: Several FAT test cases (overwrite2, combo1, disorder1, disorder2, and disorder3 ) cause Autopsy to return a 1.5-KiB file containing null data. This is particularly odd because a FAT cluster is 2 KiB; making these the only recovered objects not divisible into clusters. Given the odd size and contents, we suspect this is some sort of error state rather than an attempted recovered file. TestDisk does not identify any deleted file in NTFS overwrite3. While the deleted file’s data are entirely overwritten in this case, information about the file is still present in metadata. Since this only happens in NTFS and not FAT, meaning the precise location of the deleted file is still available, it is possible that TestDisk is able to detect the fact that the file has been overwritten, and simply ignores it. Even if this is the case, the file information is still available in metadata, so for TestDisk to meet DFR-CR-01, it must identify the file regardless of whether it can be recovered. 4.2.

Carving-based tools

In this section, we explain our process for testing file carving-based DFR tools. 4.2.1.

CFTT test cases

To evaluate the file carving tools, we used six disk images prepared by CFTT [15]. Each of these images contains between 20 and 40

Evaluating Deleted File Recovery Tools per NIST Guidelines Table 2. basic nofill simple-frag braid disorder

partials

35

File carving test cases.

40 contiguous files, with space in between each file 40 contiguous files, with no space in between each file 40 fragmented files, with space in between each fragment 10 contiguous files and 10 fragmented files, which are fragmented around each other in an A-B-A-B pattern 35 fragmented files, 30 of which are fragmented out-of-order such that the fragment containing the footer comes before the fragment containing the header 15 complete files, 5 of which are fragmented, and 25 incomplete files where at least one fragment has been overwritten or manually destroyed

graphical files of the JPG, PNG, GIF, BMP, and TIFF formats. Importantly, the CFTT images do not contain valid file systems. This ensures that a tool which utilizes both DFR methods can be evaluated solely on its file carving ability. Despite the differences between the methods, metadata-based DFR and file carving are complicated by the same two-file system behaviors: fragmentation and overwriting. Thus, the CFTT test cases are somewhat similar to the ones we created for metadata-based DFR; each is built around a specific variation of fragmentation or overwriting of deleted files. Summaries of each CFTT test image are given in Table 2. 4.2.2.

Recovering files

We chose to test four popular file carving tools: PhotoRec [16], Foremost [17], Scalpel [18], and Magnet AXIOM [19]. Note that since tests were run at different times, the version of Magnet AXIOM used for carving tests is newer than the version used for metadata-based tests. The settings we used when testing each tool are as follows: For PhotoRec and Foremost, we used the default settings. For Scalpel, we manually enabled recovery of the JPG, PNG, GIF, BMP, and TIFF formats, and otherwise used default settings. For Magnet Axiom, we ran a “sector-level scan” with default settings in AXIOM Process and

A. Meyer & S. Roy

36

exported all graphical files from “Artifact View” in AXIOM Examine. We also exported an XML carving report from Axiom Examine as unlike the other tools it does not create one automatically. 4.2.3.

Evaluating results

The main challenge when testing file carving tools, is verifying the results. For metadata-based tools, this is fairly simple; the deleted files in our tests are all simple text files which can be checked at a glance. However, text files lack a standard header and footer and thus cannot be recovered with file carving, so the test images use more complicated formats such as JPG. As a result, it would be impractical to evaluate a tool based on the carved files alone. Fortunately, all four of the carving tools we evaluated produce an itemized report of carving results. CFTT provides detailed information about the contents of the test images, so we can compare the tools’ reports with the true arrangement of files in each image. This is particularly important for FC-CR-01, FC-CR-02, and FC-CR-03 since they are concerned with the specific data blocks the tool carves. PhotoRec’s report gives a start and end address for each file, and multiple start and end addresses if it attempts to carve a fragmented file. The other tools’ reports only list the starting address and size of each file, so we assumed they never attempt to carve more than one fragment per file. Checking FC-CR-04 is slightly complicated by the fact that carving tools cannot recover the filename, but we can use the first sector listed for each file in the carving report to match them with the original files and check the file extensions. For FC-CR-05, we make use of the identify command from the ImageMagick [20] tool suite. Upon input of a graphical file, identify will output the file format and other information. Relevant to FC-CR-05 is that the return code will indicate whether the file is valid or corrupt. We add the -regard-warnings flag so an error or warning while parsing the file will mean the file is considered corrupt.

Evaluating Deleted File Recovery Tools per NIST Guidelines

4.2.4.

37

Results

For FC-CR-01, Scalpel and Magnet AXIOM score between 60% and 70% for all test cases, Foremost scores between 37% and 63%, and PhotoRec ranges from a perfect score to as low as 17% depending on the test case (Figure 13). A high score on this core feature means a tool detects most of the original deleted files in a given test case, in other words, it has a high “hit rate.” All tools perfectly fulfill FC-CR-02, meaning they never try to carve data from outside the drive or partition given as input. For FC-CR-03, each tool achieves high scores on some test cases and low scores on others. A high score on this core feature means most of the files a tool recovers contain data from only one other source. Note that carving from multiple sources may still result in a valid file, such as in Figure 14. All tools perfectly fulfil FC-CR-04, meaning every carved file that represents a “hit” is given the same file extension as its corresponding original file. We limit this only to “hits,” excluding false positives, to avoid overlapping with FC-CR-05. For FC-CR-05, PhotoRec generally scores very high, Foremost and Magnet AXIOM get high scores for some test cases and low scores for others, and Scalpel scores very low for all test cases. A high score on this core feature means most of the files a tool carves are valid files of some image format and can be properly parsed as that format (we test with the identify tool from ImageMagick). This may imply the tool checks carved files for validity before outputting them. Note that identify throws a warning for six of the TIFF files used to create CFTT’s test images, so we ignore the warning for carved versions of those six files.

5.

Discussion

In this section, we look at the results from multiple angles. We first return to the questions posed in Section 3 to guide our analysis of the chosen DFR tools. Then, we shift our analysis to the NIST guidelines

38

A. Meyer & S. Roy

Foremost

Magnet AXIOM

PhotoRec

Scalpel Fig. 13. Results from testing file carving tools. Each row represents a test case, and each column represents a core feature. Since each test image contains many files, we score a tool’s compliance with the NIST guidelines by determining the percentage of files for which each core feature is met. In each case, scores are given in both fractional and percent form.

themselves, by discussing a few situations in which the guidelines are unclear, ambiguous, or misleading. 5.1.

Critique of DFR tools

As stated in Section 3, our objectives are to assess the performance of the DFR tools according to the NIST guidelines, and to determine the factors influencing a tool’s success and failure.

Evaluating Deleted File Recovery Tools per NIST Guidelines

39

Original

Carved Fig. 14. iris-lavender.bmp from the disorder test case, as carved by PhotoRec. By cross-referencing NIST CFTT’s map of the test image with PhotoRec’s report file, we find that the carved file is actually composed of data from three files: iris-lavender.bmp, iris-yellow.bmp, and smoked-chicken.bmp. In this case, PhotoRec fails to meet FC-CR03, despite outputting a valid BMP image.

5.1.1.

Performance of metadata-based tools

None of the tools consistently meet the NIST CFTT guidelines; however, every tool meets the guidelines in some cases. All tools meet DFR-CR-01 except for TestDisk in one case, where it fails to report

40

A. Meyer & S. Roy

a deleted file in metadata. In that case, the file data are completely overwritten, so the tool may be deliberately choosing to omit the file; however, our interpretation of the guidelines would have it instead return an empty file. All tools meet DFR-CR-02 without exception, as they attempt to return something for every deleted file. In some FAT cases, Magnet AXIOM and Autopsy do not recover all data indicated by metadata, so they fail DFR-CR-03. In many cases, including both file systems and both tools, the recovered file incorrectly contains data from other files, causing the tool to fail DFR-CR-04. Almost all the times a tool fails to meet the guidelines, it is due to this core feature. 5.1.2.

Conditions for success of metadata-based tools

The main factors that cause a metadata-based DFR tool to fail are fragmented files and overwritten files. As evidenced by the basic case, all tools meet the NIST guidelines when neither of those factors are present. Most FAT tools detect simple fragmentation in FAT, but Recuva and Magnet AXIOM do not, failing DFR-CR-04 for the fragments1 case. When a file is fragmented around another deleted file, all tools fail to detect fragmentation, which means they all fail DFR-CR-04. Out-of-order fragmentation (the disorder cases) in FAT causes all tools to fail to recover the full deleted file, since FAT metadata only provide the location of the first fragment, and it is difficult to detect where the other fragments will be if they are out of order. Three of the tools still meet DFR-CR-03 by just recovering the first fragment, but Autopsy and Magnet AXIOM only return null data or an empty file, failing DFR-CR-03. Most times a tool fails to meet the guidelines, it specifically fails to meet DFR-CR-04. This occurs most often in the overwrite and combo cases, which all feature overwritten deleted files. In these cases, we expect a tool to not recover anything from the parts of a file that were overwritten, since they now contain a different file’s data. Thus, tools which are more conservative are more likely to meet this core feature (at the risk of failing to meet DFR-CR-04). The only tool

Evaluating Deleted File Recovery Tools per NIST Guidelines

41

to meet DFR-CR-04 for more than half of the test cases is FTK Imager. If a deleted file are overwritten by an active file in FAT, FTK Imager returns only up to the point the file was overwritten, then stops. If a deleted file is overwritten by an active file in NTFS, FTK Imager recovers all the remaining file data, replacing the overwritten part with null data. In either file system, if all accessible file data are overwritten, FTK Imager returns an empty file. The only cases where even FTK Imager returns data from the overwriting file are overwrite4, overwrite5, overwrite6, and combo2, the cases where the overwriting file has also been deleted. These cases are difficult for metadata-based DFR tools in general, as the tools tend to rely on file allocation status to detect overwriting (and fragmentation in FAT). Autopsy fails to meet DFR-CR-03 for the FAT overwrite2 and combo1 cases, in which the start of the deleted file is intact but a later part is overwritten. Instead of recovering all the intact data it can, it simply returns a single cluster. It may be observed that while DFR-CR-03 and DFR-CR-04 account for almost all the failures, a tool only ever fails to meet one of these core features. This makes sense as DFR-CR-03 prescribes what a tool should recover at minimum, and DFR-CR-04 prescribes what a tool should not recover. Thus, meeting the guidelines requires striking a balance between a conservative recovery strategy and a more aggressive one. 5.1.3.

Performance for file carving tools

In most cases, the tools do not fully meet the NIST CFTT expectation; however, they often come close. All tools meet FC-CR-02 and FC-CR-04 in all cases. The only case in which a tool fully meets all five core features is PhotoRec on the nofill test case. PhotoRec has by far the best scores for FC-CR-05 and comes close to fully meeting it. However, it is inconsistent on FC-CR-01 and FC-CR-03. Foremost gets inconsistent scores on FC-CR-03, but seems to outperform the other tools on this core feature. Its results on FC-CR-01 mostly mirror PhotoRec’s, but are generally lower. On FC-CR-05, it is inconsistent.

42

A. Meyer & S. Roy

Scalpel consistently recovers at least two-thirds of the deleted files, tying with Magnet AXIOM for best average performance on FC-CR-01. However, it generally performs very poorly on FC-CR-03 and FC-CR-05, though this may be an unfair consequence of our interpretation of the NIST guidelines. This is discussed in more detail in Section 5.2.3. Magnet AXIOM performs about the same as Scalpel on FC-CR-01, consistently carving over two-thirds of the deleted files. Its performance on FC-CR-03 is inconsistent in a similar way to Scalpel’s, but with higher scores. Its performance on FC-CR-05 is inconsistent in a similar way to Foremost, but with mostly higher scores. Based on our evaluation, no one tool consistently outperforms the others. Each tool meets some core features very well, and struggles with the rest. Thus, knowledge and understanding of these trade-offs is important when selecting a file carving tool. 5.1.4.

Conditions for success of file carving tools

As with metadata-based recovery, the main factors which cause tools to fail are fragmentation and missing data (such as from overwriting). All tools perform reasonably well on the basic and nofill cases. Since these have only complete, contiguous files, they are expected to be easy. The simple-frag, braid, and disorder cases introduce variations on fragmentation. Interestingly, almost all tools perform better on braid than on simple-frag. This makes sense as the files in braid are fragmented around other files, with no space in between, while the files in simple-frag have empty space between the fragments. A file carving tool can in most cases only detect the start and end of a file, so there is no good way to detect empty space. In the braid case, the point of fragmentation is often immediately followed by the start of another file, which the tool can have an easier time detecting. Since disorder puts the ends of files before their starts, a file carving tool must abandon most reasonable assumptions about the layout of the file system or it will be misled. Unsurprisingly, this case is very difficult and most tools perform poorly on it relative

Evaluating Deleted File Recovery Tools per NIST Guidelines

43

to the other cases. The partials case introduces missing file data; in this case the missing data are replaced by empty space, rather than overwritten by another file. This case also seems to be difficult, for the same reasons as simple-frag. How conservative a tool is about its output has a big impact on its score, particularly for FC-CR-03 and FC-CR-05. Some tools output a smaller number of files but generally only valid files, while other tools may get more “hits” but also output many false positives which do not constitute a valid file. This is the main reason Scalpel performs so poorly on FC-CR-03 and FC-CR-05 despite its fairly high scores on FC-CR-01. We believe this represents a flaw in the NIST guidelines, and discuss it in depth in Section 5.2.3. One other observation is that Foremost and Magnet AXIOM do not recover any TIFF files from any of the test cases, despite both supporting the format. 5.2.

Critique of NIST guidelines

In this section, we highlight issues that arose while interpreting the NIST guidelines in the context of our results. We analyze the guidelines critically, and highlight a few scenarios in which the guidelines can be confusing or ambiguous in practice. 5.2.1.

FAT fragmentation and metadata-based tools

To fulfill DFR-CR-03, a tool must recover “all non-allocated data blocks identified in a residual metadata entry” [4]. However, in the FAT file system, it’s debatable exactly how much is identified in metadata. Recall that FAT directory entries only store the starting address and the length of a file, and any other information is lost upon file deletion. So, for a fragmented deleted file, how much should a tool be expected to recover? Reading DFR-CR-03 very close to the letter, one could argue that the only data definitively identified in metadata is the first cluster; after all, the file could be fragmented between the first and second clusters, and there would be no way to tell from its metadata. However, this is almost certainly not the intent

44

A. Meyer & S. Roy

of the guidelines, as it sets a very low standard for performance. Since it is possible to reason about the point of fragmentation based on cluster allocation status and other files’ metadata, we interpret DFR-CR-03 to mean tools should at least recover the first fragment of a file. Recovering an entire fragmented file would involve guesswork, so we do not require it. NIST’s James Lyle, who wrote about the challenges of evaluating DFR tools in 2011, claims that trying to anticipate the details of every major file system would make the guidelines overly complex. Instead, Lyle suggests they be written for “an ideal file system that leaves in residual metadata all information required to reconstruct a deleted file” [21], even if this means sometimes holding a tool to an impossible standard. He further justifies this approach by pointing out that whether a feature is impossible or merely absent, the user experience is the same. Since the NIST CFTT guidelines seem to have been written with this philosophy in mind, it may not be reasonable to clarify specific edge cases like the one in this section. However, the context of this “ideal file system” should be explicitly stated in the guidelines document. 5.2.2.

Incompatible core features for metadata-based tools

Under some circumstances, it is impossible for a tool to meet both DFR-CR-03 and DFR-CR-04 at the same time. For example, suppose a file “A” in NTFS is deleted and overwritten, and the file “B” that overwrote it is also deleted. In NTFS, the full run of file A is accessible from metadata after deletion, and since file B was also deleted, all the clusters in file A’s run are unallocated. DFR-CR-03 requires a tool to recover “all non-allocated data blocks identified in a residual metadata entry” [4], so the entire run of file A should be recovered. However, this would include data from file B, which is forbidden by DFR-CR-04. DFR-CR-04 requires that a recovered file contain only data from the “deleted block pool,” which is the set of data blocks from the deleted file which “have not been reallocated or reused” [4]. Meeting DFR-CR-03 requires recovering data that were reused for file B, so in this case DFR-CR-03 and DFR-CR-04 are incompatible. This situation occurs in the cases overwrite4, overwrite5, overwrite6,

Evaluating Deleted File Recovery Tools per NIST Guidelines

45

and combo2 for both FAT and NTFS, and in these cases each tool meets DFR-CR-03 but not DFR-CR-04. As in the previous section, the user experience is the same regardless of whether or not a task is impossible, so this edge case probably does not justify an update to the guidelines. 5.2.3.

False-positives from file carving

The core features for file carving tools, besides the first feature, set requirements specifically for all “carved files.” The guidelines document defines a carved file as “a file created by a carving tool purported to be one of the source files present in the search arena” [5]. This means even false positives which are not part of one of the original files affect a tools’ evaluation on four out of five of the core features. We suggest that this results in misleading and less informative results, especially when those results are used to compare different tools. Interpreting “each carved file” to include even the ones with no relation to the original files results in dramatically low scores for tools like Scalpel (which carved at least 50 additional “files” in most of our tests), while favoring tools which are more conservative in their recovery. In other terms, this interpretation favors a strong match policy over a weak match policy. However, this obscures a relevant trade-off, that a more aggressive tool will have a high false-positive rate, but may recover files that a more conservative tool would miss. An investigator may have this trade-off in mind when selecting a DFR tool, so the NIST guidelines should not make a tool that sits on one side of that trade-off appear objectively worse than others. As the standards are currently written, a tool that carves only one file from an image, but recovers it correctly, would be considered perfect on four out of five core features. Meanwhile, a tool that perfectly carves all 40 files from an image, but also returns 150 false-positives, would likely score very poorly for CF-CR-03, CF-CR-04, and CF-CR-05. It would score especially poorly on CF-CR-05 as a falsepositive will almost never be a valid file. Since the NIST guidelines do not directly account for the false-positive rate, it indirectly and disproportionately affects several core features, diluting their usefulness.

A. Meyer & S. Roy

46

To resolve this issue, we propose the following changes to the NIST guidelines: (1) Add a new definition: Positive Carved File, a carved file which corresponds to a supported file header signature from a source file that is present in the search arena. (2) In CF-CR-03, CF-CR-04, and CF-CR-05, change “carved file” to “positive carved file.” (3) Add an additional core feature: The tool shall not return any carved files that do not correspond to a supported file header signature from a source file that is present in the search arena. This could be scored as the ratio of positive carved files to total carved files. The intent of these changes is to better atomize the guidelines, so each core feature evaluates a tool on a single capability. This should make the trade-offs of certain tools more apparent, enabling investigators to make more informed and nuanced tool choices based on the capabilities that are most important for their use case. 6.

Related Work

Wu et al. [22] presented a survey on recent advances in the field of DF tools. This is a study of the currently available DF tools in general, including file carving tools. From the experience of designing Scalpel, Richard et al. [23] presented a set of requirements to fulfill to make a file carving tool perform well. Laurenson [24] reviewed six file carving tools to evaluate their performance quality. Interestingly, the research community proposed multiple approaches for file carving. For instance, Sencur et al. [25] exploited bit sequence matching to identify fragments of a JPEG file. Furthermore, Gladyshev et al. [26] used decision theoretic analysis to carve JPEG files. Moreover, to recover files from an Ext4 system, Dewald et al. [27] built a tool that is not solely dependent on metadata information. Their tool uses file carving as well as metadata analysis to reconstruct the file.

Evaluating Deleted File Recovery Tools per NIST Guidelines

47

Lyle [21] proposed a strategy to evaluate metadata-based DFR tools, which, in our understanding, NIST CFTT considered while setting the guidelines for evaluation of such tools [4]. These guidelines are publicly available on the NIST CFTT portal [3], which we have extensively referred to in the current article. Recently, Lee et al. [28,29] proposed file recovery techniques based on metadata present in an Ext2 or Ext3 file system. Furthermore, the research community has also proposed techniques [30,31] to carve metadata. The results and analysis of metadata-based DFR tools in this chapter were originally published in EAI Endorsed Transactions on Security and Safety in 2020 [32]. Some aspects, such as the naming and organization of test cases, have been reworked to make them more clear and consistent with the file carving cases. A few redundant test cases from the original have been omitted. The section on file carving tools is original to this chapter. 7.

Conclusion

We evaluated DFR tools according to guidelines set by NIST CFTT. We used test cases provided by CFTT to evaluate file carving tools, and designed our own test cases to evaluate metadata-based DFR tools. We analyzed the results of these tests to determine how well each tool meets the guidelines and how they compare on various tasks. We also critiqued the CFTT guidelines based on our experiments and analysis. In addition to the four core features, there are several optional features listed in the NIST CFTT guidelines for metadata-based DFR tools [4]. Future work could extend our methodology to evaluate metadata-based tools on these optional features. Our evaluation of metadata-based tools was limited in scope to just the FAT and NTFS file systems, making it somewhat Windows-centric. As investigators need to be able to recover evidence from Linux or MacOS devices, future work could evaluate metadata-based tools on ext4, HFS, and other common file systems. It is common for files to be embedded

48

A. Meyer & S. Roy

within other files, for example, thumbnails in some graphical formats. Future work could test the ability of file carving tools to recover these files, and interpret and critique the NIST guidelines in the context of embedded files. Future work could also extend our investigation of file carving tools beyond just graphical file formats, such as to video or document file formats, which may be equally valuable to an investigation.

References 1. S. Gaudin. Digital trail led investigators to alleged craigslist murderer. https://www.computerworld.com/article/2523694/digital-trail-led-investigat ors-to-alleged-craigslist-murderer.html. 2. L. Mathews. Florida water plant hackers exploited old software and poor password habits. https://www.forbes.com/sites/leemathews/2021/02/15/flo rida-water-plant-hackers-exploited-old-software-and-poor-password-habits/. 3. NIST. Computer forensics tool testing program (CFTT). https://www.nist. gov/itl/ssd/software-quality-group/computer-forensics-tool-testing-programcftt. 4. NIST CFTT.Active file identification and deleted file recovery DFR tool specification: Draft for comment 1 of version 1.1. https://www.nist.gov/system/ files/documents/2017/05/09/dfr-req-1.1-pd-01.pdf (March, 2009). 5. NIST CFTT. Forensic file carving tool specification: Draft version 1.0 for public comment. https://www.nist.gov/system/files/documents/2017/05/09/ fc-req-public-draft-01-of-ver-01.pdf (April, 2014). 6. NIST CFTT. Test results for deleted file recovery. https://www.nist.gov/itl/ ssd/software-quality-group/computer-forensics-tool-testing-program-cftt/cftttechnical/deleted (March, 2019). 7. NIST CFTT. Test results for forensic file carving. https://www.nist.gov/ itl/ssd/software-quality-group/ computer-forensics-tool-testing-program-cftt/ cftt-technical-0 (March, 2019). 8. US Department of Homeland Security. The sleuth kit (tsk)3.2.2/autopsy 2.24 test results for deleted file recovery and active file listing tool. https://www. dhs.gov/sites/default/files/publications/508 Test%20Report The%20Sleuth %20Kit%203%202%202%20-%20Autopsy%202%2024%20Test%20Report November%202015 Final.pdf . 9. US Department of Homeland Security. FTK version 3.3.0.33124 test results for deleted file recovery and active file listing tool (revised). https://www. dhs.gov/sites/default/files/publications/508 Test%20Report NIST FTK%20 v3%203%200%2033124%20%28Revised%29 August%202015 Final 0.pdf. 10. Autopsy 4.11.0. https://www.autopsy.com/.

Evaluating Deleted File Recovery Tools per NIST Guidelines

49

11. Forensic Toolkit (FTK) 4.2.0.13. https://accessdata.com/products-services/ forensic-toolkit-ftk. 12. Magnet AXIOM 2.7.1.12070. https://www.magnetforensics.com/products/ magnet-axiom/. 13. Recuva 1.53.1087. https://www.ccleaner.com/recuva. 14. TestDisk 7.0. https://www.cgsecurity.org/wiki/TestDisk. 15. NIST CFTT. Forensic images used for nist/cftt file carving test reports. https://www.cfreds.nist.gov/filecarvingtestreports.html (March, 2017). 16. PhotoRec 7.1. https://www.cgsecurity.org/wiki/PhotoRec. 17. Foremost 1.5.7. http://foremost.sourceforge.net/. 18. Scalpel 1.60. https://salsa.debian.org/pkg-security-team/scalpel. 19. Magnet AXIOM 4.6.0.21968. https://www.magnetforensics.com/products/ magnet-axiom/. 20. ImageMagick 7.0.10. https://imagemagick.org/. 21. J. R. Lyle. A strategy for testing metadata based deleted file recovery tools. In International Conference on Digital Forensics and Cyber Crime, pp. 104– 114 (2011). ´ 22. T. Wu, F. Breitinger, and S. OShaughnessy. Digital forensic tools: Recent advances and enhancing the status quo. Forensic Science International: Digital Investigation, 34 (2020). 23. G. G. Richard III and V. Roussev. Scalpel: A frugal, high performance file carver. In DFRWS (2005). 24. T. Laurenson. Performance analysis of file carving tools. In IFIP International Information Security Conference, pp. 419–433 (2013). 25. H. T. Sencar and N. Memon. Identification and recovery of JPEG files with missing fragments. Digital Investigation, 6, S88–S98 (2009). 26. P. Gladyshev and J. I. James. Decision-theoretic file carving. Digital Investigation, 22, 46–61 (2017). 27. A. Dewald and S. Seufert, AFEIC: Advanced forensic Ext4 inode carving. Digital Investigation, 20, S83–S91 (2017). 28. S. Lee and T. Shon. Improved deleted file recovery technique for Ext2/3 filesystem. The Journal of Supercomputing, 70(1), 20–30 (2014). 29. S. Lee, W. Jo, S. Eo, and T. Shon. ExtSFR: scalable file recovery framework based on an Ext file system. Multimedia Tools and Applications 79 (2020): 16093–16111. 30. R. Nordvik, K. Porter, F. Toolan, S. Axelsson, and K. Franke. Generic metadata time carving. Forensic Science International: Digital Investigation, 33 (2020): 301005. 31. T. S. Atwal, M. Scanlon, and N.-A. Le-Khac. Shining a light on spotlight: Leveraging apple’s desktop search utility to recover deleted file metadata on macOS. Digital Investigation, 28, S105–S115 (2019). 32. A. Meyer and S. Roy. Do metadata-based deleted-file-recovery (DFR) tools meet nist guidelines?. EAI Endorsed Transactions on Security and Safety, 6 (2, 2020). doi: 10.4108/eai.13-7-2018.163091.

This page intentionally left blank

© 2023 World Scientific Publishing Company https://doi.org/10.1142/9789811273209 0003

Chapter 3

Optimized Feature Selection for Network Anomaly Detection Aniss Chohra∗,‡ , Paria Shirani†,§ , ElMouatez Billah Karbab∗,¶ , and Mourad Debbabi∗,|| ∗

Security Research Centre, Gina Cody School of Engineering and Computer Science, Concordia University, 1515 St. Catherine St. West, Montr´eal, QC, H3G/M8 Canada † School of Electrical Engineering and Computer Science, University of Ottawa, 800 King Edward Av., Ottawa, ON KIN 6N5, Canada ‡ a [email protected] § [email protected][email protected] || [email protected]

In this chapter, we propose an optimization approach by leveraging swarm intelligence and ensemble methods to propose solutions to the nondeterministic feature selection problem for network anomaly detection. The proposed approach is validated on two benchmark datasets, namely, NSL-KDD and UNSW-NB15, in addition to a third dataset, called IoT-Zeek dataset, which consists of Zeek network-based intrusion detection connection logs. We build the IoT-Zeek dataset by employing ensemble classification and deep learning models using publicly available malicious and benign threat intelligence on the Zeek connection logs of IoT devices. Moreover, we deploy and validate a deep learning-based anomaly detection model using autoencoders on each of the aforementioned datasets by utilizing the selected features obtained from the proposed optimization approach. Performed experiments confirm the accuracy and effectiveness of the proposed solution.

51

A. Chohra et al.

52

1.

Introduction

Due to the emerging technologies, the large connectivity between different devices in different ecosystems, and the increasing rate of cyberattacks (e.g., IoT attacks increased 700% in the last two yearsa ), security analysis of the network data is an absolute need. However, providing accurate and efficient threat detection solutions on large volume of data becomes more challenging. On the other hand, during the last decade, machine learning and deep learning techniques have attracted tremendous attention in many fields (e.g., anomaly detection, vulnerability assessment, natural language processing, stock market, and weather forecast). Therefore, training efficient and scalable machine learning- and deep learning-based threat detection models became a task of paramount importance. There are two common and known problems that need to be addressed to provide efficient, accurate, and scalable machine learning- and deep learning-based threat detection models. (i) Selecting the appropriate setting of hyper-parameters for the model to be trained: this task generally falls in the non-deterministic problems class, as it might have several solutions that give the same accuracy results; meaning that this kind of problem accepts at least two possible solutions (optimal solutions). (ii) Selecting the appropriate set of features that best define the final problem. There exist lots of features in most of the domains, which makes it time-consuming to train and validate the models. Moreover, some of those features are irrelevant due to the presence of redundancy, sparsity, or lack of correlation to the problem to be solved. Therefore, the need for methods to better filter the irrelevant features has become a widely adopted procedure before any model training and experimentation. There is a palette of techniques that has been proposed to select the most relevant features. The most common approach is the use of ensemble methods (e.g., [1–3]) due to the fact that these methods

a https://www.darkreading.com/endpoint/iot-specific-malware-infections-jumped-700-

amid-pandemic, Accessed April 24, 2022.

Optimized Feature Selection for Network Anomaly Detection

53

provide easier explanation of the variables compared to other techniques. However, sometimes it becomes difficult to know which features are given more importance than others by a model [4]. In addition, ensemble learning techniques combine multiple models to improve the overall predictive capability and to decrease the overfitting as much as possible. There exist state-of-the-art techniques [5–11] that are proposed to deal with the non-deterministic aspect of feature selection problem. These works generally use optimization algorithms to find optimal solutions and make decisions according to a certain objective function. For instance, Ref. [5] propose a feature selection approach using artificial bee colony (ABC), and [6] incorporates a hybrid genetic algorithm with granular information. However, these approaches do not explore the usage of their solutions on other types of datasets (e.g., intrusion detection systems (IDS)). On the other hand, Autoencoders [12,13] are a type of neural networks which aim at reconstructing a given input into an output with the least possible changes. Autoencoders are widely used for anomaly detection [14–17] due to their ability to better represent (compress) the data to a latent-representation (bottleneck), which consists of a reduced representation of the input, and then reconstruct the input in the most similar fashion possible. Hence, the anomalies can be detected by comparing the reconstructed output with the input, and then flag it as anomalies if there are any deviations from the input. Problem statement: Using all the features present in the input data to train autoencoders can be quite troublesome and timeconsuming, especially where the input data contain millions of records, making the experimentation and model engineering more complicated and difficult. Moreover, autoencoders focus mainly on feature engineering and extraction rather than feature selection. In other words, by transforming the input data into a compressed representation, autoencoders are able to reduce the dimensionality of the data and learn a smaller representation. In the case of large number of features, explaining and understanding the compressed data are difficult, while feature selection identifies the most useful

54

A. Chohra et al.

and relevant features that best describe and define a given ground truth variable [18]. Generally, during the feature selection process, choosing the appropriate set of hyper-parameters is quite challenging. This is due to the fact that it is a non-deterministic problem, which can have multiple optimal solutions; all of them would give the same performance and accuracy results. Thus, even after exhaustive experimentation, there is no evidence to prove that (i) all possible solutions have been explored, and (ii) a particular solution is the best one. Key idea: In this context, we propose a novel approach, called Chameleon,b which combines both swarm intelligence and ensemble learning techniques to select the optimal settings (hyper-parameter selection for the ensemble models as well as selection of most relevant features for each individual dataset) for feature selection task. The proposed approach uses ensemble learning classifiers as a fitness and evaluation function for each individual/particle within the population/swarm. This population aims to converge to the optimal solutions in an iterative process, where in each iteration, each individual tries to move closer to the optimal solutions. Afterwards, we use the selected features given by the optimal ensemble model to construct an anomaly detection autoencoder; we iteratively improve the model until it outperforms the state-of-the-art models. Contributions: The main contributions of this work are as follows: • Novel feature selection for network datasets: We propose a feature selection approach for network datasets that leverages both swarm intelligence and ensemble methods to select the most relevant features. The ensemble methods are used as a fitness function within the optimization approach in order to leverage their abilities to better interpret and select independent features. • Training time improvement : We employ selected features obtained from the optimization step and deploy deep learning models for network anomaly detection. The feature selection process b Reprinted

from Ref. [19], with permission from Elsevier.

Optimized Feature Selection for Network Anomaly Detection

55

considerably improves the training time compared to the case where all features are used. • Malicious and benign dataset generation: We setup an environment and generate a dataset called IoT-Zeek dataset from PCAPS and connection logs using Zeek NIDS [20]. Then, we introduce an ensemble model leveraging classical machine learning and deep learning classifiers in order to learn malicious behavior on the generated network traffic and classify network logs into malicious or benign connections. • Evaluation: We evaluate the proposed approach on both the IoT dataset: IoT-Zeek, and non-IoT datasets: NSL-KDD and UNSWNB15, and demonstrate its efficiency and performance. Performed experiments on the selected features obtained from the optimal solution for each dataset indicate that our proposed model outperforms existing works.

2.

Background

In the following, we provide an overview on particle swarm optimization (PSO) and ensemble methods. 2.1.

Particle swarm optimization

PSO [21] is a stochastic and meta-heuristic optimization algorithm which was first inspired by the social behavior of some animals (e.g., birds and fishes). In the PSO algorithm, the population of individuals is referred to as swarm, and each individual within the swarm is referred to as particle. These particles try to find the set of optimal solutions to a given problem by constantly updating their positions according to their own performance, which is called cognitive aspect, and the current overall performance of the swarm is called social aspect. Thus, PSO is based on two essential logics: cooperation/ collaboration and competition, where the former represents the ability of one particle to communicate with other particles in order to collaborate their efforts toward the optimal solutions, while the latter

A. Chohra et al.

56

represents one particle’s desire to use its own performance and move toward the possible solution. Moreover, each particle is defined within a search space, which represents the set of hyper-parameters to be optimized for the solution. Depending on the swarm’s global solution, each particle computes the cognitive aspect and the social aspect according to Eqs. (1) and (2), respectively, as follows: cognitive = c1 × r1 × (pos besti − posi ), social = c2 × r2 × (pos global − posi ),

(1) (2)

where c1 and c2 are called acceleration constants and define the speed at which a particle should move toward the optimal solutions (c1 defines the speed at which the particle should converge to its local solution, while c2 defines the speed of convergence of the whole swarm toward the global solution), r1 and r2 are two randomly generated values to control the stochastic influence of the cognitive and social components on the overall velocity of a particle, posi represents the position of a particle at iteration i, pos besti represents the local optimal solution found by that particle so far, and pos global represents the position of the global solution found by the entire swarm so far. Afterwards, each particle updates its velocity using Eq. (3): vi (t + 1) = w × vi (t) + social + cognitive,

(3)

where t represents the current particle, vi (t) represents the velocity of the current particle at iteration i, and w is the inertia weight (importance) given for that velocity (the smaller the weight w, the stronger the convergence toward the global optimum). Finally, the position of each particle pi is updated using Eq. (4). pi (t + 1) = pi (t) + vi (t). 2.2.

(4)

Ensemble methods

Ensemble methods are a type of machine learning models which consist of a combination of multiple classifiers/predictors in order to

Optimized Feature Selection for Network Anomaly Detection

57

Bootstrap 1

Predictor 1

Bootstrap 1

Predictor 1

Bootstrap 2

Predictor 2

Bootstrap 2

Predictor 2

Bootstrap 3

Predictor 3

Bootstrap 3

Predictor 3

Bagging

Fig. 1.

versus

Boosting

Bagging versus boosting ensemble methods.

improve the performance of the overall classification/prediction. In other words, ensemble methods combine the decisions made by multiple models using techniques, such as majority voting, average, and weighted average. Moreover, these techniques provide easier interpretation of features and better predictive performance with less overfitting compared to other machine learning techniques. This family of machine learning techniques is generally classified into two major types [22], which are presented in Figure 1, as follows: (1) bagging, where all the used predictors are running in parallel and independently, these models are then combined using an aggregation technique to make the final decision. An example of such types are random forests, where a sample called bootstrap is selected randomly and fed to one model. Therefore, each model in the forest will have a different observation, thus leading to no correlation between these predictors, making them less prone to overfitting.

A. Chohra et al.

58

(2) boosting, deploys the paradigm of learning from each one’s predecessor’s errors/mistakes (called residuals). Therefore, these types of ensemble methods are executed in a sequential order, which gives them an advantage over the first type consisting of less training time delays. An example of ensemble techniques includes gradient boosting technique. 3.

Approach Overview

To identify anomalous connections, we propose a deep learning-based autoencoder anomaly detection solution. The input to this model is a set of features obtained from the network traffic connection logs. We propose a hybrid model consisting of PSO algorithm and ensemble methods to identify the most relevant set of features for a given dataset. During this process, we explore two types of fitness functions (models); the first one belongs to the bagging ensemble methods family (Random Forests), and the second one belongs to the boosting method (gradient boosting). An overview of our approach is represented in Figure 2. Feature selection can be viewed as five sequential steps: search-space definition, fitness and objective function definition, algorithm initialization, iterative search, and optimal solutions selection. The proposed approach starts by defining the search space (Step 2) for PSO [21,23] depending on the chosen fitness function (Step 1). Afterwards, it takes as input any labeled dataset and initializes a fixed size population/swarm by generating random particles (Step 3). Given a precise number of iterations, each particle will then try to find the optimal position of the solution by updating and changing constantly its position within the search space according to the performance of the swarm and its own performance (Step 4). The goal of the swarm is to find the optimal model (optimal hyper-parameters) which maximizes certain performance metrics (Step 5). Finally, we consider only the set of best fitting settings (e.g., hyper-parameters), which give us the highest accuracy metrics. Then, we use these settings to build the final model(s) in order to extract the most relevant

Optimized Feature Selection for Network Anomaly Detection

Fig. 2.

59

Approach overview.

features. Afterwards (in Step 6), we leverage the selected features discovered during the optimization part and engineer an anomaly detection model using deep learning autoencoders [24]. Our goal is to get an anomaly detection model which outperforms existing models in terms of f1 score metric. 4.

Methodology

In this section, we provide more details on the proposed methodology. 4.1.

Optimized feature selection

Our feature selection algorithm can be performed as five sequential steps. The algorithmic description of the optimized feature selection is presented in Algorithms 1 and 2. In what follows, we explain each step in detail. Step 1 (Fitness and objective functions): First, we define a fitness function to be used to evaluate the performance of each particle within the swarm. We choose ensemble method classifiers due to their advantages and benefits, which include low overfitting and high accuracy. Each particle is fed to the classifier which in turn will automatically adapt to it and will be trained on a dataset accordingly.

60

A. Chohra et al.

Algorithm 1 Feature Selection: Algorithmic Description

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

27 28 29

30 31

Input: D // Input dataset Output: optimal solutions // Optimal solutions Global Variables : c1, // cognitive acceleration constant c2, // social acceleration constant r1, // cognitive random factor r2, // social random factor w, // velocity’s weight swarm size, // size of swarm maxIterations, // maximum number of iterations global f itness, // global solution fitness value global position, // global solution’s position global solutions list // optimal solutions positions and fitness values Function Main(D): f itness f unction ← define Fitness() bounds ← def Search Space(f itness f unction) (swarm, swarm size, maxIterations, c1, c2, r1, r2, w) ← algorithm init featureSelection( ) return Function featureSelection(): foreach i ∈ maxIterations do foreach particlek ∈ swarm do particlek f itness ← evaluate fitness(f itness f unction, particlek ) if particlek f itness > global f itness then global f itness ← particlek f itness global position ← particlek position global positions list += global position personal best position ← particlek position end update velocity(c1, c2, r1, r2, w, global position, particlek position, personal best position, particlek velocity) update position(particlek velocity, particlek+1 position) end end optimal soltions ← max(f 1 score, global solutions list) // Fitness and objective function definition, Optimal solutions selection return optimal soltions return

At the end of this process, the fitness function returns the following evaluation metrics: accuracy, recall, precision, and f1 score. Next, we define the objective function to be satisfied by the set of possible optimal solutions (evaluate the whole solutions). The objective function helps filter a set of results and keep only those that satisfy our needs. In this work, since we integrate ensemble models as evaluation/fitness functions, we should select a metric which best

Optimized Feature Selection for Network Anomaly Detection

61

describes the performance of these models at each particle level. From the above-mentioned evaluation metrics, we choose the latter one (f1 score), since it represents the weighted average of the precision and recall, taking both false positives and false negatives into account, contrary to the accuracy which takes only the true positives and true negatives into account. Moreover, f1 metric considers uneven or unbalanced datasets, where the target classes are not balanced. In this case, our objective is to consider only the settings of the models, which give us the highest values of f1 score. Thus, we define the objective function to be the maximization of these values. This process will also prevent our algorithm from falling into the local optimum and converge to a global one. Step 2 (Search space): Since we are using ensemble methods as particles’ evaluation/fitness function, we should choose the appropriate set of hyper-parameters to be fed to the models. Depending on the type of the model, i.e., bagging versus boosting [22], we select the most relevant hyper-parameters (e.g., number of trees/estimators, respective sizes for each of the training and testing splits, etc.) that are of high importance to the learning process of the learner/model. This set of hyper-parameters will define the dimensional space used by our algorithm to search for possible optimal solutions. With respect to bagging techniques, there are two major types of hyper-parameters that need to be investigated and optimized, namely, test data size and number of estimators (trees). The first one defines the size of testing data on which the model should be tested, and consequently the size of training data will be deduced automatically. It is generally recommended to set testing data size smaller than that of training set (between 10% and 40%), thus we set the lower bound as 10% and upper bound as 40%. This hyperparameter will be defined as the first dimension of each particle and based on it, the fitness function will decide on how to split the input dataset and train the appropriate ensemble model. The second hyper-parameter that needs to be optimized is the number of estimators, which represents the number of decision trees that are part of the ensemble learning model. Normally, the bigger the number of trees, the better the overall ensemble model performance.

62

A. Chohra et al.

Algorithm 2 Feature Selection: Functions Definitions 1 2

3

// Fitness and objective function definition Function define Fitness(): f itness f unction ← ensemble classif ier // tuned between random forests and gradient boosting return f itness f unction

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

// Search-space definition Function def Search Space(f itness f unction): if f itness f unction == “random f orest” then return bounds ← [(0.1, 0.4), (50, 1000)] else if f itness f unction == “gradient boosting” then return bounds ← [(0.1, 0.4), (50, 1000), (0.1, 0.3)]; end end return // Algorithm initialization Function algorithm init(): c1 ← [1, 2] // c1 is tuned using two different values c2 ← 2 w ← 0.5 r1 ← random() r2 ← random() swarm size ← 15 swarm ← random(particles, swarm size) maxIterations ← 30 return (swarm, swarm size, maxIterations, c1, c2, r1, r2, w)

26 27 28 29

Function evaluate fitness(f itness f unction, particle): f itnes value ← f 1 score(f itness f unction, particle) return f itnes value

30 31

32 33 34 35

Function update velocity(c1, c2, r1, r2, w, global position, particlek position, personal best position, particlek velocity): cognitive = c1 ∗ r1 ∗ (personal best position − particlek position) social = c2 ∗ r2 ∗ (global position − particlek position) particlek+1 velocity = w ∗ particlek velocity + social + cognitive return particlek+1 velocity

36 37 38 39 40 41

// Update particle position Function update position(particlek velocity, particlek position): particlek+1 position = particlek velocity + particlek position return particlek+1 position

Optimized Feature Selection for Network Anomaly Detection

63

However, there is a limitation to the number of trees; at some point this improvement stops and will start decreasing and result in badly predicted samples and even overfitting. In addition, the bigger the number of trees, the more the computational cost that is incurred, making the experimentation more difficult for large-scale datasets. In general practice, this hyper-parameter is decided with exhaustive experimentation by initiating the number of trees with the smallest value, and at each iteration increasing it slightly to improve the model’s performance compared to the previous experimentation results. However, finding the optimal number of estimators is very time-consuming, especially in the case of large datasets which leads to days or even months of experimentation. Moreover, this approach does not exhaustively explore all the possible optimal values for the hyper-parameters; it is mainly performed based on the knowledge and experiences of the experts. Therefore, we propose to integrate this parameter within the optimization algorithm as a hyper-parameter to be optimized for the global solution. To improve the scalability of the optimization algorithm, we choose this hyper-parameter to have a value between 50 (lower bound) and 1000 (upper bound). In boosting methods, new trees are added to the model in order to correct the mistakenly predicted samples (residuals) by the previous tree. This process has two effects; the first one, which can be considered as a benefit, consists of faster training times compared to bagging techniques. The second one can be considered as a disadvantage, which makes the model more prune to overfitting. To overcome this problem, the learning rate can be seen as a weight (percentage), which is introduced to control and reduce the number of corrections to be made by the current tree (e.g., predictor/classifier) from the previous one. As a result, the overall performance of the model is improved when the learning rate is much smaller and the number of trees is higher. Therefore, in addition to the above-mentioned hyper-parameters, the boosting ensemble methods require the third essential hyper-parameter, learning rate, to be optimized. In general practice, it is recommended to define this parameter to a value between 0.1 (lower bound) and 0.3 (upper bound).

64

A. Chohra et al.

Thus, our optimization algorithm’s search space for the bagging methods is defined as a two-dimensional space, where the first dimension represents the test size, and the second one is the number of decision trees/estimators included in the ensemble model. For the boosting methods, our search space is defined as a three-dimensional space, where each particle has three parameters: (test size, number of trees, learning rate). Step 3 (Algorithm initialization): We start by initializing the settings for our PSO algorithm. First, we define the maximum number of iterations, which can be viewed as the number of chances given to the swarm in order to find the optimal solutions. This parameter is primordial and essential since in optimization problems we only know that the problem to be solved might have multiple optimal solutions. However, we do not know the exact number of these optima; if the number of chances is too large, the algorithm in question can take tremendous amounts of time. On the other hand, the performance of the optimization algorithm to find more optimal solutions gets better when the number of iterations increases. However, to limit the time consumption factor, we fix this setting to a value of 30 iterations. Moreover, we need to define the values for the acceleration constants (c1 and c2) and the weight (w) [21]. For the c1 and c2, it is recommended to set them such that their product is between 0 and 4 (0 ≤ c1 × c2 ≤ 4) [25]. We run the algorithm two times; the first time we set these two constants to equal values (set both to 2), while in the second execution, we give more importance (speed) to the global solution by setting c2 to 2 and c1 to 1. The intuition is to start by giving the same importance to the local and global solutions, then increase the importance of the global solution (c2) and check which setting allows us to explore better optimal solutions. Next, we initialize the swarm (population of particles) by first defining a fixed number of particles (individuals) (swarm size), which consists of the swarm. For each particle, we randomly generate its respective velocity and position such that they are selected within our pre-defined bounds (search space definition). We initialize the global fitness value (f1 ) of the overall swarm to be equal to 0.5.

Optimized Feature Selection for Network Anomaly Detection

65

Step 4 (Iterative search): During the iterative process, each initiated particle in the initial swarm gets evaluated using the fitness function (ensemble model classifier) using its own coordinates. If after the particle’s fitness evaluation, the f1 score of that particle is found to be greater than the global (swarm’s) f1 score, then it first updates the global f1 score to its own value, and sets the global solution’s position to its own position. Next, it updates its position (particle) using the appropriate velocity and position functions presented in Algorithm 2 (line 35 and line 40, respectively). As for the time complexity of this process, it is of the order of O(nm); where n represents the maximum number of iterations (line 17 in 1) and m represents the population/swarm size (line 18 in 1). However, since in our experiments n and m have small values (the maximum number of iterations is equal to 30 and the size of the swarm is equal to 15), our approach does not encounter high time complexity issue. As for the fitness function (line 19 in 1), the (training) time complexity of the models (i.e., Random Forests or XGBoost) is of the order of O(k.v.n.log(n)), where k is the number of trees, v is the number of features, and n is the number of records/rows. Due to the presence of a bottleneck in our algorithm for evaluating the fitness of each particle (either by training Random Forests or XGBoost models), we leverage a multiprocessing paradigm by taking advantage of 20 CPU cores of our setup environment. On the other hand, since we use a server with 128 GB of RAM (presented in Section 6.1), the space complexity does not comprise a bottleneck in our algorithm. Therefore, our approach is sufficiently efficient and scalable on the studied datasets and their respective models. Performed experiments (reported in Section 6.6) confirm the scalability and efficiency of our proposed approach. It is worth noting that one particle can fall into the case where the second dimension (number of trees) is not an integer value. This is problematic due to the fact that the number of decision trees making the ensemble model ought to be an integer value. Therefore, in that case, we round the value to the closest integer value. Furthermore, if at any iteration, a particle novel position is found to be outer of the

66

A. Chohra et al.

search-space bounds (e.g., [0.1, 0.4] for first dimension and [50, 1000] for the second one), we correct the out of bound value to the closest bound. For instance, if a new particle’s position is (0.5, 1200), we correct it to position (0.4, 1000). Step 5 (Optimal solutions selection): Finally, after the maximum iterations are reached, we apply a maximization function, which takes as input all possible solutions explored during the iterative search and returns only the ones with the highest fitness value (f1 score). If more than one optimal solution is found (giving the same f1 score value), we select the one with a set of hyper-parameters that induce the best efficiency (e.g., execution time and CPU usage). Then, the appropriate model using the selected optima’s hyperparameters is trained and only those features with importance values higher than the average of all feature importance values are selected for the next phase (i.e., anomaly detection). 4.2.

Deep learning-based anomaly detection

After selecting the optimal feature selection model and using it to select the most relevant features, we use the filtered dataset using selected features to generate an efficient anomaly detection model using autoencoders. To this end, we start by taking the most accurate models for each dataset from the existing state-of-the-art models (e.g., [26]). We aim at reducing our search for the appropriate model selection by using the most efficient one proposed as a starting point. Then, we feed the model with only the selected features of the dataset, which will help reduce the autoencoder model’s training time. Since we do not use all the features, the compression and bottleneck generation (encoder) as well as the reconstruction phase (decoder) will run faster compared to the case of feeding all the features as input. There are multiple hyper-parameters that we need to tune in order to find the optimal autoencoder model: batch size, loss function, number of layers, number of neurons, and regularizations. Once we reach a point where our model outperforms the state-of-the-art models, we

Optimized Feature Selection for Network Anomaly Detection

67

stop the search algorithm and select that model as the optimal one. We then use l1 norm to compute the distance between the input and the reconstruction data. The obtained result is then compared to the input labels (ground truth) and different threshold values are tuned to select the one that gives the highest accuracy metrics on each dataset. 5.

IoT-Zeek Dataset Generation

In the following, we describe our methodology to generate the IoTZeek dataset of malicious and benign network traffic. We first deploy a real environment which consists of various Raspberry Pi devices that communicate with each other. We install Zeek sensors to monitor the network traffic and extract connection logs (conn.log) generated by the Zeek NIDS [20, 27]. Then, we inject different malware samples to these devices and capture malicious network traffic. These connection logs are then classified using classical machine learning and deep learning models into malicious or benign connections (as explained later). A portion of the dataset, which contains 150,000 records (connections), comprising a total of 129,441 malicious connections and 20,559 benign connections, is sampled. Malware and benign sources: To ensure the freshness of our dataset regrading the malicious/benign IP addresses, we collect PCAPS from both Concordia SecLab malware feed (in house source) and Stratosphere Research Laboratory [28]. Then, we build the global training dataset from the labeled Zeek logs. The malicious traffic logs and the benign traffic logs are retrieved from the sources presented in Listing 3.1. The number of malicious and benign connections in the evaluation dataset are 1,764,604 and 278,998, respectively. 5.1.

Maliciousness classification

In this section, we present employed ensemble models to classify the malicious activities on the IoT-Zeek dataset. As depicted in Figure 3, the first set of models belongs to classical machine learning, while the

A. Chohra et al.

68

Listing 3.1 Malicious and Benign Traffic Log Sources

Fig. 3.

Maliciousness Detection Pipeline.

second one belongs to deep learning. The input to the models is all connection log features as presented in Table 1 (there exist some other features, which are not extracted from PCAP files by Zeek in the default setting).c c https://docs.zeek.org/en/lts/scripts/base/protocols/conn/main.zeek.html,

Accessed April 24, 2022.

Optimized Feature Selection for Network Anomaly Detection

Table 1. Feature

69

Zeek’s connection log file features description.

Type

ts id.orig h id.orig p id.resp h id.resp p proto service

Numerical Categorical Categorical Categorical Categorical Categorical Categorical

orig ip bytes

Numerical

resp ip bytes

Numerical

orig pkts

Numerical

resp pkts

Numerical

conn state

Categorical

history

Categorical

Description Timestamp of the connection’s occurrence date. Originator’s IP address. Originator’s TCP/UDP port. Responder’s IP address. Responder’s TCP/UDP port. The transport layer protocol (tcp, udp, or icmp). The application layer requested protocol (e.g., ssh, dns, etc.) Number of bytes sent from the originator to the responder; Number of bytes sent from the responder to the originator. Number of packets sent from the originator to the responder. Number of packets sent from the responder to the originator. An overview description about the state of the connection. Details about the state of the connection.

As for the classical machine learning classification, we employ RandomForest, XGBoost, LightGBM, and CatBoost classifiers. We choose these classifiers due to their high performance and reputation in the industry. Moreover, the chosen classifiers were parts of many winning solutions in machine learning competitions.d In addition to the classical machine learning models, we deploy two deep learning models for maliciousness detection. This includes the convolutional neural networks (CNN) and the feed forward neural networks (NN). More specifically, we customize the architecture of the CNN model to learn the maliciousness of the network traffic, as shown in Figure 4. The details of the model are presented in Table 2. Other parameters, such as Filters, are obtained from experiments and trade off between the size of the model and the accuracy of the model. Kernel and Stride have pretty standard values in many machine learning papers in the context of CNN. The feed forward neural network architecture d https://www.kaggle.com/competitions,

Accessed April 24, 2022.

A. Chohra et al.

70

Fig. 4. Table 2. Block

#

CNN maliciousness detection model’s architecture. Dimension convolutional neural network model details. Layers

Options

Block1 1

Conv

2 3

BNorm MaxPooling

Filter=64, Kernel=(3,1), Stride=(1,1), ZeroPadding, Activation=ReLU BatchNormalization Kernel=(2,2), Stride=(2,2), Zero-Padding

4 5 6

MaxPooling Fully connected Fully connected

Global Max Pooling #Output=512, Activation=ReLU #Output=1, Activation=Sigmoid

Block2

is a typical neural network with fully connected layers. The details of the model are listed in Table 3. Ensemble models: Training the aforementioned machine learning classifiers on the connection logs’ features produces a set of models M = {cM1 , cM2 , cM3 , cM4 , dM1 , dM2 }, where cMi represents a classical machine learning model/learner (i.e., RandomForest, XGBoost, LightGBM, and CatBoost classifiers) and dMi represents a deep learning model/learner (i.e., CNN and NN). To perform ensemble learning, we use ensemble averaging technique as presented in Eq. (5), as follows: 

Y (X, α) =

|M |  i=1

αi × yi × (X),

(5)

Optimized Feature Selection for Network Anomaly Detection

Table 3.

71

Feed forward neural network model details.

#

Layers

1 2 3 4 5 6 7 8

Fully connected Batch normalization Fully connected Batch normalization Fully connected Batch normalization Fully connected Fully connected

Options #Output = 128, Activation=ReLU Batch normalization #Output = 256, Activation=ReLU Batch normalization #Output = 512, Activation=ReLU Batch normalization #Output = 512, Activation=ReLU #Output = 1, Activation=Sigmoid

where Y  is the ensemble probability likelihood, X is the input feature vector, αi are weights, and yi refer to the probability likelihood of each single model. Each individual model/learner (classical machine learning and deep learning model as presented in Figure 3) produces a single probability (yi ), which represents the maliciousness likelihood. These models’ detection probabilities are input to the weighted average ensemble. This technique employs a weighted average using αi weights to produce the ensemble prediction. In the current setting, we choose αi = 1 for all the models, which indicates that all the models have equal contribution in the final decision. 5.2.

System adaptation

Adaptation to new network threats and attacks is an important criterion in the network traffic maliciousness detection. In our context, we provide this capability through the automation of the model generation process. As shown in Figure 5, the system leverages a feed of malicious traffic (in form of PCAP files) to build an updated training dataset for every epoch. The updated training dataset is representative of the state-art-the-art network attacks and benign traffic. The system insures the quality of the produced model by using validation and testing datasets, and only models that achieve high detection performance will be deployed in production.

A. Chohra et al.

72

Fig. 5.

6.

Machine learning models’ development.

Evaluation Results and Discussion

In this section, we first describe the experimental setup, and the benchmark datasets. Then, we provide more details on the validation of our proposed feature selection approach on each of the chosen benchmark datasets. Next, we report the accuracy of our anomaly detection model on different datasets and compare our results with the state-of-the-art approaches. Finally, we provide the results of our efficiency study. 6.1.

Experimental setup

All our experiments are conducted on a computation server with an Intel Xeon E5-2630 2.30 GHz CPU with 24 cores and 128 GB of RAM, and CentOS Linux version 7 installed on it. Our system prototype is developed using Python 3.6 programming language and PyTorch by leveraging sklearn and Scikit libraries for

Optimized Feature Selection for Network Anomaly Detection

73

bagging ensemble learning techniques (random forest classifier), xgboost library for boosting ensemble techniques (gradient boosting classifier), and other machine learning models. Multiprocessing is deployed for fast models’ training by taking advantage of 20 cores of the CPU for both Random Forest and XGBoost classifiers. We use pandas API in order to load and preprocess each API. We adapt the autoencoders models by utilizing the keras API in conjunction with tensorflow. Evaluation metrics: To evaluate the performance of our approach, we use the accuracy, precision, recall, and F1 score metrics defined as follows: TP + TN , TP + TN + FP + FN TP , Precision = TP + FP TP , Recall = TP + FN precision · recall . F1 = 2 · precision + recall

Accuracy =

6.2.

(6) (7) (8) (9)

Benchmark datasets description

In this section, we introduce the two benchmark datasets as follows. NSL-KDD dataset: A network dataset, called NSL-KDD [29], was proposed to fix two main issues (e.g., redundant records and level of difficulty) related to its predecessor KDD’99 dataset. The updated dataset (NSL-KDD) contains a total of 148,517 network flow records, with 77,054 being labeled as normal records and 71,463 being labeled as attacks. The dataset consists of a total of 41 features, 32 of which are numerical (integer or float) type and 9 features have categorical values. UNSW-NB15 dataset: The UNSW-NB15 dataset [30–33] was created by the Cyber Range Laboratory for Cyber Security (ACCS)

74

A. Chohra et al.

using IXIA PerfectStorm framework that contains real normal and attack behaviors. Tcpdump is then used as a framework to capture 100 GB of network traffic activity. The dataset consists of nine types of cyberattacks: Fuzzers, Analysis, Backdoors, Denial of Service (DoS), Exploits, Generic, Reconnaissance, Shellcode, and Worms. Moreover, generated dataset contains a total of 49 features, 42 of which are numerical (integer or float) type, and 6 are of categorical type. This dataset contains 2,218,761 normal and 321, 283 attack records.

6.3.

Feature selection results

We apply our proposed optimized feature selection solution on the three aforementioned datasets. The results for each of the two explored fitness functions, Random Forests (bagging) and XGBoost (boosting), are detailed, respectively, in Tables 4 and 5. We observe that the latter fitness function (XGBoost) achieves the highest fitness values (f1 score) on all three datasets. Moreover, when the c2 acceleration constant is higher than c1 (c2 = 2 and c1 = 1), the algorithm performs better in finding better optimal solutions for two of the datasets, while for the NSL-KDD dataset, both settings give the same values of f1 score. Afterwards, for each of these sets of hyper-parameters selected for each dataset, we train the appropriate model (XGBoost), and extract the list of features with their corresponding importance values. Then, we compute their averages and select only the ones whose importance is higher than the obtained average value. The results of this process are presented in Table 6. Effects of imbalanced data: We further examine the effects of imbalanced data on our feature selection approach. As presented in Section 5, IoT-Zeek dataset has a smaller number of benign connections compared to malicious connections, which may influence machine learning algorithms to ignore the minority class. According to the literature [34], oversampling and undersampling techniques are recommended to overcome this issue. To this end, we leverage

Optimized Feature Selection for Network Anomaly Detection

75

Table 4. Feature selection results using random forests classifier as fitness function (acceleration constant C2 is fixed to 2 while C1 is tuned, f1 score is the fitness function). Dataset NSL-KDD

C1 Test size #Trees Accuracy f1 score Precision Recall 2

1

UNSW-NB15

2

1

IoT-Zeek

2

1

0.1 0.1 0.1 0.103 0.12 0.1 0.15 0.2 0.1 0.1 0.1 0.1 0.1002 0.105 0.1 0.1 0.103 0.1 0.1005 0.106 0.1 0.111 0.112 0.214 0.103 0.295

71 70 323 295 50 107 258 50 424 50 63 51 153 50 50 84 50 1000 1000 1000 724 106 980 80 111 50

99.52 99.52 99.51 99.51 99.5 99.52 99.51 99.52 99.5 99.549 99.51 99.51 99.51 99.51 99.93 99.93 99.93 99.92 99.92 99.92 99.99 99.99 99.99 99.99 99.99 99.99

99.52 99.52 99.51 99.51 99.51 99.51 99.51 99.51 99.5 99.549 99.51 99.51 99.51 99.51 99.49 99.43 99.42 99.4 99.4 99.4 99.99 99.99 99.99 99.99 99.99 99.99

99.52 99.52 99.51 99.51 99.51 99.52 99.51 99.52 99.5 99.549 99.51 99.51 99.51 99.51 99.28 99.15 99.17 99.1 99.1 99.09 100 100 100 100 100 100

99.52 99.52 99.5 99.51 99.5 99.51 99.5 99.51 99.5 99.54 99.51 99.51 99.51 99.51 99.69 99.71 99.68 99.70 99.70 99.70 99.98 99.98 99.98 99.98 99.98 99.98

Note: The results with highest accuracy (per dataset) are in bold.

SMOTE and RUS python librariese and apply both oversampling and undersampling techniques on the IoT-Zeek data. According to obtained results, oversampling technique slightly outperforms undersampling technique. Consequently, we consider the oversampled dataset during our experiments and refer to it as IoT-ZeekOversampled dataset. We apply our optimized feature selection solution on the IoTZeek-Oversampled dataset. The results of the two explored fitness functions are presented in Table 7. We observe that the XGBoost e https://imbalanced-learn.org/stable/index.html,

Accessed April 24, 2022.

Dataset NSL-KDD

C1 2

1

UNSW-NB15

2

IoT-Zeek

2

1

#Trees

Learning rate

Accuracy

f1 score

Precision

Recall

0.102 0.13 0.104 0.1 0.10558 0.106 0.105 0.106 0.1 0.1 0.1 0.16 0.1 0.1 0.1 0.158 0.17 0.114 0.137 0.4 0.4 0.325 0.1 0.369 0.360 0.397 0.361

376 327 292 233 241 680 681 686 903 824 827 903 899 1000 1000 1000 816 425 481 1000 1000 730 411 510 624 382 664

0.162523 0.138372 0.16305 0.1473 0.17077 0.1 0.1 0.1 0.1 0.1 0.1 0.102 0.10013 0.1 0.137 0.141 0.12 0.159 0.125 0.294 0.158 0.3 0.1 0.3 0.3 0.132 0.3

99.75 99.75 99.75 99.739 99.73 99.75 99.75 99.75 99.97 99.97 99.97 99.90 99.90 99.90 99.96 99.96 99.96 99.96 99.96 99.90 99.90 99.90 99.90 99.99 99.99 99.99 99.99

99.75 99.75 99.75 99.739 99.73 99.75 99.75 99.75 99.76 99.76 99.76 99.60 99.60 99.80 99.70 99.71 99.69 99.67 99.67 99.90 99.90 99.90 99.90 99.99 99.99 99.99 99.99

99.75 99.75 99.75 99.739 99.7 99.75 99.75 99.75 99.71 99.70 99.71 99.70 99.70 99.80 99.65 99.68 99.67 99.57 99.61 99.90 100 100 100 100 100 99.99 100

99.75 99.73 99.75 99.73 99.73 99.75 99.73 99.75 99.82 99.82 99.82 99.8 99.8 99.87 99.77 99.74 99.71 99.77 99.73 99.90 99.90 99.90 99.98 99.99 99.98 99.99 99.97

Note: The results with highest accuracy (per dataset) are in bold.

A. Chohra et al.

1

Test size

76

Table 5. Feature selection results using XGBoost classifier as fitness function (acceleration constant C2 is fixed to 2 while C1 is tuned, f1 score is the fitness function).

Optimized Feature Selection for Network Anomaly Detection

77

Table 6. Selected features on each dataset using the optimal solution hyperparameters (acceleration constant c2 is fixed to 2 and c1 is tuned between 1 and 2). Dataset & model

Feature name

NSL-KDD Test size: 0.106 Number of trees: 680 Learning rate: 0.1

src bytes num failed logins service diff srv rate flag hot count dst host srv diff host rate dst host same srv rate

0.298222400 0.131071240 0.074615410 0.054890107 0.039971426 0.037307087 0.027289085 0.025930267 0.024637770

UNSW-NB15 Test size: 0.1 Number of trees: 1000 Learning rate: 0.1

sttl ct state ttl dsport proto

0.087299424 0.059610307 0.018620330 0.007498254

IoT-Zeek dataset Test size: 0.369 Number of trees: 510 Learning rate: 0.3

ts id orig p history resp ip bytes

0.672937750 0.121201570 0.110579970 0.089164086

Table 7. Fitness function (f1 score) Random Forest (C2=2)

Feature selection results on IoT-Zeek-oversampled dataset.

C 1 Test size #Trees Accuracy f1 score Precision Recall 2

1

XGBoost (C2=2)

Feature importance

2

1

0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.3054 0.4 0.4 0.3963 0.4

478 518 534 50 116 431 50 50 50 492 679 739

99.99 99.99 99.99 99.99 99.99 99.99 100 100 100 99.99 99.99 99.99

99.99 99.99 99.99 99.99 99.99 99.99 100 100 100 99.99 99.99 99.99

99.99 99.99 99.99 99.99 100 99.99 100 100 100 99.99 99.99 99.99

99.99 99.99 99.99 99.99 99.99 99.99 100 100 100 99.99 99.99 99.99

fitness function achieves the highest fitness values (f1 score). More specifically, when the c2 acceleration constant is equal to c1 (c2 = 2 and c1 = 2), the XGBoost algorithm outperforms in finding better optimal solutions. Afterwards, for each of the selected set

A. Chohra et al.

78

Table 8. Selected features on IoT-Zeek-Oversampled dataset using the optimal solution hyper-parameters (acceleration constant c2 is fixed to 2 and c1 is tuned between 1 and 2). Dataset & model

Feature name

Test size: 0.3905 Number of trees: 50 Learning rate 0.2648

ts resp ip bytes resp pkts resp bytes id orig p

Feature importance 0.254218453 0.161528458 0.152892584 0.084122151 0.083250296

of hyper-parameters, we train the appropriate XGBoost model, and extract the list of features with their corresponding importance values. Then, we select only the features with an importance higher than their average values, as presented in Table 8. 6.4.

Anomaly detection results

In this section, we describe the architecture of our autoencoder models for each of the utilized datasets, and call them NSL-KDD Model, UNSW-NB15 Model, IoT-Zeek Model, and IoT-Zeek-Oversampled Model. Then, we present the results of models, and compare them with the results of state-of-the-art approaches presented in Ref. [26]. NSL-KDD model: After several iterations of model training, we found that the optimal anomaly detection model for this dataset has five hidden layers: two for the encoder (128 and 64 neurons, respectively), one layer for the bottleneck (32 neurons), and two others for the decoder (64 and 128, respectively). In addition, we used two activities’ regularization functions to deal better with the overfitting, namely, dropout=0.5 and l2 norm for kernel regularization at each layer with a value of 0.001 (as shown in the first part of Table 9). Moreover, this autoencoder is a deeply connected autoencoder, such that all layers are of Dense layer type. Each of these layers uses Relu as activation function. The optimal size of batches is set to be 32 with the testing dataset size equal to the optimal one found in the optimization algorithm (0.106). Additionally, we tuned the model with three different loss functions: categorical crossentropy,

Optimized Feature Selection for Network Anomaly Detection

79

mean squared error, and mean absolute error. The results of this model validation with different thresholds are presented in Table 10. As seen, the model performs better with categorical crossentropy as the loss function, with an optimal threshold equal to 0.512, achieving approximately 92.09% average of f1 score metric. UNSW-NB15 model: By using this dataset and after applying the same model tuning steps used for the NSL-KDD dataset, we found that the optimal model has exactly the same regularization values at each layer (dropout=0.5 and kernel regularizer l2=0.001 ). However, there are two differences compared to the previous model on the NSL-KDD dataset. First, with this dataset there are exactly seven hidden layers (encoder=[512,256,128], bottleneck=[64], and decoder=[128,256,512]), as shown in the second part of Table 9, with the batch size being set to 64. Second, this model, contrary to the previous one, performs better with the mean squared error loss function, achieving an optimal threshold of 2.239 with an overall f1 score average equal to 92.904 (as shown in Table 10). IoT-Zeek model: We achieve an overall average of f1 score equal to 97.302 on this dataset (as shown in Table 10) using mean squared error loss function. The autoencoder model for this dataset is the same as the one for NSL-KDD, except that we give it a smaller value for the kernel regularization function (0.0001), as shown in the third part of Table 9. IoT-Zeek-oversampled model: We utilize the same hyperparameters that were tuned for the NSL-KDD Model as shown in Table 9, and train our autoencoder model on the IoT-ZeekOversampled dataset using mean squared error loss function. The Receiver Operating Characteristic (ROC) curve is shown in Figure 6, where the model achieves 99% area under the curve (AUC). Moreover, obtained results for different threshold values are presented in Table 10. As seen, we achieve an f1 score of 94.300 on the oversampled data, while the obtained f1 score on the original data was 97.302. The reason for the 3% drop in the f1 score can be explained with the selected features and their importance before and after oversampling,

80

Table 9. Dataset

Proposed autoencoder architecture by dataset.

Encoder

Bottleneck

Decoder

layer 3 : 32 neurons

NSL-KDD

(1) layer 4 : 64 neurons (2) layer 5 : 128 neurons layer 4 : 64 neurons

UNSW-NB15 (1) layer 1 : 512 neurons (2) layer 2 : 256 neurons (3) layer 3 : 128 neurons

(1) layer 5 : 128 neurons (2) layer 6 : 256 neurons (3) layer 7 : 512 neurons layer 3 : 32 neurons

IoT-Zeek (1) layer 1 : 128 neurons (2) layer 2 : 64 neurons

(1) layer 4 : 64 neurons (2) layer 5 : 128 neurons layer 3 : 32 neurons

IoT-Zeek-Oversampled (1) layer 1 : 128 neurons (2) layer 2 : 64 neurons

(1) layer 4 : 64 neurons (2) layer 5 : 128 neurons

• Dropout: 0.5 • L2-regularizer : 0.001 • Dropout: 0.5 • L2-regularizer : 0.001 • Dropout: 0.5 • L2-regularizer : 0.0001 • Dropout: 0.5 • L2-regularizer : 0.001

A. Chohra et al.

(1) layer 1 : 128 neurons (2) layer 2 : 64 neurons

Regularizations

Optimized Feature Selection for Network Anomaly Detection

Table 10.

81

Chameleon deep learning anomaly detection results.

Dataset & model

Dataset: NSL-KDD Loss function: categorical crossentropy Training time: 6 min, 13 s

Dataset: UNSW-NB15 Loss function: mean squared error Training time: 28 min, 38 s

Dataset: IoT-Zeek Loss function: mean squared error Training time: 8 min, 32 s

Dataset: IoT-Zeek-Oversampled Loss function: mean squared error

Threshold Accuracy

Precision

Recall

f1 score

0.105 0.087 0.177 2.190 1.758 1.018 0.837 0.314 0.742 0.512

86.191 86.072 87.833 89.092 89.532 89.607 90.073 90.006 90.592 90.711

81.481 80.618 84.065 90.753 90.640 89.965 89.900 87.624 89.922 89.351

98.021 99.439 97.016 90.010 91.008 92.005 93.010 96.002 94.008 95.005

88.989 89.045 90.077 90.380 90.824 90.974 91.429 91.622 91.920 92.092

1.352 1.342 2.365 2.346 2.325 2.305 2.286 2.265 2.162 2.239

84.382 84.302 86.058 86.387 86.728 87.083 87.456 87.757 87.629 89.523

79.346 78.795 86.124 85.913 85.705 85.546 85.406 85.169 83.805 90.00

98.099 99.088 90.010 91.008 92.036 93.026 94.031 95.044 97.016 96.002

87.731 87.784 88.024 88.387 88.758 89.129 89.511 89.836 89.927 92.904

3.101 3.098 3.097 3.094 3.092 3.090 3.088 3.086 3.084 3.082

98.158 98.288 98.407 98.530 98.651 98.777 98.894 99.019 99.134 99.246

96.344 96.329 96.235 96.157 96.080 96.043 95.944 95.897 95.789 95.659

90.004 91.002 92.001 93.012 94.010 95.009 96.007 97.005 98.003 99.002

93.066 93.590 94.070 94.558 95.034 95.523 95.975 96.448 96.884 97.302

95.921 90.063 89.015 87.177 85.143 3.364 82.043 78.225 78.037 76.116

97.321 97.458 97.595 97.734 97.871 97.817 97.979 98.099 98.236 98.373

90.444 90.538 90.631 90.724 90.813 86.922 90.719 90.694 90.781 90.866

90.004 91.002 92.001 93.012 94.010 99.002 95.009 96.007 97.005 98.003

90.223 90.770 91.311 91.854 92.384 92.569 92.814 93.275 93.790 94.300

A. Chohra et al.

82

Fig. 6.

Autoencoder anomaly detection ROC curve on IoT-Zeek-oversampled dataset.

as presented in Tables 6 and 8, respectively. Since feature selection technique applies statistical methods to find the best features, if the population (the dataset) changes (due to over/under sampling), the importance of selected features will likely be different, which will affect the overall performance. We further measure the training and validation loss for each of the aforementioned models (i.e., NSL-KDD Model, UNSW-NB15 Model, and IoT-Zeek Model) as presented in Figure 7; where we can see that for all three models, there is no overfitting and each model’s loss becomes stable around epoch 6. Moreover, Figure 8 shows the ROC curve for each one of the aforementioned models (NSL-KDD Model, UNSW-NB15 Model, and IoT-Zeek Model explained in Section 6.4), where all the three models achieve more than 90% area under the curve (AUC), with IoT-Zeek Model achieving almost 100% AUC. 6.5.

Comparative study

We further compare our two models that we trained on both NSL-KDD and UNSW-NB15 benchmark datasets, with the most prominent state-of-the-art anomaly detection models (e.g., the ones presented in Ref. [26]) that are applied on these two datasets.

Optimized Feature Selection for Network Anomaly Detection

UNSW-NB15: mean squared error

NSL-KDD: categorical crossentropy loss val_loss

6.5

0.022

loss val_loss

0.020

6.0

0.018 Loss

5.5 5.0

0.016 0.014

4.5 0.012 4.0

0.010 1

2

3 Epoch

4

6

5

0

1

(a) NSL-KDD

2

3 Epoch

4

5

6

(b) UNSW-NB15 Zeek: mean squared error loss val_loss

0.012 0.010 Loss

0

0.008 0.006 0.004 0

1

2

3 Epoch

4

5

6

(c) IoT-Zeek

Fig. 7.

Deep learning anomaly detection: train and validation loss.

Receiver Operating Characteristric 1.0

0.8 True Positive Rate

Loss

83

0.6

0.4 NSL-KDD (AUC = 0.954) UNSW-NB15 (AUC = 0.932) Zeek (AUC = 0.996)

0.2

0.0 0.0

Fig. 8.

0.2

0.4 0.6 False Positive Rate

0.8

1.0

Autoencoder anomaly detection ROC curves.

84

A. Chohra et al.

(a) NSL-KDD Dataset.

(b) UNSW-NB15 Dataset.

Fig. 9.

A comparative study of anomaly detection approaches in terms of f1 score.

According to our experiments, our proposed autoencoders outperform them in terms of f1 score metric. The results of this comparison are depicted in Figures 9(a) and 9(b), for NSL-KDD and UNSW-NB15, respectively, where for both datasets, our proposed models achieve the highest values of f1 score. Moreover, we compare the performance of our proposed approach with the aforementioned selected work in terms of accuracy metric on both NSL-KDD and UNSW-NB15 datasets. The results of this comparative study are reported in Table 11. More specifically, we compare the accuracy results obtained during the testing (using hold-out dataset) of our autoencoder models

Optimized Feature Selection for Network Anomaly Detection

Table 11. Approach

85

A comparative study of anomaly detection approaches in terms of accuracy. NSL-KDD (%) UNSW-NB15 (%) Ave. Accuracy (%)

Chameleon 90.71 Rashid et al. [46] 99.90 MemAE [41] 89.51 Roy et al. [47] 98.50 CNN [45] Not reported J48 [44] Not reported ICVAE-DNN [26] 85.97 GB-RBM [38] 73.23 RNN-IDS [39] 81.29 ID-CVAE [40] 80.10 CASCADE-ANN [42] Not reported DNN [37] 75.75 STL [36] 74.38 SCDNN [35] 72.64 DT [43] Not reported EM Clustering [43] Not reported

89.52 94.00 85.30 Not reported 93.50 87.65 89.08 Not reported Not reported Not reported 86.40 Not reported Not reported Not reported 85.56 78.47

90.115 96.95 87.405 — — — 87.525 — — — — — — — — —

(NSL-KDD Model and UNSW-NB15 Model) trained on both benchmark datasets with the reported accuracy results of existing stateof-the-art models tested on NSL-KDD dataset (e.g, [26, 35–41]) and on UNSW-NB15 dataset (e.g., [26, 41–45]) as presented in Table 11. In Ref. [44], the authors examine different anomaly detection classifiers on the UNSW-NB15 dataset before and after applying feature selection. The reported results show that the J84 classifier achieves the highest accuracy of 87.65%, outperforming slightly the case where no feature selection is applied (with an accuracy of 87.44%). However, the authors have not measured other performance evaluation metrics (e.g., f1 score, recall, and precision). The authors of Ref. [45] applied a CNN model on only UNSW-NB15 dataset and detect anomalies with an accuracy of 93.5%. However, the authors have not reported additional performance metrics (e.g., f1 score, recall, precision) during their evaluation. In Ref. [41], the authors introduced MemAE, which employs autoencoders to reconstruct the behavior of abnormal samples that look similar to normal ones. MemAE achieves an accuracy of 89.51% and f1 -score of 89.93% on NSL-KDD dataset, as well as 85.3%

86

A. Chohra et al.

accuracy and 85.26% f1 -score on UNSW-NB15 dataset. However, the authors have not considered using feature selection prior to their autoencoder anomaly detection model to show the difference between the two scenarios. Reference [47] propose B-Stacking, a lightweight supervised intrusion detection based on machine learning ensemble that uses K-Nearest Neighbors (KNN), Random Forest, and XGBoost to detect network anomalies. The approach has been evaluated only on the NSL-KDD dataset, with an accuracy of 98.50% and f1 score of 99.00%, and has not been tested on the UNSW-NB15 dataset. The authors in Ref. [46] propose a stacking ensemble technique (called SET ) with SelectKBest feature selection technique and an ensemble of Decision Trees, Random Forest, and XGBoost machine learning models for network anomaly detection. Performed experiments demonstrate that SET obtains an accuracy and f1 score of 94.00% and 94.00% on UNSW-NB15 dataset, and 99.90% for both accuracy and f1 score on NSL-KDD dataset. However, the use of SelectKBest technique is less adaptive to new malicious network traffic over the time. In contrast, our proposed solution, Chameleon, employs an autoencoder model, which is more resilient to new threats since our model uses unsupervised techniques. Moreover, Chameleon has two sub-detection modules: supervised (XGBoost and Random Forest for classification) and unsupervised (deep autoencoders for anomaly detection), both of which achieve promising accuracy results. Although our main objective is to perform anomaly detection, obtained classification results presented in Tables 5 and 7 demonstrate high f1 scores, which outperform reviewed existing approaches. The results reported in Table 10 obtained from our anomaly detection approach uses the autoencoders, which are considered high in the context of anomaly detection (unsupervised). Among the aforementioned works, [26] deploy a combination of variational autoencoders and deep neural networks (DNN) to detect anomalies, which achieves the highest accuracy and f1 score of 85.97% and 86.27% on NSL-KDD, and those of 89.08% and 90.61% on UNSW-NB15. [35] combine both spectral clustering and DNN

Optimized Feature Selection for Network Anomaly Detection

87

achieving 72.64% of accuracy on NSL-KDD, and [36] deploy selftaught learning reporting with accuracy of 74.38% on NSL-KDD. [37] employ DNN and obtain 75.75% accuracy on NSL-KDD. [38] deploy a Gaussian-Bernoulli based Recurrent Boltzmann Machine achieving 73.23% accuracy on NSL-KDD, while [39] propose a novel IDS using recurrent neural networks (RNNs) reporting 81.29% accuracy on NSL-KDD. On the other hand, Ref. [40] propose an intrusion detection system based on conditional variational autoencoders (CVAE), achieving 80.1% accuracy on the NSL-KDD dataset. Reference [42] introduce an intrusion detection approach using multicascading artificial neural networks achieving an accuracy of 86.4% on UNSW-NB15 dataset. Reference [43] deploy two approaches; the first one uses expectation-maximization clustering technique in order to detect anomalies efficiently achieving 78.47% accuracy on UNSWNB15 dataset, and the second approach deploys decision trees on the same dataset and records an accuracy of 85.56%. However, given all this information, we notice that our work outperforms the aforementioned state-of-the-art works by achieving 90.711% of accuracy and 92.092% f1 -score on NSL-KDD, and 89.523% of accuracy and 92.904% of f1 -score on UNSW-NB15 dataset. The advantages of our approach over aforementioned existing works are as follows: • Feature selection: where our work is among a few proposed approaches (e.g., [44]) that introduce the selection of the most important features through PSO algorithm before applying a detection model. This leads to achieving a more accurate model, since feature selection helps filter non-important/relevant features (noisy data) from the dataset, which leads to classifying each class/label more accurately and results in more accurate models. In addition, feature selection provides better efficiency and scalability compared to existing models that use the whole features of the datasets. • Evaluation on recent real-world IoT dataset: while existing works evaluated their approaches on the most common benchmark

A. Chohra et al.

88

Random Rorest 300

275

250 Time (minutes)

XGBoost

210

200 150 100 50

63

45

38

53

0 NSL-KDD Fig. 10.

UNSW-NB15

Zeek

Optimized feature selection execution times.

datasets (NSL-KDD and UNSW-NB15), none of them conduct experiments on real-world IoT dataset. On the contrary, we first generate our own real-word IoT dataset, and then apply our models to the generated IoT dataset in addition to those non-IoT datasets. This makes our approach more realistic and applicable to recent security problems. 6.6.

Efficiency

We further examine the execution time for the optimization feature selection algorithm depending on the chosen fitness function. The obtained results are presented in Figure 10. The execution time is relatively high for the UNSW-NB15 dataset, due to the huge number of records as well as large number of features. However, we do not consider this as an issue, since the optimized feature selection task is executed only once on each dataset. 7.

Related Works

In this section, we present the most relevant and important works that have been proposed for: (i) feature selection using optimization algorithms and (ii) anomaly detection and maliciousness fingerprinting using machine learning and deep learning models.

Optimized Feature Selection for Network Anomaly Detection

7.1.

89

Feature selection using optimization

Reference [5] propose a feature selection approach using Artificial Bee Colony (ABC) and integrate a Kalman filterf alongside Hadoop ecosystemg for noise removal. The system is validated on 10 datasets and compared with swarm intelligence approaches. However, the authors have not applied their approach on IDS datasets. Reference [6] propose a technique for feature selection, which incorporates a hybrid genetic algorithm with granular information. This technique is tested on 11 benchmark financial datasets and has been compared with certain state-of-the-art techniques. The obtained results demonstrate that it achieves high classification accuracy. However, the usage of the approach on other types of datasets (e.g., network dataset) has not been explored. In Ref. [7], a feature selection approach for classification is proposed, where the feature selection task is considered as a nondeterministic problem. The authors investigate two types of multiobjective PSO algorithms. The first one leverages the concept of non-dominated sorting in the feature selection problem. While the second one introduces more evolutionary concepts (mutation and crossover) to search for better optimal solutions. The two algorithms are then compared with two standard feature selection techniques and are validated on twelve benchmark datasets. However, the authors have not explored the usage of more complex fitness functions. Another approach for feature selection is proposed in Ref. [8], which combines multi-swarm particle swarm optimization (MSPSO) and support vector machines (SVM) as fitness function, with f1 score being the fitness value. The goal is to execute both kernel optimization and feature selection simultaneously in order to get better generalization. The proposed approach is compared with state-of-the-art feature selection algorithms using PSO, genetic algorithm (GA), and

f http://web.mit.edu/kirtley/kirtley/binlustuff/literature/control/Kalman%20filter.pdf, Accessed April 24, 2022. g https://hadoop.apache.org/, Accessed April 24, 2022.

90

A. Chohra et al.

grid search, using 10 UCI (University of California Irvine)h machine learning benchmark datasets for validation. The evaluation results show that their technique outperforms three aforementioned techniques in terms of accuracy metrics. However, the proposed algorithm is only specific to the datasets used for validation and has not been tested on network IDS datasets. In Ref. [9], the authors propose a feature selection approach which combines both GA and PSO algorithms, where SVM is used as fitness function and the accuracy metric as fitness value. The proposed approach is validated on Indian Pines Spectral dataset [48] and the results show that the approach can select the most relevant features that allow higher accuracy results for classification. However, the authors have neither presented an exhaustive study on benchmark datasets nor a comparative study with state-of-the-art techniques. Moreover, the proposed solution is only limited to those types of utilized datasets. Another approach for feature selection by combining genetic algorithm with neural networks (HGA-NN) is introduced in Ref. [10]. The approach is applied to a real-world credit dataset collected from the Croatian Bank, and furthermore is evaluated on a benchmark credit dataset selected from the UCI database. This technique is compared to existing classification works in terms of accuracy results and showed that it outperforms them. However, this solution focuses on the accuracy rather than f1 score, and has only been applied to UCI datasets. 7.2.

Deep learning and anomaly detection

In Ref. [49], the authors present an anomaly detection system on web applications by proposing a stacked ensemble by combining several ensemble models (e.g., Random Forests and XGBoost) and validate their approach on four different datasets (i.e., CSIC-2010v2, CICIDS-2017, NSL-KDD, UNSW-NB15). The obtained results show that the proposed stacked model outperforms existing web attack h https://archive.ics.uci.edu/ml/datasets.php,

Accessed April 24, 2022.

Optimized Feature Selection for Network Anomaly Detection

91

detection solutions in terms of accuracy and false positive rates (FPR). However, the authors have not preformed a scalability and complexity study of their approach, especially for two large datasets (UNSW-NB15 and CICIDS-2017). Reference [50] also proposed a stacked-based model for anomaly-based intrusion detection systems, where the based learners/models are deep neural networks (DNN). Their approach is then validated on benchmark datasets (NSL-KDD, UNSW-NB15, and CICIDS-2017) and evaluated using several metrics including accuracy, false positives rate, and Matthew’s Correlation Coefficient. The obtained results prove that their model outperforms the simple DNN-based anomaly model in addition to some stateof-the-art techniques (by achieving 89.97%, 92.83%, and 99.65% on the three aforementioned benchmark datasets, respectively). However, the authors have not preformed a scalability study of their model on these datasets. In Ref. [51], the authors present a novel approach for anomaly detection which combines both genetic algorithm and fuzzy logic. More specifically, the genetic algorithm is deployed to better represent fingerprints of network segments using network flow data. This also allows to predict network traffic behaviors for specific and predefined time windows. Then, fuzzy logic is used to decide whether there are some anomalous behaviors within those time-windows. Their approach is validated and evaluated on real-world network traffic data and it has proven to be effective by achieving 96.53% of accuracy and 0.56% of false positive rates. Reference [52] propose an approach for anomaly detection on network traffic data, called SVM-L, which combines both SVM and Linear Discriminant Analysis (LDA). More specifically, URLs from the data are used as input and are converted into vector format using natural language processing (NLP) and statistical techniques. Then, these vectors are fed to the SVM model to be classified into anomalies or normal. An optimization algorithm is utilized to optimize the hyper-parameters of the SVM classifier. The validation results of the SVM-L model show that it achieves 99% of accuracy on the tested datasets.

92

A. Chohra et al.

There exist several solutions (e.g., [53,54]) that improve the results of the maliciousness segregation using advanced machine learning and NLP techniques on log files. For instance, Ref. [54] propose attack2vec to detect emerging network attacks by leveraging dynamic word embedding techniques. Similar to NLP word embeddings, their approach produces a dense representation of the security events while considering the time factor. Moreover, in Ref. [53], the authors propose Atlas, a framework for attack investigation that leverages NLP and deep learning techniques to segregate attacks and non-attacks using logs as input. Atlas begins with processing the logs and building a causal dependency graph between the events found in the logs. This graph is augmented using NLP techniques and which is used to train a sequence-based model that represents the attack semantics. The produced models help the cyber analyst identify key attack steps that share similarities with previous patterns. On the contrary, our proposed IoT real-world dataset generation (presented in Section 5) fingerprints malicious logs from the IoT network traffic data by leveraging an ensemble model constructed using several models/classifiers (e.g., Random Forests, XGBoost, CatBoost, NN, and CNN). In Ref. [44], the authors present a study of different anomaly detection classifiers before and after applying feature selection. More specifically, the authors compare different machine learning classifiers by training each model twice. In the first iteration, they use all the existing features from the dataset. During the second iteration, the authors first tune the classifier with several feature selection algorithms; then, they select that feature selection algorithm which gives the best accuracy results, and use the selected features with the same classifier as for the first training iteration. The reported results show that the J84 classifier achieves the highest accuracy of 87.65%, outperforming slightly the case where no feature selection is applied (with an accuracy of 87.44%). However, the authors have not measured other performance metrics (e.g., f1 -score, recall, and precision). The authors of Ref. [45] apply a convolutional neural network (CNN) model on UNSW-NB15 dataset in order to detect anomalies efficiently. Obtained result shows that their proposed CNN model achieves an accuracy of 93.5%. However, the authors have

Optimized Feature Selection for Network Anomaly Detection

93

not compared their work with any other state-of-the art approaches and they have not reported additional performance metrics (e.g., f1 score, recall, precision) during their evaluation. In Ref. [41], the authors introduced a network anomaly detection technique, called memory-augmented deep autoencoder (MemAE ). Autoencoder is used to reconstruct the behavior of abnormal samples that look similar to normal ones; thus, the authors are solving the problem of over-generalization, which occurs with abnormal samples on autoencoders. The experiments were conducted on both NSLKDD and UNSW-NB15 datasets. The obtained results prove that their model achieved an accuracy of 89.51% and f1 -score of 89.93% on NSL-KDD dataset, as well as 85.3% accuracy and 85.26% f1 -score on UNSW-NB15 dataset. However, the authors have not considered using feature selection prior to their autoencoder anomaly detection model in order to show the difference between the two scenarios. Reference [47] propose a lightweight intrusion detection system, called B-Stacking, based on supervised machine learning. A series of feature transformation, dimensionality reduction, and feature selection methods are applied to produce the learning features. Afterwards, the authors propose B-Stacking, a machine learning ensemble that uses K-Nearest Neighbors (KNN), Random Forest, and XGBoost to detect network anomalies. However, the detection run-time has not been reported. The authors in Ref. [46] propose a stacking ensemble technique (SET) with SelectKBest feature selection technique for network anomaly detection. First, dimensionality reduction and features selection are applied to segregate relevant features. Next, an ensemble of Decision Trees, Random Forest, and XGBoost machine learning models are employed to detect anomalies. However, the use of SelectKBest technique is less adaptive to new malicious network traffic over the time. 8.

Concluding Remarks and Limitations

Optimization of non-deterministic tasks in machine learning and deep learning is becoming a new widespread approach to help

94

A. Chohra et al.

developers find optimal hyper-parameter settings and use them to build their classification, regression, or clustering models. This chapter presented a novel approach which focuses on finding the optimal hyper-parameters for ensemble methods in order to select the important features on a given networking dataset. The proposed approach is developed by combining ensemble methods with a swarm intelligence optimization algorithm (PSO). Our validation results prove that the proposed algorithm finds the optimal solutions better when tuned with boosting (XGBoost) ensemble techniques rather than bagging (Random Forest) ones. Moreover, we used the optimal solutions detected by the optimization algorithm in order to select the appropriate set of features on each validation dataset. Using only those features, we built and tuned an anomaly detection autoencoder for each one of these datasets. Obtained evaluation results demonstrate that our anomaly detection models outperform the most efficient state-of-the-art techniques applied on these datasets. Additionally, they achieve reasonable and reduced training time delays. However, there are some limitations to our work that need to be addressed in the future. (i) We used only two hyper-parameters for the optimization algorithm when using Random Forests (number of trees and test size), and three when using it with XGBoost (number of trees, test size, and learning rate). We are currently exploring the possibility of adding (optimizing) more hyper-parameters. (ii) we need to improve the scalability (execution times) of the feature selection (optimization) algorithm. Although this does not pose an issue, since it needs to be run only once for each dataset and not only on a regular basis. (iii) We have not explored the setting of PSO hyper-parameters (c1, c2, and w) in an adaptable fashion, which can also improve the search efficiency; this involves the usage of some variations of PSO, such as Adaptive Particle Swarm Optimization (APSO) [55], in order to find the optimal settings for these three hyper-parameters.

Optimized Feature Selection for Network Anomaly Detection

95

References 1. O. Sagi and L. Rokach. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249 (2018). 2. R. Sheikhpour, M. A. Sarram, S. Gharaghani, and M. A. Z. Chahooki. A survey on semi-supervised feature selection methods. Pattern Recognition, 64, 141–158 (2017). 3. C. Lazar, J. Taminau, S. Meganck, D. Steenhoff, A. Coletta, C. Molter, V. de Schaetzen, R. Duque, H. Bersini, and A. Nowe. A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9 (4), 1106–1119 (2012). 4. H. M. Gomes, J. P. Barddal, F. Enembreck, and A. Bifet. A survey on ensemble learning for data stream classification. ACM Computing Surveys (CSUR), 50(2), 1–36 (2017). 5. A. Ahmad, M. Khan, A. Paul, S. Din, M. M. Rathore, G. Jeon, and G. S. Choi. Toward modeling and optimization of features selection in big data based social internet of things. Future Generation Computer Systems, 82, 715–726 (2018). 6. H. Dong, T. Li, R. Ding, and J. Sun. A novel hybrid genetic algorithm with granular information for feature selection and optimization. Applied Soft Computing, 65, 33–46 (2018). 7. B. Xue, M. Zhang, and W. N. Browne. Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Transactions on Cybernetics, 43(6), 1656–1671 (2012). 8. Y. Liu, G. Wang, H. Chen, H. Dong, X. Zhu, and S. Wang. An improved particle swarm optimization for feature selection. Journal of Bionic Engineering, 8(2), 191–200 (2011). 9. P. Ghamisi and J. A. Benediktsson. Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geoscience and Remote Sensing Letters (GRSL), 12(2), 309–313 (2014). 10. S. Oreski and G. Oreski. Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Systems with Applications, 41(4), 2052– 2064 (2014). 11. A. Chohra, M. Debbabi, and P. Shirani. Daedalus: Network anomaly detection on IDS stream logs. In International Symposium on Foundations and Practice of Security, pp. 95–111 (2018). 12. W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi. A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11–26 (2017). 13. W.-J. Jia and Y.-D. Zhang. Survey on theories and methods of autoencoder. Computer Systems & Applications, 5, 1 (2018). 14. R. Chalapathy and S. Chawla. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407 (2019).

96

A. Chohra et al.

15. M. Ahmed, A. N. Mahmood, and J. Hu. A survey of network anomaly detection techniques. Journal of Network and Computer Applications, 60, 19–31 (2016). 16. D. Kwon, H. Kim, J. Kim, S. C. Suh, I. Kim, and K. J. Kim. A survey of deep learning-based network anomaly detection. Cluster Computing, 22(1), 949–961 (2019). 17. M. Xie, S. Han, B. Tian, and S. Parvin. Anomaly detection in wireless sensor networks: A survey. Journal of Network and Computer Applications, 34(4), 1302–1325 (2011). 18. W. M. Hartmann. Dimension reduction vs. variable selection. In International Workshop on Applied Parallel Computing (PARA’04), pp. 931–938 (2004). 19. A. Chohra, P. Shirani, E. B. Karbab, and M. Debbabi. Chameleon: Optimized feature selection using particle swarm optimization and ensemble methods for network anomaly detection. Computers & Security, 117, 102684 (2022). 20. Z. Team. Zeek an open source network security monitoring tool. https:// zeek.org/ (2018). 21. J. Kennedy and R. Eberhart. Particle swarm optimization. In Proceedings of 95-International Conference on Neural Networks (ICNN), vol. 4, pp. 1942– 1948 (1995). 22. P. B¨ uhlmann. Bagging, boosting and ensemble methods. In Handbook of Computational Statistics, pp. 985–1022. Springer, Heidelberg, Dordrecht, London, New York (2012). 23. W. Ali and S. J. Malebary. Particle swarm optimization-based feature weighting for improving intelligent phishing website detection. IEEE Access, 8, 116766–116780 (2020). 24. F. Q. Lauzon. An introduction to deep learning. In 2012 11th International Conference on Information Science, Signal Processing and Their Applications (ISSPA), pp. 1438–1439 (2012). 25. F. Marini and B. Walczak. Particle swarm optimization (PSO). A tutorial. Chemometrics and Intelligent Laboratory Systems, 149, 153–165 (2015). 26. Y. Yang, K. Zheng, C. Wu, and Y. Yang. Improving the classification effectiveness of intrusion detection by using improved conditional variational autoencoder and deep neural network. Sensors, 19(11), 2528 (2019). 27. V. Paxson. Bro: A system for detecting network intruders in real-time. Computer Networks, 31(23-24), 2435–2463 (1999). 28. S. R. Laboratory. Malware public datasets. https://mcfp.felk.cvut.cz/ publicDatasets/ (2018). 29. M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani. A detailed analysis of the KDD CUP 99 data set. In IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA’09), pp. 1–6 (2009). 30. N. Moustafa and J. Slay. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1–6 (2015). 31. N. Moustafa and J. Slay. The evaluation of network anomaly detection systems: Statistical analysis of the UNSW-NB15 data set and the comparison

Optimized Feature Selection for Network Anomaly Detection

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

97

with the KDD99 data set. Information Security Journal: A Global Perspective, 25(1–3), 18–31 (2016). N. Moustafa, J. Slay, and G. Creech. Novel geometric area analysis technique for anomaly detection using trapezoidal area estimation on large-scale networks. IEEE Transactions on Big Data, 5(4), 481–494 (2019). N. Moustafa, G. Creech, and J. Slay. Big data analytics for intrusion detection system: Statistical decision-making using Finite Dirichlet mixture models. In Data Analytics and Decision Support for Cybersecurity: Trends, Methodologies and Applications, pp. 127–156. Springer, Cham, Switzerland (2017). A. Fern´ andez, S. Garc´ıa, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera. Learning from Imbalanced Data Sets. vol. 10, Springer, Cham, Switzerland (2018). T. Ma, F. Wang, J. Cheng, Y. Yu, and X. Chen. A hybrid spectral clustering and deep neural network ensemble algorithm for intrusion detection in sensor networks Sensors 16(10), 1701 (2016). A. Javaid, Q. Niyaz, W. Sun, and M. Alam. A deep learning approach for network intrusion detection system. In Proceedings of the 9th EAI International Conference on Bio-Inspired Information and Communications Technologies (formerly BIONETICS), pp. 21–26 (2016). T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and M. Ghogho. Deep learning approach for network intrusion detection in software defined networking. In 2016 International Conference on Wireless Networks and Mobile Communications (WINCOM), pp. 258–263 (2016). Y. Imamverdiyev and F. Abdullayeva. Deep learning method for denial of service attack detection based on restricted Boltzmann machine. Big Data, 6(2), 159–169 (2018). C. Yin, Y. Zhu, J. Fei, and X. He. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access, 5, 21954–21961 (2017). M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret. Conditional variational autoencoder for prediction and feature recovery applied to intrusion detection in IoT. Sensors, 17(9), 1967 (2017). B. Min, J. Yoo, S. Kim, D. Shin, and D. Shin. Network anomaly detection using memory-augmented deep autoencoder. IEEE Access, 9, 104695–104706 (2021). M. M. Baig, M. M. Awais, and E.-S. M. El-Alfy. A multiclass cascade of artificial neural network for network intrusion detection. Journal of Intelligent & Fuzzy Systems, 32(4), 2875–2883 (2017). N. Moustafa and J. Slay. The evaluation of network anomaly detection systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Information Security Journal: A Global Perspective, 25(1–3), 18–31 (2016). A. Roy and K. J. Singh. Multi-classification of UNSW-NB15 dataset for network anomaly detection system. In Proceedings of International Conference on Communication and Computational Technologies, pp. 429–451 (2021).

98

A. Chohra et al.

45. G. Mahalakshmi, E. Uma, M. Aroosiya, and M. Vinitha. Intrusion detection system using convolutional neural network on unsw nb15 dataset. In Advances in Parallel Computing Technologies and Applications, pp. 1–8. IOS Press (2021). 46. M. Rashid, J. Kamruzzaman, T. Imam, S. Wibowo, and S. Gordon. A treebased stacking ensemble technique with feature selection for network intrusion detection. Applied Intelligence, pp. 1–14 (2022). 47. S. Roy, J. Li, B.-J. Choi, and Y. Bai. A lightweight supervised intrusion detection mechanism for iot networks. Future Generation Computer Systems, 127, 276–285 (2022). 48. NASA AVIRIS Sensor. Indian Pines dataset. http://www.ehu.eus/ccwintco/ index.php/Hyperspectral Remote Sensing Scenes#Indian Pines (2021). 49. B. A. Tama, L. Nkenyereye, S. R. Islam, and K.-S. Kwak. An enhanced anomaly detection in web traffic using a stack of classifier ensemble. IEEE Access, 8, 24120–24134 (2020). 50. L. Nkenyereye, B. A. Tama, and S. Lim. A stacking-based deep neural network approach for effective network anomaly detection. Computers Materials & Continua, 66(2), 2217–2227 (2021). 51. A. H. Hamamoto, L. F. Carvalho, L. D. H. Sampaio, T. Abr˜ ao, and M. L. Proen¸ca Jr. Network anomaly detection system using genetic algorithm and fuzzy logic. Expert Systems with Applications, 92, 390–402 (2018). 52. Q. Ma, C. Sun, B. Cui, and X. Jin. A novel model for anomaly detection in network traffic based on kernel support vector machine. Computers & Security, 104, 102215 (2021). 53. A. Alsaheel, Y. Nan, S. Ma, L. Yu, G. Walkup, Z. B. Celik, X. Zhang, and D. Xu. ATLAS: A sequence-based learning approach for attack investigation. In 30th USENIX Security Symposium (USENIX Security 21) (2021). 54. Y. Shen and G. Stringhini. Attack2vec: Leveraging temporal word embeddings to understand the evolution of cyberattacks. In 28th USENIX Security Symposium (USENIX Security 19), pp. 905–921 (2019). 55. Z.-H. Zhan, J. Zhang, Y. Li, and H. S.-H. Chung. Adaptive particle swarm optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(6), 1362–1381 (2009).

c 2023 World Scientific Publishing Company  https://doi.org/10.1142/9789811273209 0004

Chapter 4

Forensic Data Analytics for Anomaly Detection in Evolving Networks Li Yang∗,¶ , Abdallah Moubayed∗, , Abdallah Shami∗,∗∗ , Amine Boukhtouta†,†† , Parisa Heidari‡,‡‡ , Stere Preda†,§§ , Richard Brunner§,¶¶ , Daniel Migault†, , and Adel Larabi†,∗∗∗ ∗

Western University, London, ON, Canada † Ericsson, Montreal, QC, Canada ‡ IBM, Montreal, QC, Canada § Log5Data, Montreal, QC, Canada ¶

[email protected] [email protected] ∗∗ [email protected] †† [email protected] ‡‡ [email protected] §§ [email protected] ¶¶ [email protected]  [email protected] ∗∗∗ [email protected] 

In the prevailing convergence of traditional infrastructure-based deployment (i.e., Telco and industry operational networks) toward evolving deployments enabled by 5G and virtualization, there is a keen interest in elaborating effective security controls to protect these deployments in-depth. By considering key enabling technologies like 5G and virtualization, evolving networks are democratized, facilitating the establishment of point presences integrating different business models ranging from media, dynamic web content, gaming, and a plethora of IoT use cases. Despite the increasing services provided by evolving networks, many cybercrimes and attacks have been launched in evolving 99

L. Yang et al.

100

networks to perform malicious activities. Due to the limitations of traditional security artifacts (e.g., firewalls and intrusion detection systems), the research on digital forensic data analytics has attracted more attention. Digital forensic analytics enables people to derive detailed information and comprehensive conclusions from different perspectives of cybercrimes to assist in convicting criminals and preventing future crimes. This chapter presents a digital analytics framework for network anomaly detection, including multi-perspective feature engineering, unsupervised anomaly detection, and comprehensive result correction procedures. Experiments on real-world evolving network data show the effectiveness of the proposed forensic data analytics solution.

1.

Introduction

The evolution of modern networks enables the shift from traditional infrastructure toward key enabling technologies, like virtualization, cloud, and 5G [1]. These technologies are meant to settle the integration of fully-fledged business models in different points of presence, including content delivery, gaming, Internet of Things (IoT), etc. The Ericsson June 2020 Mobility Report [2] highlights the rapid growth of 5G networks, infrastructure, applications, and end-user services. Due to the COVID-19 pandemic, 88% of professionals use online video calls in their work and personal life. Communication Service Providers (CSPs) face the challenge of delivering resilient and secure networks, as well as innovative service offerings [2]. Ericsson sees Intelligent security management as a business accelerator. CSPs need support in automating, scaling, and adapting their security posture to stay protected against threats in this evolving 5G network landscape. With the development of evolving networks, various types of network devices and entities have generated large volumes of data and traces [3]. During network communications, massive data are collected at the network level. Client-centric forensic data are a major source of network data, including history logs, access logs, chat logs, cookies, system logs, etc. [3]. The extensive digital traces in network environments can potentially give insights into the actions and behaviors of network users and devices. Most research on network data and forensic analytics has focused on static networks, in which single observations or combined

Forensic Data Analytics for Anomaly Detection in Evolving Networks

101

historical datasets that do not change over time are collected and used [1]. However, the majority of modern networks are connected by evolving systems with dynamic activities, named evolving networks [1]. In real-world applications, modern networks evolve abruptly and continuously. For example, real-world network devices and services usually face continuous upgrades due to new user requirements, causing the networks to evolve over time. Cyberattacks will also result in dramatic changes in communication networks. Thus, it is valuable to conduct research in detecting abnormal changes in evolving networks for modern network protection. Accurate anomaly detection is crucial for decision-making and the timely execution of necessary actions. Additionally, sudden changes often occur only in certain nodes of a large evolving network [1]. Hence, it is important to locate the affected subnetworks and nodes, which can help us determine the root cause of anomalous events and take proper countermeasures. Network anomalies can be classified into two primary types: legitimate anomalies and network attacks [4]. Legitimate anomalies are network issues relating to operational failures and performance, such as server failures, misconfigurations, crowd events, etc. Network attacks and cybercrimes are anomalies that are launched by malicious attackers and often cause severe consequences. There have been a growing number of cybercrimes and attacks involving the devices and services of evolving networks. Not only can network devices be targets of cyberattacks, but they can also be exploited to launch attacks [3]. Cybercrimes and attacks often result in harmful consequences, such as service unavailability, network congestion, financial losses, and other severe issues [5]. In this work, we focus on the cyberattacks that target network services, including Denial of Service (DoS) attacks and Cache Pollution Attacks (CPAs), so-called service targeting attacks. Since service targeting attacks are capable of disrupting services and compromising devices of evolving networks, it is essential to develop effective forensics approaches to identify malicious behaviors and cyberattacks [6]. Network forensic analytics techniques enable us to identify the attackers and compromised networks/nodes, so as to take proper

102

L. Yang et al.

countermeasures to defend against attacks and address the network issues. With the continuously growing volumes of digital data and the development of high technology crimes, the expense of conducting cybercrime investigations has been increasing dramatically [7]. Thus, the research on digital forensic analytics has attracted significant attention, especially in dealing with large datasets. Digital forensics plays a vital role in crime reconstruction and evidence generation in court. Thus, it is crucial to develop digital forensic analytics techniques to discover more useful information about cyberattacks, such as the attackers, victims, compromised devices, places, time periods, and means of attack. Network data analytics and forensic analytics have become essential techniques in evolving networks, and are expected to become more critical in the near future. Digital forensics is the process of discovering and evaluating relevant details about an event of interest to get a greater comprehension of the event [3]. The event traces on the digital system are used by forensic investigators to expose the truth of the event. According to the National Institute of Standards and Technology (NIST) recommendation [8], digital forensics is also defined as the process of collecting, examining, analyzing, identifying, and presenting (CEAIP) digital data, aiming to transform the original information into digital evidence and provide investigation results using forensic methods [6]. The forensic analytics techniques should be applied to network data to derive conclusions about the occurred cybercrimes, such as the attacker devices, their motivations or objectives, their attack methods or types, their exploited services, and the compromised devices. By connecting the forensic process to the data analytics process, Data Mining (DM) and Machine Learning (ML) algorithms can be utilized to construct network security models capable of effectively identifying cybercrimes. DM and ML techniques are able to discover useful information and trends that are difficult to be found by human observation. Thus, ML algorithms can be integrated with forensic analytics methods to handle the ever-increasing volume of data

Forensic Data Analytics for Anomaly Detection in Evolving Networks

103

and information [9]. ML algorithms are able to detect numerically abnormal patterns, while forensic analytics can be used to determine the real attack patterns and identify the relevant crime information, like affected devices and malicious attacker Internet Protocol (IP) addresses. A major challenge in digital forensics is to locate relevant information in large volumes of data [9]. Additionally, many traditional anomaly detection solutions (e.g., ensemble-based Anomaly Detection System (ADS) [10], multi-stage ADS [11], tree-based ADS [12], and Convolution Neural Network (CNN)-based ADS [13]) only focus on the identification process through ML model learning, which is a small subset of the large digital forensic analytics process. Thus, this chapter presents a comprehensive unsupervised anomaly detection framework applicable to the forensic analytics of a wide range of information and evidence. On the other hand, for network data analytics, the main difference between static and evolving network analytics is that static network analysis uses single observations of the network without time correlations, while evolving network analysis extracts evolving features of the network by retaining time-series information [14]. Hence, this chapter also proposes a multi-perspective feature engineering method to extract time-based features and retain time-series correlations for effective evolving network data analytics. The proposed multi-perspective feature engineering approach and the comprehensive unsupervised anomaly detection framework aim to fingerprint malicious clients or IoTs in the IP space and abnormal content. In addition, it allows the identification of targeted service nodes. The research and validation of the unsupervised anomaly detection framework are based on real-world service data logs collected from dynamic evolving networks. The large-sized service data consist of more than 452 million unlabeled lines of logs. Through experiments, the abnormal network entities (i.e., malicious IPs, abnormal contents, and compromised nodes) and their corresponding attack types are detected effectively using the proposed multiperspective anomaly detection framework.

L. Yang et al.

104

2.

Background

This section provides backgrounds on the service targeting attacks in evolving networks and presents the digital forensic analytics procedures that can be used for network anomaly detection.

2.1.

Service targeting attacks in evolving networks

By service targeting attacks, we mean any type of disturbances performed to impact the functioning of serving nodes in evolving networks. By serving nodes, we mean any masterpiece used to integrate different business models like message brokers for IoTs, gateways, proxies, caching nodes, etc. [15]. The goal of cyberattackers is usually either taking down a service or altering its inner workings [16]. The former, commonly known as DoS attacks [5], is meant to overwhelm serving nodes and reduce their capability to drive well-functioning of services, resulting in sustainability impact. The latter is meant to take advantage of misconfigurations or vulnerabilities of serving mechanisms for mid or long-term impact on the sustainability of service without taking it down, resulting in stealthy attacks. For the sake of illustration, in the context of clouds, network threats can take down serving nodes resulting in a denial of service or misconfiguring internal parameters to impact serving nodes’ production on-premises resulting in a sustainability impact. In the context of content delivery networks, CPAs [5] can be launched by polluting the cache space with many unpopular or illegitimate contents; therefore, legitimate clients will get many cache misses for popular files, making the cache mechanism less effective resulting in a sustainability impact. An insider threat can tweak caching parameters resulting in impacting the effectiveness of the caching mechanism. This chapter focuses on the detection of service targeting attacks and the related network entities to present digital evidence through digital forensic analytics methods.

Forensic Data Analytics for Anomaly Detection in Evolving Networks

2.2.

105

Digital forensic analytics

The standard Digital Forensic Analytics (DFA) process can be divided into five consecutive steps: collection, examination, identification, analysis, and presentation [7]. Their definitions and challenges in evolving network analytics are as follows [8]: (1) Collection: The collection process aims to collect and integrate data from digital devices for subsequent procedures of digital forensics. Error data and uncertain data should be avoided in the data collection process. In evolving networks, network data are often acquired from a variety of nodes or other data sources with different attributes or formats. Additionally, network data sources often generate data at different speeds, making it challenging to integrate network data. Thus, it is critical to use proper data collection and integration methods in order to obtain high-quality data for further analysis. (2) Examination: During the examination process, the collected data are examined and prepared for further identification and analysis. As original datasets are often collected from different sources, a preliminary data evaluation and reconstruction should be conducted to extract and interpret important digital evidence. Due to the limitations of raw data, there are many challenges associated with the examination process of evolving network data analytics. First, raw data often includes a large number of missing and error samples that should be eliminated or imputed in order to prevent noisy data from impairing the analytics results. Second, different class distributions (i.e., unbalanced data) and feature ranges have a negative effect on the performance of analytics models. They can be addressed by using appropriate data sampling and normalization approaches. Third, the raw network data are usually not the optimal data with the most appropriate attributes/features. Thus, proper feature engineering should be conducted to extract and select the most relevant network features for cyberattack detection.

106

L. Yang et al.

Many digital forensic technologies, like Automated Machine Learning (AutoML) techniques [17, 18], have been developed to automatically conduct the examination and pre-processing procedures to reduce human efforts and potential mistakes. (3) Identification: The identification process aims to identify the relevant digital objects, including the events, people, items, and methods related to the case, based on the analytics of the examined data. In the anomaly detection problems in the context of evolving networks, ML and DM algorithms are widely used to develop classifiers that can identify the benign and abnormal events or cyberattacks based on their patterns. As the MLbased identification approaches (e.g., Conti et al. [19] and Yang et al. [20]) usually suffer from high-false positive rates, further analysis on the identification results is often required. (4) Analysis: In the analysis procedure, the digital objects obtained in the identification procedure are analyzed according to their relevance and reliability. The digital objects that meet the requirements will be chosen as digital evidence and investigation results. In the anomaly detection problems in the context of evolving networks, analysis can be conducted to validate and correct the identification results obtained by ML and DM models. For accurate analysis, human efforts and expert knowledge are often required to obtain sufficient and valid evidence. (5) Presentation: The presentation process aims to conclude the investigation results and present them in a proper way, so that the relevant staff and police officers can understand the investigation results clearly and use them for criminal conviction and accountability. In the anomaly detection problems in the context of evolving networks, as the investigation results often involve machine-readable data or technical terms, the main difficulty is to make the results user-friendly and human-understandable. Moreover, proper data visualization methods are also helpful for the presentation of network anomaly detection results. The proposed solution is designed based on the typical DFA process described above, which will be presented in Section 4.2.

Forensic Data Analytics for Anomaly Detection in Evolving Networks

3.

107

Literature Review

In this section, a review on the existing literature is conducted, which includes general network anomaly detection methods, digital forensic data analytics techniques, service targeting attack detection approaches, and cybercrime-related entity detection methods.

3.1.

Network anomaly detection

Network anomaly detection refers to the process of analyzing network packets to identify abnormal network events and behaviors for network security enhancement. This subsection focuses on the network anomaly detection systems that aim to identify malicious cyberattacks and threats in modern networks by network traffic data analytics. Yang et al. [13] propose a CNN and transfer learningbased intrusion detection system for the Internet of Vehicles. The proposed system can transform general network traffic data into images to better fingerprint the patterns of cyberattacks. Moubayed et al. with their ensemble-based ADS [10, 21] propose an ensemble feature engineering and optimized random forest-based intrusion detection model to detect Botnet attacks and DNS typo-squatting. Yang et al. [22,23] propose an adaptive data analytics framework for anomaly detection in evolving and dynamic networks. It analyzes the time-based features in evolving networks and achieves high accuracy of more than 99% on public datasets. However, its high run-time complexity remains an issue. Injadat et al. [11, 24] propose Bayesian optimization-based ML models for network intrusion detections. The proposed models can effectively detect various types of cyberattacks, but they did not analyze the attacking details. Although the above methods achieved high performance on benchmark datasets for anomaly detection, they are designed only to detect the occurrence of cyberattacks instead of fingerprinting the detailed information of cyberattacks and victims. It is crucial to obtain the attacking details, as they can help restore the compromised devices, hold the attackers accountable, and prevent future attacks. Forensic analytics

108

L. Yang et al.

techniques can be used to fingerprint the details of network anomalies and attacks. 3.2.

Forensic data analytics

Digital forensic analytics techniques have been applied to several network anomaly detection and cyber-security-related problems. Amato et al. [25] propose a semantic-based method for digital forensic analysis in cyber-security problems. This method enables the generation and retrieval of useful data for digital evidence collection. Koroniotis et al. [26] propose a Deep Neural Network (DNN) and Particle Swarm Optimization (PSO)-based forensic architecture to detect botnet attacks. The proposed framework is able to trace the behaviors of cyberattack events with better performance than other compared methods. Khan et al. [27] present a multi-agent method to conduct digital forensic analysis in storage networks. The access logs collected in a server can be aggregated by the agents for further verification to detect malicious events and cyberattacks. Although there are many existing forensic techniques, the majority of them are designed for labeled datasets, which are usually difficult to acquire in real-world network applications. Thus, it is still critical to upgrade existing methods and develop new methodologies to deal with digital forensic analytics problems more effectively [3]. 3.3.

Service targeting attack detection

As described in Section 2.1, cyberattackers usually aim to take down or disrupt services in evolving networks, named service targeting attacks. As this chapter focuses on detecting service targeting attacks (e.g., CPAs and DoS attacks) due to their destructiveness, a review of existing research works for service targeting attack detection is conducted. Vasseur et al. [28] propose a network anomaly detector that utilizes unsupervised learning algorithms to generate a set of rules from captured network data and train a supervised learning classifier to identify anomalies. However, a convincing validation process is required to verify the generated rules. Conti et al. [19]

Forensic Data Analytics for Anomaly Detection in Evolving Networks

109

performed studies to demonstrate that CPAs constitute a serious danger to network security and suggested a lightweight detection approach for reliably detecting CPAs. Baradaran et al. [29] propose an anomalous network traffic detection method that compares the extracted feature values with the predetermined threshold values to determine anomalies and identify DoS attacks. However, determining an appropriate threshold is often difficult. Yao et al. [30] propose a detection scheme by using the gray forecast to predict the popularity of each cached content, and use the estimated popularity information to identify cache pollution attacks. Kumar et al. [31,32] propose a security framework named Software-Defined Perimeter (SDP) to protect modern networks from DoS attacks. Karami et al. [33] propose an anomaly detection model that uses k-means and PSO algorithms to detect DoS attacks in content-centric networks. However, this method is not evaluated on a representative network dataset. The SDP framework can prevent and defend against upcoming DoS attacks but cannot investigate the past attacks that have already breached the networks. 3.4.

Cybercrime-related entity detection

Network entities are the physical pieces and components comprising the network, such as client IPs, service nodes, contents, offerings, etc. Detecting network entities affected by cyberattacks is critical in the anomaly detection process because they can help analyze the attacking details. Detected malicious client IPs are the potential attacker IPs which can help us locate the attackers. Detecting compromised nodes enables us to locate the target devices and repair network failures. Doctor et al. [34] propose a system for identifying and mitigating malicious threats on a network by collecting and analyzing network traffic data associated with IP addresses. Malicious IP addresses can be detected, but this system does not consider service provider perspectives. Ayyagari et al. [35] proposed a context-aware network threat detection method that monitors the nodes associated with users to generate a behavior profile for each user. Anomalies can then be detected by comparing each user’s behavior profile with the

L. Yang et al.

110

baseline behavior profile. Qing et al. [36] proposed a network anomaly detection method based on gradient boosting decision tree (GBDT) that extracts node status and routing information features, such as cache hit rate and caching life rate to detect cache pollution attacks, but did not consider other network entities, like client IPs. Pandey et al. [37] propose an intrusion detection method that uses the agents to identify compromised nodes in wireless sensor networks (WSNs) based on their behavior, which shows good efficiency in small networks. However, root cause analysis was not conducted in the above techniques, which is significant for future attack prevention. 3.5.

Research gaps

In summary, in the context of security detective controls to be provided for evolving networks, we identify the following issues with existing methods: (1) The complexity of evolving networks makes it difficult to merely identify and combine individual sources of events as reliable indicators of service targeting attacks. For example, distributed DoS (DDoS) attacks are launched by a large number of malicious clients instead of an individual client, so the behaviors of all these malicious clients should be analyzed together to effectively identify DDoS attacks; (2) Existing security technologies fail to model the evolving networks in a way to properly capture data-driven semantics out of “multiperspective”(s), a concept we consider essential; (3) In evolving networks, the following requirements are not properly fulfilled by existing solutions: (a) Perspectives are defined in a way to allow their interaction; (b) It is possible for a security baseline to be defined from multiple perspectives. Against such a baseline, deviations are detected as network attacks; and (c) The indicator of compromises should be effectively validated for a better security posture in evolving networks;

Forensic Data Analytics for Anomaly Detection in Evolving Networks

111

Thus, solutions for more efficient and cost-effective detection of service targeting attacks in such complex ecosystems are left as open problems. The objectives of the proposed solution are to develop the following techniques: (1) Holistic multi-perspective cache and service-based digital forensic analytics techniques for anomaly detection in delivery evolving networks. (2) Characterization of network anomalies as attacks to improve security posture management in evolving networks. (3) A mechanism to detect service targeting attacks based on selective features engineering involving multiple perspectives: content (i.e., media, configurations, files, gaming), IP space (i.e., clients, IoTs), service nodes, and offerings (i.e., network slices, accounts). (4) An unsupervised anomaly detection mechanism to fingerprint abnormal contents and malicious IPs associated with attack detection, so as to locate attackers. (5) Hyper-parameter optimization methods and mechanisms to build optimized machine learning models for service-based attacks detection. (6) A composite methodology to infer anomaly detection models for implicit identification of attacks targeting services (i.e., cache pollution, denial of service) (7) Implicit root cause analysis to identify and validate targeted delivery nodes associated with attack detection.

4.

Multi-perspective as Intelligence for Anomaly Detection

This section provides an overview of the proposed solution, which uses multi-perspective as intelligence for anomaly detection in evolving networks. The potential deployment of the proposed system in evolving networks is also discussed in this section.

L. Yang et al.

112

4.1.

Security posture support in evolving networks

In this chapter, we rely on an approach that aims to digest application-layer logs in evolving networks to identify anomalies for the purpose of hardening their security posture. A security posture refers to the overall security status of the software and hardware assets, networks, services, and information [38]. A perspective in evolving networks means monitoring and analyzing network states based on the behaviors of a single type of network entity, such as the client and service provider. Analyzing the interactions among multiple perspectives enables people to have a broader view of network behaviors than a single perspective, which is beneficial for accurate cyberattack detection. As depicted in Figure 1, knowing the following interactions: (1) The service nodes (i.e., proxies, gateways, caches, brokers) located in evolving networks are accessed by clients or IoTs from the IP space. (2) The content (media, files, and configurations) is requested from the clients/IoTs and provisioned or accessed through service nodes.

Fig. 1.

Security posture support for evolving networks.

Forensic Data Analytics for Anomaly Detection in Evolving Networks

113

(3) The offerings are indexing the content and configured based on service nodes and caches. (4) The clients and IoTs are registered in offerings for diverging services. The solution aims to characterize these perspectives through attributes to infer an intelligence to support the security incident event management (SIEM) and security orchestration, automation, and response (SOAR). By intelligence, we mean the ability to infer a security baseline to identify anomalies like attacking IPs, abnormal content as well as victims (targeted service nodes). The intelligence is used by SOAR to trigger defense and mitigation techniques like dynamic firewalling, throttling, and blacklisting.

4.2.

Digital forensic analytics framework for anomaly detection

The proposed framework intends to characterize abnormal network events and attacks and distinguish them from benign events by analyzing network features extracted from multiple perspectives, including client IP, content, node, and offering perspectives. The framework for the proposed anomaly detection model is shown in Figure 2, which consists of five stages: data collection, data pre-processing and feature engineering, unsupervised anomaly detection, result correction, and final results presentation. These five stages are designed based on the five standard procedures of digital forensic analytics described in Section 2.2: collection, examination, identification, analysis, and presentation. In the collection process, access log traces are collected. They consist of web requests received and recorded by multiple service providers in a large evolving network. After collecting the data logs, they are pre-processed in the examination procedure to generate a sanitized version of the data. A multi-perspective feature engineering method is also proposed and utilized in the examination process to

L. Yang et al.

114

Fig. 2.

The proposed forensic analytics framework for anomaly detection.

extract features and generate new datasets from different perspectives that can accurately depict the behaviors of cyberattacks from each perspective. In the identification process, the datasets extracted from different perspectives are processed by the proposed unsupervised anomaly detection system, consisting of the Gaussian mixture model (GMM), Isolation forest (iForest), and Bayesian optimization (BO) models to effectively discriminate between abnormal and benign patterns. In the analysis process, the preliminary anomaly detection results will be validated by the proposed comprehensive result correction frameworks to determine the real cyberattacks and

Forensic Data Analytics for Anomaly Detection in Evolving Networks

Fig. 3.

115

The deployment of the proposed anomaly detection system.

identify the related network entities, such as nodes, contents, and IPs. Lastly, the anomaly detection results, including the detailed information of cyberattack types, victims, attackers, and utilized services, will be returned and displayed in the presentation procedure. 4.3.

System deployment

For applying the proposed anomaly detection system to evolving networks, it can be placed in both edge and cloud servers in evolving networks, as shown in Figure 3. The proposed system can continuously monitor network traffic in edge servers for abnormal network entity detection and provide warnings to the central server as soon as an attack is launched; thus, the central server can notify other edge servers and take appropriate countermeasures. When installed on the central server, the proposed system can provide a complete picture of the whole network’s functioning and can safeguard the central server when attackers use particular edge servers to compromise the central server. Specifically, all network traffic that is not blocked by the first layer of security

116

L. Yang et al.

(i.e., firewalls and authentication) would be recorded and analyzed by the suggested anomaly detection system [20] on each edge or cloud server. Additionally, the vast volume of traffic may be saved in a database for a thorough examination by the suggested anomaly detection system. If an attack is detected on one of the edge or cloud servers, an alert is raised on all of the edge and cloud servers. As a result, network administrators at the central server and edge servers may take appropriate actions to thwart existing and future assaults.

5.

Data Pre-processing and Feature Engineering

This section presents the two critical procedures of the proposed forensic analytics anomaly detection framework: data pre-processing and feature engineering. Data pre-processing is the process of improving data quality for more accurate analytics, while feature engineering is the process of extracting or selecting appropriate network features that can better reflect cyberattack patterns. 5.1.

Data collection and description

Data collection refers to the initial step of forensic analytics. In our solution, we focus on anomaly detection in the application layer (e.g., HTTP, CoAP, MQTT, LwM2M, industrial protocols) access/event logs. These logs record the information of contextual requests used to run a business use case. For the sake of illustration, clients request content from web servers. The content includes HTML text files, embedded images, videos, audios, and other associated files provided by a web service. Access/Event logs help to identify states and behaviors triggered by contextual applications in a network on a timely basis. A typical sample of general access log data is as follows: 64.0.0.1 - - [11/Dec/2016:05:33:28 -4000] \GET /support.html HTTP/1.1" 200 15340 \Mozilla/5.0(compatible;Googlebot/2.1;+http:// www.google.com/bot.html)"

Forensic Data Analytics for Anomaly Detection in Evolving Networks

117

Numerous network fields/features can be collected for the purpose of anomaly detection. Specific features should be chosen for the detection of certain attacks. The major raw features that may be helpful for the detection of service targeting attacks (i.e., DoS attacks and CPAs) are depicted in Table 1. Table 1.

Application protocols’ raw features.

Feature IP Timestamp Protocol method

Status code

Bytes Delivery time

Device/IoT agent type Service type Service/service cache hit indicator Node Offering

Service content-canonical path

Content type

Description Client IP addresses Start timestamp of each request, in format “[dd/mmm/yyyy:hh:mm:ss -zzzz]” HTTP request methods, (i.e., GET, POST, etc.) CoAP request methods, (i.e., GET, DELETE, PATCH, etc.) HTTP status code: 2xx represents a successful response; 3xx represents a redirection; 4xx represents a client error; 5xx represents a server error CoAP status code: 2xx represents a successful response; 4xx represents a client error; 5xx represents a server error; 7xx represents signaling codes Bytes sent across the network in response to a request Duration of a request from start to finish, including delivery of all content, in milliseconds The type of device/IoT used to make a request, e.g., iPhone, iPad, desktop, etc. The type of service, e.g., live streaming, static, progressive download, sensor, etc. Service or cache hit/miss indicator, e.g., hit or miss Node name or ID, representing a service provider or device The network account offering name (e.g., streaming, static web content, dedicated network slice for IoTs’ applications), indicating a service accessed from the IP space The content part of a canonical path, indicating unique content (URLs for HTTP, CoAP, MQTT URL IoT Agent, etc.) The type of content in each request, e.g., text, video, image, audio, configuration files, patches, etc.

L. Yang et al.

118

5.2.

Data normalization

Data normalization is the process of converting data features to be on a similar scale. It is required when features have varying scales, as larger-scale features are often regarded as more important features than smaller-range features in the training process of ML models, resulting in misleading predictions. Min-max normalization is an effective normalization method that can reduce the impact of varying feature scales. In min-max normalization, the normalized value of each feature value, x ˜, is denoted by [12] x ˜=

x − min f , max f − min f

(1)

where x is the original data value, max f and min f are the maximum and minimum values of the feature f . Among the normalization methods, min-max normalization is suitable for anomaly detection problems, as it can retain the outlier values (i.e., extremely large or small values) in datasets to help detect anomalies [20]. Additionally, it can convert all features to be on the same scale of 0 and 1. Thus, it is selected for the proposed framework. 5.3.

Feature engineering

In the raw data, although there are 13 network features for each request, it is difficult to use this data to identify network anomalies, as cyberattacks and affected entities usually cannot be directly reflected in single requests. Thus, to perform effective anomaly detection, new datasets that can reflect abnormal network behaviors should be generated through feature engineering. This process corresponds to the second phase of standard digital forensic analytics: examination. Network feature engineering is the process of extracting or selecting appropriate features from the log traces collected from evolving networks’ deployment. To accomplish this, we propose a multi-perspective feature engineering framework to obtain dedicated

Forensic Data Analytics for Anomaly Detection in Evolving Networks Table 2.

Description of extracted features from the ‘content’ perspective.

Feature

Description

Number of requests Popularity

Cache hit rate Content Request per IP ratio

Request per node ratio

IP dynamicity

Table 3.

119

The total number of requests per content The popularity of each content, represented by the normalized number of IPs that sent requests to each content The number of cache hits divided by the total number of requests per content The ratio of the total number of requests sent per content to the total number of IPs which sent requests per content The ratio of the total number of requests sent per content to the total number of nodes which received requests per content The percentage of the IP changes per content during a period

Description of extracted features from the ‘node’ perspective.

Cache hit rate Cache hit rate of legitimate IPs Data transfer rate (MB/s) Request error rate Average request popularity Content dynamicity IP dynamicity Offering request rate

The number of cache hits divided by the total number of requests received per node Average cache hit rate of IPs which only requested for popular contents per node The total bytes returned divided by the total delivery time per node The percentage of requests with errors (4xx or 5xx status code) received by each node Average content popularity of requests received per node The percentage of the changed contents cached in the cache space during a period The percentage of the IP changes per node during a period The percentage of requests sent through each offering per node, e.g., “account1: 80%, account2: 20%”

datasets. It aims to extract network features from four different perspectives: content, service provider (node), IP space, and offerings. These perspectives are meant to index data to characterize a multiple views’ baseline for security. The description of the extracted features for each perspective is shown in Tables 2, 3, 4, and 5, respectively.

L. Yang et al.

120 Table 4.

Description of extracted features from the ‘client IP’ perspective.

Number of requests

The number of requests sent per IP

Average request interval

The average time interval between consecutive requests sent per IP

Number of nodes

The total number of unique nodes that received requests per IP

Number of contents

The total number of unique contents requested per IP

Request per content ratio

The ratio of the total number of requests sent per IP to the total number of contents requested per IP

Request per node ratio

The ratio of the total number of requests sent per IP to the total number of nodes which received requests per IP

Average request popularity

Average content popularity of requests sent per IP

Cache hit rate

The number of cache hits divided by the total number of requests sent per IP

Request error rate

The percentage of requests with errors (4xx or 5xx status code) sent per IP

Mobile rate

The percentage of requests sent through mobile devices per IP

Offering request rate

The percentage of requests sent through each offering for each IP, e.g., “account1: 80%, account2: 20%”

Table 5.

Description of extracted features from the ‘offering’ perspective.

Number of requests

The number of requests per offering

Number of nodes

The total number of unique nodes that received requests per offering

Service type

The type of service provided per offering, e.g., static, live streaming, progressive download, etc.

Content type

The type of content provided per offering, e.g., text, image, audio, video, etc.

Request popularity

Average content popularity of requests sent per offering

Cache hit rate

The number of cache hits divided by the total number of requests per offering

The obtained datasets or aggregates with the features gathered from four perspectives are first processed separately and then analyzed in conjunction to identify abnormal events or malicious cyberattacks. In the proposed solution, the intent is to fingerprint malicious IPs and abnormal content that are exploited by attackers, as well as identify compromised service nodes. Intelligence inferred

Forensic Data Analytics for Anomaly Detection in Evolving Networks Table 6.

Attack Type

Potential patterns of service alteration/cache pollution attacks.

Perspective

Alteration of Service (e.g., CPAs)

121

Feature

Abnormal patterns (intensity)

Cache hit rate Cache hit rate of legitimate IPs Data transfer rate (MB/s) Average request popularity

Low Low Low Low

Client IP

Number Average Number Number Average

Large Short Small Large/small Low

Content

Popularity Request per IP ratio Request per node ratio

Low High/low High

Offering

Request popularity

Low

Node

of requests request interval of nodes of contents request popularity

from the offering perspective represents supporting information to validate the abnormality of features’ vectors. After feature engineering, the original access log datasets with 452 million requests have been transformed into four datasets from four different perspectives, including the content-based dataset (1.8 million unique contents), node-based dataset (50 unique nodes), client IP-based dataset (1.2 million unique IPs), and offering-based dataset (70 unique offerings). 5.4.

Attack patterns

To launch targeting services attacks, attackers typically aim to either disrupt services or alter functionalities. The attacks aiming to disrupt services are known as DoS attacks, while cache pollution is a common type of alteration of service attacks that aims to pollute the nodes’ cache space with unpopular or illegitimate contents in content-based networks. Based on the potential characteristics of alteration of service and DoS attacks, the extracted features that can directly reflect the attack patterns are selected for each type of attack, as shown in Tables 6 and 7.

L. Yang et al.

122 Table 7.

Attack Type

Perspective

Denial of service

Node

Potential patterns of DoS attacks.

Feature

Abnormal patterns (intensity)

Cache hit rate Cache hit rate of legitimate IPs Data transfer rate (MB/s) Request error rate

Low Low Low High

Client IP

Number of requests Average request interval Number of nodes Cache hit rate Request error rate

Large Short Small Low High

Offering

Cache hit rate

Low

DoS attacks are launched mainly to exhaust the network resources of serving nodes by sending a sudden burst of requests. Thus, the cache hit rate of the compromised nodes and of their serviced legitimate IPs will be reduced. The data transfer rate will also be decreased due to reduced resources. Many illegitimate requests may also be sent, resulting in an increased request error rate. DoS attacks may be launched by certain IPs that send a sudden burst of requests to certain target nodes. Similarly, the functionality of compromised nodes will be altered after an alteration of service attacks. The service provided by the affected nodes will also be degraded, causing a reduced cache hit rate and data transfer rate. Moreover, taking CPAs as an example of alteration of service attacks, the compromised nodes’ cache space will be occupied by unpopular contents, since the malicious IPs of potential attackers will send many requests for low-popularity contents to certain target nodes. 6.

Unsupervised Anomaly Detection

The proposed unsupervised anomaly detection model is developed based on multiple perspectives, as shown in Figure 4. Unsupervised ML algorithms are first constructed to preliminarily identify

Forensic Data Analytics for Anomaly Detection in Evolving Networks

Fig. 4.

123

The process of multi-perspective identification and validation of attacks.

numerical anomalies from the identification perspectives, including the service nodes, IP space, and content perspectives. This process corresponds to the third phase of standard digital forensic analytics: identification. At the next stage, three validation perspectives are considered to validate and correct the anomaly detection results, including crossperspective, time-series, and offering analysis. This procedure will be discussed in Section 7. 6.1.

Malicious IPs and content fingerprinting

In evolving networks, a node often provides various services or contents for many clients. To separate abnormal client IPs and contents from numerous legitimate IPs and contents, clustering algorithms are promising solutions. Clustering algorithms are a set of unsupervised machine learning models that are used to group unlabeled data samples into multiple clusters [20]. A general process of abnormal IP and content detection consists of the following two steps: (1) Use a clustering algorithm to group the client IPs/contents into a sufficient number of clusters. (2) Analyze the behaviors of each cluster, and label the IPs/contents in this cluster as “normal” or “abnormal” based on the summarized patterns/characteristics of different types of attacks.

L. Yang et al.

124

GMM is selected as the clustering algorithm for abnormal content and client IP detection in the proposed solution. GMM uses multiple Gaussian distributions as components to model data points. In GMM, each Gaussian component can be described by a multivariate Gaussian distribution [39]: G(x | μ, Σ) =

1 D 2

1

(2π) |Σ|

1 2

T Σ−1 (x−µ)

e− 2 (x−µ)

,

(2)

where x is the data sample, D is the number of features, μ and Σ are the mean and covariance of the Gaussian distribution. A GMM model with K Gaussian distribution components uses a probability density function to estimate the data, as follows: p(x | θ) =

K 

πi G (x | μi , Σi ) ,

(3)

i=1

where πi is the weight of each Gaussian distribution. The time complexity of GMM is O(N KD 2 ) and O(N KD) for N data samples, K Gaussian distributions, and D dimensionalities [40]. The main reasons for choosing GMM for abnormal content and IP detection are as follows: (1) GMM uses multiple Gaussian distributions to construct a model, which is able to model datasets with complex data shapes or distributions. Thus, it has better adaptability to model realworld network data than many other clustering algorithms, like k-means, which is only effective for globular shape data; (2) Unlike many clustering algorithms (e.g., k-means and hierarchical clustering) that can only return a predicted label for each data sample, GMM can also generate a confidence value that helps us find uncertain data samples to reduce errors; and (3) Although GMM has a training time complexity of O(N KD 2 ) that may be high for high-dimensional datasets, it has a linear run-time complexity of O(N KD) for high run-time efficiency. Additionally, the dimensionality of the network datasets can be low due to the proposed effective feature engineering.

Forensic Data Analytics for Anomaly Detection in Evolving Networks

125

To construct a more effective model that fits the datasets better, the number of clusters/Gaussian components, as the major hyperparameter of GMM, is optimized by Bayesian optimization (BO) [18]. BO is a hyper-parameter optimization method that uses the previous evaluation results to efficiently determine the future hyper-parameter configuration to evaluate. As the two major components of BO, a surrogate model is constructed to fit all the tested hyper-parameter values into the objective function, and an acquisition function is used to locate the future hyper-parameter values by exploring both the currently promising regions and the new regions in the search space. BO is used to construct optimized GMMs for the proposed solution since it often exhibits excellent performance in optimizing a small number of continuous or discrete hyper-parameters to which the number of clusters belong [18]. By training two optimized GMMs for client IP and content perspectives, they are used to identify potential malicious client IPs and abnormal contents. For the content-based dataset extracted after feature engineering, 169 contents are preliminarily detected as potential abnormal contents utilized by attackers to launch CPAs, because these contents have very low popularity and got a large number of requests on several target nodes. For the client IP-based dataset, 310 IPs are preliminarily detected as potential DoS IPs because they have sent a sudden burst of requests to certain target nodes, while 54 IPs are identified as potential CPA IPs because they have sent a large number of requests to unpopular contents and targeted on certain nodes. They are the potential IP addresses of cyberattackers. 6.2.

Compromised service nodes identification

Compared with the number of client IPs and contents in a general evolving network, the number of nodes is often significantly less since each node can provide a large number of IPs with numerous contents. Utilizing a clustering algorithm to detect compromised nodes is often ineffective because the number of nodes may be too small to form clusters with a sufficient number of similar data samples.

126

L. Yang et al.

Thus, outlier detection algorithms that do not require a large number of data samples are better choices for abnormal node detection. Outlier detection algorithms aim to fingerprint normal patterns and distinguish outliers from the analyzed normal patterns. A general abnormal node detection process has two main steps: (1) Use an outlier detection algorithm to separate numerically abnormal data samples from normal samples. (2) Analyze the behaviors of each numerically abnormal node, and label the nodes that match the summarized patterns/ characteristics of different types of attacks. Isolation forest (iForest) [41], an outlier detection algorithm that uses true structures to identify isolated data points, is selected for abnormal node selection. IForest is constructed with multiple binary search trees as isolation trees (iTrees) by splitting the instances according to feature values. The number of splittings required to isolate an abnormal sample (i.e., the tree depth) is often less than normal samples since outliers are in relatively sparse regions while normal instances are in relatively dense regions [42]. The main reasons for selecting the iForest model for abnormal node detection are as follows [41, 42]: (1) Unlike most clustering algorithms, iForest does not require a large data size to build an effective model since iForest uses the low tree depth of data samples to indicate the outliers. Thus, iForest is suitable for node-based datasets with a small number of unique samples. (2) Based on the assumption that in network data, most data samples are normal, while only a small percentage of them belong to anomalies, the iForest model has the adaptability to fit most real-world network data since it can effectively detect outliers in sparse areas of data. (3) IForest has a low time complexity of O(N ) [43], (4) IForest has good interpretability because it uses tree structures to model data.

Forensic Data Analytics for Anomaly Detection in Evolving Networks

127

Similar to GMM, iForest has an important hyper-parameter, the contamination level that indicates the percentage of abnormal data samples in the original data. Since it is a continuous hyper-parameter, it is also optimized by BO, which shows great effectiveness in optimizing both discrete and continuous variables. Thus, an optimized iForest model is trained for effective abnormal node detection. By applying the optimized iForest model to the node-based dataset, 11 potential compromised nodes have been preliminarily detected. These potentially affected nodes have a very low cache hit rate, especially for those legitimate IPs which have sent requests to these nodes. Their data transfer rate and request success rate are also much lower than other normal nodes.

7.

Anomaly Detection Result Correction

Although machine learning algorithms, including GMM and iForest, are able to identify numerically abnormal network entities (e.g., content, nodes, and IPs), many prediction errors might occur because many numerical outliers are not true attacks. Certain benign network events, like crowd events and misconfigurations, have similar behaviors as service targeting attacks and can be misclassified as attacks. Hence, a comprehensive validation process should be conducted to reduce the detection errors and distinguish the real service targeting attacks. In the proposed solution, the detected anomalies are corrected from three validation perspectives: cross-perspective, timeseries, and offering analysis. Through this multi-perspective framework, the real anomalies affected by service targeting attacks can be effectively identified. In the standard digital forensic analytics process, this validation step corresponds to the fourth phase: analysis. 7.1.

Cross-perspective analysis

After obtaining abnormal detection results from three single perspectives (i.e., content, node, and client IP perspectives), crossperspective analysis, as shown in Figure 5, is conducted to validate

L. Yang et al.

128

Fig. 5.

The flow chart of cross-perspective analysis.

whether the anomalies from different perspectives have correlations. To conduct the cross-perspective analysis, the numerical anomalies detected from each perspective are used to validate the detection results from other perspectives. For instance, the detected affected nodes can be used to validate the potential attacker IPs that have attacked these nodes and the abnormal contents which have been used to pollute these nodes. Additionally, the abnormal contents and IPs detected from their perspectives can also be used to validate the compromised nodes that are targeted by the cyberattackers

Forensic Data Analytics for Anomaly Detection in Evolving Networks

129

who exploited these abnormal IPs and contents to launch attacks. After cross-perspective analysis, the missed true abnormal entities that are affected by service targeting attacks and the false alarms that do not harm any nodes can be identified for more accurate anomaly detection. 7.2.

Time-series analysis

As the process shown in Figure 6, time series analysis is used by examining the changes in particular features over time (e.g., hourly and daily changes) to identify abnormal events and cyberattacks for result validation. For example, changes in the number of requests over time can be used to identify potential DoS attacks and crowd events when a sudden burst of requests have been sent during certain periods; changes in the cache hit rate over time can be used to identify potential DoS attacks when a sudden burst of error requests occurred to exhaust resources; the changes in the request popularity over time can be used to validate potential CPAs when malicious IPs have sent a large number of requests for low-popularity contents, or compromised nodes have got a large number of requests for low-popularity contents. Time series analysis can help us locate the time periods of potential service targeting attacks and benign events like crowd events by analyzing the network entities affected in these periods. Take the access log data used in the proposed solution as an example, the changes in the hourly number of requests and hourly cache hit rate in the original datasets are shown in Figures 7 and 8. From Figure 7, it is noticeable that on Days 1, 3, and 4, there are three sudden bursts of requests. They can be potential crowd events or DoS attacks. Similarly, it can be seen in Figure 8 that the cache hit rate significantly dropped on Day 4, which could also be a potential DoS attack or CPA. Further analysis can be conducted to determine the detailed information of the detected abnormal events. 7.3.

Offering analysis

Offerings refer to the streaming sources that deliver certain services, like static and live streaming services. Their configurations determine

L. Yang et al.

130

Fig. 6.

The flow chart of time-series analysis.

Forensic Data Analytics for Anomaly Detection in Evolving Networks

Fig. 7.

Fig. 8.

131

The changes in the number of requests in the original dataset used in this work.

The changes in the cache hit rate in the original dataset used in this work.

the behaviors of their served IPs. Offering analysis is the process of validating anomaly detection results according to the configurations and behaviors of offerings to separate true cyberattacks from benign outliers. The general process of offering analysis is shown in Figure 9. Normally, an offering’s potential behaviors can be estimated by its

L. Yang et al.

132

Fig. 9.

The flow chart of offering analysis.

Forensic Data Analytics for Anomaly Detection in Evolving Networks

133

configurations. If service targeting attacks occur, the affected offerings may perform abnormal behaviors, which may assist us in validating the network anomalies serviced by these offerings. For instance, if an offering that is configured to be a static content stream has got a large number of requests for non-existent live streaming content, it might be used by DoS attackers to overwhelm certain target nodes. On the other hand, if an offering is configured for the benign tests of old progressive download videos by network administrators, it may receive numerous requests for unpopular content, which is similar to the behaviors of cache pollution attacks; thus, its affected network entities may be misidentified as false alarms by machine learning models. Offering analysis requires offering configuration information and expert knowledge by comparing offerings’ configurations with their actual behaviors. Through offering analysis, the false alarms obtained from benign events can be reduced for more accurate anomaly detection. 7.4.

Results summary

Finally, the real cyberattacks and their affected network entities have been identified through the proposed forensic analytics process. Specifically, the proposed solution has detected 14 compromised nodes, including 12 nodes attacked by DoS attacks and 2 nodes attacked by CPAs; 155 client IPs have been identified as malicious IPs of potential cyberattackers, including 122 DoS IPs and 33 CPA IPs; 56 contents have been exploited by attackers to launch CPA attacks. Moreover, among the detected anomalies, 384 false positives and 65 false negatives have been identified and removed using the proposed comprehensive result correction method. All the anomaly detection results have been validated by multiple cybersecurity experts and industrial partner security network engineers. The detected real anomalies, including the malicious IP addresses, compromised nodes, and abnormal contents, as well as their corresponding attack types, are then presented to trigger corresponding countermeasures, like blacklisting the attacker IPs and abnormal contents, and isolating or recovering the compromised nodes. This

L. Yang et al.

134

phase corresponds to the last step of the standard digital forensic analytics process: presentation. 8.

Summary

Most modern networks are evolving networks that are dynamically changed and updated to provide continuous services and functionalities. Due to the increasing number of cyber-threats and crimes, it is crucial to enhance the security of modern networks through digital forensic analytics techniques. Cybersecurity and anomaly detection are important applications of digital forensic analytics, which aim to identify data patterns and user behaviors to recognize network anomalies and cyberattacks. The detailed and comprehensive view of network anomalies enables us to locate attackers, victim devices, and exploited services, so as to convict and stop cybercrimes. In this work, we propose a multi-perspective anomaly detection framework by integrating the five major phases of standard digital forensic analytics: collection, examination, identification, analysis, and presentation. Through the proposed multi-perspective feature engineering, unsupervised anomaly detection, and comprehensive result correction approaches, real-world access log data collected in evolving networks can be effectively processed to fingerprint the service targeting attacks and the affected nodes, malicious attacker IPs, and abnormal contents. The information of these detected anomalies can be used as important digital evidence to solve cybercrimes. Acknowledgment This chapter is partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) [NSERC Strategic Partnership Grant STPGP – 521537] and Ericsson Canada. References 1. L. Kodali, S. Sengupta, L. House, and W. Woodall. The value of summary statistics for anomaly detection in temporally evolving networks:

Forensic Data Analytics for Anomaly Detection in Evolving Networks

2. 3. 4. 5.

6.

7.

8.

9.

10.

11.

12.

13.

14. 15.

16.

17.

135

A performance evaluation study. Applied Stochastic Models in Business and Industry, 36, 980–1013 (2020). Ericsson. Ericsson mobility report June 2020, Ericsson. p. 36 (2020). J. Hou, Y. Li, J. Yu, and W. Shi. A survey on digital forensics in internet of things. IEEE Internet of Things Journal, 7, 1–15 (2020). M. Thottan and C. Ji. Anomaly detection in IP networks. IEEE Transactions on Signal Processing, 51, 2191–2204 (2003). L. Deng, Y. Gao, Y. Chen, and A. Kuzmanovic. Pollution attacks and defenses for Internet caching systems. Computer Networks, 52, 935–956 (2008). N. Koroniotis, N. Moustafa, E. Sitnikova, and B. Turnbull. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Generation Computer Systems, 100, 779–796 (2019). D.-Y. Kao, Y.-T. Chao, F. Tsai, and C.-Y. Huang. Digital evidence analytics applied in cybercrime investigations. In 2018 IEEE Conference on Application, Information and Network Security (AINS), pp. 111–116 (2018). K. Kent, S. Chevalier, T. Grance, and H. Dang. Guide to integrating forensic techniques into incident response. The National Institute of Standards and Technology (2006). D. Quick and K.-K. Choo. Impacts of increasing volume of digital forensic data: A survey and future research challenges. Digital Investigation, 11, 273–294 (2014). A. Moubayed, E. Aqeeli, and A. Shami. Detecting DNS typo-squatting using ensemble-based feature selection. IEEE Canadian Journal of Electrical and Computer Engineering, pp. 1–11 (2021). M. Injadat, A. Moubayed, A. Nassif, and A. Shami. Multi-stage optimized machine learning framework for network intrusion detection. IEEE Transactions on Network and Service Management, 4537, 1–14 (2020). L. Yang, A. Moubayed, I. Hamieh, and A. Shami. Tree-based intelligent intrusion detection system in internet of vehicles. In 2019 IEEE Global Communications Conference (GLOBECOM), pp. 1–6 (2019). L. Yang and A. Shami. A transfer learning and optimized CNN based intrusion detection system for internet of vehicles. In 2022 IEEE International Conference on Communications (ICC), pp. 1–6 (2022). B. Wu. A general framework for evolving network analysis. Master’s thesis, Texas A&M University, US (2018). A. Moubayed, A. Shami, P. Heidari, A. Larabi, Brunner, and R. Method for service placement in multi-access/mobile edge computing (MEC) system, US Patent 11,102,630 B2 (2021). E. Abdallah, H. Hassanein, and M. Zulkernine. A survey of security attacks in information-centric networking. IEEE Communications Surveys and Tutorials, 17, 1441–1454 (2015). X. He, K. Zhao, and X. Chu. AutoML: A survey of the state-of-the-art. Knowledge-Based Systems, 212, 106622 (2021).

136

L. Yang et al.

18. L. Yang and A. Shami. On Hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316 (2020). 19. M. Conti, P. Gasti, and M. Teoli. A lightweight mechanism for detection of cache pollution attacks in Named Data Networking. Computer Networks, 57, 3178–3191 (2013). 20. L. Yang, A. Moubayed, and A. Shami. MTH-IDS: A multitiered hybrid intrusion detection system for internet of vehicles. IEEE Internet of Things Journal, 9, 616–632 (2022). 21. A. Moubayed, M. Injadat, and A. Shami. Optimized random forest model for botnet detection based on DNS queries. In 2020 32nd International Conference on Microelectronics (ICM), p. 1–4 (2020). 22. L. Yang, D. Manias, and A. Shami. PWPAE: An ensemble framework for concept drift adaptation in IoT data streams. In 2021 IEEE Global Communications Conference (GLOBECOM), p. 1–6 (2021). 23. L. Yang and A. Shami. A lightweight concept drift detection and adaptation framework for IoT data streams. IEEE Internet of Things Magazine, 4, 96–101 (2021). 24. M. Injadat, A. Moubayed, and A. Shami. Detecting botnet attacks in IoT environments: An optimized machine learning approach. In 2020 32nd International Conference on Microelectronics (ICM), pp. 1–4 (2020). 25. F. Amato, A. Castiglione, G. Cozzolino, and F. Narducci. A semantic-based methodology for digital forensics analysis. Journal of Parallel and Distributed Computing, 138, 172–177 (2020). 26. N. Koroniotis, N. Moustafa, and E. Sitnikova. A new network forensic framework based on deep learning for Internet of Things networks: A particle deep framework. Future Generation Computer Systems, 110, 91–106 (2020). 27. M. Khan. Multi-agent based forensic analysis framework for infrastructures involving storage networks. Proceedings of the National Academy of Sciences, India Section A: Physical Sciences, 89, 291–309 (2019). 28. J.-P. Vasseur, G. Mermoud, and V. Kolar. Training a classifier used to detect network anomalies with supervised learning. US Patent 2019/0138938 A1 (2019). 29. N. Baradaran, A. Reddy, Thakur, and R.S. Feature engineering for web-based anomaly detection. US Patent 10476893 B2 (2019). 30. L. Yao, Y. Zeng, X. Wang, A. Chen, and G. Wu. Detection and defense of cache pollution based on popularity prediction in named data networking. IEEE Transactions on Dependable and Secure Computing, 18(6), 2848–2860 (2021). 31. P. Kumar, A. Moubayed, A. Refaey, A. Shami, and J. Koilpillai. Performance analysis of SDP for secure internal enterprises. In 2019 IEEE Wireless Communications and Networking Conference (WCNC) (2019). 32. A. Moubayed, A. Refaey, and A. Shami. Software-defined perimeter (SDP): State of the art secure solution for modern networks. IEEE Network, 33, 226–233 (2019).

Forensic Data Analytics for Anomaly Detection in Evolving Networks

137

33. A. Karami and M. Guerrero-zapata. A fuzzy anomaly detection system based on hybrid PSO-Kmeans algorithm in content-centric networks. Neurocomputing, 149, 1253–1269 (2015). 34. B. Doctor, S. Bingham, K. Berg, I. Reynolds, and J. Mohr. Apparatus, system and method for identifying and mitigating malicious network threats. US Patent 2019/0104136 A1 (2019). 35. A. Ayyagari, T. Aldrich, D. Corman, G. Gutt, Whelan, and D.A. Context aware network security monitoring for threat detection. US Patent 9215244 B2A (2015). 36. D. Qing, W. Yang, W. Wang, S. Xuan, J. Lv, Mu, and Y. A kind of information centre’s network-caching contamination detection method based on GBDT. China Patent 110049039A (2019). 37. S. Pandey, P. Kumar, J. Singh, and M. Singh. Intrusion detection system using anomaly technique in wireless sensor network. In 2016 International Conference on Computing, Communication and Automation (ICCCA), pp. 611–615 (2016). 38. M. Pour, D. Watson, and E. Bou-Harb. Sanitizing the IoT cyber security posture: An operational CTI feed backed up by internet measurements. In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 497–506 (2021). 39. M. Bitaab and S. Hashemi. Hybrid intrusion detection: Combining decision tree and Gaussian mixture model. In 2017 14th International ISC (Iranian Society of Cryptology) Conference on Information Security and Cryptology (ISCISC), pp. 8–12 (2017). 40. R. Pinto and P. Engel. Scalable and incremental learning of gaussian mixture models. arXiv (2017). 41. L. Sun, S. Versteeg, S. Boztas, and A. Rao. A detecting anomalous user behavior using an extended isolation forest algorithm: An enterprise case Study. arXiv (2016). 42. G. A. Susto, A. Beghi, and S. McLoone. Anomaly detection through online isolation forest: An application to plasma etching. In 2017 28th Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC), pp. 89– 94 (2017). 43. F. Liu, K. Ting, and Z. Zhou. Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, vol. ICDM, pp. 413–422 (2008).

This page intentionally left blank

c 2023 World Scientific Publishing Company  https://doi.org/10.1142/9789811273209 0005

Chapter 5

Offloading Network Forensic Analytics to Programmable Data Plane Switches Kurt Friday∗,§ , Elias Bou-Harb∗,¶ , Jorge Crichigno†, , Mark Scanlon‡,∗∗ , and Nicole Beebe∗,†† ∗

The Cyber Center for Security and Analytics, University of Texas at San Antonio, USA † Integrated Information Technology, University of South Carolina, USA ‡ Computer Science and Informatics, University College Dublin, Ireland §

[email protected] [email protected]  [email protected] ∗∗ [email protected] †† [email protected]



The extent to which cybercrimes are taking place has reached a frequency that has never been observed before. Moreover, the increasing network traffic rates have made the storing and subsequent analysis of the resultant stockpile of traffic data in order to attribute such crimes increasingly challenging and time-consuming. As a result, inadequate artifact extraction latency and poor incident response often become issues for the forensics community. To address these dilemmas, we propose a novel means of transforming network forensics into a procedure that is conducted within nanoseconds of traffic traversing the network by harnessing the newfound programmable switch technology. In particular, we implement two switch-based use cases for conducting the relevant 139

K. Friday et al.

140

network forensics associated with the prevailing cybercrime themes of Distributed Denial of Service (DDoS) activities and the misuse of IoT devices. The first use case endeavors to detect multiple types of DDoS attacks and subsequently perform incident response. The empirical results confirm that it achieves both aims for UDP amplification at wire-speed and SYN flooding attacks within a fraction of a second. Moreover, it reduces the time to fully remediate slow DDoS from near ten seconds down to two seconds. The second use case instruments the switch with a rule-based Projective Adaptive Resonance Theory (PART) classifier to accurately fingerprint the origin IoT device of network traffic from a single TCP packet that traverses the switch’s pipeline. We also provide a methodology for automating the translation of such rule-based Machine Learning (ML) output to P4 programs, thereby enabling its deployment without the need for additional background expertise. The proposed fingerprinting engine is evaluated against a dataset consisting of both IoT and non-IoT devices and achieves 99% of accuracy.

1.

Introduction

A network forensic practitioner’s essential tasks of monitoring, inspecting, and attributing network traffic to cybercrimes has become increasingly challenging due to the rate at which these misdemeanors are currently taking place. Further, these challenges are often compounded by more sophisticated attacks (e.g., anti-forensic strategies) launched by adversaries. Several factors have contributed to this increase in cybercrime and attack sophistication, such as society’s growing dependence upon the Internet [1], the enhanced interconnectivity amid modern technology [2–4], an assortment of open source exploitation code and tools, widely available attack services (e.g., DDoS-for-hire), and even the COVID-19 pandemic [5]. Moreover, the immense rates at which information is transferred between contemporary machines has led to an exponential increase of data that must be analyzed. As a result, the amount of time and resources necessary to conduct effective investigations has also risen substantially [6]. Unfortunately, it is also commonplace for attackers to leverage the insecurity of the vast IoT domain as a means of conducting cybercrime. The excessive heterogeneity of these devices combined with their expedited deployment by vendors (i.e., leading to subpar security mechanisms and patching) has left adversaries with a plethora of vulnerabilities to exploit [7]. Consequently, such vulnerabilities have

Offloading Network Forensic Analytics to Programmable Data Plane Switches

141

been used to conduct a broad range of malicious endeavors, ranging from gaining entry to critical infrastructure (e.g., power grids [8]) to adversaries overtaking upwards of a million devices at one juncture to form botnets in order to launch various campaigns (e.g., spam, cryptomining, etc.) [9, 10]. Indeed, botnets have been the primary culprit behind the substantial uptick observed in the frequency and scale of DDoS attacks, including some of the largest recorded to date (e.g., the record-breaking attacks on OVH in 2016 [11, 12], GitHub in 2018 [13], and Amazon in 2020 [14], to name a few). Moreover, this trend in DDoS activity is expected to continue to climb [15]. Ultimately, this extensive attack surface has left the forensic community with the challenge of investigating and attributing such crimes with largely offline analysis procedures. To put matters into perspective, a 10 Gbps flow of traffic using only a two-hour sliding window necessitates 10 TB of storage, and 20 Gbps utilizing a 12-hour sliding window requires 1 PB. Furthermore, these numbers pale in comparison to the large traffic rates today’s networks often encounter. For example, it has been projected that backbone networks may experience up to 170 Tbps in 2021 [16]. Moreover, while these time-consuming, offline investigation procedures can eventually lead to attribution of a cybercrime, the resultant delays in identifying attacks naturally create challenges for mitigating them while they are in progress. In addition, these delays give adversaries more time to launch ensuing attacks, evade prosecution after an attack transpires (e.g., via anti-forensic attempts, fleeing, etc.), and so forth. In turn, investigators have the monumental task of monitoring and safeguarding the capture of the offending traffic amid the overwhelming rates of traffic modern networks observe. Once this objective has been completed, investigators must subsequently analyze the resultant stockpile of capture data for viable artifacts in a very small time window to ensure a successful investigation, circumvent anti-forensic attempts, and mitigate damages. Indeed, streamlining aspects of this arduous process would dramatically enhance its effectiveness. Generally, data plane devices strive for line-rate processing (i.e., at wire-speed), which means a given device can forward traffic from

142

K. Friday et al.

all its ports concurrently at full port capacity. Until very recently, performing the aforesaid forensic tasks at line rate, as the malicious traffic is traversing the wire, was largely an impossibility with the excessive limitations of traditional network implementations. This pitfall is rooted in the fact that the devices handling such traffic are either software-based or static in nature. In particular, the softwarebased solutions generally consist of middleboxes, which cannot conduct complex traffic analysis without substantial degradation to the network’s throughput. Alternatively, the devices that can offer better processing capabilities (e.g., switches and routers) customarily have had their behavior encoded in firmware by vendors and offer extremely limited support for network forensic endeavors. Elaborating upon this notion, network topologies generally can be abstracted as being within the control or data plane [17]. The data plane is responsible for delivering traffic from one device to another, whereas the control plane is essentially the brains of the network and is concerned with establishing links between routers and exchanging protocol information. In the case of the aforementioned traditional networks, both of these plains are integrated into the firmware of routers and switches, and therefore these implementations have relatively fixed behavior. To offer more flexibility, Software Defined Networking (SDN) was proposed to explicitly decouple the two planes. The SDN control plane is realized via a controller that has a centralized view of the network and is implemented in software to offer programmatic access to the data plane and flexibility to administrators. That being said, SDN is still bounded to a small set of forwarding protocols (e.g., IP, Ethernet) entertained by the data plane, which severely restricts the number of applications that can be employed which utilize the enhanced processing capabilities of the data plane’s forwarding devices. Moreover, it takes around four years to add a new protocol to these devices, given they are characteristically proprietary with closed source code [18]. As a result, SDN has struggled to keep pace with the excessively dynamic nature of cybercrime [19]. The P4 language [20] has since surfaced as the de-facto standard for defining the forwarding behavior of the data plane. In turn, the

Offloading Network Forensic Analytics to Programmable Data Plane Switches

143

programming logic that dictates the behavior of how packets are processed can now be developed, tested, deployed, and amended in a much shorter time span. Moreover, such behavior can be governed by the given network’s operators, resulting in fully customizable implementations for network forensic practitioners. In harnessing this newfound technology, the research conducted herein utilizes programmable data planes to transform the manner in which network forensics has traditionally been conducted. In particular, programmable switches can identify and extract forensic artifacts at line rate in order to bypass storing a wealth of capture data to subsequently analyze offline. Custom switch-based programs can also use these extracted artifacts in order to fingerprint malicious events in real time amid Tbps traffic rates. This is in stark contrast to the software-based intermediary nodes (i.e., middleboxes) employing Intrusion Detection Systems (IDS), firewalls, etc., which crumble under the tremendous load of modern networks. To this end, the notion of leveraging programmable data planes for network forensics applications is introduced by proposing two such approaches corresponding to the ever-increasing presence of staggering DDoS attacks and the harnessing of vulnerable IoT devices for malicious endeavors, respectively. In terms of DDoS, note that it can present itself in many forms and one detection strategy might only be able to detect a specific type it was designed for [21]. For example, while an entropy-related approach might be effective when a network is experiencing a flooding attack, it likely will struggle to identify the presence of a stealthier attack. To address this issue, this work combines multiple novel DDoS fingerprinting techniques into one unified detection strategy within a P4-programmed switch. Another aspect of several DDoS detection strategies that can prove problematic is employing static, arbitrary thresholds to mark the boundary between normal and anomalous traffic. Since such thresholds always remain fixed, they need to be calibrated for a particular network’s topology, its expected traffic, when in the day or week it is used, etc., and can also be more easily exploited by savvy adversaries. The proposed mechanism tackles this dilemma

144

K. Friday et al.

by employing dynamic thresholds that automatically adapt to varying network conditions, which thereby negate the need for frequent calibration. To evaluate this strategy, three attack scenarios were launched, namely, SYN flooding, UDP amplification, and a stealthy variant, slow DDoS, against the proposed approach deployed on a Behavioral Model version 2 (BMv2) [22] software switch. The results confirm that UDP amplification attacks could be constrained to a dynamically allocated bandwidth coinciding with the given UDP protocol being leveraged by the attacker (e.g., NTP, DNS, etc.), at line rate. Additionally, SYN flooding was shown to be rendered ineffective with benign requests experiencing no latency, and all other TCP-based DDoS types employing various mixtures of set flags (e.g., SYN-ACK flood, FIN flood, etc.) were negated entirely. Finally, the approach was able to restore service to clients endeavoring to connect to the server within one second of the slow DDoS attack consuming the server’s available connections and to fully remediate the attack by the following second; this is a substantial difference from past approaches that wait for the malicious connections with the target server to time out, which ultimately takes around 10 seconds or more. The second approach aims to offer practitioners a means of promptly fingerprinting IoT device traffic on the network. With the increasing instances of cybercrime committed by way of these devices [23–28], such artifacts can be invaluable to conducting effective investigations. To perform this objective on this switch, a rulebased PART learning algorithm was first trained on the noteworthy dataset proposed by Sivanathan et al. [29] encompassing a thorough mixture of both IoT and non-IoT devices. From the generated rules, a unique methodology for translating the ML algorithm’s output to a compact and practical P4 program was proposed. As a result, the entire classification algorithm is deployed entirely on the switch to enable line-rate fingerprinting of IoT devices. The corresponding evaluation of this classification program on BMv2 demonstrates that the exact device type from which the given traffic originates can be identified with 99% accuracy from a single TCP packet.

Offloading Network Forensic Analytics to Programmable Data Plane Switches

145

The core contributions made by this chapter toward advancing the state of the art are as follows: • Advancing the efficiency of network forensic investigations by leveraging programmable data planes in order to promptly fingerprint both DDoS attacks and IoT devices on forwarding devices. The end result is a dramatic transformation of the traditional timeconsuming offline procedures of filtering a large amount of traffic captures into that which can be conducted at line rate. Further, the filtering out of attack traffic effectively circumvents the need to excessively store a wealth of such irrelevant data. • Improving upon current DDoS protection mechanisms by providing a unified network forensic approach for identifying the broad spectrum of contemporary DDoS attacks within the switch. An adaptive threshold-based approach is used to trigger both artifact extraction and subsequent detection in order to mitigate attacks immediately following their inception. The evaluation of three attack scenarios prevalent in the wild concurs that the proposed strategy remediates UDP amplification and SYN flooding attacks in fractions of a second, and reduces the complete mitigation time of slow DDoS from to upwards of 10 seconds down to two. Further, the approach negates all other TCP flooding attacks that fictitiously set flags. • Presenting an IoT fingerprinting scheme that accurately identifies the IoT devices from the traffic they transmit. When evaluated, the approach was able to identify IoT traffic with 99% accuracy. Moreover, the fingerprinting scheme’s evaluation demonstrates that it is not only effective for fingerprinting IoT devices from a single TCP packet, but devices of non-IoT nature as well. In addition, the results suggest that this procedure can be applied for the fingerprinting of the exact device type on the switch by merely incorporating more training samples per device. • Providing a novel automated methodology for converting ML rulebased output to practical P4 applications on the switch. Further,

K. Friday et al.

146

the proposed methodology has been specifically designed for compact, parallel processing and thereby is extremely practical given its small resource footprint; therefore, P4 programs utilizing it can be employed by network operators next to a multitude of other network-specific P4 algorithms, and without the need for additional training. 2.

Related Literature

In this section, we perform a binary classification of the works related to this chapter. The first class encompasses the advancement of P4-enabled traffic analysis mechanisms since its recent introduction. The second focuses on efforts relative to the state of the art in network forensics. Additionally, we provide a taxonomy which maps subcategories of advancements of the first class to such subcategories of the second class. 2.1.

P4-enabled analytics

In recent years, the benefits of programmable data planes have garnered the attention of the research community. Though the ability to program these forwarding devices is a relatively new technology and yet to be leveraged for tasks specific to network forensics, a number of recent research efforts have been presented to enhance network analysis procedures in the context of IoT-based measurements, addressing disproportional network flows, and enhancing machine learning implementations, as subsequently detailed and depicted in the taxonomy in Figure 1. Machine learning advancements: With the advantages of ML techniques becoming apparent over past decades, current research efforts have been studying how to synergize them with programmable data planes. Given that training ML models is a time-consuming process that can last for weeks, traditional research avenues often endeavor to accelerate the computation process. With programmable switches, such accelerations can now be conducted throughout the

Offloading Network Forensic Analytics to Programmable Data Plane Switches

147

Relevant Research

P4-Enabled Analytics

Traditional Network Forensics

ML Advancements

ML Integration

DDoS/HH Detection

DDoS Forensics

IoT Traffic Management

IoT Fingerprinting

Fig. 1.

Taxonomy of related literature.

network for distributed learning. To this end, Sapio et al. [30] offered a rudimentary MapReduce application for performing data aggregation via P4 in an effort to reduce the communication overhead of exchanging model updates. In a similar context, the in-network aggregation system proposed by Yang et al. [31] was able to reduce the job completion time of a MapReduce-like framework by as much as 50%. Applying a different technique to in-network aggregation, Sapio et al. [32] used workers to perform gradient vector computations, after which point the workers send their individual update vector to the P4 switch and receive back the aggregated model update. As a result, the authors were able to speed up the model’s training by as much as 300% compared to existing distributed learning approaches. Providing an alternative for reducing processing overhead, Sanvito et al. [33] worked on analyzing options for partitioning

148

K. Friday et al.

subsets of layers of Neural Networks (NN) to offload to programmable switches and Network Interface Cards (NIC) for processing. Another area of P4 research is the harnessing the programmable switches to perform classification tasks. This scope of study is currently still largely theoretical, though noteworthy advancements have been made. For example, Siracusano et al. [34] took a noteworthy first-step toward implementing more complex NNs in P4 via presenting a simplified NN utilizing only the bitwise logic functions that programmable switches can entertain. Additionally, Xiong and Zilberman [35] proposed some possible avenues for programming various classification algorithms in P4, namely, decision trees, k-means clustering, Support Vector Machines (SVM), and na¨ıve Bayes. The authors’ attempted to strike a balance between the limited resources the switch can use for such tasks and classification accuracy. Conversely to the bitwise logic means of model simplification leveraged by [34], the algorithms presented by Xiong and Zilberman [35] were more complex, and the authors stated that it is uncertain as to whether these algorithms will compile on an actual hardware switch target. The proposed approach herein falls in line with the goal of the two aforementioned works of switch-based classification; however, the proposed strategy for automating the integration of rulebased classifiers entirely on the switch can be updated on the fly as new intelligence arrives without any downtime, and neither sacrifices accuracy nor the switch’s resources. Disproportional network flows: The generalized approach to identifying disproportional flows within a network is broadly referred to as Heavy Hitter (HH) detection. Specifically, HHs are associated with a low number of flows within a given network that consume a large amount of its bandwidth. Their swift detection has long been shown to promote effective network management practices [36–38], and has been utilized in accounting [37, 39] and traffic engineering [40,41], as well as worm and probing detection [42,43]. Following this aim, the works of Liu et al. [44], Sivaraman et al. [45], Xing et al. [46], and Kuˇcera et al. [47] extended HH detection efforts to programmable switches, which allows the traditional approach

Offloading Network Forensic Analytics to Programmable Data Plane Switches

149

of employing software collectors residing outside the data plane to be bypassed to enhance both detection speed and accuracy. To this end, the aforementioned data plane advances have all enriched the state of the art in HH identification. Ultimately, soft computing-like approaches such as HH detection, which tolerate a level of uncertainty and partial truth [48] due to their generic nature, might not provide suitable evidence to implicate wrong doing in court [49]. In turn, the approach presented in this work reduces the scope of HHs to strictly DDoS detection, with the primary motivation of prompt fingerprinting for evidence extraction in order to facilitate network forensic investigations. A number of other P4 research endeavors have also focused on DDoS detection. In particular, Zhang et al. [50] proposed a range of security policies for volumetric attack mitigation. In a different approach for addressing volumetric varieties, Lapolli et al. [51] utilized entropy for fingerprinting such traffic anomalies. In addition, Mi et al. [52] presented a deep learning technique premised upon the Pushback method [53] for tackling volumetric DDoS. To detect a particular type of volumetric DDoS, Febro et al. [54] proposed a means of fingerprinting that exploited SIP. Alternatively, Scholz et al. [55] proposed a SYN flooding defense strategy premised upon SYN authentication and SYN cookie techniques. While research efforts specifically tailored to DDoS fingerprinting are viable candidates for forensic procedures, if a defense mechanism is to be integrated into the network’s switches, it should address all relevant attacks. This notion poses a problem for the aforementioned DDoS detection schemes as one of the caveats of the programmable switch technology is the limited resources of each switch; thus, implementing a number of different DDoS protection programs into the switch’s pipeline in conjunction with fundamental programs pertaining to packet forwarding, load balancing, etc., is likely not feasible [56]. There is also a need to address the prevalence of more advanced DDoS techniques such as slow DDoS, which can circumvent the detection methods proposed in the aforementioned research efforts [57]. To this end, the work herein proposes a DDoS detection,

150

K. Friday et al.

artifact extraction, and mitigation scheme that unifies a number of techniques to function in harmony with one another in order to address an assortment of relevant DDoS attacks. Additionally, the proposed approach introduces a novel means of providing useful forensic intelligence amid attacks employing spoofing, by way of clustering configuration artifacts on the switch. This is in contrast to previous approaches that attempt to achieve this aim through source authentication techniques such a SYN cookies, which can litter the internet with the corresponding validation traffic and result in detection latency. IoT traffic management: The IoT paradigm has unmistakably been pervasive and entrenched in contemporary society in recent years. With such an overwhelming utilization of these devices, the P4 research has focused on promoting there integration into state-of-theart networks. One particular area of emphasis has been the significant percentage of network bandwidth that is lost while transmitting IoT packet headers. Given that these devices generally have limited processing capabilities, they typically transmit packets encapsulating small payloads (e.g., sensor readings), which leads to large quantities of packets largely comprising redundant headers that occupy throughput and need to be processed by the network. To this extent, Wang et al. [58,59] and Lin et al. [60] proposed a promising solution of aggregating such packets on programmable switches. This is in contrast to conducting aggregation on server CPUs which can increase end-to-end latency and result in the loss of real-time functionality. Another area of IoT research undertaken by the P4 community is service automation. Essentially, Low-power low-range IoT communication technologies characteristically utilize a Peer-to-Peer (P2P) model. While P2P offers distinct advantages such as low end-to-end latency and reduced power consumption, it’s also tightly coupled with the drawbacks of subpar scalability, short reachability, and policy enforcement that is inherently inflexible. To overcome these pitfalls, Uddin et al. [61] proposed a programmable switch that automates IoT services by encoding their transactions in the data plane and utilizing the controller for address assignment, device and service discovery,

Offloading Network Forensic Analytics to Programmable Data Plane Switches

151

subscription management, and policy enforcement. Additionally, the authors subsequently presented an extension [62] that supports multiple non-IP protocols. There is still a need to accurately fingerprint IoT devices for purposes of the aforementioned approaches and for network forensic procedures, and this is thereby the motivation for the IoT device fingerprinting mechanism proposed herein. 2.2.

Traditional network forensics

Network forensics has safeguarded our networks for many years. Moreover, the research community has kept practictioners equipped with state-of-the-art measures for conducting effective investigations in order to hold adversaries accountable for their crimes. Amid some of the primary areas of study in this context are ML integration, DDoS forensics, IoT analysis, which are elaborated upon next and shown in the taxonomy in Figure 1. ML integration: Capturing network activity lies at the root of network forensics; however, a large amount of the information captured or recorded will not be useful for investigations. Moreover, with the increasing rates of traffic modern-day networks exhibit, this equates to a large amount of wasted time, storage, and computational resources. In an effort to address this, Mukkamala and Sung [63] employed NNs and SVMs for offline intrusion analysis in order to fingerprint key features that reveal information deemed worthy for further intelligent analysis. With a similar goal, Sindhu and Meshram [64] apply the Apriori algorithm to perform association rule learning to the data their system collects in order to uncover patterns pertaining to malicious activities. Another area of concern for practitioners has been the increased proliferation of botnets which has been causing serious security risks and financial damage. To aid the investigations of such misdemeanors, Koroniotis et al. [65] employed association rule mining, an NN, na¨ıve Bayes, and a decision tree to detect botnets and track their activities, with the decision tree giving the best accuracy of 93.23%. In a subsequent work, Koroniotis et al. [66] facilitated the

152

K. Friday et al.

training and validation network forensic systems by way of offering a noteworthy botnet dataset. This dataset later enabled the work of Oreˇski and Androˇcec [67] which reduced the time needed for optimal feature selection by employing a genetic algorithm to optimize such parameters to be fed into an NN. In another botnet forensic undertaking, Bijalwan [68] explored the use of eight different ensembles of classifiers, showing the resultant improvement in accuracy over a single classifier. Overall, the aforesaid ML approaches brought forth advancements reducing the amount of time necessary to analyze large traffic captures for relevant artifacts. Conversely, the proposed approach conducts classifications as packets traverse the switch, which allows events to be flagged and customized actions such as the storing of evidence in the midst of an attack, in real time. Further, the presented ML-based method is automated, and thereby circumvents the need for additional expertise. DDoS forensics: With DDoS attacks not only being a concern for decades but ever increasing in intensity and frequency of occurrence, a number of network forensic research endeavors has been devoted to targeting this mounting issue. Following suit with the previously articulated benefits of ML, it has also been leveraged for DDoS forensic tasks. One such effort is that conducted by Hoon et al. [69] which aimed to identify the best machine learning model for offline DDoS forensics, finding that na¨ıve Bayes, gradient boosting, and distributed random forests were the most optimal. The approach taken by Kachavimath et al. [70] affirmed the effectiveness of na¨ıve Bayes and additionallly showed that k-nearest neighbors too outperforms conventional learning models. Similarly, Fadil et al. [71] utilized na¨ıve Bayes to perform DDoS forensics on network traffic extracted from a core router via packet captures. Conversely, the proposed approach herein performs such detection as the traffic is traversing the switch. Yudhana et al. [72] also implemented a na¨ıve Bayes classifier, however they additionally integrated an NN for conducting DDoS forensics. Taking a more traditional approach to DDoS forensics, Zulkifli et al. [73] exercised live forensic log file analysis to identify a Denial of Service (DoS) attack via Wireshark [74]. This live approach is

Offloading Network Forensic Analytics to Programmable Data Plane Switches

153

in contrast to typical forensic procedures, which are executed while the system is down [75]. Another challenge for DDoS investigations has been the rise in both attack and benign traffic that networks typically observe [69]. Moreover, such a steep rise has proportionally led to a sharp growth in attack log files sizes. In an attempt to reduce the time to perform the corresponding analysis to attribute sources and victims of DDoS attacks, Khattak et al. [76] proposed using Hadoop’s MapReduce. Similarly, Khattak and Anwar [77] leveraged MapReduce to parallelize the entropy-based clustering and forensic analysis of attack traffic to safeguard nodes in a cloud environment to decrease log file analysis. In building upon this aim, the proposed approach presents a technique for performing this traditionally offline procedure in a live fashion via programmable switches, which allows evidence to be obtained at line rate while the attack is simultaneously mitigated. Additionally, Aydeger et al. [78] also worked on mitigating DDoS attacks such as Crossfire by utilizing SDN in conjunction with Network Function Virtualization (NFV) to provide a Moving Target Defense (MTD) framework for ISP networks to conceal network topologies. The authors also permitted the storing of information pertaining to potential attackers for investigations. Alternatively, the methodology introduced herein pushes relevant attack evidence to a collector for subsequent analysis immediately upon detection to eliminate benign traffic data excessively consuming storage. P4programmed hardware switches process packets in nanoseconds, and allow practitioners to easily add customized code for evidence extraction once such maliciousness has been fingerprinted. Other networkspecific DDoS forensic works include machine-to-machine networks presented by Wang et al. [79], mobile ad-hoc networks by Timcenko and Stojanovic [80], and networks encompassing cellular devices by Cusack et al. [81]. Further, with slow DDoS via mobile devices being a growing concern [82], Cusack et al. [81] endeavored to identify the presence of such attacks based upon the Euclidean distance similarity between the protocol (e.g., HTTP, HTTPS, etc.) counts of a past and present log file. Since this technique can lead to both false

154

K. Friday et al.

positives and negatives given the randomness of traffic patterns, the proposed approach employs a novel interarrival time analysis scheme that facilitates investigations by fingerprinting, attributing, and mitigating slow DDoS attacks in real time. IoT fingerprinting: With the IoT paradigm being tied to a number of inherent vulnerabilities and responsible for a large number of botnet-facilitated DDoS attacks, investigating crimes conducted by way of these devices is now fundamental to network forensics. In turn, fingerprinting traffic originating from them has recently attracted significant attention from both the research community and the industry in order to identify events of interest and extract relevant artifacts. With ML’s ability to recognize patterns in network traffic, it is generally leveraged for IoT fingerprinting tasks. Among these, Meidan et al. [83] used supervised learning trained upon deep packet features in order to distinguish between IoT and non-IoT devices, and to associate each IoT device to a specific class. Yang et al. [84] utilized both deep packet and header features to train an NN in order to generate IoT fingerprints. Taking a different approach, Feng et al. [85] used IoT application-layer response data coupled with product descriptions from relevant websites to generate an Acquisitional Rule-based Engine (ARE) to classify devices. Conversely, Perdisci et al. [86] only use DNS fingerprints for IoT device classification. Lastly, Pinheiro et al. [87] utilize five different classifiers trained with packet length statistics to identify IoT devices, from which the Random Forest algorithm achieved the highest accuracy of 96%. Alternatively, some research efforts paired IoT device identification with SDN functionality. Sivanathan et al. [88] leveraged SDNbased, flow-level telemetry combined with machine learning for IoT classification. In another approach based upon SDN, Thangavelu et al. [89] assigned classifier maintenance to the controller and the actual tasks of classifying IoT devices to the gateways. The gateway devices utilize software for classification (i.e., unscalable to high traffic rates [88]), which differs from the hardware-based classification approach proposed herein which ensures line-rate processing amid heavy traffic loads. Further, the proposed approach classifies devices

Offloading Network Forensic Analytics to Programmable Data Plane Switches

155

from the headers of a single TCP packet which necessitates nanoseconds in hardware versus the session or flow-level analysis utilized in [88] and [89], respectively, which gives a classification time upperbounded by the time to analyze the encompassed successive packets. While all the aforesaid works advanced the state of the art in IoT device fingerprinting for network forensic intelligence, all but [88] and [89] performed offline procedures, which lead to delays in the detection and attribution of criminal behavior. Such offline procedures are in stark contrast to the proposed P4 switch-based approach which can fingerprint devices and execute corresponding customized actions as a single packet originating from these devices traverses the switch’s pipeline.

3.

Background

In this section, we highlight the intuition behind instrumenting programmable switches for network forensic tasks and to set the stage for subsequent sections in this chapter. We begin by offering a primer on these devices. Next, we elaborate on some of the advantages that the characteristics and functionalities of programmable switches can offer practitioners. 3.1.

A primer on programmable switches

As previously mentioned, SDN provided an effective means of separating the control plane from the forwarding devices of the data plane. The data plane table entries are then populated by way of protocols such as OpenFlow [90], and the control plane exposes interfaces for third-party applications where programmers can apply customized logic for the population of table entries. Despite the data plane customization that this allows for, the latest OpenFlow specification (OpenFlow 1.5.1 [91]) is constrained to 45 headers, which dramatically limits the range of applications that can be used. Further, attempts to modify or add headers generally translates to about 4 years of waiting [18].

K. Friday et al.

156

Alternatively, recent efforts have been devoted to developing switches that allow for full data plane ASIC programmability via domain-specific languages such as P4 [20]. Along with allowing for customized network implementations, programmable switches do not incur performance penalties and run on ASICs at line rate with terabit speeds. For example, the Tofino2 ASIC processes packets at 12.8 Tbps [92]. At the root of this advance’s inception is the Protocol Independent Switch Architecture (PISA), which is depicted in Figure 2. As shown, an incoming packet enters the programmable parser where it is parsed into individual headers (parsed representation), and where states and transitions are defined. Subsequently, the packet flows sequentially into each stage of the Programmable MatchAction Pipeline where match-action unit tables are applied. It is in these tables that various header and metadata fields are typically matched on in order to provide customized behavior. Additionally, programmable data planes possess the distinct ability to perform stateful packet processing by way of storing data across packet traversals of the switch via counters and registers. As a result, network owners can leverage these storage mechanisms to implement their own complex processing logic that operates at line rate. Once the P4 program is written, it is transformed into binaries for the target architecture by the compiler provided by the particular target

Fig. 2.

Programmable switch architecture.

Offloading Network Forensic Analytics to Programmable Data Plane Switches

157

switch’s vendor. In addition, the compilation produces interactive APIs that the control plane uses to interact with the data plane. 3.2.

Motivating line-rate network forensics

Typically, network forensics entails the storing of all observed traffic on the network via packet captures, saving sampled traffic information, or logging network events of interest. In all cases, investigations generally necessitate the later inspection of this information. Naturally, the capturing of each and every packet traversing the network has an increased potential of encapsulating forensic artifacts when present; however, this is achieved at the clear cost of storage and the time complexities associated with processing the captures. An alternative approach is conducting such analysis in an online fashion. To employ an online analytics strategy, the assistance of software-based middlebox techniques (e.g., Intrusion Detection Systems (IDS), firewalls, etc.) are generally warranted. These approaches are deployed in-line, meaning that network traffic will be processed by them prior to it reaching its destination. While middleboxes are supported by well-crafted methods and algorithms for inspecting and filtering malicious traffic, softwarebased solutions suffer from serious concerns in terms of performance, cost, and agility [88]. For example, DDoS attacks are now synonymous with leveraging terabits of attack traffic, which is a rate that is impossible to handle with current software solutions [21]. The end result is a significant degradation in the network’s throughput, which in turn affects resource utilization. Moreover, packets inspected by software lead to a considerable increase in latency and jitter, which impacts the Quality of Service (QoS) of latency-sensitive services and user experience. Furthermore, this phenomena not only applies to DDoS attacks, as the increasing utilization of the Internet has resulted in a variety of networks experiencing exorbitant traffic loads [15]. In addition to network performance, software-based approaches necessitate additional costs to keep up with such traffic rates. While

K. Friday et al.

158

incrementing the number of hardware resources employed will solve the problem, ultimately a steep rise in operational costs and management complexity will arise. Note that adding resources is a temporary patch given the aforementioned trend of growing traffic rates. Lastly, proprietary middleboxes are closed sources; thus, practitioners cannot readily modify algorithms or develop custom solutions that implement the latest forensic intelligence. The nature of cybercrime is dynamic, and adversaries are constantly utilizing new attack vectors and surfaces. Attempting to mitigate such maliciousness with middlebox-based techniques is a daunting task given the challenge of keeping them current without vendor support. Conversely, programmable data planes address each of the aforementioned shortcomings. They are not only cheap to deploy but allow practitioners to customize the processing logic that once compiled functions at line rate amid substantial traffic loads. Therefore, these forwarding devices offer the cost-effectiveness, agility, and necessary performance to meet the demands of contemporary network forensic tasks.

4.

In-network Forensic Use Cases

To effectively demonstrate the abilities of programmable switches to assist in the network forensic process, two use cases are detailed in this section in order to provide an in-network means of fingerprinting an assortment of DDoS attacks and IoT devices. The particulars of each use case are highlighted, along with how the intricacies of each approach are implemented on the switch. 4.1.

Assessing DDoS

The first of the two approaches entails aggregating a number of unique DDoS detection strategies into one uniform network forensic methodology. To perform such aggregation, note that while a variety of DDoS attacks are currently exercised by adversaries, they can essentially be encapsulated by the binary classification of volumetric or stealthy (i.e., slow DDoS); thus, this proposed technique utilizes

Offloading Network Forensic Analytics to Programmable Data Plane Switches

159

two schemes for detecting the assortment of relevant DDoS attacks, which are elaborated upon next. 4.1.1.

Slow DDoS

What has been termed slow DDoS takes a stealthy approach to denying service to a targeted network via endeavoring to tie up the server’s available connections in order to deny authentic clients access. These attacks utilize legitimate TCP behavior and send malicious packets at a frequency similar in intensity to that of benign traffic, which makes such malicious traffic incredibly hard to detect by way of traditional anomaly and signature-based techniques [21, 93]. To effectively fingerprint these stealthy attacks, the stateful processing of programmable switches is leveraged in order to track the active sessions on the server being targeted by the attacker. In particular, a record of each authentic session an outside entity retains with the target server is stored within the switch’s registers. The utilization of the switch’s registers (versus pushing data to the controller for storage) enables line-rate functionality as this is performed entirely on the switch hardware. Note that by maintaining such records, all assortments of TCP flooding attacks are effectively eradicated because they are not associated with any current valid connection with the server, as implied in Figure 3. This is due to the fact that TCP flooding variations set a variety of erroneous flags without first establishing a connection (aside from SYN flooding, which is addressed by the approach with a different technique), which is designed to exhaust the target server’s resources, or some semblance of both. In order to generate statistics with respect to each active session held with the server for real-time detection purposes, the switch first associates all such sessions with their corresponding source IP addresses. Note that source IPs are relevant artifacts for slow DDoS as it does not leverage spoofing because a legitimate interchange of packets between the source and destination IP addresses is fundamental to executing the attack. That being said, storing all the IP addresses that could potentially be observed naturally induces

K. Friday et al.

160

Fig. 3.

DDoS detection approach overview.

resource consumption issues. To address this issue, a 65,536 cell Bloom filter is held on the switch, which is instantiated as a register array. Bloom filters offer the distinct advantages of storage efficiency and O(1) access times. In this implementation, the index of the register to be accessed in the Bloom filter is determined from the result of a Cyclic Redundancy Check 16-bit (CRC16) hash of the source IP address. The overview of the Bloom filter’s behavior is shown in Figure 4. In this instance, the registers of the Bloom filter are responsible for holding the timestamp of the last packet received from the given index. This proposed fingerprinting strategy leverages timestamps as the foundation of its detection mechanism given interarrival times are a distinguishing factor of a slow DDoS occurrence. Specifically, slow DDoS keeps interarrival times long enough as to conserve the attacker’s resources and not stand out amid the flow of legitimate traffic, but not too long as to be timed-out by the target server. This behavior can be observed in the source code of implementations of this attack, such as in that of R U Dead Yet? (R.U.D.Y.) [94] and Slowloris [95]. When a packet arrives at the switch and is found to be holding an active session with the server, the interarrival time (timestampcurrent − timestampprevious ) of this session is extracted. Once this occurs, this value is subsequently matched against ranges

Offloading Network Forensic Analytics to Programmable Data Plane Switches

Fig. 4.

161

In-network stateful artifact tracking by way of a Bloom filter.

of interarrival times in one of the switch’s match-action tables. Each of these table entries of interarrival time ranges are associated with direct P4 counters, meaning a corresponding counter is incremented for each match. At the end of each designated time window W , these counters are received by the controller as a counter array and subsequently analyzed. By way of a Python script, the controller uses these counts to formulate a distribution of the interarrival rates observed on the network during W . Note that proceeding in this fashion accounts for traffic patterns that are varying during busy or slow times. It is from this distribution that the detection strategy identifies anomalies associated with slow DDoS, i.e., abnormally lengthy interarrival times. Further, given that this distribution of current network traffic is updated in real time, the switch can use its line-rate processing abilities to immediately identify such stealthy attacks. Note that while merely using a dynamic anomaly threshold addresses the pinpointing of slow DDoS amid varying legitimate traffic rates the network may observe, there is a chance that benign users with very slow connections could falsely be identified as malicious. In order to minimize any such impacts on these users, it is paramount to impose the anomaly threshold on an as-needed basis, namely, only when the total number of connections the server has to offer are nearly all consumed. In addition, because an interarrival time calculation inherently necessitates the analysis of two subsequent packets from the same source IP, this poses a challenge if the detection

K. Friday et al.

162

mechanism waits until all of the server’s available connections are consumed before it acts, i.e., resulting in 10 seconds of DDoS for the network from an attack employing ten second interarrival times. As a result, it is necessary to act preemptively. To this effect, the proposed approach aims to identify the number of session establishments that can be expected to occur during Wi . The switch then strives to preemptively keep such expected number of such session establishments E open at all times during Wi+1 . We argue that this minimizes the aforementioned false positives considering that slow DDoS interarrival times are generally much higher than that of benign users with sluggish connections. This is because if an attacker instrumented enough source IPs to consume all of the target server’s available connections, using relatively normal interarrival times, the attack as a whole would lose its stealthiness as it would appear rather volumetric in nature. To incorporate this preemptive measure, the switch maintains a register holding the maximum number of session establishments which occur during time window W without any of the currently established sessions closing. Upon the expiring of W , the controller computes the ten second moving average of this register, estav . In turn, the proposed approach can effectively identify a threshold for dropping slow DDoS connections when less than estav thread exist during W , as given by 



f (x)dx = thresh

4.1.2.

estav . max(sessions)

(1)

Volumetric analysis

Middlebox or server-based software volumetric DDoS defenses often result in degradation to a network’s throughput. This is because they simply cannot keep pace with processing the large amounts of traffic that these attacks are now generating. Conversely, the proposed volumetric detection scheme utilizes the switch’s stateful storage in order to circumvent the need of such CPU-based implementations. In particular, the bandwidth artifacts utilized by TCP, UDP, and ICMP are stored within the switch’s registers in Bps. This is relevant

Offloading Network Forensic Analytics to Programmable Data Plane Switches

163

considering the direct proportionality between the bandwidth being consumed and the resource depletion of targeted server. In turn, by determining the bandwidth consumed at regular time windows W while the network is not experiencing an attack, volumetric DDoS will produce an anomaly (i.e., a deviation from the network’s normal link saturation) if it transpires. These overarching bounds (Tallocatedi ) are determined by assessing the expected throughput for each of the aforementioned protocols via the following equation: Tallocatedi = Btotal ∗ 

Tmeasuredi , s∈(S) Tmeasureds

(2)

where Btotal is the total amount of bandwidth allowed, Tallocated is the throughput allocated per transport protocol, Tmeasured is the current benign throughput measured by the switch during time window W , and i ∈ S where S = {TCP, UDP, ICMP}. By bounding the protocols by Tallocatedi , the remaining protocols S − i will function unimpeded if protocol i is being used to deliver a volumetric attack. Aside from the majority of TCP flooding that was previously addressed, namely, the setting of fictitious flags by adversaries, note that this mechanism can impact service to any benign entities also using i. In turn, further action must be taken to minimize such collateral damage. For ICMP traffic, we allow its legitimate use by not electing to adopt the approach taken by many modern-day networks of simply blocking all ICMP traffic at the edge for security purposes. The motivation for doing so is that this protocol is often essential to circumvent issues with diagnostics and performance [96]. Secondly, such traffic is dropped when the header field type equates to zero, three, four, five, eight, or several others that have been deprecated [97]. As a result, various sub-classes of ICMP-related attacks and vulnerabilities are eradicated [98]. The Bps of the remaining ICMP traffic is then recorded by the switch and bounded by TallocatedICM P . Minimizing the impact on benign entities using UDP is especially relevant. UDP encompasses a large assortment of underlying services and attempting to only bound the throughput of all

164

K. Friday et al.

UDP traffic by TallocatedU DP allows an attack using a single UDPbased service to DDoS all other UDP-encapsulated services. It is also important to note that UDP notoriously coincides with amplification attacks because some UDP-based services’ responses are much larger than the initial request. In turn, adversaries can easily send requests resulting in response traffic that reaches magnitudes far exceeding that which they needed to transmit. In turn, this type of attack can either consume the network’s server or bombard another target with responses from the network’s server via the adversaries spoofing the source IP of the initial requests (reflection attacks [99]). In fact, amplification resulted in the largest Tbps DDoS attacks to date [100, 101]. To address both amplification and impact to benign entities, the proposed strategy conducts real-time distribution calculations via recording the Bps per application layer amplification protocol (identified on the switch via the destination port) over a moving average of time window W . For the sake of simplicity, the remaining application layer protocols that are associated with amplification are grouped into one distribution calculation. In the event more finegrained considerations for such remaining protocols are needed, this technique can easily be amended with no added latency and very little additional consumption of the switch’s resources, namely, one extra register, counter, and port match entry per added service. The strategy for achieving this objective is similar to that elaborated upon in Section 4.1.1. Specifically, the Bps counts are extracted by the controller upon the completion of W . The controller then calculates the new threshold for each of these services and pushes them back to the switch. The switch subsequently stores these values in its registers and begins enforcing them on applicable traffic. Note by analyzing both the source and destination ports of incoming UDP traffic, the proposed strategy vanquishes attempts by attackers to both target or leverage (for a reflection attack) the network’s server. In terms of volumetric TCP attacks, as shown in Figure 3, SYN flooding remains to be addressed. Given that SYN flooding generally entails spoofing of a large amount of SYN requests in order to

Offloading Network Forensic Analytics to Programmable Data Plane Switches

165

saturate the target with empty transactions, it becomes difficult to segregate malicious request traffic from that of a benign nature. A common technique employed in the past is to merely block all SYN traffic amid such an attack, i.e., effectively denying service to all new end users as a means of mitigation. Alternative approaches have since been proposed, such as SYN cookie techniques, however they incur latency and often litter neighboring networks with response traffic [102]. To address these gaps in the literature, the proposed approach implements a signature matching scheme via hashing the headers of ingress SYN packets that have the tendency to imply different TCP/IP implementations, such as TTL, Window Size, etc. This strategy is motivated by the fact that an adversary will generally target specific vulnerabilities (e.g., from a certain Operating System (OS) version [103, 104]); therefore, there exists a strong likelihood that the machines exploited by this adversary to transmit the attack will possess the same signature. The signature artifacts are maintained on the switch by a counting Bloom filter. This Bloom filter functions similarly to that previously discussed in Section 4.1.1; however, in this instance the configuration headers are hashed to obtain the index of the register array, and the value stored in the given register is merely a count of how many times that register has been hashed to. The highest counts within this array are stored in additional registers on the switch to be compared against. The reason for storing multiple counts is in the event the passive signature matching procedure does not fully identify all the malicious sources; thus, blacklisting only the sources with the highest count might not be sufficient to mitigate the attack. In turn, the sources with the highest counts are incrementally blacklisted until the SYN request rate falls below a desirable threshold in a given W . As a result, there is less likelihood that legitimate end users will inadvertently be blacklisted by the SYN flood’s mitigation strategy. To calculate the aforesaid threshold dynamically, the switch first counts the SYN packets it observes during W . Upon the expiring of W , the switch’s data are transmitted to the controller. The controller then calculates the ten second moving average of SYN requests

166

K. Friday et al.

and returns the resultant value (plus two standard deviations) to the switch to be used as a dynamic threshold. 4.2.

Fingerprinting IoT devices

With the plethora of vulnerabilities surrounding IoT coupled with the increasing utilization of these devices, the value in extracting IoT-specific artifacts for investigations is evident. To date, the most effective means of fingerprinting IoT devices is by way of ML. To this end, the state-of-the-art research in P4 has been endeavoring to uncover a practical means of integrating ML functionality into the switch’s pipeline [30–32,34,35]. The primary reasons for doing so are either (1) to leverage the boost in speed that the switch’s hardware can offer (e.g., for distributed learning applications) or (2) to harness the classification abilities of ML within a network context. Though a few noteworthy works have been proposed addressing some nuances pertaining to (1), a viable and practical solution is still ultimately lacking in terms of (2). One concern with (2) is whether switches can execute quantized versions of complex classification algorithms with acceptable loss to accuracy. Another debate that has arisen with (2) is such algorithms can consume a large amount of the switches’ limited resources and therefore, if it is realistic from an economic standpoint to have a switch strictly dedicated to classification tasks. We address these issues with (2) in this use case via identifying an ML algorithm that accurately fingerprints IoT devices without the need for any quantization and map it to the switch’s pipeline in a highly efficient manner, as subsequently detailed. 4.2.1.

Switch-based constraints

One of the trade-offs with leveraging the efficiency of a programmable switch is operating within its strict resource constraints. One means of meeting these tight resource bounds is by offloading tasks to the controller. That being said, such approaches can be susceptible to additional latency due to communication and calculation delays.

Offloading Network Forensic Analytics to Programmable Data Plane Switches

167

Whether or not this latency is acceptable is generally based on the application. Additionally, if a strict data plane application is preferred, only specific computations (e.g., simple comparisons, bitwise operations, addition, and subtraction) and a small predefined number of algorithmic operations (limited by the number of stages utilized) can be performed [47]. While the set of computations that can be practiced within the switch is clear, a notion that cannot be understated is that of stages. Though internal switch configurations are vendor-specific and generally not disclosed to the public, it is common to employ a little over ten stages in programmable switches [105]. A stage is allocated its own dedicated resources, such as match-action tables and register arrays. Operations within a stage function independent of each other (i.e., in parallel). Though stages can pass information to subsequent stages via modifications made to a given packet’s header fields and metadata, the operations encapsulated from one stage to another execute sequentially at runtime. As a result, the amount of sequential operations that a programmable switch can entertain are bounded by the number of stages the hardware switch possesses. While the choice of operation placement is typically made by the compiler, it is based on whether the aforementioned operations possess dependencies (i.e., they need to be executed sequentially). For example, if meta variable1 = value1 and meta variable2 = meta variable1 + 1, these operations will necessitate 2 separate stages. Moreover, if intermittent stages are being filled by other P4 programs, a dependencyridden implementation might not compile on an actual hardware switch. In order to offer line-rate IoT artifact extraction to network forensic practitioners, the proposed IoT fingerprinting ML mechanism is converted to a resource-friendly implementation that operates entirely within the data plane. As a result, its processing is performed at a relatively constant rate as traffic traverses the switch (i.e., within nanoseconds). In particular, the Projective Adaptive Resonance Theory (PART) learning algorithm [106] is harnessed for the fingerprinting procedure. PART is a partial decision tree algorithm

K. Friday et al.

168

for rule-based classification; each rule corresponds to one traversal down the tree to a given class. Contrary to the comparable C4.5 [107] and RIPPER [108] algorithms, it can generate the appropriate rules without the need to perform global optimization and hence is a more efficient alternative. Further, the proposed approach’s mapping of classifier output to P4 applications can be expeditiously applied to any such rule-based approach while being extremely conservative with the aforementioned limited resources of the switch, as subsequently elaborated upon.

4.2.2.

Meeting hardware restrictions

With the proposed fingerprinting approach residing strictly within the data plane, it is paramount that it meets the aforementioned tight bounds of such implementations. To address this aim, we convert the rules generated by PART to match-action tables in the switch’s pipeline. Essentially, each of these rules encompasses a group of comparison statements, with the number of statements within each rule i falling within the set Si = {x | x ∈ N ∧ x . As can be observed, f contains ten features in total. 4.2.4.

Parallel processing

Building upon the strict assertion of using header-based features for classification, it should be noted that the variables the programmable parser stores the header values in are entirely independent of one another. In turn, all features can be evaluated in parallel because there are no dependencies between them, as previously explained in Section 4.2.1. Further, evaluating whether each of these features match an explicit range or value requires no additional operations other than hard-coding the values to be matched against as keys in the switch’s match-action pipeline. As a result, proper implementation of the PART feature evaluation component of the P4 program facilitates parallel processing, and in turn, uses a minimal number of consumed stages. 4.2.5.

Match table mapping

To facilitate the generation of a program that can be updated by network operators upon the arrival of any new fingerprinting

170

K. Friday et al.

intelligence, whether during initialization or runtime, the program must be constructed in such a manner where this needed flexibility exists strictly within the entries of the match-action tables. This is because while the entries in the match-action tables can be added or removed effortlessly at any point during the program’s execution, the allocation of the tables took place during the program’s compilation and are therefore static. To this end, the proposed fingerprinting approach employs a skeleton made entirely up of tables, i.e., the actual P4 code that is visible to the forensic practitioner and will not be modified. This skeleton is only dependent upon the features utilized (i.e., the header values trained on). The shell encompasses one table per feature, followed by a single fingerprinting hash table to perform the classification. Each of these tables are instantiated via a simple apply statement, as shown by Algorithm 1. Algorithm 1 P4 implementation algorithm. Input: network and transport layer headers Output: IoT device classification 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:

Control Ingress { apply(ip len tbl); apply(ip id tbl); apply(ip off tbl); apply(ip ttl tbl); apply(ip sum tbl); apply(tcp sport tbl); apply(tcp dport tbl); apply(tcp off tbl); apply(tcp flags tbl); apply(tcp win tbl); apply(hash fingerprinting tbl); }

In turn, the number of tables implemented is always equal to  |f | + 1. Further, because the feature analysis can be conducted in parallel, the IoT fingerprinting approach only necessitates two

Offloading Network Forensic Analytics to Programmable Data Plane Switches

171

stages in the programmable switch pipeline, namely, the feature tables followed by the device classification. Each of the feature tables has a declared action() (i.e., performs processing based on the key that was matched) that merely assigns a result to a single metadata variable which holds that table’s match result (i.e., the result of that feature’s evaluation). Each result falls within the set Tresult = {j | j ∈ N ∧ j decision. We also maintain a MySQL database to store the intermediary pre-computed verification results. The interface to different security applications is built to customize part of their design to interact with the policy enforcement module.

5.

Adapting to Other Cloud Platforms

We design our framework in a platform-agnostic manner so that we can adapt it to other major cloud platforms (e.g., Amazon EC2 [34], Google GCP [35], and Microsoft Azure [36]). The main adaption effort includes developing platform-specific interfaces to interact with the cloud platform (e.g., while collecting logs and intercepting runtime events) through two modules: log processor and interceptor. The remaining modules of our framework (in Figure 7) deal with the c https://github.com/qpwo/python-simple-cycles

S. Majumdar

250

Table 1. Interception supports to adopt our framework in major cloud platforms. Cloud platform

Interception support

OpenStack Amazon EC2-VPC Google GCP Microsoft Azure

WSGI Middleware [37] AWS Lambda Function [34] GCP Metrics [35] Azure Event Grid [36]

platform-independent data, and hence, the next steps in our framework are platform-agnostic. In the following, we elaborate on each of these efforts. Building interceptors: The responsibility of the interceptor is to intercept runtime event requests to a cloud platform. The interception mechanism may require to be implemented for each cloud platform. In OpenStack, we leverage the Web Server Gateway Interface (WSGI) middleware [37] to intercept and enforce the proactive results so that compliance can be preserved. Through our preliminary study, we identify that almost all major platforms provide an option to intercept cloud events. In Amazon, using AWS Lambda functions [34], developers can write their own code to intercept and monitor events. Google GCP introduces GCP Metrics [35] to configure charting or alerting different critical situations. Our understanding is that our framework can be integrated to GCP as one of the metrics similarly as the dos intercept count metric, which intends to prevent DoS attacks. The Azure Event Grid [36] is an event managing service from Azure to monitor and control event routing which is quite similar to our interception mechanism. Therefore, we believe that our framework can be an extension of the Azure Event Grid to proactively audit cloud events. Table 1 summarizes the interception support in these cloud platforms. Developing log processors: The responsibility of the log processor is to interpret platform-specific event instances, and hence, it is required to be implemented for each platform. First, to interpret platform-specific event instances to generic event types, we currently

Multi-level Security Investigation for Clouds Table 2.

251

Mapping event APIs from different cloud platforms to generic event types.

Generic event type

OpenStack [21]

create VM

POST /servers

delete VM

DELETE /servers

update VM

PUT /servers

create security group delete security group

POST /v2.0/securitygroups DELETE /v2.0/securitygroups/{security group id}

Amazon EC2-VPC [34]

Google GCP [35]

Microsoft Azure [36]

aws opsworks --region create-instance aws opsworks --region delete-instance --instance-id aws opsworks --region update-instance --instance-id aws ec2 create-security-group

gcloud compute instances create

az vm create l

gcloud compute instances delete

az vm delete

N/A

az network nsg create

aws ec2 delete-security -group --group-name

N/A

az network nsg delete

gcloud compute az vm update instances add-tags

maintain a mapping of the APIs from different platforms. Table 2 enlists some examples of such mappings. 6.

Experiments

This section presents our experimental results. 6.1.

Experimental settings

Our testbed cloud is based on OpenStack version Mitaka. There are one controller node and up to 80 compute nodes, each having Intel i7 dual core CPU and 2 GB memory with the Ubuntu 16.04 server. Based on a recent survey [38] on OpenStack, we simulate an environment with a maximum of 1,000 tenants and 100,000 VMs. The synthetic dataset includes logs (e.g., nova-api.log and neutron-server.log) from various services (e.g., Nova for computing and Neutron for networking) of OpenStack [39] with over 4.5 millions records. We further utilize data collected from a real community cloud hosted at one of the largest telecommunications vendors.

S. Majumdar

252

To this end, we analyze the management logs (sized more than 1.6 GB text-based logs) and extract 128,264 relevant log entries for the period of more than 500 days. We repeat each experiment at least 100 times. 6.2.

Experimental Results

We present our obtained experimental results as follows. Efficiency of our solution: The objective of our first set of experiments is to demonstrate the efficiency of our proposed solution. Figure 8 illustrates the response time (in milliseconds) for both user and virtual levels using the modified implementations of Patron and Congress, respectively. In Figure 8(a), for different sizes of cache, we observe a quasi-constant response time (i.e., less than one millisecond) by leveraging our cache implementation. This figure also shows that the pre-computation effort is around four milliseconds. Figure 8(b) shows the results of a similar experiment on virtual infrastructure verification (e.g., the modified Congress implementation). The response time remains within six milliseconds for 85.5% of the time on average, and the prediction error may cost the precomputation effort of up to 137 milliseconds (and the verification

Additional Pre-Computation

Runtime Verification (Prediction Match) 150

Time (ms)

Time (ms)

6 4 2

0 16MB

100 50 0

32MB

64MB

128MB

0

20K

40K

60K

Cache Size

# of VM

(a)

(b)

80K

100K

Fig. 8. Time required for both runtime verification and additional pre-computation for (a) user-level (using re-designed Patron) and (b) virtual infrastructure (using re-designed Congress) while varying the size of the cache and number of VMs, respectively.

Multi-level Security Investigation for Clouds

253

time of Congress in the proactive mode). Overall the results show the response time in several milliseconds in the best cases and several hundred of milliseconds in the worst cases. Accuracy of our predictive model: The second set of experiments is to show the accuracy improvements by our proposed predictive model. To that end, we measure the prediction match rate and prediction error rate. The prediction match rate refers to the percentage of time our prediction is correct. Therefore, the higher prediction match rate ensures better response time. On the other hand, the prediction error rate refers to the situation where our prediction causes an inaccurate pre-computation. Therefore, the lower error rate ensures minimal wastage of computations. For measuring the accuracy of our predictive model, we consider the events from Figure 4, where A: create VM, B: create security group, D: add security group rule, and E: delete security group rule. The prediction is represented as a function of conditional probability. For instance, P (D|A) is the conditional probability of the future occurrence of the add security group rule event given that the current event is create VM. A quantitative comparison (in terms of the prediction match and error rates) between our ARMAX-based predictive model (in solid lines) and the dependency model proposed in LeaPS [16] (in dashed lines) is shown in Figure 9. More specifically, Figure 9(a) shows the percentages of prediction match for three distant pairs of events in Figure 4, respectively. The obtained results show that our predictive model achieves the following match rates: 97.2% at the best case, 85.8% at the average case, and 77.2% at the worst case. On the other hand, the LeaPS model demonstrates the following match rates: 81.2% at the best case, 55.1% at the average case, and 30% at the worst case. Figure 9(b) shows the percentages of prediction error for three distant pairs of events in Figure 4, respectively. The obtained results show that our predictive model achieves the following error rates: 1.2% at the best case, 15.4% at the average case, and 27.2% at the worst case. On the other hand, the LeaPS model demonstrates the following error rates: 21.2% at the best case, 50.4% at the

S. Majumdar

254

ARMAX:P(D|A)

0.2

0.3

0.4

0.5

Probability Threshold

(a)

0.6

0.7

LeaPS:P(D|A) 100 90 80 70 60 50 40 30 20 10 0 0.1

ARMAX:P(D|B)

LeaPS:P(D|B)

ARMAX:M

Prediction Match/Error (%)

LeaPS:P(E|A)

Prediction Errors (%)

Prediction Matches (%)

ARMAX:P(E|A) 100 90 80 70 60 50 40 30 20 10 0 0.1

0.2

0.3

0.4

0.5

Probability Threshold

(b)

0.6

0.7

100 90 80 70 60 50 40 30 20 10 0 0.1

0.2

LeaPS:M

0.3

ARMAX:Er

0.4

0.5

LeaPS:Er

0.6

0.7

Probability Threshold

(c)

Fig. 9. Comparison between our predictive model using ARMAX and dependency model proposed in LeaPS [16] in terms of (a) percentage of prediction match for each pair of event, (b) percentage of prediction error for each pair of event, and (c) overall percentage of prediction match/error.

average case, and 78.1% at the worst case. Figure 9(c) depicts the overall percentages of prediction match and prediction error. In all cases, our predictive model shows significant improvements. Specifically, the prediction match rate is increased up to two times and the prediction error rate is decreased up to 12 times. Figure 9(c) reports that our prediction results on average 10.1% false pre-computation. However, our system ensures the best response time (when results are in the cache) on average 85.5% of the time. A selective choice of the threshold may provide up to 93% of prediction match. The price of a false prediction is measured in our previous work, Proactivizer [20]. In addition, we conduct similar experiments on real data. Despite the fact that the number of observations is relatively small (only around 400 records), the ARMAX model still depicts its superiority over the LeaPS model (e.g., up to 65% improvement in prediction match rate as shown in Table 3). 7.

Discussion

In this section, we discuss different aspects of our solution. Effects of a wrong prediction: Like other dependency-modelbased proactive techniques (e.g., [16, 20]), this work also relies on the accuracy of the model. In case of a wrong prediction, all those solutions (including ours) work as an intercept-and-check approach,

Multi-level Security Investigation for Clouds

255

Table 3. Effectiveness of ARMAX versus LeaPS [16] for real data with the size of 400 records. Probability threshold ARMAX prediction match (%) LeaPS prediction match (%) Improvement ratio (%) ARMAX prediction error (%) LeaPS prediction error (%) Improvement ratio (%)

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

97.36

96.5

96.5

88.6

82.45 82.4

73.7

68.4

66.6

65

65.8

65.8

65.8

65.78 65.8

65.8

65.8

65.8

48

46.6

46.6

65.8

34.6

25.3

12

4

1.3

80.6

77.41

64.5

54.8

42

42

32.2

25.8

25.8

74.2

74.2

74.2

74.2

74.2

74.2

74.2

74.2

74.2

−8.7

−4.3

13

26

43.5

56.5

65.2

65.2

78.2

where the verification is performed at runtime and the response time will be in seconds instead of milliseconds (similarly as shown in [14]). However, in this work, we achieve a better accuracy in the prediction than that of other existing approaches (as shown in Figures 9(a) and 9(b) in Section 6). We also demonstrate the outperformance of our model in the ratio of prediction match and prediction error rates in Figure 9(c). Correctness of our approach: As the main objective of this work is to improve the efficiency of the existing security verification solutions for different levels of a cloud; our experiments in Section 6 have focused on the performance. As to accuracy, as the verification component of our solution depends on the underlying formal verification methods (e.g., constraint satisfaction problems (CSP) solver [40] for the user level, and Datalog [12] and graph theory [13] for virtual infrastructure and network levels), the theoretical accuracy of our approach would be the same as that of those formal solvers (which are well-established formal verification techniques whose correctness has been extensively discussed in the literature [13, 40]).

S. Majumdar

256

Other applications to our solution: Our solution is designed to support security solutions that cover different levels of the cloud. More specifically, we intend to support more solutions which offer security enforcement in other cloud levels (e.g., SDN). To this end, there exist several works (e.g., TopoGuard [41] and TopoGuard+ [42]) for the SDN security verification. Choosing the value of the threshold: As described in Section 3.2 and evaluated in Section 6, our solution schedules the computation based on a threshold probability. As shown in Figure 9, lower values of threshold result in better prediction match. However, the prediction error also increases in such cases. Therefore, an optimal threshold has to be chosen based on the tenant’s need. Supporting security policies: Our solution currently supports all the security policies that are covered by the candidate security solutions (i.e., [12, 13, 16, 17]). To further extend the coverage, we may require to add new solutions or/and define a more expressive policy language.

8.

Related Work

In this section, we first present the outcome of our comparative study and then discuss different categories of related works. 8.1.

Comparative study

The comparison between existing works and this work is summarized in Table 4. The first and second columns list existing works and their verification methods. The next three columns indicate different cloud levels, such as user-level, virtual infrastructure at the tenant level (T), virtual network at the cloud service provider (CSP) level, and physical level. The next three columns compare these works based on the adopted approaches. The next columns

Multi-level Security Investigation for Clouds Comparing existing works with this work. Intercept-and-Check

Proactive

Run. enforcement

Multi-level

Platform-Agnostic

Expressive

Auto Dep. model

Features

Retroactive

Access control SAT solver SAT solver SAT solver SAT solver Graph-theoretic Graph-theoretic Datalog Datalog Datalog Custom + Bayesian -

Approaches

Physical

Patron [17] Majumdar et al. [14] ISOTOP [18] Madi et al. [10] Majumdar et al. [11] Weatherman (V1) [13] Weatherman (V2) [13] Congress (V1) [12] Congress (V2) [12] NoD [43] LeaPS [16] This Work

Cloud level Virtual Net. (CSP)

Methods

Virtual Infr.(T)

Proposals

User-level

Table 4.

257

• • • • • • •

• • • • • • • •

• • • • • • •

• • • •

• -• • -

• • • • • -

• • • •

◦ ◦ • ◦ • • •



• •

• • • • • • • • • •

N/A N/A N/A N/A N/A N/A N/A N/A •

Note that, for both Weatherman [13] and Congress [12], V1 and V2 refer to their proactive and intercept-and-check variants, respectively. The symbols (•), (-), and N/A mean supported, not supported, and not applicable, respectively. The symbol (◦) is used for the solutions which support runtime enforcement with significant delay.

compare them according to different features, i.e., runtime enforcement capability, offering multi-level security framework, adaptable to major cloud platforms, supporting expressive policy languages, and offering automated dependency model building. Note that the (◦) symbol is for the Run. Enforcement column indicates that the corresponding work offers runtime enforcement with significant delay, and an (N/A) in the Auto dependency model column means that the corresponding solution is not utilizing any so-called dependency model. In summary, this work differs from the existing works as follows. First, it is the first framework to facilitate security solutions for multiple levels of a cloud. Second, our work currently covers four different cloud levels: user, virtual infrastructure, virtual network, and physical. Third, we provide a platform-agnostic design of the framework to support major cloud platforms. Fourth, this work can potentially support a wide range of security policies due to its inherited expressiveness from the integrated tools. Finally, it automatically builds

258

S. Majumdar

the structure of the dependency model which greatly improves the practicality of the proactive approach. 8.2.

Existing investigation approaches

The existing investigation approaches can mainly be divided into three kinds: retroactive, intercept-and-check, and proactive. The retroactive approach is conducted after the fact by verifying the logs and configurations of the system. The intercept-and-check approach is conducted at runtime by verifying the impacts of the current event requests. The proactive approach is conducted in advance by verifying the impacts of future events. In the following, we elaborate on each of these approaches. Retroactive approach (e.g., [10, 11, 44, 45]) in the cloud is a traditional way of conducting investigation. Unlike our proposal, those approaches can detect violations only after they occur, which may expose the system to high risks. Existing intercept-and-check approaches (e.g., [12,13]) perform major verification tasks while holding the event instances blocked. This approach usually causes significant delay (e.g., four minutes to verify mid-sized cloud [13]) to a user request. In contrast, our solution applies a proactive approach to overcome this limitation. Proactive approaches perform a part of the verification in advance. To this end, there exist several works (e.g., [12,13,15,16,46]). Weatherman [13] and Congress [12] verify security policies on a future change plan in a virtualized infrastructure. PVSC [15] proactively verifies the security compliance by utilizing the static patterns in dependency models. Both in Weatherman and PVSC, models are captured manually by expert knowledge. LeaPS [16] partially addresses this limitation by automating the parameter learning process of the model. However, LeaPS still relies on manual identification of the structure of the model and does not include the temporal dependencies in the model. More importantly, unlike our work, none of those works provides a comprehensive solution for multiple levels of clouds.

Multi-level Security Investigation for Clouds

259

Additionally, many studies have been focusing on mining dependency relations from the ordering of log messages. Various definitions of temporal dependency have been proposed, such as forwarding conditional probabilities [23], transition invariants [47], and transition significance [48]. These studies mainly focus on mining reliable pattern relations from the data and do not consider the overall quality of the structural events. Unlike ours, none of these works considers the dependencies among cloud events and time intervals between event transitions. 9.

Conclusions

In this chapter, we proposed a multi-level proactive security solution for clouds. More specifically, first, we proposed an automated approach to learn the structure of the dependencies in the cloud. Second, we derived a predictive model, which utilizes structural, probabilistic, and temporal dependencies to predict the future events. Third, we re-designed and integrated four security solutions (e.g., Congress [12], Weatherman [13], Patron [17], and LeaPS [16]) to our system to offer multi-level proactive security. Finally, using both synthetic and real data, we conducted experiments to show the efficiency (e.g., responding in a few milliseconds) of our proposed solution. However, our solution comprises the following limitations, which we identify as potential future works. First, our system currently does not integrate any solution for virtual network layer 2 or SDN. Second, we currently rely on a specific time series model for prediction. References 1. T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds. In CCS (2009). 2. Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. Cross-VM side channels and their use to extract private keys. In CCS (2012). 3. Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. Cross-tenant sidechannel attacks in PaaS clouds. In CCS (2014).

260

S. Majumdar

4. Z. Xu, H. Wang, and Z. Wu. A measurement study on co-residence threat inside the cloud. In USENIX Security (2015). 5. OpenStack. Nova network security group changes are not applied to running instances (2015). https://security.openstack.org/ossa/OSSA-2015-021.html. Last accessed on February 14, 2018. 6. OpenStack. Neutron security groups bypass through invalid cidr (2015). https://security.openstack.org/ossa/OSSA-2014-014.html. Last accessed on February 14, 2018. 7. V. Varadarajan, T. Kooburat, B. Farley, T. Ristenpart, and M. M. Swift. Resource-freeing attacks: improve your cloud performance (at your neighbor’s expense). In CCS (2012). 8. F. Doelitzscher, C. Fischer, D. Moskal, C. Reich, M. Knahl, and N. Clarke. Validating cloud infrastructure changes by cloud audits. In SERVICES (2012). 9. K. W. Ullah, A. S. Ahmed, and J. Ylitalo. Towards building an automated security compliance tool for the cloud. In TrustCom, pp. 1587–1593 (2013). 10. T. Madi, S. Majumdar, Y. Wang, Y. Jarraya, M. Pourzandi, and L. Wang. Auditing security compliance of the virtualized infrastructure in the cloud: Application to openstack. In CODASPY, pp. 195–206 (2016). 11. S. Majumdar, T. Madi, Y. Wang, Y. Jarraya, M. Pourzandi, L. Wang, and M. Debbabi. Security compliance auditing of identity and access management in the cloud: application to OpenStack. In CloudCom, pp. 58–65 (2015). 12. OpenStack. OpenStack Congress (2015). https://wiki.openstack.org/wiki/ Congress, last accessed on February 14, 2018. 13. S. Bleikertz, C. Vogel, T. Groß, and S. M¨ odersheim. Proactive security analysis of changes in virtualized infrastructures. In ACSAC (2015). 14. S. Majumdar, T. Madi, Y. Wang, Y. Jarraya, M. Pourzandi, L. Wang, and M. Debbabi. User-level runtime security auditing for the cloud. IEEE Transactions on Information Forensics and Security, 13(5), 1185–1199 (2018). 15. S. Majumdar, Y. Jarraya, T. Madi, A. Alimohammadifar, M. Pourzandi, L. Wang, and M. Debbabi. Proactive verification of security compliance for clouds through pre-computation: Application to OpenStack. In ESORICS (2016). 16. S. Majumdar, Y. Jarraya, M. Oqaily, A. Alimohammadifar, M. Pourzandi, L. Wang, and M. Debbabi. LeaPS: Learning-based proactive security auditing for clouds. In ESORICS (2017). 17. Y. Luo, W. Luo, T. Puyang, Q. Shen, A. Ruan, and Z. Wu. OpenStack security modules: A least-invasive access control framework for the cloud. In CLOUD (2016). 18. T. Madi, Y. Jarraya, A. Alimohammadifar, S. Majumdar, Y. Wang, M. Pourzandi, L. Wang, and M. Debbabi. ISOTOP: Auditing virtual networks isolation across cloud layers in OpenStack. ACM Transactions on Privacy and Security (TOPS), 22(1), 1 (2018). 19. S. Majumdar. A multi-level proactive security auditing framework for clouds through automated dependency building. CCF Transactions on Networking, 3(2), 112–127 (2020).

Multi-level Security Investigation for Clouds

261

20. S. Majumdar, A. Tabiban, M. Mohammady, A. Oqaily, Y. Jarraya, M. Pourzandi, L. Wang, and M. Debbabi. Proactivizer: Transforming existing verification tools into efficient solutions for runtime security enforcement. In ESORICS (2019). 21. OpenStack. OpenStack open source cloud computing software (2015). http:// www.openstack.org, Last accessed on: February 14, 2018. 22. VMware. VMware vCloud Director. https://www.vmware.com, Last accessed on February 14, 2018. 23. W. Peng, C. Perng, T. Li, and H. Wang. Event summarization for system management. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007). 24. W. H¨ am¨ al¨ ainen and M. Nyk¨ anen. Efficient discovery of statistically significant association rules. In ICDM, pp. 203–212 (2008). 25. S. Majumdar, A. Tabiban, Y. Jarraya, M. Oqaily, A. Alimohammadifar, M. Pourzandi, L. Wang, and M. Debbabi. Learning probabilistic dependencies among events for proactive security auditing in clouds. Journal of Computer Security, 27(2), 165–202 (2019). 26. K. Murphy. A brief introduction to graphical models and Bayesian networks, http://www.cs.ubc.ca/∼ murphyk/Bayes/bnintro.html, 1998. 27. M. Li, W. Zang, K. Bai, M. Yu, and P. Liu. Mycloud: supporting userconfigured privacy protection in cloud computing. In ACSAC, pp. 59–68 (2013). 28. M. Bellare and B. Yee. Forward integrity for secure audit logs. Technical report, Citeseer (1997). 29. N. Schear, P. T. Cable II, T. M. Moyer, B. Richard, and R. Rudd. Bootstrapping and maintaining trust in the cloud. In ACSAC (2016). 30. J. Aikat, A. Akella, J. S. Chase, A. Juels, M. Reiter, T. Ristenpart, V. Sekar, and M. Swift. Rethinking security in the era of cloud computing. IEEE Security & Privacy, 15(3) (2017). 31. Elasticsearch. Logstash https://www.elastic.co/products/logstash. Last accessed on February 14, 2018. 32. M. B. Priestley. Spectral analysis and time series (1981). 33. A. Tabiban, S. Majumdar, L. Wang, and M. Debbabi. Permon: An openstack middleware for runtime security policy enforcement in clouds. In SPC (June 2018). 34. Amazon. Amazon virtual private cloud https://aws.amazon.com/vpc. Last accessed on February 14, 2018. 35. Google. Google cloud platform https://cloud.google.com. Last accessed on February 14, 2018. 36. Microsoft. Microsoft Azure virtual network https://azure.microsoft.com. Last accessed on February 14, 2018. 37. WSGI. Middleware and libraries for WSGI (2016). http://wsgi.readthedocs. io/en/latest/libraries.html. Last accessed on February 15, 2018. 38. OpenStack. OpenStack user survey (2016). https://www.openstack.org/ assets/survey/October2016SurveyReport.pdf. Last accessed on February 14, 2018.

262

S. Majumdar

39. OpenStack. OpenStack logging (2018). https://docs.openstack.org/opera tions-guide/ops-logging.html. Last accessed on April 07, 2020. 40. N. Tamura and M. Banbara. Sugar: A CSP to SAT translator based on order encoding. In Proceedings of the Second International CSP Solver Competition, pp. 65–69 (2008). 41. S. Hong, L. Xu, H. Wang, and G. Gu. Poisoning network visibility in softwaredefined networks: New attacks and countermeasures. In Proceedings of 2015 Annual Network and Distributed System Security Symposium (NDSS’15) (February 2015). 42. R. Skowyra, L. Xu, G. Gu, T. Hobson, V. Dedhia, J. Landry, and H. Okhravi. Effective topology tampering attacks and defenses in software-defined networks. In Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’15) (June 2018). 43. N. P. Lopes, N. Bjørner, P. Godefroid, K. Jayaraman, and G. Varghese. Checking beliefs in dynamic networks. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI’15), pp. 499–512 (2015). 44. C. Wang, S. S. Chow, Q. Wang, K. Ren, and W. Lou. Privacy-preserving public auditing for secure cloud storage. IEEE transactions on computers, 62(2), 362–375 (2013). 45. Y. Wang, Q. Wu, B. Qin, W. Shi, R. H. Deng, and J. Hu. Identity-based data outsourcing with comprehensive auditing in clouds. IEEE Transactions on Information Forensics and Security, 12(4), 940–952 (2017). 46. S. S. Yau, A. B. Buduru, and V. Nagaraja. Protecting critical cloud infrastructures with predictive capability. In CLOUD, pp. 1119–1124 (2015). 47. I. Beschastnikh, Y. Brun, S. Schneider, M. Sloan, and M. D. Ernst. Leveraging existing instrumentation to automatically infer invariant-constrained models. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (2011). 48. J. Kwon and K. M. Lee. A unified framework for event summarization and rare event detection from multiple views. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1737–1750 (2015).

c 2023 World Scientific Publishing Company  https://doi.org/10.1142/9789811273209 0008

Chapter 8

Digital Evidence Collection in IoT Environment Jane Iveatu Obioha∗,‡ , Amaliya Princy Mohan∗,§ and Habib Louafi∗,†,¶, ∗

Department of Computer Science, University of Regina 3737 Wascana Parkway, Regina, SK, Canada S4S 0A2 † New York Institute of Technology 2955 Virtual Way, Vancouver, BC, Canada V5M 4X3 ‡

[email protected] [email protected] ¶ habib.louafi@uregina || [email protected] §

Digital forensics can be defined broadly as the process of identifying, collecting, examining, and analyzing digital evidences, then reporting the results. The evidences are expected to be presented in the court, and therefore should be kept authentic during the entire investigation. Thus, the evidence collection process is of great importance. In traditional forensics, digital evidences are collected from computers, servers, firewalls, etc. In IoT environments, they should be also collected from sensors, tags, home appliances, wearable smart devices, etc. However, these devices are very diverse, produced by different vendors, using various data formats. Moreover, they are scattered over wide areas, e.g., smart cities, smart farming. They are connected through wired and wireless networks, and thus the network boundaries become unseizable, which makes digital evidence collection very challenging. In this chapter, we focus on the process of digital evidence collection in IoT, which is a key in 263

J. I. Obioha et al.

264

any criminal investigation. We study and categorize the different IoT device types and their features. Then, we present the different data formats integrated into IoT devices and how their data can be collected, in compliance with the forensic investigation protocol. Lastly, we present existing IoT forensic tools and their features and limitations.

1.

Introduction

There are a variety of electronic devices produced by different manufacturers and made available for a variety of purposes. They can be used for personal, home, and commercial purposes. They are mostly connected to the Internet, through wired or wireless means. Collectively, they are known as connected devices, among which many are resource-constrained [1]. The constraints are in the size of the memory, storage capacity, and processing speed as well as that they operate on low-energy power. Though these devices have unique identifiers (UIDs), they have different architectures, use a variety of communication protocols, and can interrelate [1]. They can be standalone, wearable, implanted in humans, attached to plants, animals, or things. Many are programmed to transfer data over the Internet automatically. The systems comprising these interconnected and interrelated computing devices are categorized as the Internet of Things (IoT). In simple terms, “IoT is a system of interrelated computing devices, mechanical and digital machines, objects, animals, or people that are provided with unique identifiers and the ability to transfer data over a network, without requiring human-to-human or humanto-computer interaction” [1]. The term IoT was birthed over two decades ago and there has been a proliferation of devices connecting to the Internet, as shown in Figure 1. It is predicted that 35 billion IoT devices will be connected in the world by 2021, and 125 billion devices by 2030 [2]. IoT devices are used in our daily life activities, hence all these life activities are expected to be all connected over the Internet. It directly means that there exist several tons of digital data generated, stored, and exchanged over the Internet. This makes our lives more and more digitalized. Since almost everything is connected to the

Digital Evidence Collection in IoT Environment

Fig. 1.

265

IoT growth forecast (in billion) [5].

Internet, there are several inherent forms of vulnerabilities, which can be exploited to perpetrate various Internet attacks. This creates obvious opportunities for criminal snooping. Digital evidence needs to be collected from these IoT systems for investigation and resolution of crime activities [3]. This falls under the umbrella of IoT forensics, which is a branch of digital forensics that deals with IoTrelated cybercrimes [4]. However, the collection of this evidence is challenging because of the nature of IoT devices [6]. Indeed, the devices are diversified and lack standards (e.g., different vendors, architectures, operating systems, communication protocols). The challenges include, but are not limited to, IoT device detection, localization, and identification, which lead to inevitable oversight of core data [3, 6]. There must be procedures of identification and classification to enable the criminal investigators to collect digital evidence in a proper way, so they can be admissible in the court of law. In other words, the collected evidence should adhere to the so-called chain of custody. In this chapter, we investigate the different facets, challenges, and existing tools of digital evidence collection in IoT systems. Before diving into the core of the IoT digital evidence collection, let us review some related definitions and keywords.

J. I. Obioha et al.

266

2.

Definitions

Before delving into the details of digital evidence collection, we define some key concepts that are needed to pave the way for the reader to have a big picture of the topic. First, we define the term “evidence” and its variants. Then, we explain what is “evidence collection” in the general context and in the IoT context in particular. 2.1.

Evidence

In the event of any incidence, the only way to ascertain the details of the occurrence is with evidence. This provides a definition of evidence as the availability of facts to ascertain the validity of the claims. It can be of many forms, including people, paper, physical, or recording mediums. Let us define these forms of evidence: • People evidence: It refers to the individual(s) who were present (physically or virtually) at the crime scene [3, 7]. Every investigation begins from the known to the unknown. • Paper evidence: It refers to any written material (handwritten or printed) submitted at the crime scene or obtained during the course of the investigation [3, 7]. This could be in a form of hard printed balance sheets and receipts in case of financial crime investigation. It can also be some written maps and guidelines on a set of activities or a specific course of action that eventually turns criminal. Assessing these materials can be significant to a case under investigation. The importance of this in digital evidence collection may normally be disputable. Paper evidence is not digital evidence in itself but can form part of the investigation if it links up with and leads to the offending instance of the digital crime. • Physical evidence: It refers to any tangible item available at the crime scene or retrieved during the course of the investigation [7]. It includes hardware, clothing, debris, and biological items. Examples include devices, equipment, clothes’ bloodstain, soil, and hair [3,7]. • Recording evidence: It refers to audio, video, and photography captured early enough after or during the incidence. It mostly provides the documented visual view of the crime [3, 7]. Examples

Digital Evidence Collection in IoT Environment

267

are video footage, photos, and computer-related data, such as those captured from swipe cards, wireless communication, etc.

2.2.

Evidence collection

Evidence collection is the process of gathering or collating related and unrelated evidence pointing toward a definite cause of an investigation [3]. It is the fundamental aspect of any investigation, which is used to determine the final resolution of a claim. Evidence collection is referred to as “the process that is used to identify the digital evidence in its most original form” [8]. The failure of collecting the right evidence may lead to unresolved investigations. Most essentially, the evidence must be original (unedited and undamaged), documented, packaged, stored, and maintained, and so it can be admissible to a court of law. Different types of evidence require different methods of collection. For instance, the method of collecting debris is different from the way of collecting video footage [3]. This broadly means that evidence collection can either be manual, computer digitalized (using specific computer tools), or IoT digitalized (using IoT-oriented forensic tools, as IoT data is usually transmitted automatically). 2.3.

IoT digital evidence collection

Having defined the different types of evidence and evidence collection, let’s define the IoT digital evidence collection. This refers to the process of gathering or collating digitalized evidence from the IoT system, which can be found in the IoT device, the underlying wired/wireless network, and storage supports (local or cloud), to support an investigation. In the rest of this chapter, an in-depth analysis of IoT-related evidence collection will be carried out. 3.

Digital Forensics

Digital forensics, as a term used in Information Technology, did not start with the advent of IoT. The computer processor by design

268

J. I. Obioha et al.

always leaves a residue of information about the nature, origin, and pathways of all data it handles. Hence, terms like digital footprints [6] or digital evidence have always been associated with events in digital information analysis and data recycle or retrieval. The common factor however between historical digital footprint tracing models and the current digital forensics in IoT is the presence of unwanted or suspicious third-party entities, whose activities require, in the first place, an investigation. Therefore, while our focus here will be on IoT forensics, which is de-facto evolving and relatively novel to the digital sphere of things, we will review some existing approaches and solutions from traditional digital forensics efforts. Every forensic analysis in any environment, be it the cloud or even inside the most archaic hard disks, assumes the presence of crime, criminal intent, or suspicious activities [9]. Thus, the methodology on our current subject matter will not differ circumstantially, though the nature and scope of threats may in fact be different. However, the challenge lies in the variety of the IoT devices (i.e., different brands, characteristics, and features) and their locations, as they are usually scattered over wide areas and not always visible to the investigator. Besides, it is worthy to note that the sheer size and ephemeral nature of the cloud would present its own challenges. IoT technology has brought a significant shift to the criminal investigation field, especially on how the IoT systems interact with data. In this section, two important and interrelated areas of digital forensics, namely, traditional forensics and IoT forensics, will be exposed. 3.1.

Traditional digital forensics

Talking about traditional digital forensics, we are invariably referring to early digital forensics. Precisely, from late 1960s to early 2000s, when the digital revolution was yet in both its infancy and early development [9]. This is the period when evidence collection in criminal investigations began to include computers. This process evolved following the usual methodology of traditional

Digital Evidence Collection in IoT Environment

269

criminal investigations, while considering computers and related digital devices, as evidence. In traditional digital forensics, from the past and through history, we trace the earliest possible attempts at making the computer system a viable source of evidence [9]. Traditional information gathering mostly focused on the central processing unit (CPU) and the hard disk [6, 9]. The cloud was yet non-existent in the way we know it today, although the Internet was already connecting several computer systems to one another. Gathering digital evidence still focused on systems having storage capacities. Later, when the term digital footprint was coined, the digital forensics scope expanded to include actual digital pathways and networks. In general, traditional forensics always followed a tripartite process involving: (i) evidence seizure, (ii) evidence deconstruction and analysis, and (iii) forensics judgment and reporting. 3.1.1.

Evidence seizure

This is the first step in the investigative process when investigators begin to isolate pieces of information and gather materials related to the crime scene. In the early periods of traditional digital forensics, hard disks, floppy disks, memory, and CPUs were often seized and gleaned for information. Later, the scope was expanded to include other digital devices, which characteristically store information [9]. This step focuses on pieces of material evidence that could lead to information related to the criminal activities under investigation. Note that the activity under investigation just needs to have utilized digital materials in order to qualify for digital forensics. Pieces of evidence unrelated to computers and other digital devices are not admitted to the sphere of digital forensics, even if they form an isolated part of the crime itself. For instance, the collection of digital evidence in an office burglary case would only focus on the computer systems and not on the door or the gate of the building. The non-digital evidence can only be admitted to the investigation if the digital materials under scrutiny capture them, as part of the crime

270

J. I. Obioha et al.

scene. For example, if a digital camera has footage of the door or the gate. The latter can then he used in the investigation, but as digital materials captured by a digital device and not in their physical independent states of being. This part of forensics evidence could be the most important evidence because it sets the tone for what happens during the rest of the investigation. 3.1.2.

Evidence deconstruction and analysis

This is the process of scrutinizing the evidence gathered. Here, the entire forensic investigation evaluates the relationship between the materials gathered and the crime committed [9]. It is difficult to talk about digital forensics without somehow getting dragged into jurisprudence. After all, the entire system of forensic investigation is built on the premise that some laws have been violated. Therefore, evidence analysis in traditional forensics seeks to prove the following properties [9]: (1) The evidence collected is related to the crime under investigation. (2) The crime under investigation can be solved by looking at the digital footprints left by the criminal. (3) The specific activity of the criminal contravenes specific sets of laws. These are the considerations that guide the digital forensic analyses and support the legal relevance of the evidence gathered, with regards to the activity being investigated [9]. In a typical analysis, evidence gathered is somewhat reconstructed into their primordial digital forms, so that conclusions can be reached from observing the evidence as it would have been pre-tempering. For instance, in a case where an individual is accused of embezzling public funds and destroying records of funds inflow, an investigator would have, as part of the forensic process, recovered files from the trash folder or a dump of deleted emails, or even from an external hard drive. Hence, analyzing these data would involve reconstructing the cash inflow cycle against outflow based on evidence found from items in the trash folder. This process of analysis, which consists of

Digital Evidence Collection in IoT Environment

271

reconstructing the initial scenario, represents one of the recurrent steps in the forensic analysis methodology. 3.1.3.

Forensic judgment and reporting

In this step, the investigator tries either to tie the collected digital forensic evidence to the crime or activity under investigation or disproves any correlation between the initial hypothesis and the evidence. The results of digital forensic judgment in the traditional sense is always subject to rigorous scrutiny. This is due in part to the virtual nature of some parts of the body of evidence. The paradox of digital evidence resides in the fact that the evidence only makes sense in a digital continuum. A lot of the evidence and also the so-called “footprints” in a digital set cannot be assessed outside the bounds of the digital ecosystem. The evidence itself, which conforms to digital tabulations, would probably mean nothing if sampled against nondigital test tools. For instance, a broken computer hard-drive could look like any broken carcass, but in digital evidence collection and forensics, those broken pieces of the hard drive could contain vital tell-tale digital markings. Yet, the admissibility of those digital markings as evidence in the investigated case would prove painstaking. Indeed, the markings are virtual, they are there only as intangible ripples of what has already happened. Digital evidence collection in itself involves a lot of abstractions and pinpoints to virtual realities that willingly summon and demand high-level forensic rigor. 3.2.

IoT digital forensics

In IoT digital forensics, evidence can be gathered from IoT devices, but also from cloud servers and any other devices connected directly or indirectly to those smart devices. Thus, the digital device spectrum from which evidence might be collected is very wide, ranging from smart devices to cloud/traditional servers. However, digital forensics in IoT systems faces a systemic problem, characterized by disorder, that is, hacks and data criss-cross

272

J. I. Obioha et al.

are rampant here. Evidence in the context of IoT is often subject to adulteration from other secondary players. Indeed, IoT systems connect various smart devices together in no specific or uniform order. The connectivity of IoT operative systems is determined purely by need and convenience. Hence, interaction among the connected IoT devices is not assessed at the time of deployment, which makes their forensics a nightmare for investigators. Digital forensics in IoT systems follow the same methodology and steps as in traditional digital forensics, but with significantly more problems related to the identification and localization of the IoT devices, and the relevance of the evidence collected. It also has a higher likelihood of errors because non-relevant evidence collected, and sometimes a digital footprint [6], might lead to no conclusion or to unintended third parties. Therefore, the evidence sources need to be analyzed, at different layers, depending on where they are located [4].

3.2.1.

Source of digital evidences in iot digital forensics

To mitigate the effect of these problems on the forensic investigation, digital forensic experts suggest that the evidence sources in IoT forensics be stratified into different layers [4]. These layers are mainly determined by the location of the evidence and their accessibility. These layers are: • Smart device and sensor layer: All gadgets present at the crime scene are considered pristine and therefore make up the first layer. These include, but are not limited to, smart watches, home automation appliances, weather control devices, etc. [4]. • Hardware and software layer: In this layer, all traceable communication links between smart devices and the external world are considered. These include computers, mobile devices, IPS (intrusion prevention system), and firewalls [4]. • External resource layer: This represents all areas outside the network under investigation. It is considered that the bulk of

Digital Evidence Collection in IoT Environment

273

adulteration of data happens at this layer. These include cloud, social networks, network providers, international gateways, etc. [4]. 3.2.2.

Challenges of IoT forensics

Some of the well-known problems associated with IoT digital forensics are as follows [4]: • Variance of the IoT devices: IoT can be seen as the vast interconnectedness of an indeterminate number of devices, gadgets, computers to each other via the Internet. There is no specific determinant for classifying devices under the IoT umbrella beyond their inherent smartness, ability to connect to the Internet, and see other devices and process information on the go. It does not matter if the device is a pen, a book, a phone, a kitchen utensil, or even a bottle of wine. This poses a unique challenge to the forensic analyst who must sift through a maze of smart debris to determine specific digital patterns in each device and relate them to the line of evidence being explored. If the digital patterning of smart devices follows a uniform outline, this challenge would not exist. But the likelihood of such uniformity in digital patterning is almost utopian. It is less likely that all smart devices introduced to the market would suddenly converge in the near future or a uniform framework would be developed by computer engineers in such a way that all devices have same motherboard perhaps or “speak” the same language and leave the same digital footprint. There are thousands of devices in the IoT space and this multiplicity and their variable digital patterning is always challenging to the forensic investigator. • Criss-cross and layered data: As previously mentioned, IoT has no strict demarcations as does the traditional digital space. For instance, a personal computer that is not connected to the Internet or a LAN is de-facto on its own. Its digital space is basically punctuated by its hard-disk and internal storage systems. In this case, data cannot be lost nor be digitally pilfered because the system has no interface to any digital milieu. In this setup, the

274

J. I. Obioha et al.

digital space itself is localized and contained within a specific and determined space. However, IoT is directly opposite this arrangement. Data movement in IoT is basically random. The storage systems are smart devices, and they are indeterminate. Data can be accessed from similar smart devices, captured as they are processed and moved around the digital pipelines, or extracted from the cloud. This is a problem for the forensic investigator, that is, there is a lack of judgment ability beyond any reasonable doubt about a set of information related to a specific event or crime. This is because the dataset could have been adulterated, disconnected from the crime, and in some cases cloned (some viruses are known to create copies of host data or files in the system [10]). • Data modification, loss, or even pilfering by third parties: The porosity of the IoT environment is a challenge. Data can easily be stolen or modified. Questionable actors can also adulterate data to confuse or derail the forensic investigation. Such adulteration could create a parallel narrative around the forensic content as to undermine the credibility of the investigation. • Proprietary jurisdictions: Data generated by the IoT system can be stored on cloud servers and hosted outside the crime scene jurisdiction. This raises the issue of international relations and policy, which obviously has an impact on the forensic evidence collection. Some cloud servers are located in different geographical areas, which might be subject to specific laws. It is clear that IoT systems have no boundaries. Moreover, there are some information laws that could affect how even smart devices access data. For the forensic investigator, seeking information to use in a litigation raises legal issues. • Constrained resources of IoT devices: IoT devices have limited computing and storage capabilities. Collecting relevant information after the crime happened is only as sure as the ability of that device to perform that action and retain relevant information. Relying on devices that have no verified and proven competencies can be problematic in all phases of an investigation.

Digital Evidence Collection in IoT Environment

275

These represent the main challenges, though many other circumstantial limitations and challenges are also cited by researches, such as cloud service requirements and data format [8]. 4.

Digital Evidence Collection in IoT Systems

Digital evidence can be referred to as the data or footprint within the system or cloud that points to suspicious activity. In other words, it refers to evidence that an unauthorized actor has gained access to a collection of data. Precisely, digital evidence is all about those telltale signs and/or material pointers about specific digital activities in a medium. These activities do not occur in the air, they happen in specific media, such as devices and gadgets used to access the Internet, process information over local area network (LAN), or retrieve information from the cloud. Surfing a digital platform, device, or any medium in search of evidence is referred to as forensic search or simply digital forensics. Basically, there are three routes for digital evidence collection, namely, computer digital evidence collection, IoT digital evidence collection, and cloud digital evidence collection. 4.1.

Computer evidence collection

This refers to the traditional sources of digital evidence, such as computers, mobile devices, servers, and gateways. These are localized digital media that form the traditional group of devices, which have been part of the early digital revolution. The computer processor is designed to preserve on the hard drive some digital footprints or traces of all activities it ever processed. Because of this, the computer provides a very fertile ground for evidence collection. 4.2.

IoT digital evidence collection

The scope of this route is wide. Indeed, IoT has significantly expanded to almost every kind of device or appliance. In IoT evidence

276

J. I. Obioha et al.

collection, cars, cookers, medical implants, washing machines, etc. have become part of a large assemblage of non-traditional players in the digital evidence collection. Evidence gathering at this level is much more difficult and diverse. There is a greater likelihood of evidence tampering and adulteration because this field of evidence is too exposed. Here, the targeted objects are the actual devices that are linked by IoT systems to the specific data trail under scrutiny. The evidence is to be deduced by examining those devices. Imagine a situation where there is a medical implant on a particular patient. If certain unusual activities are noticed and deserve investigation, the implant on the patient would be one of the core sources of evidence. Other sources of evidence also can be the servers of the healthcare facility, to which the implant is connected through the Internet. In this case, there would probably be other subconnections of other patients to the same server. This server can also be connected to other wearable alarm devices with the doctors and nurses. In the IoT context, the field of evidence is too wide and diverse and typically exposed to tampering. Determining the authenticity of evidence collected and creating the forensic link to the initial suspicion of foul play is always challenging at this level.

4.3.

Cloud digital evidence collection

Some authors consider the cloud as part of the IoT data collection set. However, the cloud is a distinct environment and separate from the IoT devices that could utilize it as a data source or as a digital funnel for interaction with other devices. Thus, in evidence collection, the cloud should be treated as a separate source of evidence. The cloud as a virtual environment has typically porous and insecure APIs (Application Programming Interfaces). Besides, the cloud systems’ boundaries are blurred, which makes data vulnerable to stealing or tampering. Therefore, the cloud is considered a viable source of data for evidence collection.

Digital Evidence Collection in IoT Environment

5.

277

IoT Forensic Tools and Frameworks

In an IoT environment, a large number of distributed and resourceconstrained devices are involved and produce an enormous amount of data called Big IoT Data. This large volume of information prevents the forensics investigator from smoothly gathering and extracting the digital evidence. The diversity of data formats and the lack of real-time log analysis solutions are the key problems raised by big IoT data for forensics investigators. As explained earlier, IoT digital evidence could be collected from a combination of several technology zones: IoT zone, network zone, and cloud zone. In terms of digital evidence, the current challenges posed by IoT devices that are impacting traditional computer forensics solutions are limited visibility of IoT devices and short survival time of the digital evidence [11]. In IoT systems, data are mainly stored and processed in the cloud. In most cases, because of service level agreement (SLA) restrictions, it becomes impossible for investigators to gain access to data on the cloud for investigative purposes. Furthermore, in IoT environments, data are distributed through various networks, such as edge devices and data centers. The computation takes place predominantly at the edge of users’ networks, and metadata are transmitted to the cloud. In such a situation, data is stored in two hierarchical locations (user’s network and cloud), which creates problems for forensics researchers, with regards to data collection and log data analysis. The two other factors affecting IoT forensics are complex computing architectures (different hardware architectures and heterogeneous operating systems) and proprietary hardware and software (individual vendors and multiple standards). Figure 2 illustrates the various new factors of IoT systems affecting traditional computer forensics. 5.1.

Attributes of the IoT forensics tools

IoT forensic solutions, like their traditional counterparts, involve various elements that need to be considered during the investigation

J. I. Obioha et al.

278

Fig. 2.

IoT factors affecting traditional forensics [12].

process. These elements, known as forensics attributes, describe the different phases of the investigation, sources of the evidence, investigation modes, etc [13]. In the following sections, these attributes will be detailed. 5.1.1.

Forensics phases

A basic investigation in IoT forensics begins with background establishment. Many security metrics, such as software and security tools, are applied by the investigation team to the large data to be obtained from various locations. Before the actual investigation, the law enforcement related to the investigation, such as privacy, copyright, and information technology law, among others, are fully reaffirmed and agreed upon by the investigator [12]. Evidence is then gathered from various sources, and in the next step is further researched and analyzed. Then, the final conclusion is stated in the document based on the facts and provided to the appropriate parties. At the final stage, the collected data and the final reports are archived for future use in a digital format.

Digital Evidence Collection in IoT Environment

5.1.2.

279

Enablers

Various technologies, such as sensor nodes, mobile devices, virtualization, cloud, radio frequency identification (RFID), network equipment, and artificial intelligence (AI), are involved in IoT systems. During the forensics analysis process, these technologies perform individual roles. Sensor nodes and mobile devices, which are core IoT devices, are used to gather evidence from the crime scene after the attack. In the entire forensic process, cloud and virtualization technologies provide on-demand, flexible, elastic, compute-as-aservice support [12]. For object recognition, RFID is used extensively in sensor devices. Network devices, such as routers, switches, and Software-Defined-Networking (SDN) switches, allow packet tracing to be monitored. In analyzing the collected data, Artificial Intelligence (AI) techniques are used extensively. 5.1.3.

Networks

Network attributes apply to the type of network in the crime scene linked to IoT devices. During the investigative process, the topology of the network plays a significant role and ensures that the region’s area is protected and law enforcement is followed. For interconnecting IoT devices within a restricted range, the local area network (LAN), metropolitan area network (MAN), and personal area network (PAN) are commonly used. Home appliances, such as wash machines and refrigerators, are linked to the home area network (HAN). In terms of data storage and processing, cloud computing represents excellent resources for IoT applications. To integrate the cloud applications through APIs, IoT appliances are linked to the wide area network (WAN). 5.1.4.

Sources of evidence

Crime-related IoT data may be gathered from numerous crime scenes that related to main source of evidence. The data will mainly reside in IoT devices, such as home appliances, sensor nodes, medical implants,

J. I. Obioha et al.

280

embedded systems, and automobiles. Although IoT applications’ memory spaces are limited, useful information is sent, over the network, to the central-processing computer for processing and storage. Data, such as the system log and temporary cache memory, are usually used as sources of digital evidence. Such data can be obtained by monitoring several network devices, including routers, switches, but also from their virtualized counterparts, e.g., vSwitch, vRouter. 5.1.5.

Investigation modes

Based on the investigation timeline, the investigation can be performed in two modes, static and dynamic. The static mode is the standard mode, in which the investigation is conducted in the IoT system after the attack is identified. That is, the IoT data have already been compromised or removed as a result of the attack. Using universal serial bus and scanning cache memory, the static investigative mode recovers data, among others. To retrieve valuable evidence sources, IoT forensics analysis often allows the system to remain alive during the investigative process in order to discover new data, such as open network links, memory dumps, and running processes [14]. This type of investigation is known as dynamic mode. 5.1.6.

Digital forensics models

In standard models, forensics investigations for IoT applications are performed, such that the relevant evidences can be obtained and more importantly admitted to the court of law. The basic phases of forensics investigation are accompanied by all existing traditional models, which are background establishment, data collection, investigation, analysis, and reporting, among others. In general, there are several forensics models based on these phases [11], as shown in Table 1. 5.1.7.

IoT forensics data processing

In general, there are two modes of forensic data processing, centralized and distributed. In the former mode, forensics data processing

Digital Evidence Collection in IoT Environment Table 1. Forensic model DFIM: Digital forensics investigation model [15]. DFRW: Digital forensic analysis workshop [16]. ADFM: Abstract digital forensic model [17]. IDIP: Integrated digital investigation model [18]. EDIPM: Enhanced digital investigation process model [19]. EMCI: Extended cybercrime investigation model [20].

DFMDFI: Digital forensic model for digital forensic investigation [21].

SDFIM: Systematic digital forensic investigation model [22].

ESDFIM: Enhanced systematic digital forensic investigation model [23].

281

Digital forensics models. Investigative phases

Acquisition of evidence, authentication of the evidence and analysis of the evidence. Identification, preservation, collection, examination, analysis, presentation, decision. Identification, preservation, collection, examination, analysis, presentation, decision, planning, approach, strategy, and return of evidence. Readiness, deployment, investigation of the physical crime scene, investigation of the digital crime scene, and review. Readiness, deployment, investigation of the physical crime scene, investigation of the digital crime scene, review, trace back, and dynamite. Understanding, authorization, preparation, notification, evidence detection, evidence collection, evidence transmission, evidence storage, evidence analysis, hypothesis, hypothesis presentation, hypothesis proof, and archive storage. It comprises three tiers: Tier 1: planning, identification, authorization, and communication. Tier 2: compilation, preservation, and documentation related laws. Tier 3: inspection, exploratory research, and study. Tier 4: results, feedback, and reports. Preparation, securing the scene, Survey and recognition, documenting the scene, communication shielding, evidence collection, preservation, examination, analysis, presentation, result, and review. Planning stage, acquisition and preservation phase, evaluation and analysis phase, knowledge exchange phase, presentation phase, and review.

is performed at the device site in the forensics data processing stage. In this case, forensic data are stored in highly secured central servers for centralized data forensics, which can be accessed by designated investigators at various locations. This kind of centralized data processing is less costly and highly reliable and provides administrators with great power [14].

J. I. Obioha et al.

282

In the distributed processing mode, the forensics data are collected and sent to different servers located in different locations for processing. This processing mode has low latency and low delay, but also low requirements in terms of data protection and communication bandwidth. 5.1.8.

Forensics layers

Cloud Forensics

At a high level of abstraction, the IoT forensics comprises three layers: device, network, and cloud [24], as shown in Figure 3. The investigator collects evidence predominantly from IoT devices in the device-level forensics layer, knowing that they have poor processing

Network Forensics

IoT Applications and services

Device Forensics

Cloud

Fig. 3.

IoT forensics layers [24].

Digital Evidence Collection in IoT Environment

283

and storage capabilities. To judge or convict a suspect, network-level forensics gathers data from network devices, such as switches and routers. Usually, IoT devices communicate with each other through some network topologies (e.g., LAN, WAN, HAN, MAN, etc.). The networks contain useful data, such as network data logs and cache memory information, which can act as trustworthy evidence. The cloud layer is an expansive repository of multiple data. IoT forensics at the cloud level is confronted by an ambivalence. That is, the cloud hosts almost inexhaustible data, which represents an inestimable source of data for the investigation, but yet leaves to the investigator the task of finding the evidence relevant to the crime investigated. The question of relevance and adulteration of evidence is most proximate at this level because data are not stored in bounded assortments. They represent a forest of information, and thus choosing which is pristine and relevant to the case under study is always difficult.

5.2.

IoT forensics tools

After an IoT attack, forensics investigations are usually conducted by well-trained professionals who have good IT and law enforcement experience. While IoT forensics research entails multiple challenges, e.g., large amount of data collected and real-time data analysis, the various forensics tools should compensate for these challenges. Most of the known IoT forensics tools, along with their strengths and weaknesses, are listed and summarized in Table 2. CAINE (Computer Aided Investigative Environment) is an opensource forensics tool that supports multiple forensics phases in an interactive way [25]. EnCase is used to perform analysis of images, documents, and files for forensics [26]. For network forensics research, WireShark is mostly used [27]. The primary drawback of the Wireshark is that it does not operate large network data efficiently. The Bulk Extractor tool helps to search and extract information from disc images and directory files, such as card numbers, email addresses, web addresses, and telephone numbers [28]. NUIX is used to search

284

Table 2. Forensic tool

IoT forensic tools.

Features

Strengths

Weaknesses

CAINE [25]

• Open source tool. • Supports multiple forensics phases. • Works in an interactive way.

• Supports investigator through all the investigation phases. • Features can be customized. • Easy to use.

• Newer version may render additional difficulty in navigation.

EnCase [26]

• Perform analysis of images,

• Very adaptable and easy to use. • Some versions support evidence

• Certain versions may generate

documents, and files for forensics.

reports that are too abstract. • Compatibility issues with other forensic tools.

sources. • Provides good search capabilities. • Can unlock encrypted evidence. • Provides effective reporting features. WireShark [27]

• Supports network forensics research.

• Free and easy to use. • Allows user to see timestamps in a

• Does not handle large network data efficiently.

dataset. Bulk Extractor [28]

• Helps to search and extract

• Able to quickly isolate relevant

information from disc images and

data from a data pool. filename does not suggest relevance • Provides concise and snapshot views to the investigation. on relevant information.

directory files, e.g., card/phone numbers, email addresses, and web addresses.

• Relevant information may be lost if

J. I. Obioha et al.

acquisition. • Acquires data from multiple

NUIX [29]

• Performs search in large volumes of data to help the retrieval of valuable information that can be

RegRipper [30]

used for evidence search purposes. • Performs scanning of windows

better and easy referencing. • Provides a simple and very useful file • Very limited capabilities. • Very poor aggregation of data localization feature. collected from Internet or • Offers an easy-to-use interface. distributed systems.

NetAnalysis [31]

Pajek64 [32]

IEF [33]

• Aims at searching Internet

• Offers an easy-to-use interface. • Provides concise reports.

• Not suitable for handling complex

history-based images and data. • Helps to analyze a vast volume of

• Provides strong analytics

• Weak in visualization.

data related to the network.

capabilities.

and variable datasets.

• Performs forensic analysis of images • Can recover deleted files.

• May not be adequate for local

and a broad variety of data

system files.

collected from Internet history, chat history, and operating systems.

Digital Evidence Collection in IoT Environment

registry files.

• Does not have an easy-to• Provides good search features. • Offers a helpful highlighting feature. understand report formats. • Organizes searches into folders for

285

J. I. Obioha et al.

286

a large volume of data and procedures, contributing to the retrieval of valuable information later used for research [29]. RegRipper is used mostly for the scanning of windows registry files [30]. NetAnalysis aims at searching Internet history-based images and data [31]. Pajek64 helps to analyze a vast volume of data related to the network [32]. Forensic images and a broad variety of data collected from Internet history, chat history, and operating systems are searched, using the IEF tool [33]. 5.3.

IoT forensic frameworks

Although various mechanisms have been proposed to deal with the specific characteristics of IoT forensics, there is a multitude of problems that still have to be addressed. Existing tools are unable to cope with the heterogeneous infrastructure of the IoT systems [12]. For example, even though collecting data from these heterogeneous IoT infrastructures is challenging, producing evidence that are solid and admissible to the court of law is even more challenging. Numerous problems arise from the above-mentioned difficulties, e.g., confusion about the origin of the data, its location, and format. Hence, the procedure used for the traditional digital forensics process is not always applicable in the IoT world, ensuring that the custody chain is preserved. That is why IoT forensics is still maturing, specifically on account of the many existing problems and the lack of research activities conducted in this particular field. Most of the researchers have concentrated on the challenging task of carrying out forensics in the IoT infrastructure. Table 3 shows various frameworks proposed to address the existing problems in IoT forensics. This table lists IoT forensic frameworks proposed by the research community, along with their features and characteristics. 5.4.

Discussion

IoT is gradually integrated into all aspects of our lives, taking care of homes, or handling smart cities. Although this growth indeed makes people’s lives simpler, this growth also raises various security issues

Digital Evidence Collection in IoT Environment Table 3. Forensic framework

287

IoT forensic frameworks. Features

DFIF-IoT: Digital Forensic Investigation Framework for IoT [34].

• Adheres to the ISO/IEC 27043:2015; an internationally recognized standard on process, information technology, techniques used for security, and the principles of investigation. • Strengthens the capabilities of the investigation and has a high level of certainty.

CFIBD-IoT: Cloud-based Framework [35].

• Comprises three layers: digital forensic investigation layer, cloud/IoT infrastructure layer, and forensic evidence isolation layer. • Uses standardized mechanisms for the extraction and isolation of evidence (e.g., ISO/IEC 27043).

FSAIoT: Forensic State Acquisition from the Internet of Things [36].

• Allows the possibility to collect and log IoT state data in real-time, using a Forensic State Acquisition Controller (FSAC). • Provides a practical approach and a general framework via IoT device state acquisition. • Shows practical results of pulling state data.

FIF-IoT: Forensic Investigation Framework for Internet of Things [37].

• Employs a public digital ledger to pinpoint facts in crime attacks. • Stores evidence in the form of interactions, such as device-to-cloud, device-to-device, and device-to-user in a public digital ledger which resembles that used for Bitcoin. • Guarantees the anonymity, confidentiality, and non-repudiation of the publicly available evidence. • Offers a tamper-proof storage against potential collision scenarios. • Can identify and mitigate cyberattacks that target IoT at their initial stage. • Does not provide an experimental evaluation. • Extends the initially proposed generic DFIF-IoT. • Proposes an integrated framework with acceptable digital forensic techniques that are capable of analyzing potential digital evidence generated by IoT devices.

FoBI: Fog-Based IoT Forensic Framework [38]. IDFIF-IoT: Integrated Digital Forensic Investigation Framework [39].

Probe-IoT Framework [40].

• Helps to find criminal facts in IoT-based systems using a digital ledger. • Maintains, via a digital ledger, a track record of all transactions taking place between IoT devices, users, and cloud services [12]. • Ensures theoretical results of integrity, confidentiality, and non-repudiation of the evidence. • Does not provide experimental performance evaluations and analysis.

PRoFIT Framework [41].

• Ensures privacy (ISO/IEC 29100:2011) standard [42]. • Provides an experimental evaluation with actual malware propagation in an IoT-enabled coffee shop.

288

J. I. Obioha et al.

and consequently a variety of digital forensics challenges. It is not possible to use a single kind of digital forensic procedure for all cyberattacks happening over the IoT infrastructure [43]. The primary forensics problems faced in IoT digital forensics in general, and evidence collection in particular, are briefly reviewed in this section. • Data location: As most of the IoT data are spread across different locations, identifying the location of evidence is considered one of the biggest challenges for investigators. We need to follow various countries’ regulations as the IoT data might be available in different countries and mixed with other users’ data. • Lifespan limitation of digital media: IoT devices have a shorter lifespan and data can be easily overwritten. To avoid data overwriting, we can transfer the data to local hubs or to the cloud server to solve this challenge. However, this requires proof that the evidence has not been changed or modified during data transmission. • Cloud service requirement: Most of the accounts are anonymous users because the cloud provider does not ask users to sign up with correct credentials. This makes it difficult for a suspect to be identified. • Security lack: Due to lack of protection, data in IoT devices may be altered or removed, which may make these data not admissible in a law court. Some businesses and end-users do not regularly or at all update their devices, allowing hackers to discover existing vulnerabilities and exploit them to perpetrate attacks on the entire IoT systems, as it has been demonstrated in the MIRAI attack [44]. • Device type: The digital investigator has to locate and collect evidence from a digital crime scene during the identification process of the forensics protocol. In IoT, the source of evidence could be any smart object or sensors, and the investigators face real challenges in identifying and finding these IoT devices at the crime scene. This is because these devices are small, resource-constrained, using different operating systems, hardware, software, etc.

Digital Evidence Collection in IoT Environment

289

• Data format: The formats of the data that IoT devices produce does not conform to the ones used to store data in the cloud. Besides, the user does not have direct access to his/her data and the data are viewed in a format that is different from the one in which they are stored. Moreover, before being stored in the cloud, data could be processed using analytic functions at various locations. Therefore, before conducting a forensic investigation on the data, the data type should be restored to its original format in order to be admitted to a law court.

6.

Conclusion

In digital forensics investigation, a protocol comprising a set of phases is applied carefully. These phases can be defined broadly as identifying, collecting, examining, and analyzing digital evidences, then reporting the results. Since the evidence is ultimately presented in the court of law, it must be authentic during the entire investigation protocol. The evidence collection phase is particularly critical and important, because if the evidence collected does not comply with the chain of custody, it is simply rejected by the law court. In traditional forensics, digital evidences are collected from computers, servers, firewall, etc. In IoT systems, in addition to the traditional devices, they can also be collected from small devices, which can be found in smart homes, smart factories, smart farms, and also in wearable clothes, etc. However, these devices are very diverse and not standardized yet. Indeed, they are produced by different vendors, using various data formats, and use different operating systems. Moreover, they are scattered over wide areas and can be connected through wired and wireless networks, which makes the digital evidence collection scope very large and challenging. In this chapter, we analyzed the digital evidence collection protocol, with regards to IoT systems. We presented the IoT device categories, their particularities, and their impact on the digital evidence collection process. Lastly, we presented existing IoT forensic tools and frameworks, their features, and their limitations.

290

J. I. Obioha et al.

References 1. C. McClelland. What is IoT? — A simple explanation of the internet of things. https://www.iotforall.com/what-is-iot-simple-explanation (2020). Accessed on November 15, 2020. 2. N. Galov. How many IoT devices are there in 2020? [All you need to know]. https://techjury.net/blog/how-many-iot-devices-are-there/ (2020). Accessed on November 15, 2020. 3. D. A. Kleypas and A. Badiye. Evidence Collection. StatPearls Publishing, Treasure Island (FL) (2020). https://www.ncbi.nlm.nih.gov/books/ NBK441852/. 4. V. Boricha. IoT Forensics: Security in an always connected world where things talk. https://hub.packtpub.com/iot-forensics-security-connectedworld/ (2018). Accessed on November 15, 2020. 5. Ericsson Corp. IoT connections outlook. https://www.ericsson.com/en/mob ility-report/reports/june-2020/iot-connections-outlook. Accessed on November 15, 2020. 6. F. Bouchaud, G. Grimaud, T. Vantroys, and P. Buret. Digital Investigation of IoT Devices in the Criminal Scene. Journal of Universal Computer Science, 25(9), 1199–1218 (2019). https://hal.archives-ouvertes.fr/hal-02432740. 7. Enablon. 4 types of evidence during a root cause analysis investigation. https://enablon.com/blog/4-types-of-evidence-during-a-root-cause-analysisinvestigation/ (2018). Accessed on November 15, 2020. 8. S. Alabdulsalam, K. Schaefer, T. Kechadi, and N.-A. Le-Khac. Internet of things forensics-challenges and a case study. In eds. G. Peterson and S. Shenoi, Advances in Digital Forensics XIV, pp. 35–48, Springer International Publishing, Cham (2018). 9. InfoSec, Inc. Computer Forensics: Forensic Techniques, Part 1. https:// resources.infosecinstitute.com/topic/computer-forensics-forensic-techniquespart-1/ (2019). Accessed on November 15, 2020. 10. Cisco Security. What is the difference: Viruses, worms, trojans, and bots? https://tools.cisco.com/security/center/resources/virus differences# 3. Accessed on January 1, 2022. 11. C. Meffert, D. Clark, I. Baggili, and F. Breitinger. Forensic state acquisition from internet of things (FSAIoT): A general framework and practical approach for IoT forensics through IoT device state acquisition. In Proceedings of the 12th International Conference on Availability, Reliability and Security, ARES ’17, Association for Computing Machinery, New York, NY, USA (2017). doi: 10.1145/3098954.3104053. 12. I. Yaqoob, I. A. T. Hashem, A. Ahmed, S. A. Kazmi, and C. S. Hong. Internet of things forensics: Recent advances, taxonomy, requirements, and open challenges. Future Generation Computer Systems, 92, 265 – 275 (2019). 13. S. Sathwara, N. Dutta, and E. Pricop. IoT forensic: A digital investigation framework for IoT systems. In 2018 10th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pp. 1–4 (2018).

Digital Evidence Collection in IoT Environment

291

14. C. Agostino Ardagna, R. Asal, E. Damiani, N. El Ioini, and C. Pahl. Trustworthy IoT: An evidence collection approach based on smart contracts. In 2019 IEEE International Conference on Services Computing (SCC), pp. 46–50. 15. M. Qatawneh, W. Almobaideen, M. Khanafseh, and I. Al Qatawneh. DFIM: A new digital forensics investigation model for internet of things. Journal of Theoretical and Applied Information Technology, 97(24), 3850–3867 (2019). 16. G. Palmer. A road map for digital forensic research. Technical Report DTRT001-01, Utica, New York (2001). 17. M. Reith, C. Carr, and G. Gunsch. An examination of digital forensic models. International Journal of Digital Evidence, 1(3), 1–12 (2002). 18. B. Carrier, E. H. Spafford, et al. Getting physical with the digital investigation process. International Journal of Digital Evidence, 2(2), 1–20 (2003). shttps://www.scirp.org/(S(351jmbntvnsjt1aadkozje))/reference/references papers.aspx?referenceid=706477. 19. V. Baryamureeba and F. Tushabe. The enhanced digital investigation process model. Journal of Digital Investigation (2004). 20. S. O. Ciardhu´ ain. An Extended Model of Cybercrime Investigations. International Journal of Digital Evidence, 3(1), 1–22 (2004). 21. I. O. Ademu, C. O. Imafidon, and D. S. Preston. A new approach of digital forensic model for digital forensic investigation. International Journal of Advanced Computer Science and Applications (IJACSA), 2(12), 175–178 (2011). 22. A. Agarwal, M. Gupta, S. Gupta, and S. C. Gupta. Systematic digital forensic investigation model. International Journal of Computer Science and Security (IJCSS), 5(1), 118–131 (2011). 23. K. Kyei, P. Zavarsky, D. Lindskog, and R. Ruhl. A review and comparative study of digital forensic investigation models. In International Conference on Digital Forensics and Cyber Crime, pp. 314–327 (2012). 24. S. Zawoad and R. Hasan. FAIoT: Towards building a forensics aware eco system for the internet of things. In 2015 IEEE International Conference on Services Computing, pp. 279–284 (2015). doi: 10.1109/SCC.2015.46. 25. Nanni Bassetti. CAINE: Computer Aided Investigative Environment. https://www.caine-live.net/. Accessed on January 15, 2021. 26. OpenText Corp. EnCase: A unified approach to cyber resiliency. https:// security.opentext.com/. Accessed on January 15, 2021. 27. G. Combs. Wireshark. https://www.wireshark.org/. Accessed on January 15, 2021. 28. S. L. Garfinkel. Bulk extractor. https://github.com/simsong/bulk extractor/ wiki. Accessed on January 15, 2021. 29. NUIX Corp. NUIX. https://www.nuix.com/solutions/fraud-investigations. Accessed on January 15, 2021. 30. Harlan Carvey. RegRipper. https://github.com/keydet89/RegRipper3.0. Accessed on January 15, 2021. 31. Digital Detective Group. Net Analysis. https://www.digital-detective.net/ digital-forensic-software/netanalysis/. Accessed on January 15, 2021.

292

J. I. Obioha et al.

32. A. Mrvar. Pajek: Analysis and visualization of very large networks. http:// mrvar.fdv.uni-lj.si/pajek/. Accessed on January 15, 2021. 33. Magnet Forensics. Internet Evidence Finder (IEF). https://www.magnet forensics.com/products/magnet-ief/. Accessed on January 15, 2021. 34. V. Kebande and I. Ray. A generic digital forensic investigation framework for internet of things (IoT). In 2016 IEEE 4th International Conference on Future Internet of Things and Cloud (FiCloud), pp. 356–362 (2016). 35. V. Kebande, N. Karie, and H. Venter. Cloud-centric framework for isolating big data as forensic evidence from IoT infrastructures. In 2017 1st International Conference on Next Generation Computing Applications (NextComp), pp. 54–60 (2017). 36. C. Meffert, D. Clark, I. Baggili, and F. Breitinger. Forensic state acquisition from internet of things (FSAIoT): A general framework and practical approach for IoT forensics through IoT device state acquisition. In Proceedings of the 12th International Conference on Availability, Reliability and Security, ARES ’17, Association for Computing Machinery, New York (2017). 37. M. Hossain, Y. Karim, and R. Hasan. FIF-IoT: A Forensic Investigation Framework for IoT Using a Public Digital Ledger. In 2018 IEEE International Congress on Internet of Things (ICIOT), pp. 33–40 (2018). 38. E. Al-Masri, Y. Bai, and J. Li. A Fog-Based Digital Forensics Investigation Framework for IoT Systems. In 2018 IEEE International Conference on Smart Cloud (SmartCloud), pp. 196–201 (2018). 39. V. Kebande, N. Karie, A. Michael, S. Malapane, I. Kigwana, H. Venter, and R. Wario. Towards an integrated digital forensic investigation framework for an IoT-based ecosystem. In 2018 IEEE International Conference on Smart Internet of Things (SmartIoT), pp. 93–98 (2018). 40. M. Hossain, R. Hasan, and S. Zawoad. Probe-IoT: A public digital ledger based forensic investigation framework for IoT. In IEEE INFOCOM 2018 — IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 1–2 (2018). 41. A. Nieto, R. Rios, and J. Lopez. A methodology for privacy-aware IoTforensics. In 2017 IEEE Trustcom/BigDataSE/ICESS, pp. 626–633 (2017). 42. TCS Forensics. Internet of things (IoT) forensic services. https://www. tcsforensics.com/iot-forensics/ (2020). Accessed on November 15, 2020. 43. A. Nieto, R. Rios, and J. Lopez. IoT-Forensics Meets Privacy: Towards Cooperative Digital Investigations. Sensors, 18(2) (2018). 44. The Verge. How an army of vulnerable gadgets took down the web today. https://www.theverge.com/2016/10/21/13362354/dyn-dns-ddos-attackcause-outage-status-explained. Accessed on November 15, 2020.

c 2023 World Scientific Publishing Company  https://doi.org/10.1142/9789811273209 0009

Chapter 9

Optimizing IoT Device Fingerprinting Using Machine Learning Oluwatosin Falola∗,‡ , Habib Louafi∗,†,§ , and Malek Mouhoub∗,¶ ∗

Department of Computer Science, University of Regina, 3737 Wascana Parkway, Regina, SK S4S 0A2, Canada † New York Institute of Technology (Vancouver Campus) 2955 Virtual Way, Vancouver, BC V5M 4X3, Canada ‡

§

[email protected] habib.louafi@uregina.ca, hlouafi@nyit.edu ¶ [email protected]

Diverse IoT devices are continuously introduced in a variety of environments, e.g., home, industry, military, etc. Unfortunately, these devices can be exploited to perpetrate severe attacks. Mobile network operators need to protect their assets and customers, but they do not have full control on these devices, as they are deployed on the customer premises. Data generated by IoT devices and collected at the operator end can be analyzed to learn their behaviors and infer their identities (fingerprints). The latter helps detect malicious and faulty devices. Such a process is very important in digital forensic investigation, as it helps the investigator identify compromised IoT devices and collect digital evidence that ties the crime to the attacker. In this chapter, we propose an efficient solution to fingerprint IoT devices using machine learning algorithms. Precisely, we identify the minimal subset of IoT device features that captures the essence of the device fingerprints. That is, without using the entire data collected from IoT devices, a small subset can be used to predict IoT device fingerprints. We validate our solution with a known dataset of IoT network 293

O. Falola et al.

294

traffic, from which different partitions are created based on different feature subsets. On each partition, selected machine learning algorithms are trained, and the prediction quality of the obtained models are calculated. Obtained results show that using only four features, instead of 17, IoT device fingerprints can be predicted. The prediction quality is measured using precision (99.10%), recall (99.00%), and harmonic mean (99.00%).

1.

Introduction

New and diverse Internet of Things (IoT) devices are increasingly introduced into the marketplace [1]. Unfortunately, these devices are disrupting traditional security measures, as has been shown with the MIRAI attack [2]. Mobile Network Operators (MNOs) have limited control over customers’ IoT devices, as they are deployed on the customer premises. On the other side, customers are not necessarily aware of IoT-related security issues. Therefore, MNOs need to deploy effective security controls at their end to protect their assets. Huge amounts of data are generated by IoT devices, which can be exploited to understand device behaviors and ultimately detect compromised and faulty ones. When an attack is detected, the collected data and the behavior of these devices represent an important asset for the forensic investigation to detect the attacker identity and collect incriminating evidence. When abnormal network traffic is detected, two solutions can be adopted: blocking and dropping the traffic, or sending it for deeper analysis. The first solution may disconnect legitimate IoT devices, as certain behavior deviations are quite normal, e.g., bandwidth fluctuation. The second solution attempts to learn more about IoT devices and refine the learned IoT device behavior. This should be done in real time to continuously adapt the learned IoT device behavior to a changing environment, e.g., new device types. More importantly, the data generated by IoT devices and collected my MNOs can be exploited to detect the identities of the connected IoT devices. This process is called IoT device fingerprinting. It helps drawing the big picture of the connected devices, without the need to install plugins on them to probe their identities. Indeed, installing plugins is

Optimizing IoT Device Fingerprinting Using Machine Learning

295

invasive and is not always feasible, as IoT devices are normally scattered on wide areas on the customers’ sides. But, it is clear that deducing the fingerprint of IoT devices is very helpful for detecting malicious and faulty devices. For instance, the traffic collected from a smart thermostat is different from that collected from a smart TV or webcam. If a new malicious IoT device is connected, MNOs should be able to detect it on their ends, by simply analyzing the network traffic (data collected from IoT devices). More interestingly, they should be able to identify its fingerprint (e.g., webcam, smart TV, etc.). The detected fingerprints of the IoT devices combined with their global behavior helps decide about the nature of the device (e.g., normal, malicious, or faulty). In the digital forensic investigation, fingerprinting the IoT devices can be used to identify the malicious device and ultimately track its origin, i.e., the server or the attacker that compromised it. In this chapter, we propose an optimized solution to the problem of IoT device fingerprinting, which we are planing to integrate, in future work, in a larger framework of IoT device anomaly detection. We exploit machine learning (ML) techniques [3–6] to identify the minimal set of features that captures IoT devices’ behaviors and consequently their identities (fingerprints). Since the traffic generated by IoT devices is usually very huge, we are interested here in finding the minimal subset of data that can be used to fingerprint the IoT devices. Such a solution will definitely speed up the forensic investigation, as only a subset of the traffic will be used. We validate our solution with a known dataset of IoT network traffic [7]. From this dataset, different feature subsets are created and used to create several partitions, which are then trained with selected ML algorithms. The obtained models are then tested and different prediction quality metrics are calculated to assess the prediction quality and the performance of our proposed solution. The experiments show that a limited number of features (i.e., 4 instead of 17) is able to predict the IoT device fingerprints, with higher prediction quality. The prediction quality results using this feature subset

O. Falola et al.

296

are 99.10%, 99.00%, and 99.00%, as measured using the precision, recall, and harmonic mean, respectively. These results are better than those obtained with all the features. Indeed, adding more features is not always good or impactless for prediction. In fact, it could have a negative impact on the prediction [8]. This is very appealing, as it reduces drastically the amount of data and time needed to fingerprint IoT devices. Our contributions can be summarized as follows: • Proposing and validating an ML-based solution to predict the fingerprints of IoT devices, based solely on their network traffic. • Processing of existing dataset of collected traffic of IoT devices, and training different ML models on different feature subsets. • Identifying the optimal (minimal) subset of features and the ML algorithm that yield the optimal prediction quality, as measured with known prediction quality metrics, such as precision, recall, and harmonic mean. The rest of the chapter is organized as follows: Section 2 reviews some research work related to IoT device fingerprinting and anomaly detection. In Section 3, the problem at hand is modeled mathematically. In Section 4, our proposed solution is presented. Section 5 presents the experimental setup and results as well as the temporal complexity of the proposed solution. Last, Section 6 concludes the chapter. 2.

Related Work

Many wireless devices have similar characteristics, such as capacity, scalability, access control, coverage options (indoor or outdoor), shape, and size, etc. Sometimes, it is quite impossible to differentiate them. The same thing is also true when it comes to the IoT device configuration, which makes it challenging to identify IoT devices by their physical properties, i.e., MAC and IP addresses [7]. Therefore, researchers resort to the data generated by IoT devices, which are not necessarily physical, such as the interval time between IP packets. Some fingerprinting solutions are hardware- or driver-centric, as they

Optimizing IoT Device Fingerprinting Using Machine Learning

297

focus on identifying the device hardware type or the driver installed on the device. These approaches do not provide accurate fingerprints, as the same hardware or driver may be integrated into different IoT device types. Solutions that fingerprint the device type are designed for devices that generate large amounts of traffic or discontinuous traffic, however, IoT device traffic may be small and continuous, too. Technically, the problem of IoT device fingerprinting is addressed at three layers: signaling, hardware, and network. At the signaling layer, some solutions have been proposed to fingerprint IoT devices by identifying their hardware types and the characteristics of installed drivers [9, 10]. They exploit the IoT device radio-frequency signature. In [11], the authors propose to fingerprint the WiFi drivers of IoT devices, by probing packet interarrival time of the IoT devices. Robyns et al. [12] propose to analyze the IoT device signal at the physical layer and use supervised and zero-shot learning [13, 14] to detect the identity of IoT devices. For devices from the same vendor, the fingerprinting shows lower accuracy, while for devices from different vendors, it shows higher accuracy. Indeed, devices from different vendors differ much in terms of features and their values, and therefore they can be differentiated easily, meaning that they can be classified with higher accuracy. Conversely, devices that are form the same vendor share more features/values, and so their differentiation is less easier. That is, when different devices share more features/values, the ML algorithms tend to put them in the class, and thus the prediction accuracy is low. At the hardware layer, some solutions are proposed to fingerprint the network interface card (NIC), exploiting the IoT device clockskew [7, 15, 16] or radio-frequency signature [9, 10]. At the network layer, Johnny Cache [11] proposes a solution to fingerprint IoT devices based on network features, such as packet size and MAC address. Shahid et al. [17] apply wavelet analysis on incoming and outgoing traffic packets to classify IoT devices into different types. Gisdakis et al. [18] exploit packet arrival time for specific types of applications and apply ANN algorithms to classify IoT devices based on the device type. Franklin et al. [19] use Bayesian classification to fingerprint IoT devices’ WiFi drivers.

O. Falola et al.

298

Though several solutions are proposed to fingerprint IoT devices using the network traffic, to the best of our knowledge, this is the first research that investigates the optimization of the minimum number of features needed to fingerprint IoT devices, while keeping high prediction quality. 3.

Problem Statement

Given an IoT system that comprises a set of IoT devices, which generate a network traffic, we aim at predicting the identity (fingerprints) of the connected IoT devices by performing network traffic analysis. Since the IoT traffic is generally huge, our objectives is to identify more relevant features, as not all the features are necessarily needed to fingerprint the IoT devices. Our goal here is to identify the optimal (i.e., minimal) subset of features that can be used to fingerprint the IoT devices with high prediction quality. To do so, we use various ML algorithms and different feature subsets. Formally, we model the problem at hand as follows: Let M be a set of known ML algorithms, and F a set of features extracted from a given dataset D, from which different subsets are created with different cardinalities. Let us denote a given subset of features by fi , where i is the ith subset of features created form F. Our objective is to train different known ML algorithms, denoted as mi ∈ M, on different subsets of features, fi ⊆ P(F), to predict the fingerprints of IoT devices. P(F) stands for the power of the set F. Then, we select the minimal subset of features, denoted as f ∗ , that yields the optimal prediction quality. The latter can be measured using one of the known prediction metrics, such as accuracy, precision, recall, etc. Formally, the problem at hand can be formulated as follows: (m∗ , f ∗ ) =

arg max mi ∈M,fj ⊆P(F )

Qp (mi , fj ),

(1)

where (m∗ , f ∗ ) is the optimal couple, ML algorithm, and set of features, which provides the optimal prediction quality. Qp (mi , fj ) is

Optimizing IoT Device Fingerprinting Using Machine Learning

299

the prediction quality, obtained by training the ML algorithm mi on the dataset D using only the subset of features fj . 4.

Proposed Methodology

In this section, we present our methodology, which comprises mainly three steps: data pre-processing, data training, and data analysis and prediction. 4.1.

Overview

To solve (1), we train different ML algorithms on different partitions of a dataset, each of which contains a subset of features. In the sense of ML, the problem at hand is a multi-class classification problem, the number of classes being the number of distinct IoT devices. That is, each class is identified by the IoT device name. A variety of ML algorithms are proposed in the literature, and we have selected those we believe are appropriate to tackle our problem. Therefore, we propose to train a limited number of selected ML algorithms. Similarly, we select a limited number of feature subsets, with which several partitions are created from the entire dataset. These partitions are then used for training and testing different models (i.e., for each partition, 80% is used for training and 20% for testing). Our proposed system (shown in Figure 1) comprises three modules: data pre-processing, data training, and data analysis and prediction. 4.1.1.

Data Pre-processing

In this step, our objective is to extract the network features from the dataset of network traces. To do so, we use an IoT network trace dataset that was used in [7]. The dataset is provided as a set of folders, each of which comprises a set of pcap files of a given IoT device. We merge all these pcap files in WireShark [20] for better analysis and readability. In fact, any tool that can read .pcap traces can be used, such as tcpdump, cloudshark, or sysdig [21].

O. Falola et al.

300

Fig. 1.

Proposed system flow for IoT device fingerprinting.

Table 1.

List of IoT devices of the dataset used [7].

No.

Device

No.

Device

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

Aria D-LinkCam D-LinkDayCam D-LinkDoorSensor D-LinkHomeHub D-LinkSensor D-LinkSiren D-LinkSwitch D-LinkWaterSensor EdimaxCam1 EdimaxCam2 EdimaxPlug1101W EdimaxPlug2101W EdnetCam1 EdnetCam2 EdnetGateway

17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.

HomeMaticPlug HueBridge HueSwitch iKettle2 Lightify MAXGateway SmarterCoffee TP-LinkPlugHS100 TP-LinkPlugHS110 WeMoInsightSwitch WeMoInsightSwitch2 WeMoLink WeMoSwitch WeMoSwitch2 Withings

The dataset we use, which has a datasize of 64.10 MB, comprises 16,366 traces (rows) and 31 different IoT devices. It is a labeled dataset, the labels being the IoT device types, such as D-LinkCam, HueSwitch, etc. Tables 1 and 2 show the list of IoT devices and the list of features that comprise the dataset, respectively. Note that not all the features can be found in every IoT device. This depends on the nature of the device and the communication protocol it uses.

Optimizing IoT Device Fingerprinting Using Machine Learning Table 2.

301

List of features of the dataset used [7] and their possible values.

Feature Time Source Destination Protocol Length Encapsulation type HTTP EAPOL

UDP TCP DHCP Type Expert Info-Group

Expert Info-Message

Description and possible values Timestamp, represented by sequential numbers IP source or MAC address IP destination, MAC address or Broadcast Communication protocol used, such as TCP, TLS, ARP, DHCP, DNS, etc. Packet size Physical layer encapsulation type. One value exists in the dataset: “Ethernet” Time since request in the HTTP protocol (measured in seconds). Key descriptor type in the EAPOL protocol. One key type: “EAPOL RSN Key” Port number (e.g., 42, 50, 119, 308) Port number (e.g., 8688, 9055, 29200) DHCP parameter request list, known also as“option code 55” [22] IPv4, IPv6, ARP, and 802.1X Authentication Groups, into which the expert information items are categorized. These groups are: Assumptions, Checksum, Comment, Debug, Decryption, Deprecated, Malformed, Protocol, Reassemble, Request code, Response code, Security, Sequence, Uncoded. Short description of the expert information item. The following are some examples: — HTTP/1.1 200 OK — DNS query retransmission. Original request in frame 3035 — Connection Termination (FIN) (Continued)

O. Falola et al.

302

Table 2.

(Continued)

Feature

Description and possible values

Expert Info-Severity

Severity level of the expert information item: — Chat: Information about usual workflow (e.g., TCP packet with the SYN flag set) — Note: Returned common error code, such as HTTP 404 — Warn: Warnings (e.g., returned unusual error code like a connection problem) — Error : Serious problems, such as malformed packets Information about the connection encoded as raw text. The following is an example: “Standard query response 0x0000 A, cache flush 10.10.10.37 PTR, cache flush DCS-935L-B0C554255B0E.local SRV, cache flush 0 0 80 DCS-935L-B0C554255B0E.local TXT, cache flush PTR dhnap. tcp.local PTR D-Link HNAP Service. dhnap. tcp.local” The name of the IoT device (i.e., DLinkSensor, DlinkCam, SmarterCoffee)

Info

Device

All the features gathered from all the devices in the dataset, along with their different values, are listed in Table 2. Next, we clean the data, by removing empty columns that are usually added by Wireshark, from which the remaining traces are exported into a csv file that is used later on in the training phase. 4.1.2.

Data training

We use Weka [23], which is a platform to conduct experiments with various ML algorithms. This platform was designed and developed at the University of Waikato [24]. First, the csv file obtained from WireShark is converted into the arff format [25], which is a more accurate file format. Then,

Optimizing IoT Device Fingerprinting Using Machine Learning Table 3. Subsets

303

Different subsets of features used in the experiments. Features

S1 Protocol, Length, Encapsulation Type, HTTP, EAPOL, UDP, TCP, DHCP, Type Encapsulation Type, HTTP, EAPOL, UDP, S2 TCP, DHCP, Type HTTP, EAPOL, UDP, TCP, DHCP, TYPE S3 EAPOL, UDP, TCP, DHCP, TYPE S4 UDP, TCP, DHCP, TYPE S5 TCP, DHCP, Type S6 DHCP, Type, Info S7 Type S8 Expert Info-Group, Expert Info-Message, S9 Expert-Info-Severity, Info, Device Expert Info-Group, Expert Info-Message, S10 Expert-Info-Severity Info S11 Source, Destination S12 Time S13 All Network Feature S14 Time, Source, Destination, Info S15

the dataset, stored in arff format, is split into different partitions, each of which contains only a subset of features. The selected subsets of features created are shown in Table 3. These feature subsets were created after ranking all the features and identifying the highly ranked ones, which obviously provide together better prediction quality. The best-ranked feature subset was S15 = {T ime, Source, Destination, Info}. Though the other features are low-ranked and so they cannot provide better quality separately, they cannot be ignored or discarded. Indeed, they can be combined with low- or high-ranked features and so improve on the prediction quality. To that goal, the other features subsets were created. Though it is not possible to create and test all the 217 feature subsets, we created those we believe that can perform competitively with S15 , including the best-ranked features. Then, each partition is split into two subpartitions, as follows: 80% is used for training the model and 20% for testing the obtained model. Note that the traces are selected randomly by Weka for creating the two subpartitions.

304

O. Falola et al.

As explained earlier, in this chapter we select four machine algorithms, namely, J48, Random Forest, BayesNet, and KStar. The descriptions of these algorithms are as follows: • J48 Algorithm: It is an open-source Java version of the C4.5 algorithm that was developed by the Weka team. It is a decision treebased algorithm that implements an iterative dichotomiser [26]. It is an extension of ID3 algorithm [3]. This algorithm is used to create univariate decision trees using the divide-and-conquer approach. • KStart Algorithm: It is an instance-based classifier, which classifies data based on pre-classified examples. It assumes that similar instances provide similar classification results. The instance similarity is calculated using an entropy-based distance function [4]. It is based on the probability of transforming one instance into another, by randomly choosing between all possible transformations [27]. • Random Forest Algorithm: In this algorithm [5], many decision trees are created, based on a seed (random number), while training the data. The class prediction is a result of combining all prediction models of all the trees. The number of trees produced affects the accuracy of the result. That is, the higher the number of trees, the better the training results and eventually the prediction results. • BayesNet Algorithm: It is a probabilistic classifier, which is based on Bayesian Decision Theory [6]. In this algorithm, data are represented with a directed acyclic graph (DAG), in which each node represents a random variable that can take either a discrete or continuous value. The edges represents conditional dependencies between the nodes. For instance, when two nodes a and b are directly linked (a → b), that means the node a has direct influence on node b. It consists in using the probabilistic dependency between data of the different features to train a model. 4.1.3.

Data analysis and prediction

To assess the quality of the models obtained in the training phase, each model is tested against the 20% subpartition, and different

Optimizing IoT Device Fingerprinting Using Machine Learning

305

metrics are calculated, including precision and recall. This process is repeated for each partition, which is defined by a different subset of features. Recall that our objective is to identify the minimal subset of features that provides the optimal prediction quality. The obtained experimental results are presented in Section 5.

5.

Experimentation

To run our system and validate the obtained models, we use a PC equipped with a dual core Intel i7-3540M processor clocked at 3.00 GHz and a memory of 8GB. The schema of the dataset used, which can be found in [7], is presented in Table 4. We used the Weka platform, which can be seen as an ML box, with different ML algorithms and different setting parameters to tune each algorithm. Basically, in Weka, the following four options are available for training models: • Training dataset: when this option is selected, the entire dataset is used for training and testing. That is, the entire dataset is used to train the model, which is in turn tested against the entire dataset. • Supplied test set: with this option, one can select a dataset for training the model (selected in the previous method) and another dataset for testing the model. That is, two datasets are used, one to train the model and the other one to test the obtained model.

Table 4. Schema of the dataset used in the experiments [7]. Attribute Format Total number of devices Total number of features Total data instance Data size Start date End date

Value pcap 31 17 54,593 13.9 MB April 15, 2016 September 13, 2016

306

O. Falola et al.

• Percentage split: in the case of one dataset, this latter can be split into two partitions. One partition is used for training the model, while the other one is used to test the obtained model. • Cross validation: when selecting this option, the entire dataset is split into n folds. To train the model, n − 1 folds are used for training the model, while the remaining fold is used to test the obtained model. This process is repeated n times. Each time, one fold is selected for testing, and the other n − 1 folds are used for training a new model. Furthermore, Weka adds a last model, trained and tested using the entire dataset. Finally, all these obtained models and results are averaged. In our experiments, we split manually the dataset into various partitions, using the subset of features already setup (see Table 3). These partitions are then stored into different arff files. Later, these files are used as different datasets and submitted separately for training using the Percentage split training option, the percentage being 80% for training and 20% for testing. As mentioned earlier, four ML algorithms are trained on different partitions extracted from a dataset that can be found in [7]. These partitions are created based on different subsets of features. These algorithms are J48, KStar, BayesNet, and Random Forest, and the subset of features are shown in Table 3. Then, different prediction quality metrics are calculated to identify the optimal couple, ML algorithm, and feature subset. The obtained results are presented in next sections. 5.1.

Precision and recall

First, we calculate the precision and recall obtained with the four algorithms and different subsets of features. The precision is based on true positives (TP) and false positives (FP) and the recall is based on true positives and false negatives (TN). The first one gives an idea about the number of devices that are predicted correctly out of the number of devices that are correctly and wrongly predicted. The second measure indicates the number of devices that are predicted

Optimizing IoT Device Fingerprinting Using Machine Learning

307

correctly out of the total number of devices that need to be predicted (i.e., correctly predicted and incorrectly predicted). Both measures are presented in (3) and (3), respectively. TP , TP + FP TP . Recall = TP + FN

Precision =

(2) (3)

The obtained results of these two measures are presented in Figures 2–5. Besides, a summary of the obtained results are summarized in Table 5, in which the optimal results are shaded. In terms of precision, from these figures, one can see that the first best subsets are S14 , S15 , S14 , and S11 , as obtained with J48, BayesNet, KStar, and Random Forest, respectively. The second best subsets are S15 , S14 , S15 , and S12 obtained with the same algorithms, respectively. The ranking of these best subsets is summarized in Table 6, in which optimal precision results are shaded. We observe clearly that BayesNet provides better results for the three best feature subsets. Furthermore, the optimal feature subset, based

Fig. 2. Precision and recall of the prediction, as obtained by J48 algorithm and the different subsets of features.

308

O. Falola et al.

Fig. 3. Precision and recall of the prediction, as obtained by KStar algorithm and the different subsets of features.

Fig. 4. Precision and recall of the prediction, as obtained by BayesNet algorithm and the different subsets of features.

Optimizing IoT Device Fingerprinting Using Machine Learning

309

Fig. 5. Precision and recall of the prediction, as obtained by RandomForest algorithm and the different subsets of features.

Table 5. Summary of prediction results, as measured using precision and recall metrics, for the different algorithms and feature subsets. ML algorithms Feature Subset

J48

BayesNet

Kstar

Random Forest

Precision Recall Precision Recall Precision Recall Precision Recall

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14

95.77 84.90 84.90 84.90 84.90 85.08 60.60 51.41 41.20 61.26 60.60 93.30 85.20 95.90

63.40 46.10 46.10 46.10 46.10 46.20 41.20 20.00 41.20 27.20 41.20 92.50 84.00 95.00

95.31 93.53 93.53 93.34 93.34 85.61 60.10 51.41 77.41 61.92 76.12 93.50 95.17 97.80

58.90 57.80 57.80 57.50 57.50 46.40 39.70 20.00 39.40 26.50 39.20 92.40 78.90 96.80

94.11 91.18 91.61 86.39 86.39 80.08 59.10 50.00 61.10 57.76 66.08 93.10 50.00 98.10

57.50 49.60 50.20 43.80 43.80 37.80 37.10 19.60 41.60 24.20 30.00 91.60 19.60 98.00

95.36 86.36 84.53 86.80 87.43 85.25 85.40 51.55 61.26 61.26 99.09 98.39 76.04 92.95

61.60 47.50 44.80 49.30 49.40 46.80 46.20 20.00 27.20 27.20 87.20 91.70 42.20 71.20

S15

88.00

86.90

99.10

99.00

94.60

93.30

71.81

35.40

Note: The shaded cells show the optimal measures.

O. Falola et al.

310

Table 6. Optimal subsets, as obtained with the different algorithms and measured with the precision metric. 1st best subset

2nd best subset

3rd best subset

J48 BayesNet

S14 (95.90%) S15 (99.10%)

S15 (95.77%) S14 (97.80%)

S12 (93.30%) S1 (95.31%)

KStar Random Forest

S14 (98.10%) S11 (99.09%)

S15 (94.60%) S12 (98.39%)

S1 (94.11%) S1 (95.36%)

Note: The shaded cells show the optimal precision values.

Table 7. Optimal subsets, as obtained with the different algorithms and measured with the recall metric. 1st best subset

2nd best subset

3rd best subset

J48

S14 (95.00%)

S12 (92.50%)

S15 (86.90%)

BayesNet KStar Random Forest

S15 (99.00%) S14 (98.00%) S12 (91.70%)

S14 (96.80%) S15 (93.30%) S11 (87.20%)

S12 (92.40%) S12 (91.60%) S14 (71.20%)

Note: The shaded cells show the optimal recall values.

on precision only, is S15 , which comprises only four features. It outerpasses the S15 , which contains all the features. Regarding the recall metric, we observe that the first best subsets are S14 , S15 , S14 , and S12 , as obtained with J48, BayesNet, KStar, and Random Forest, respectively. Similarly, the ranking of these subsets, based on the recall metric, are summarized in Table 7, in which the optimal recall results are shaded. Again, we observe that the optimal recall results are provided by BayesNet for the best feature subsets. Besides, the optimal subset, yielding the higher recall result, is S15 . This shows one more time that we can use only four features, which are defined in S15 , instead of using all the features that are defined in S14 . By analyzing the results obtained with the precision and recall metrics, we observe that the optimal feature subset is S15 , which is obtained with BayesNet algorithm. Moreover, based on precision and recall, the second best feature subset obtained with BayesNet is better than the first best feature subsets obtained with the other algorithms. Therefore, based on precision and recall, the

Optimizing IoT Device Fingerprinting Using Machine Learning

311

couple (S15 , BayesNet) is far better than all the other feature subset/algorithm combinations. 5.2.

F1-score (harmonic mean)

To measure the balance between precision and recall, we use the F1-score metric (also called harmonic mean), which is also used to assess the accuracy of the experiments. Without loss of generality, we assume that both precision and recall metrics are balanced, i.e, there is no preference between them. Thus, the F 1-score metric is given by Eq. (4). F 1-score = 2 ·

precision · recall . precision + recall

(4)

The obtained results are presented in Figures 6, 7, 8, and 9. According to these figures, the best feature subsets are S14 , S15 , S15 , S12 , which are obtained using J48, BayesNet, KStar, and Random Forest, respectively. Similarly, the optimal subsets obtained with the different algorithms are ranked based on the harmonic results, and presented in Table 8. Again, the higher harmonic result obtained, which is 99%

Fig. 6. Prediction results, as measured using the harmonic metric for the J48 algorithm and the different feature subsets.

312

O. Falola et al.

Fig. 7. Prediction results, as measured using the harmonic metric for the BayesNet algorithm and the different feature subsets.

Fig. 8. Prediction results, as measured using the harmonic metric for the KStar algorithm and the different feature subsets.

is obtained with BayesNet and the S15 feature subset. This confirms once again that using BayesNet and the subset S15 , we can fingerprint IoT devices with higher prediction quality. This is a remarkable achievement, as only four features (time, source, destination, and info) are needed to fingerprint IoT devices with higher accuracy,

Optimizing IoT Device Fingerprinting Using Machine Learning

313

Fig. 9. Prediction results, as measured using the harmonic metric for the Random Forest algorithm and the different feature subsets.

Table 8. Optimal subsets, as obtained with the different algorithms and measured with the F1-score metric. 1st best subset

2nd best subset

3rd best subset

J48

S14 (95%)

S12 (93%)

S15 (87%)

BayesNet

S15 (99%)

S14 (97%)

S12 (93%)

KStar Random Forest

S14 (98%) S12 (95%)

S15 (94%) S11 (93%)

S12 (92%) S14 (81%)

Note: The shaded cells show the optimal harmonic values.

instead of 17 features. Even if Random Forest is used, only two features (i.e., S12 : source and destination) can be used to fingerprint IoT devices, instead of 17 features, but with slightly lower accuracy. On a final note, the obtained results are appealing, but should be used with precaution, as more experiments need to be conducted using other ML algorithms and other datasets. 5.3.

Complexity

In this section, the time spent in training the different models is recorded and presented in Table 9. As one can see, the optimal times required to train the different models are 0.02s, 0.02s, 0.00s, and

O. Falola et al.

314

Table 9. Temporal complexity (in seconds) of training models using the different algorithms and feature subsets. ML algorithms Feature subsets

J48

BayesNet

Kstar

S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

2.15 25.56 18.07 14.96 13.25 8.15 1.92 0.02 2.21 18.27 2.99 0.04 1.02 3.41 1.53

0.13 0.25 0.51 0.09 0.07 0.07 0.15 0.05 0.09 0.02 0.14 0.02 0.04 1.18 0.05

0.00 0.05 0.01 0.01 0.01 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00 0.01

Random Forest 33.10 105.33 36.49 3465.36 60.63 42.30 450.73 0.44 8424.91 5435.15 3087.59 1.03 605.83 9.30 2.55

Note: The shaded cells show the optimal time complexity of subsets in seconds.

1.03s and are obtained using S8 with J48, S10 with BayesNet, several subsets with KStar, and S11 with RandomForest, respectively. We observe that the time recorded by Weka is too low for KStar with several feature subsets, and the values returned are zero seconds. Overall, the best temporal complexities are comparable, except those of Random Forest algorithm, which are totally high. The one obtained with BayesNet and S15 is 0.05s, which is very close to the optimal one obtained with S10 and S12 . 6.

Conclusion

IoT devices are nowadays scattered over wide areas, e.g., smart cities, industry, farming, etc. They can be exploited to perpetrate damaging attacks on customers and network infrastructures alike. To protect their assets and customers, network operators collect data generated by IoT devices to understand and monitor their behavior. When

Optimizing IoT Device Fingerprinting Using Machine Learning

315

an IoT device is down, compromised, or a non-authorized one connects to the network, it must be detected and appropriate measures taken. For instance, a digital forensic investigation is conducted to find out the compromised IoT devices, and ultimately track the origin of the attack. This cannot be done remotely, as the devices cannot be trusted in such environments. Therefore, the IoT generated traffic is exploited to do so. In this chapter, we proposed an efficient solution to fingerprint IoT devices using collected IoT network traffic and ML algorithms. In the sense of ML, we identify the optimal (minimal) set of features, which predicts the identities of the IoT devices that are connected to the network. Thus, instead of using the entire IoT traffic, a small subset of that traffic will be used, which reduces drastically the digital forensic investigation time. The obtained results showed that using only four features, instead of 17, IoT device fingerprints can be predicted with high accuracy. Indeed, the prediction measures obtained are 99.10%, 99.00%, and 99.00% using precision, recall, and harmonic mean, respectively. These results outer-passed those obtained using the entire traffic.

References 1. IoT Analytics. State of the IoT 2018: Number of IoT devices now at 7B-Market accelerating. https://iot-analytics.com/state-of-the-iot-updateq1-q2-2018-number-of-iot-devices-now-7b/. Accessed on March 15, 2020. 2. The Verge. How an army of vulnerable gadgets took down the web today. https://www.theverge.com/2016/10/21/13362354/dyn-dns-ddos-attackcause-outage-status-explained. Accessed on March 15, 2020. 3. N. Bhargava, G. Sharma, R. Bhargava, M. Mathuria. Decision tree analysis on J48 algorithm for data mining. In International Journal of Advanced Research in Computer Science and Software Engineering (2013). 4. J. G. Cleary and L. E. Trigg. K*: An instance-based learner using an entropic distance measure. In 12th International Conference on Machine Learning. pp. 108–114, (1995). 5. L. Breiman. Random Forests. Machine Learning, 45(1), 5–32 (2001). 6. H. Guo and W. Hsu. A survey of algorithms for real-time Bayesian network inference. In Join Workshop on Real Time Decision Support and Diagnosis Systems (2002).

316

O. Falola et al.

7. M. Miettinen, S. Marchal, I. Hafeez, N. Asokan, A. Sadeghi, and S. Tarkoma. IoT SENTINEL: Automated device-type identification for security enforcement in IoT. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 2177–2184 (June 2017). 8. StackExchange. Increasing number of features results in accuracy drop but prec/recall increase. (2012). https://stats.stackexchange.com/questions/ 18815/increasing-number-of-features-results-in-accuracy-drop-but-precrecall-increase 9. V. Mora-Afonso and P. Caballero-Gil and J. Molina-Gil. Strong authentication on smart wireless devices. In Second International Conference on Future Generation Communication Technologies (FGCT 2013) pp. 137–142 (2013). 10. C.-J.M. Liang, B.F. Karlsson, N.D. Lane, F. Zhao, J. Zhang, Z. Pan, Z. Li and Y. Yu. SIFT: Building an Internet of Safe Things, In Proceedings of the 14th International Conference on Information Processing in Sensor Networks, IPSN ’15, pp. 298–309, Association for Computing Machinery, New York (2015). 11. J. Cache. Fingerprinting 802.11 implementations via statistical analysis of the duration field. vol. 5, Uninformed (2006). 12. P. Robyns, E. Marin, W. Lamotte, P. Quax, D. Singel´ee, and B. Preneel. Physical-layer fingerprinting of LoRa devices using supervised and zero-shot learning. In Proceedings of the 10th ACM Conference on Security and Privacy in Wireless and Mobile Networks, WiSec ’17, pp. 58–63, ACM, New York (2017). 13. H. Larochelle, D. Erhan, and Y. Bengio. Zero-data learning of new tasks. In AAAI, vol. 1, p. 3 (2008). 14. Y. Xian, B. Schiele, and Z. Akata. Zero-shot learning-the good, the bad and the ugly. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4582–4591 (July 2017). 15. J. L. H.-R. M. P. P. A. J. J. A. F. S. L. Ladid. Toward a lightweight authentication and authorization framework for smart objects. IEEE Journal on Selected Areas in Communications, 33(4), 690–702 (June 2015). 10.1109/JSAC.2015.2393436. 16. C. T. Zenger, M. Pietersz, J. Zimmer, J.-F. Posielek, T. Lenze, and C. Paar. Authenticated key establishment for low-resource devices exploiting correlated random channels. Computer Networks, 109, 105–123 (2016). Special issue on Recent Advances in Physical-Layer Security. 17. S. Raza, L. Wallgren, and T. Voigt. SVELTE: Real-time intrusion detection in the Internet of Things. Ad Hoc Networks. 11(8), 2661–2674 (2013). ISSN 1570-8705. 18. S. Gisdakis, T. Giannetsos, and P. Papadimitratos. SHIELD: A data verification framework for participatory sensing systems. In Proceedings of the 8th ACM Conference on Security & Privacy in Wireless and Mobile Networks, number 16 in WiSec ’15, pp. 1–12, ACM, New York (2015). 10.1145/2766498.2766503.

Optimizing IoT Device Fingerprinting Using Machine Learning

317

19. J. Franklin, D. McCoy, P. Tabriz, V. Neagoe, J. Van Randwyk, and D. Sicker. Passive data link layer 802.11 wireless device driver fingerprinting. In Proceedings of the 15th Conference on USENIX Security Symposium — Volume 15, USENIX-SS’06, USENIX Association, Berkeley, (2006). 20. WireShark. https://www.wireshark.org/. Accessed on January 30, 2020. 21. TechWiser. 6 Best Wireshark Alternatives for Windows and macOS. https:// techwiser.com/wireshark-alternatives-for-windows-and-macos/. Accessed on January 30, 2020. 22. Network Working Group. DHCP options and BOOTP vendor extensions, RFC 2132. https://tools.ietf.org/html/rfc2132. Accessed on February 12, 2021. 23. E. Frank, M.A. Hall, and I.H. Witten. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th edn. Morgan Kaufmann, Burlington (2016). 24. M. Hall, E. Frank, G. Holmes, B. Pfahringer, and I.H. Witten. The WEKA data mining software: An update. SIGKDD Explorations 11, 10–18 (2009). 25. The University of Wikato. Attribute-relation file format (ARFF). https:// www.cs.waikato.ac.nz/ml/weka/arff.html. Accessed on February 7, 2021. 26. Universitat Oberta de Catalunya. J48 decision tree. http://data-mining. business-intelligence.uoc.edu/home/j48-decision-tree. Accessed on September 30, 2019. 27. A. Bahri, V. Sugumaran, and S.B. Devasenapati. Misfire detection in IC engine using Kstar algorithm. CoRR abs/1310.3717 (2013). http://arxiv.org/ abs/1310.3717.

This page intentionally left blank

c 2023 World Scientific Publishing Company  https://doi.org/10.1142/9789811273209 0010

Chapter 10

Conclusion Suryadipta Majumdar∗,‡ and Paria Shirani†,§ ∗

Concordia University, 1455 de Maisonneuve Blvd. West, S-EV 007.640 Montral, QC H3G 1M8, Canada † School of Electrical Engineering and Computer Science, University of Ottawa, 800 King Edward Av., Ottawa, ON KIN 6N5, Canada ‡ [email protected] § [email protected]

There has been a significant increase in the adoption of newer technologies such as cloud, IoT, virtual networks, etc. over the last few years. Security, however, has lagged behind, as evidenced by the increasing number of attacks that use the very unique nature of these technologies (including large-scale, dynamic, multi-layered, heterogeneous). As a result, the accountability and transparency of these devices and their operations often become questionable. Therefore, it is essentially important to enable digital investigation capabilities in these technologies. Nevertheless, existing digital forensics methods often fall short to address specific challenges in these technologies, as noted in the following: • First, existing security standards (e.g., NIST 8228, OWASP IoT security guidance) are intended more for high-level recommendations for programmers and practitioners than for conducting automated digital forensic investigations. As a result, it is infeasible 319

320

S. Majumdar & P. Shirani

to simply apply those recommendations as the means to collect evidences for investigation. • Second, even after identifying evidence source and collecting evidence, many low-end devices (such as IoT devices) are not capable of hosting the investigation process by themselves, as these devices have limited computation and storage capacity to store forensic data and to execute any existing forensic tools. • Third, the forensics becomes more challenging due to the limited logging support in today’s many technologies (e.g., IoT applications). Therefore, in this context, conducting only traditional logbased investigation becomes insufficient. In summary, there is a need for an automated digital forensics process for each of those new technologies, which can potentially overcome all these technology-specific challenges. In this book, we presented various mechanisms to enable digital forensic procedures in different emerging technologies (e.g., modern networks, cloud computing, IoT). To this end, we first reviewed the literature and enumerated existing challenges in digital forensics for emerging technologies. Then, we conducted a thorough analysis on the file recovery tools against NIST guidelines to evaluate their efficacy. Afterwards, we presented several mechanisms to perform large-scale investigation on network anomalies using machinelearning algorithms to automate and accelerate the entire forensic procedure. Moreover, we extended those methods to support digital forensic in the context of cloud computing and modern networking (e.g., software-defined networks). Additionally, we provided an investigation framework for IoT devices, which can collect necessary evidences and build device fingerprinting to assist in the forensic procedure. Thus, this book addresses several basic limitations that digital forensic researchers and practitioners are currently facing in various emerging technologies.