Security and Privacy in the Internet of Things: Challenges and Solutions 9781643680521, 9781643680538, 2020930532

The Internet of Things (IoT) can be defined as any network of things capable of generating, storing and exchanging data,

1,045 202 9MB

English Pages [204] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Security and Privacy in the Internet of Things: Challenges and Solutions
 9781643680521, 9781643680538, 2020930532

Table of contents :
Title Page
Foreword
Introduction
Acknowledgments
Contents
USEIT Project: Empowering the Users to Protect Their Data
Privacy Awareness for IoT Platforms: BRAIN-IoT Approach
UPRISE-IoT: User-Centric Privacy & Security in the IoT
Making the Internet of Things More Reliable Thanks to Dynamic Access Control
The SOFIE Approach to Address the Security and Privacy of the IoT Using Interledger Technologies
Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems
Construction of Efficient Codes for High-Order Direct Sum Masking
A Framework for Security and Privacy for the Internet of Things (SPIRIT)
Direct Sum Masking as a Countermeasure to Side-Channel and Fault Injection Attacks
IoTCrawler. Managing Security and Privacy for IoT
Subject Index
Author Index

Citation preview

SECURITY AND PRIVACY IN THE INTERNET OF THINGS: CHALLENGES AND SOLUTIONS

Ambient Intelligence and Smart Environments The Ambient Intelligence and Smart Environments (AISE) book series presents the latest research results in the theory and practice, analysis and design, implementation, application and experience of Ambient Intelligence (AmI) and Smart Environments (SmE). Coordinating Series Editor: Juan Carlos Augusto Series Editors: Emile Aarts, Hamid Aghajan, Michael Berger, Marc Bohlen, Vic Callaghan, Diane Cook, Sajal Das, Anind Dey, Sylvain Giroux, Pertti Huuskonen, Jadwiga Indulska, Achilles Kameas, Peter Mikulecký, Andrés Muñoz Ortega, Albert Ali Salah, Daniel Shapiro, Vincent Tam, Toshiyo Tamura, Michael Weber

Volume 27 Recently published in this series Vol. 26. Vol. 25. Vol. 24. Vol. 23. Vol. 22. Vol. 21. Vol. 20. Vol. 19. Vol. 18. Vol. 17.

A. Muñoz, S. Ouhbi, W. Minker, L. Echabbi and M. Navarro-Cía (Eds.), Intelligent Environments 2019 – Workshop Proceedings of the 15th International Conference on Intelligent Environments M. Vega-Barbas and F. Seoane (Eds.), Transforming Ergonomics with Personalized Health and Intelligent Workplaces A. Muñoz and J. Park (Eds.), Agriculture and Environment Perspectives in Intelligent Systems I. Chatzigiannakis, Y. Tobe, P. Novais and O. Amft (Eds.), Intelligent Environments 2018 – Workshop Proceedings of the 14th International Conference on Intelligent Environments C. Analide and P. Kim (Eds.), Intelligent Environments 2017 – Workshop Proceedings of the 13th International Conference on Intelligent Environments P. Novais and S. Konomi (Eds.), Intelligent Environments 2016 – Workshop Proceedings of the 12th International Conference on Intelligent Environments W. Chen et al. (Eds.), Recent Advances in Ambient Assisted Living – Bridging Assistive Technologies, e-Health and Personalized Health Care D. Preuveneers (Ed.), Workshop Proceedings of the 11th International Conference on Intelligent Environments J.C. Augusto and T. Zhang (Eds.), Workshop Proceedings of the 10th International Conference on Intelligent Environments J.A. Botía and D. Charitos (Eds.), Workshop Proceedings of the 9th International Conference on Intelligent Environments ISSN 1875-4163 (print) ISSN 1875-4171 (online)

Security and Privacy in the Internet of Things: Challenges and Solutions

Edited by

José Luis Hernández Ramos European Commission, Joint Research Centre

and

Antonio Skarmeta University of Murcia, Department of Information and Communications Engineering

Amsterdam  Berlin  Washington, DC

© 2020 The authors and IOS Press. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 978-1-64368-052-1 (print) ISBN 978-1-64368-053-8 (online) Library of Congress Control Number: 2020930532 doi: 10.3233/AISE27 Publisher IOS Press BV Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail: [email protected] For book sales in the USA and Canada: IOS Press, Inc. 6751 Tepper Drive Clifton, VA 20124 USA Tel.: +1 703 830 6300 Fax: +1 703 830 2300 [email protected]

LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS

v

Foreword Christian Wilk Research Executive Agency The roots of the idea of the Internet of Things – even though first references to such an idea already appeared in the 1960s – can be traced back to a vision described by Mark Weiser in the early 1990s in his seminal article ‘The Computer for the 21st Century’. In it he described a scenario which he called ‘ubiquitous computing’ where computers would vanish into the background, becoming so pervasive and unobtrusive that they would basically become invisible and ubiquitous. Such a network of sensors and processors would be permanently aware of the actors in its vicinity, and would react fully context-aware to each need expressed. Moving from this scenario, which still had the human user and its needs at the center of attention, to a scenario where devices would communicate independently of human intervention, led to the term machine-to-machine communication. Going even a step further and taking all these devices, independent of their focus on human or machine communication, and connecting them to the internet led to the term as we know it now, the Internet of Things (IoT). The IoT could be defined as any networked thing equipped with the ability to generate, store, and exchange data, and in some cases as well as to act on data, thus being able to sense, to interact and to change its environment actively. This could be anything from tiny sensors embedded in moving vehicles, voice-activated loudspeakers, wearables, actuators and operational technology in industrial settings, to medical devices and implants. This new form of seamless connectivity has many applications across various industries such as smart cities, smart grids for energy management, intelligent transportation, environmental monitoring, infrastructure management, and medical and healthcare systems, to building and home automation. Within a business context where competition is mainly driven by lowering costs, and in combination with several constraints that IoT devices face, such as limited computing power and battery lifetime, security considerations are not the most important design feature for connected devices. And particularly in the industrial sector, legacy devices which may date back from the days when connectivity was very limited, when integrated into larger computer networks, can create risks that the original developers never anticipated. These potentially severe implications to the security and safety of people makes the security and safety of IoT building blocks a paramount issue. Furthermore, a major barrier to the uptake of IoT on a larger scale is the lack of trust. Building trust into the

vi

IoT based on robust cybersecurity features is a precondition for exploiting its numerous potential benefits and for the realization of Europe’s Digital Single Market. To ensure a minimum level of interoperability, security and assurance, the European Commission issued a Common Cybersecurity Strategy for the European Union in 2013 (JOIN(2013) 1), in which for the first time the term machine-to-machine communication in the context of automated water sprinklers was used to refer to the nascent field of IoT. The Common Cybersecurity Strategy for the EU kicked off the preparatory work on several EU cybersecurity policies which over the following years became legal acts directly relevant and applicable to the IoT domain: 1. In August 2016 the NIS Directive (2016/1148) entered into force with member states having to transpose it into national law by May 2018. The NIS Directive has three parts: a. Capacity building: EU member states must possess minimum capabilities, adopt a cybersecurity strategy, and establish a single point of contact for cybersecurity issues (CSIRT) b. Critical Infrastructure: operators of essential services (critical sectors such as energy, transportation, water, healthcare, and finance) have to adopt a culture of risk management and have to comply with security and notification requirements. c. Cooperation: in order to build trust and confidence, member states shall collaborate across borders, a mechanism shall be put in place for the exchange of security related information, the sharing of incident information and best practices (CSIRTs network) Even though neither the term IoT itself nor its related terms is directly mentioned in the Directive, the Directive directly affects sectors with the highest potential use for (industrial) IoT. 2. The General Data Protection Regulation (GDPR) (2016/679) was released in April 2016, and entered into force on 25 May 2018. The GDPR aims to increase the control of individuals over their personal data and to unify data protection laws across the EU. It introduces limitations to the purpose and scope of personal data collection and processing. It also governs the transfer of personal data outside the EU, and introduces notification requirements in the case of a security breach affecting personal data. 3. The latest addition to the portfolio of EU legislation in the area of cybersecurity was the Cybersecurity Act (2019/881) which entered into force on 27 June 2019 and complements the NIS Directive. It mentions prominently the Internet of Things on several occasions. It consists of two main parts: a. Reinforcing the European Union Agency for Cybersecurity (ENISA) by giving it a permanent mandate and strengthening its role b. Establishing a European cybersecurity certification framework for ICT products, services and processes It is against this evolving legal background that research and development projects funded within the EU’s Horizon 2020 programme are trying to explore available options and possible approaches to address the security and privacy issues of the Internet of Things.

vii

It is with great pleasure to see such a wide cross-section of projects presented in this book. The spectrum ranges from the secure management of personal data, the specific challenges of IoT with respect to the GDPR, through access control within a highly dynamic IoT environment, increasing trust with distributed ledger technologies, to new cryptographic approaches as a counter-measure for side-channel attacks, and the vulnerabilities of IoT-based ambient assisted living systems. Security and safety of the Internet of Things will remain high on the agenda of policymakers for the foreseeable future. Even more so when moving towards the internet of nano-things, when things will become literally invisible to the human eye and can penetrate living things unnoticed. Together with the convergence of the physical and biological realm through nanotechnology and synthetic biology this will create an internet of living things which will blur the boundary between biological and cyber risks. The need for proactive, forward looking policymaking, moving away from the current reactive approach, will therefore become even more important as policy development cycles and technology development cycles will presumably remain as decoupled and out of sync as they have been in the past.

viii

Introduction a

Enrico DEL RE a University of Florence and CNIT, Italy

In the Information Technology (IT) and in the future Internet of Things (IoT) systems security and privacy, also generally referred to as cybersecurity, play a key role and have to address these six main requirements:  Authentication: the process of determining whether someone or something is, in fact, who or what it declares to be  Access control: the procedure to allow an authorized utilization of a resource  Data integrity: the certification to ensure that the received data are identical to the sent data  Nonrepudiation: the protection against the possibility that the sender (or the recipient) could deny sending (or receiving) the data  Availability: the certainty that the desired service must be available when required  Confidentiality: all the procedures to guarantee that user data must be protected from any unauthorized access and usage Today in present IT systems all six cybersecurity requirements are implemented by third parties (service providers and/or network operators) by means of suitable procedures (protocols) submitted to the user when requiring a service. Different levels of effectiveness, efficiency and performance have been achieved to fulfill these requirements: the first five have reached an acceptable or sufficient degree of performance (even if not always satisfactory), while recent unauthorized disclosure of user data, in particular by pervasive social networks (but not only by them), has clearly highlighted that the sixth requirement (i.e. confidentiality) need more stringent procedures and, probably, a completely different approach. The European Union (EU) since 2012 tackled the cybersecurity issues and stated “Building trust in the online environment is key to economic development. Lack of trust makes consumers hesitate to buy online and adopt new services, including public egovernment services. If not addressed, this lack of confidence will continue to slow down the development of innovative uses of new technologies, to act as an obstacle to economic growth and to block the public sector from reaping the potential benefits of digitisation of its services, e.g. in more efficient and less resource-intensive provisions of services. This is why data protection plays a central role in the Digital Agenda for Europe, and more generally in the Europe 2020 Strategy”1, and "by design new systems must include as initial requirements:  The right of deletion  The right to be forgotten  Data portability  Privacy and data protection principles 1

European Commission, 25.01.2012, SEC(2012)72 final, page 4.

ix

taking into account two general principles:  The IoT shall not violate human identity, human integrity, human rights, privacy or individual or public liberties  Individuals shall remain in control of their personal data generated or processed within the IoT, except where this would conflict with the previous principle."2 Following these general and challenging statements, in spite of heavy attempts to defeat any rule, the EU issued the so-called GDPR (General Data Protection Regulation) that entered into force in all Member States on 25 May 2018. This complex regulatory document deals with all cybersecurity requirements related to personal data and, particularly, to the confidentiality and privacy of data anyhow referred to the user (defined data subject in the GDPR terminology). Basic principles and guidelines of GDPR, when someone or something is collecting, processing and storing personal data, are lawfulness, fairness, transparency, minimization, purpose limitation, security, accuracy and integrity. Another key and distinguishing feature is that the service providers must ask data subject for consent defined as “any freely given, specific, informed and unambiguous indication of the data subject's wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her”. Consent is not given once and forever, but must be renewed whenever personal data are used for purposes other than those initially authorized. Heavy penalties are imposed on service providers who do not comply with the GDPR rules. It is a significant step forward for user security and privacy protection, as demonstrated by the worldwide acceptance of its principles, that have gained consensus outside Europe (California, Japan, Brasil, Singapore, New Zealand, and others). Indeed services currently offered, while slowly trying to comply with GDPR rules, miss almost completely the fulfillment of the security and privacy requirements ‘by initial design’, as stated by the EU principles. Moreover, in spite of its fundamental milestone on user security and privacy, does GDPR fully comply with the stated EU principle “Individuals shall remain in control of their personal data generated or processed within the IoT”? The situation will become even more critical in scenarios foreseen for the future, where the high-speed, ultra-reliable, massive and always available connectivity at the global scale provided by the 5G mobile networks, the billions of (more or less) smart objects and sensors always connected in the IoT and the Artificial Intelligence (AI) innovative and powerful processing capabilities will realize the possibility to obtain, to store, to process, to deliver diversified and high volume data (Big Data). Most of these data will refer to human sensitive information and could be acquired even without the awareness of the interested subjects. For example, this is particularly realistic when automatic profiling (i.e. profiling without any human intervention) of the data subject personal data and automatic facial recognition are put in place. This scenario looks like an ever present distributed and global computer dealing with personal data without the awareness of their 2

European Commission, 2013, IoT Privacy, Data Protection, Information http://ec.europa.eu/information_society/newsroom/cf/dae/document.cfm?doc_id=1753

Security.

x

owner and suggests a much worse scenario than the famous Big Brother described in Orwell’s 1984, with the concrete risk of violation of the fundamental human rights and of people becoming the new future digital slaves of a few big players. Of course, 5G, AI and IoT can provide breakthroughs and enormous benefits to society and individuals (e.g. for e-health applications and services to disabled and elderly people, environment control and security, smart energy production and utilization, smart mobility management, industry efficiency, smart cities, smart buildings, media and entertainment, e-government,…) and it is a vital interest of the entire human society to preserve the benefits while reducing to the minimum the associated risks of personal security and privacy. While GDPR is a fundamental step to tackle these contrasting issues, however some possible realistic breaches are evident, in addition to noncompliance of service providers, e.g. by the mentioned automatic profiling and facial recognition. The implementation of the six cybersecurity main requirements, including the last one of confidentiality and privacy, even after GDPR, is in charge of service providers that should guarantee their fulfillment. The heavy penalties in case of noncompliance should convince service providers to conform and to implement all the necessary tools and actions, but we all know that this is not always the case. Actually, for the first five requirements the solution provided by third parties is an inevitable and proper approach. However, the control of the data subject of the authorized or unauthorized use of her/his personal data at most can be verified only a posteriori, e.g. accessing to a database of all data transactions certified by a Distributed Ledger Technology. To avoid, perhaps definitively, the violation of our fundamental rights, we need the new paradigm of “a priori data usage control”, meaning that “except in cases of force majeure or emergency, the use in any form and for any purpose of personal data must be authorized in advance and explicitly by its owner, correctly informed of the purpose of use”. To meet this highly challenging objective, we need to synergize the innovative and revolutionary GDPR directives and new efficient technological tools dealing specifically with direct control by data subject of her/his data. Indeed, in the future IoT scenarios, the new technological solutions are needed for all the six requirements of security and privacy, as present tools and procedures can be no longer adequate. The EU played a proactive role to search for the technological solutions to direct data subject control and, more in general, to all aspects of the security and privacy in the IoT, by funding specific researches in the framework of HORIZON 2020 and CHIST-ERA programmes. Doing this, EU put itself on the international forefront of advanced research on these technological challenging issues. Of course, these research activities are ongoing, but some already achieved results are well encouraging towards possible practical solutions to the problems of the security and privacy in the IoT. The ten chapters of this book give an overview of some relevant preliminary results obtained by projects funded by EU in recent years and generally still ongoing. The readers are faced with the worldwide forefront of the more advanced

xi

researches on the security and privacy techniques for future IoT scenarios, with many examples of specific case use applications. The chapter USEIT project: Empowering the users to protect their data proposes a solution of the security and privacy in a smart building use case involving directly user action through the interaction with intermediate functional entities that analyses the data sent from different sensors in the building to implement the security measures. The chapter “Privacy awareness for IoT platforms: BRIAN-IoT approach” addresses the challenges of privacy control and impact assessment for IoT platform by leveraging GDPR core principles and ISO/IEC standards. The chapter “UPRISE-IoT: User-centric Privacy & Security in the IoT” manages the user awareness and control of privacy risks of a mobile app and the informed consent/deny of the service. The chapter “Making the Internet of Things More Reliable Thanks to Dynamic Access Control” proposes new approaches of context-aware and distributed dynamic access control mechanisms. The chapter “The SOFIE Approach to Address the Security and Privacy of IoT using Interledger Technologies” investigates the application of distributed multiple interledger technologies for implementing authentication, access, nonrepudiation and privacy issues, applied to the four cases of food supply chain, electricity grid load balancing, context-aware mobile gaming, and smart meter data exchange. The chapter “Assessing Vulnerabilities in IoT-based Ambient Assisted Living systems” proposes a framework to model, understand and analyze the security risks in possible attacks related to authorization and access by unauthorized entities to data referred to humans with special needs. The two related chapters “Construction of Efficient Codes for High-Order Direct Sum Masking and Direct Sum Masking as a Countermeasure to Side-Channel and Fault Injection Attacks” describe new approaches of direct sum masking to data cryptography to provide countermeasures to the combination of side-channel and fault injection attacks. The chapter “A Framework for Security and Privacy for the Internet of Things (SPIRIT)” addresses the authentication, access, integrity and privacy issues by extraction and classification of the document content and by encryption tools. The chapter “IoTCrawler. Managing security and privacy for IoT” proposes a search engine for IoT information addressing authentication, authorization, confidentiality of exchanged data by encryption techniques and distributed ledger technologies for IoT inter-domain relations. The following table proposes a concise synopsis of the distribution of the main contents of each chapter versus the six cybersecurity requirements. Foe each project the table points out the approach to address the headed requirement or, alternatively, when specific use cases are considered, what corresponding requirement they try to solve. Hopefully, it will help the reader to navigate around the book.

USEIT project: Empowering the users to protect their data Privacy awareness for IoT platforms: BRIAN-IoT approach UPRISE-IoT: Usercentric Privacy & Security in the IoT Making the Internet of Things More Reliable Thanks to Dynamic Access Control The SOFIE Approach to Address the Security and Privacy of IoT using Interledger Technologies Assessing Vulnerabilities in IoT-based Ambient Assisted Living systems Construction of Efficient Codes for High-Order Direct Sum Masking

Authentication

Access control

Integrity

Authentication, Identity management

Authorization

Key cryptographic exchange

Nonrepudiation

Availability

Privacy control based on GDPR and ISO/IEC Privacy management in mobile apps

Context-aware and distributed dynamic access 4 cases: food chain, electricity load balancing, mobile gaming, smart meter

For humans of special needs

For humans of special needs

Confidentiality

Trust and Reputation

Informed consent/deny

4 cases: food chain, electricity load balancing, mobile gaming, smart meter

xii

Table 1. Synopsis of the contents of the book chapters versus cybersecurity requirements

4 cases: food chain, electricity load balancing, mobile gaming, smart meter

Cryptography to provide countermeasures to the combination of side-channel and fault injection attacks

4 cases: food chain, electricity load balancing, mobile gaming, smart meter

Authentication SPIRIT Project

Direct Sum Masking as a Countermeasure to Side-Channel and Fault Injection Attacks IoTCrawler. Managing security and privacy for IoT

Access control

Integrity

Nonrepudiation

Availability

Confidentiality

Extraction and classification of the document content

Extraction and classification of the document content

Extraction and classification of the document content and encryption Cryptography to provide countermeasures to the combination of side-channel and fault injection attacks

Extraction and classification of the document content

Encryption techniques and distributed ledger technologies

Encryption techniques and distributed ledger technologies

Encryption techniques and distributed ledger technologies

xiii

xiv

This work has been partially sponsored by the USEIT project (CHIST-ERA PCIN-2016-010) as well as the EU H2020 projects OLYMPUS (Grant agreement ID: 786725) and SerIoT (Grant agreement ID: 780139)

xv

Contents Foreword Christian Wilk

v

Introduction Enrico del Re

viii

Acknowledgments

xiv

USEIT Project: Empowering the Users to Protect Their Data Dan Garcia-Carrillo, Alejandro Molina-Zarca, Nouha Oualha and Antonio Skarmeta

1

Privacy Awareness for IoT Platforms: BRAIN-IoT Approach Mohammad Rifat Ahmmad Rashid, Davide Conzon, Xu Tao and Enrico Ferrera

24

UPRISE-IoT: User-Centric Privacy & Security in the IoT Silvia Giordano, Victor Morel, Melek Önen, Mirco Musolesi, Davide Andreoletti, Felipe Cardoso, Alan Ferrari, Luca Luceri, Claude Castelluccia, Daniel le Métayer, Cédric Van Rompay and Benjamin Baron

44

Making the Internet of Things More Reliable Thanks to Dynamic Access Control Anne Gallon, Erkuden Rios, Eider Iturbe, Hui Song and Nicolas Ferry

61

The SOFIE Approach to Address the Security and Privacy of the IoT Using Interledger Technologies Dmitrij Lagutin, Priit Anton, Francesco Bellesini, Tommaso Bragatto, Alessio Cavadenti, Vincenzo Croce, Nikos Fotiou, Margus Haavala, Yki Kortesniemi, Helen C. Leligou, Ahsan Manzoor, Yannis Oikonomidis, George C. Polyzos, Giuseppe Raveduto, Francesca Santori, Vasilios Siris, Panagiotis Trakadas and Matteo Verber Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems Ioana-Domnina Cristescu, José Ginés Giménez Manuel and Juan Carlos Augusto

76

94

Construction of Efficient Codes for High-Order Direct Sum Masking Claude Carlet, Sylvain Guilley, Cem Güneri, Sihem Mesnager and Ferruh Özbudak

108

A Framework for Security and Privacy for the Internet of Things (SPIRIT) Julian Murphy, Gareth Howells, Klaus Mcdonald-Maier, Sami Ghadfi, Giles Falquet, Kais Rouis, Sabrine Aroua, Nouredine Tamani, Mickael Coustaty, Petra Gomez-Krmer and Yacine Ghamri-Doudane

129

xvi

Direct Sum Masking as a Countermeasure to Side-Channel and Fault Injection Attacks Claude Carlet, Sylvain Guilley and Sihem Mesnager

148

IoTCrawler. Managing Security and Privacy for IoT Pedro Gonzalez-Gil, Juan Antonio Martinez, Hien Thi Thu Truong, Alessandro Sforzin and Antonio F. Skarmeta

167

Subject Index

183

Author Index

185

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved. doi:10.3233/AISE200002

1

USEIT Project: Empowering the Users to Protect Their Data Dan GARCIA-CARRILLO a,1 , Alejandro MOLINA-ZARCA a Nouha OUALHA b and Antonio SKARMETA a a University of Murcia, Faculty of Computer Science, 30100, Murcia, Spain b CEA, LIST, Communicating Systems Laboratory, 91191 Gif-sur-Yvette CEDEX, France Abstract. USEIT is a project that is developing and integrating technologies towards the empowerment of the day to day user of the Internet of Things technology through the secure management of their personal data. This chapter will give an overview of the project, its objectives, its architecture and the actual platform that is developed towards the secure management of user data. To show this secure data management, we present a use case of a Smart Building, and how the data sent from different sensors in the building is analysed and the security measures are taken under the orchestration of a building manager who sets the necessary policies in place, so that end users can securely receive information. Keywords. IoT, Security, CP-ABE, Policy-based, Orchestration

1. Introduction This book chapter presents the USEIT project, what are its objectives and the work done towards the completion of these objectives through the development and integration of technologies towards the empowerment of the day to day user of the Internet of Things technology through the management of its personal data. USEIT is a CHIST-ERA project about privacy and security aimed for Internet of Things (IoT). This project implements and develops security tools, policy languages and new cryptographic algorithms, enabling users of IoT devices to have the strongest privacy and security protection and control over their information. The project tests the mechanisms that are developed throughout its duration to prove their security, with the objective of providing open-source implementations of these mechanisms. To validate and test them, the project considers three distinct use cases. The first one deals with vehicular networks. The remaining two use cases are about security and the management of data shared amongst smart objects in changing and dynamic environments. In this book chapter we will delve into the use cases on smart objects, showing how a user can access the information in a Building Management System (BMS), the measures 1 Corresponding Author: Dan Garcia-Carrillo, Faculty of Computer Science, University of Murcia, Espinardo, Murcia, 30100, E-mail: [email protected]

2

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

that are taken to secure the information and provide a fine-grained access to the available data according to the permissions of the users. The rest of the chapter is organised as follows: Section 2, gives an overview of the USEIT project. Section 3, gives the necessary background to understand the proposed framework to manage user’s access to the information. Section 6, elaborates the BMS use case. 2. USEIT project USEIT refers to the User empowerment for SEcurity and privacy in Internet of Things) CHIST-ERA project. The USEIT project is focused on privacy and security for the Internet of Things. For a system to be consider considered as trustworthy, it needs to protect the data that is communicated within the systems and its processes. Making such protection a reality is a challenging task, particularly for the Internet of Things. On the one hand, because of the limitations of the devices in terms of resources (CPU, RAM, etc.), their limited or non-existent user interface, limited connectivity and bandwidth. On the other hand, because the nature of the IoT networks is dynamic. Sadly, employing existing technologies in the area of security, oftentimes can produce compromised systems that cannot support requirements to human users such dadta privacy and manageability. USEIT implements and develops policy languages, security tools and new cryptographic algorithms, enabling the strongest security, privacy protection and control to the users of IoT devices. USEIT aims also to tests the developed mechanisms, proving their security, and providing open-source implementations of them. To validate and the mechanisms, the project considers three relavant use cases: one that is related to vehicular networks and two that are related to the management and protection of the data shared among IoT devices.

3. USEIT Framework In this section we give the necessary background regarding the tools that are part of the USEIT framework to provide the end user with control over its data. 3.1. MyData Model and citizen control: Technologies for empowering the users Personal data is nowadays a world-wide activity and mainstream business. The World Economic Forum, says ‘Personal data is becoming a new economic asset class, a valuable resource for the 21st century that will touch all aspect of society” [5]. On the one hand, for the digital life people ought to have legitimate rights and specialized instruments to oversee the information gathered about them. On the other hand, security guidelines and regulations avert companies to make innovative services using personal information, so they resort to approaches to sidestep them. So as to address this issues, MyData has been proposed [8], incorporating a system, standards and a model for a human-centric way to enable people to manage their information and its management. The MyData initiative works as follows:

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

3

• Human-centric privacy and control: People are not passive targets, yet enabled actors in the administration of their own lives both on the web and offline. • Usable data: It is fundamental that individual information is in fact simple to access and utilize – it is available in machine-readable and open formats by means of secure and standardized APIs. • Open business environment: MyData framework empowers decentralized administration of individual information, improving interoperability and making it simpler for organizations to conform to stricter data protection regulations, enabling people to choose service providers without proprietary data lock-ins. MyData is a dynamic and progressive way to deal with personal information management, combining digital human rights and industry need to access the information. This approach is beneficial to both sides: individuals and companies. For individuals, it gives simple to-utilize and extensive instruments for personal information management and transparency systems that show how associations utilize their information. For organizations, it opens open doors for new types of data-based businesses oportunities by encouraging lawful and technical access to pre-existing personal datasets when the individual is willing to give his/her consent. Furthermore, for the society, it makes the essential structures, procedures and policies for ensuring the protection of individual rights and encouraging the utilization of personal information in the development of new and innovative services. The initiative of the MyData architecture is based on standardized and interoperable MyData accounts. The proposed model allows individuals to control the access to their personal data from a centralized place in an easy way. Such accounts will be provided by organizations, acting as MyData operators that also give the possibility to individuals to host their own accounts. The data goes from a source of data to an application or a service that uses such data. The primary function of a MyData account is to enable consent management – the data itself is not necessarily streamed through the servers where the MyData account is hosted. 3.2. eXtensible Access Control Markup Language (XACML) XACML is an OASIS standard. It entails a declarative and XML-based language that is used to express access control policies. This allows the specification of a set of subjects that can perform certain actions upon a set of resources, according to their attributes. The policies are based on three elements: PolicySet, Policy and Rule. A PolicySet can contain other PolicySetsand Policies. A Policy encompases a set of Rules that specifies an Effect (Permit or Deny), as a result of applying said Rule with a particular request. XACML potency allows to specify a set of Environment values that permit the specification of additional values to access a resource such as a specific Time range in which the resource is accesible. XACML architecture consists mainly of four elements: • PEP (Policy Enforcement Point): This entity is responsible for access control, making requests for access and enforcing authorisation decisions. • PDP (Policy Decision Point): This entity is in charge of evaluating requests against a set of policies and makes authorization decisions.

4

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

Figure 1. MyData Architecture

• PAP (Policy Administration Point): This entity creates policies or set of policies. • PIP (Policy Information Point): This entity is a source of attribute values. 3.3. USEIT crytographic approaches with CP-ABE Attribute-Based Encryption (ABE) [9] is a generalization of Identity Based Encryption (IBE), where the identity of the participants is represented by a set of attributes related to their identity. ABE is attracting attention because of its high expressiveness and flexibility, compared to previous schemes. With ABE, a the information can be made accessible to a set of entities whose identities are based on a set of attributes. In a CP-ABE scheme [3], a ciphertext is encrypted using a policy of attributes, whilst the keys of the participants are associated with their sets of attributes. With CP-ABE, a producer is able to exert control over how the information is disseminated to other entities. 4. USEIT Architectural approach In this section we explain the basis on which the USEIT architecture is build upon, which will enable the instantiation of the USEIT platform in the next section. 4.1. Architecting the IoT Over the last years, the evolution of IoT lead to a series of protocols and technologies that results in fragmented solutions. For this reasons, there is a necessity to provide high-

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

5

level architectures that enable the disengagement from the technical details, providing a common understanding of the privacy and security needs of IoT scenarios. To this end, in 2015 then AIOTI WG03 started the development of a High Level Architecture (HLA) for IoT following a layered funciontal model ( Network, IoT, Application). This working group collaborate closely with the AIOTI WG04 that addresses policy issues that are related to privacy and security. Besides AIOTI, currently there are other initiatives that are mainly focused on defining an architecture for IoT that is high-level. Particularly, the purpose if the IEEE ”Standard for an Architectural Framework for the Internet of Things (IoT)” (IEEE P2413)2 is definingan architectural framework that addresses the descriptions, the definitions and the common aspects of the different domains of IoT, so increasing the compatibility, transparency and interoperability of IoT systems. The architecture that is proposed follows a three-tier approach (Sensing, Networking, Data Communications and Applications). Additionally, the initiative OneM2M is a joint effort with 14 partners (the European Telecommunications Standards Institute (ETSI) and others), so to ensure efficient M2M deployments in IoT. oneM2M proposes a layered model (Network Services, Common Services and Application) mapped to a functional architecture. Additionally, the ITU Telecommunication Standardization Sector (ITU-T) following the recommendation Y.2060 ”Overview of the Internet of Things” designed a four level reference model (Device, Network, Service Support and Application Support, and Application) with two cross-layer levels (Security capabilities and Management capabilities), so grouping distinct functional aspects in every layer. Furthermore, the ITU-T Study Group 20 (SG20) “Internet of Things (IoT) and its applications including smart cities and communities (SC&C)”3 objective is to generate standards to enable IoT technology related developments that are coordinated, using mechanisms for inteoperable IoT applications and datasets that are used by several vertically oriented sectors of the industry. Within European research projects, there is a wide range of IoT scenarios of IoT that has resulted in t he specification of several architectures usually tailored to the domain where they are being deployed, meeting specific requirements. This is pinpointed as a barrier for the adoption of IoT at a bigger scale. To cope with this one of the proposals to address this with a common architecture is the IoT-i 4 , a European research project, which analysed different architectures, so to create a joint and aligned vision of the IoT in Europe. This meant a step forward for the creation of holistic environments, which encouraged a broader adoption of IoT. IoT-A 5 , a large-scale project that was focused on the design of an textitArchitecture Reference Model (ARM), to be instantiated by other IoT architectures. They provided a set of tools and guidelines for this purpose. The IoT6 architecture [14] focus was the use of previous results from different projects to design an IPv6-based and service oriented architecture, so to achieve a high degree of interoperability amognst different communication and application technologies. Additionally, other architectures were proposed by others, such as FI-WARE6 , which was based on the specific requirements from par2 https://standards.ieee.org/develop/project/2413.html

3 http://www.itu.int/en/ITU-T/about/groups/Pages/sg20.aspx 4 http://postscapes.com/iot-i-iot-initiative 5 http://www.iot-a.eu/public 6 https://www.fiware.org/

6

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

ticular application domains. SENSEI, on the other hand, was focused on designing the service layer in wireless actuator and sensor networks. On a different note, FI-WARE, under the FI-PPP program, designed an open platform which is based on an architecture conformed by components referred to as Generic Enablers (GEs). These EU projects addressed the definition of an IoT architecture that considered different levels of abstraction, so to fit specifc scenarios. Furthermore, this set of initiatives do not address privacy and security concerns using a holisting approach. Contrarily, USEIT’s effort to develop the architecture is driven by the instantiation of the IoT-A proposed architecture. The reason for selecting the IoT-A ARM to start, is becauase it provides a comprehensive IoT ecosystem definition, proposing different models and architectures. Additionally, IoT-A results are supported by emerging initiatives, such IEEE P2413 or the initial definition of HLA by AIOTI WG03, which specifies a reference architecture for IoT. 4.2. IoT-A and the Architecture Reference Model IoT-A, is a European project focused on designing an Architectural Reference Model (ARM) [1], so to enhance the interoperability amongst IoT domains that are isolated, as a crucial step to move from an Intranet of Things to a Internet of Things [15] that is real. The collection of results obtained from IoT-A encompass: a Reference Model (RM), which promotes a common understanding at high abstraction level; a Reference Architecture (RA), which describes the essential building blocks, building compliant IoT architectures; and additionally, a collection of Guidelines and Best Practices to aid in the development of an architecture based on the aforementioned RA. Particularly, the RA contributes several perspectives and views that are focused on different architectural aspects. Amongst these views, the Functional View (see in Figure 2) describes a collection of Functional Components (FC) that are organized into nine Functional Groups (FG), their interfaces and responsibilities [IoT-A. D1.5]. • Application FG. Represents services and users, which interact with IoT systems. • Device FG. Represents sensors, actuators or tags in an IoT domain. • IoT Process Management FG. The objective is providing the functional interfaces and components necessary to augment traditional (business) processes with the idiosyncrasies of the IoT world. • Service Organisation FG. This is intended to orchestrate and compose Services of different levels of abstraction. • Virtual Entity FG. This has functions to interact with the IoT System, with the basis of Virtual Entities (VEs) (i.e. the digital representation of a physical entity), as well as functionalities for discovering and looking up services that can provide information about VEs, or which allow the interaction with VEs. • IoT Service FG. This contains IoT services as well as functionalities for discovery, look-up, and name resolution of IoT Services. • Communication FG. This is an abstraction, which models the variety of interaction schemes that are derived from the many technologies that belong to IoT systems and provide a common interface to the IoT Service FG. • Security FG. This is responsible for ensuring the security and privacy of IoT-Acompliant systems

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

7

Figure 2. IoT-A Functional View

Additionally, the Security FG is formed by five functional components: Authentication, Authorization, Identity Management (IdM), Key Exchange and Management (KEM) and Trust and Reputation (T&R). • Authentication FC. This involves user and service authentication by checking the users’ credentials and verifying assertions. • Identity Management FC. This addresses the privacy aspects, managing pseudonyms enabling anonymous interactions. • Authorization FC. This is responsible for granting or denying access to resources based on access control policies, as well the management of said policies. • Key Exchange and Management FC. This enables secure communications amongst different entities, distributing cryptographic material and registering security capabilities. • Trust and Reputation FC. This collects the reputation scores and calculates the trust levels by requesting and providing the reputation information 4.3. USEIT Use case Information View Subscription • (1) The user sends a request to subscribe to the IoT Service Subscriptions Handlers. The user then can be notified when a new energy measurement is generated. • (1.1) The IoT Service Subscriptions Handler authenticates the user. To perform the authentication, the user can use mechanisms as login/password or X.509 certificates, or • (1.2) The user can employ a cryptographic proof from an anonymous credential, avoiding being unequivocally identified when suscribing

8

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

Figure 3. Information View for the USEIT Smart Building use case

• (1.3) In any case, one the user is authenticated, the attributes demonstrated are used to launch an authorization process Publish Data: • (1) A device (e.g. a smart meter) can generate a measurement and said data is sent through the controller or gateway through the network, reaching the IoT Service. This communication should be done securely by using a security association protocol such as DTLS. Furthermore, in case the device is powerful enough, it can encrypt and sign the data before sending it. • (2) The raw data is sent to Push data Handler IoT service. This service is in charge of authenticating and checking if the device is authorized to publish data to the IoT Broker. In particular: • (2.1) For the purposes of authentication, the IoT Service can use Public Key Cryptography (PKC) or using MAC/HMAC in case of using Symmetric Key Cryptography (SKC) • (2.2) For the purpose of authorization, the device can use a token of authorization that can be validated by the IoT service • (2.3) In the case the data from the IoT device is not secured, the IoT Service can use its capabilities to cipher and potentially sign the data on behalf of the IoT device before sending it to the IoT Broker. In case the IoT device is authenticated and authorized, the IoT service can convert the raw data into a formatted data (e.g. by using NGSI) • (3) The IoT Service can submit the updated data in the IoT Broker and the IoT Broker sends the information to the subscribers. • (4) When the new measurement is received, the user can use the corresponding key to decipher the data, and validate the signature.

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

9

5. USEIT Platform In this section we elaborate the different components that conform the USEIT platform and the rationale for their consideration in USEIT. 5.1. FI-WARE Introduction FIWARE7 , is a European platform for middleware, which promotes the development and the deployment of Future Internet applications. FIWARE provides a reference architecture, a specification and a implementation of different open interfaces that are called Generic Enablers (GEs). The FIWARE Catalogue has a rich library of Generic Enablers. The reference implementations permits developers to use functionalities such as the connection to Big Data analysis or the Internet of Things. 5.1.1. Main FI-WARE Components for Security and Privacy Some of the FI-WARE components will be used by USEIT to provide privacy and security. • The Orion Context Broker. An implementation of the GE Publish/Subscribe Context Broker, which provides the interfaces NGSI9 and NGSI10. With these interfaces, users can perform a set of operations: ∗ Register producer applications, e.g. the temperature sensor of a room ∗ Update information, e.g. send updates on the temperature ∗ Being notified when changes take place (e.g. the temperature has changed) or with a given frequency (e.g. get the temperature each minute) ∗ Query information. The Context Broker stores information from different applications and the queries are resolved based on that information. • Keyrock IdM. An implementation that is open source of the IdM system that is defined in FIWARE. the Keyrock IdM relies on standard protocols, for example OAuth and SAML, providing authentication and authorization features that allow the management of the users’ access to resources such as network, services and applications. The IdM GE is responsible also for the management of users profiles, SSO and identity federation across service domains. Keyrock relies on the Keystone implementatio of OpenStack IdM, providing extensions and an implementation of the SCIM standard. SCIM is used to reduce the complexity and cost of user management operations using a common schema, extension model and REST API with a rich and simple set of operations. • Orion PEP Proxy. The Orion Policy Enforcement Point (PEP) is a proxy that is meant to secure independent FI-WARE components, intercepting every request and validating it with the Access Control component. 5.2. Access Control and CP-ABE Integration Figure 4 shows an example of the integration related to the access control functionality. We use XACML for the implementation of the Policy Administration Point (PAP) and 7 https://catalogue.fiware.org

10

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

Figure 4. Access Control Integration on FI-WARE

the Policy Decision Point (PDP). These components have been deployed as web services. Additionally, we have added the Capability Manager as the component that generates DCapBAC tokens. These DCapBAC tokens are based on the proposal presented in [6] We consider two main external actors: • The Owner: This actor is the user or service that is in charge of defining access control policies, so guaranteeing only authorized users will be able to provide information to the platform. • The User: This actor represents a data consumer that aims to access data from the platform. Additionally, the USEIT platform integrates other components such as FI-WARE GEs mentioned previously. This way, the DCapBAC functionality is enabled defining different components within the platform, so automating the generation of DCapBAC tokens. This way, when a user intends to access data from the Context Broker (CB), first, requests a DCapBAC token for this action, by querying the Capability Manager (step 1). Then, the Capability Manager asks the KeyRock IdM to get the user’s identity attributes (steps 2-3). After that, the Capability Manager asks the PDP (step 4) to evaluate if the credential can be generated. The PDP queries the policies defined in the PAP (step 5) and evaluates them against the user’s request (step 6). In the case of an affirmative decision (step 7), the Capability Manager generates a token for the user (step 8) and it is finally delivered to the user (step 9). This token includes information such as the user’s public key, a specific action (e.g. NGSI queryContext method) over a data hosted by the Orion Context Broker, as an access right. It also includes time restrictions, restricting the validity period for this credential. Finally, the user can use this token to gain access to the data being queried (step 10). This token is evaluated by the PEP Proxy (step 11) and, in case the evaluation is succesful, the request is forwarded to the Orion Context Broker (CB) (step 12).

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

11

Figure 5. CP-ABE integration on FI-WARE

5.3. CP-ABE integration The CP-ABE functionality, is instantiated by three main components. The CP-ABE Key Generation Center (KGC), which is responsible for generating the CP-ABE keys, which are employed by users (acting as consumers of information) to decipher data gathered from the platform. The KeyRock IdM and the Orion Context Broker, both provide the same functionality described in the previous section. Different users (data producers and consumers) obtain CP-ABE key material from the KGC (step 0). Note that the KGC, when receiving a request, it gets the identity attributes associated to the user from the KeyRock IdM (step 1). This way, the CP-ABE private keys are associated to the set of identity attributes of every user. Additionally, the users receive the public parameters, which are used for ciphering and deciphering processes. A data producer can update his information in this central entity (step 2-3) using CP-ABE encryption by a specific policy or combination of identity attributes (e.g. people working at UMU). This way, end-to-end confidentiality is ensured, because the Orion Context Broker cannot access the shared data. Furthermore, the data can be shared to be accessible by a set users. Particularly, those users that satisfy the CP-ABE policy that ciphered the data. When the encrypted data is received, the users use the CP-ABE deciphering process using their private keys to get access to the shared data (step 4). 6. A USEIT setup This section introduces the setup of a smart objects use case scenario aimed at demonstrating the USEIT platform. 6.1. Smart Objects Scenario The purpose of the Smart Objects scenario is to emphasise of some of the major privacy/security requirements, abstracting smart city use cases where information is shared using a central data management. We borrow the definition of a smart object from [7] where a smart object is any “autonomous physical/digital objects augmented with sensing, processing, and network capabilities”. Additionally, we use the producer/consumer approach where a smart object can act as a producer or consumer of data.

12

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

Figure 6. Smart Objects scenario overview

6.2. Smart Objects Communication Setting Smart objects’ data can be shared across a central data platform, which processes and shared the data with other devices, users or applications. These devices are typically characterised by having resource constraints, and operating through pLLN (e.g. based on 6LoWPAN). This limits the use of traditional security mechanisms or advanced cryptographic mechanisms. These devices usually are connecte to the Internet through Gateways that are not so-constrained devices. 6.2.1. Participants Figure 6 provides an overview of the scenario. The main entities are: • Smart Object. Physical IoT devices that act as data producers and consumers. Typically battery-powered and resource-constrained devices, as well as legacy devices (i.e. without IP connectivity). • Gateway. Brings the ability to interconnect smart objects by abstracting their physical details, providing a common model to represent the data. Gateway can also assist smart objects to perform some taxing tasks such as performing resource-demanding cryptographic operations. • Platform. This entity manages the data from the smart objects (through gateways), as well as disseminates the data to different high-level services. The platform serves as a storage of information and allows entities to remain uncoupled. The platform also can perform task such as data analysis to provide high-level services based on knowledge extraction. • Services. They usually different applications that are based on the information provided by the platform, so providing different utilities to citizens and to other services. USEIT addresses the privacy and security requirements following two approaches. First, defining a concrete platform of components for the empowerment of users, en-

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data Web Protocol Stack

IoT Protocol Stack

HTTP

CoAP

TLS

DTLS

TCP

UDP

IPv4/IPv6

6LoWPAN

IEEE 802.3 IEEE 802.11

IEEE 802.15.4

13

Figure 7. Web and IoT protocol stack

abling them to define their preferences in terms of access to their information and privacy. Second, analyzing the integration of advanced cryptographic mechanisms into IoT devices with strong resource constraints, so ensuring end-to-end privacy and security properties. 6.2.2. Communication Channels The heterogeneity of use cases and IoT deployments considers a variety of communication protocols. In fact, protocols that are usually considered for the Web are still used on IoT scenarios in domains where bandwidth and network performance are not a problem (i.e. gateway-platform communications). However, where resource-constrained smart objects and networks are involved, these protocols have been adapted, redesigned or new protocols have been proposed to cope with these limitations. Figure 7 summarizes the mapping of Web protocols to the stack commonly used in the IoT domain. According to Figure 6, Gateway-Platform communications are normally done using the Web Protocol Stack. Hence, the challenge is in defining scalable mechanisms that allow the security management and abstraction of large numbers of smart objects and the associated data. For the smart object-gateway communication typically the IoT Protocol Stack is used. The Constrained Application Protocol (CoAP) [11] is used as a lightweight alternative to HTTP. CoAP also defines the use of DTLS to secure the communications using pre-shared keys, raw public keys or certificates. Other devices can communicate with the gateway through proprietary protocols. For this reason, there is a real need for cryptographic security mechanisms that can be applied at upper communication layers that are independent of the underlying protocols. For this, emerging approaches, such as COSE [10], represent a significant effort to apply advanced and optimized encryption and signature mechanism to be considered. 6.2.3. Types of Devices and Messages Following Figure 6, different components and entities typically compose an IoT scenario. Firslty, the gateways and the platform are typically considered as entities that are not resource constrained, and for which the use of conventional cryptographic mechanisms is not problematic. The challenge in this case is related with the definition of mechanisms to manage privacy and security aspects of large numbers of smart objects. Particularly, the definition of approaches to bind privacy and security properties with semantic approaches, such as OMA NGSI [2], so applications and users are abstracted from the details of the physical devices. This is important, particularly, for empowerment of users, enabling them to define their prefrences regarding security and access control.

14

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

Figure 8. Classes of Constrained Devices [4]

Secondly, smart objects are the collection of IoT devices deployed physically in the environment, which are responsible for capturing data from their environment and transmitting that data to other objects or services. These devices are typically of limited capacity, with the purpose of typically fulfilling a single simple task (e.g. measuring the temperature or humidity of a room). Additionally, they are usually heavily constrained in terms of memory and processing power. Concretely, Figure 8 depicts the terminology that is currently being considered during the project. Regardless of these limitations, the use of cryptographic mechanisms in these devices is required, so to ensure that typical security properties, such as end-to-end integrity and confidentiality can be maintained. For this, the application of public-key cryptography is challenging, due to the underlying mathematical operations. Furthermore, it should be noted that IoT device may be required to produce data periodically (e.g., every 5 seconds), so there is a need to define a cryptographic approaches that constrained devices can use and without exhausting their resources promptly. 7. Smart Building Use Case Smart buildings are an instantiation of the previous general setting, being a suitable scenario to showcase the applicability of the policy-based approaches and advanced cryptographic techniques to ensure the users’ empowerment for privacy and security. In these environments, a large amount of data coming from heterogeneous devices is shared to enable different services. Due to the amount and sensitive nature of such data, users’ privacy could be compromised if the data protection mechanisms are not implemented. Specifically, use case that we consider is based on a real scenario as part of an ongoing initiative at UMU premises, which is derived from the SMARTIE project [13]. 7.1. Smart Building with IDS The use case of the smart building can be fortified further by deploying mechanisms for advanced intrusion detection systems (IDSs) in said environment. The smart objects in the smart building are vulnerable to multiple attacks, which can be aimed to disrupt the network. The use of certain cryptographic primitives such as attribute-based encryption is not enough to secure the network against some types of attacks, where a smart object is cloned or compromised. A malicious smart object, which owns legitimate cryptographic keys can easily launch an internal on the network. At the network level, it can drop packets, delay the transmission, etc. At the application level, it can send false information to authorised recipients or break the access policy associated with encrypted data adding attributes of malicious entities. This case of internal attacks can only be detected using mechanisms of behavioural analysis to track unusual smart objects behaviour, the activity of the network and the interactions among smart objects, so detecting threat attempts or

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

Sys Admin

IDS Manager

Policy manager

Reaction Manager

15

PKG

Orchestrator

End User

IDS System Gateway

BMS 6LowPAN

Figure 9. Instantiation of the BMS with IDS use case

occurrences. Once an anomaly is detected, a reaction mechanism is launched to take the necessary measures to repair the situation. In the next Section we show an instantiation of said scenario in which we achieve a more secure management of the data being transmitted by the different IoT devices when the IDS systems detects an anomaly in the behaviour of the IoT devices. 7.2. Instantiation of the Smart Building with IDS To begin, we showcase the different entities involved in this scenario. Figure 9 shows the scenario we describe in the following. The 6LoWPAN network is populated by different IoT devices of varied nature (e.g., sensors, actuators, multisensors, etc.). This 6LoWPAN network has an entity called Gateway that enables the communication with the outside (Internet) provided the devices have gathered the permissions to do so. This entity is able to perform ciphering with CP-ABE with more complexity than the IoT devices since it is considered as resource-rich device. Within the IoT network, an the IDS system is deployed, which entails a number of RaspberryPI devices distributed in the 6LoWPAN network. These devices controlled by the IDS management system are natively equipped with tye necessary hardware and software tools capable of sniffing and understanding a large number of protocol technologies. The devices have also the necessary information to search for abnormal behaviour from the network. On the server’s side, we have the BMS that will implement the Orion Context Broker, where all the information from the sensors will be sent protected and ciphered with CP-ABE, and where all the users will subscribe to receive information.

16

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

The server’s side comprises also an orchestrator. This entity allows adopting an scalable, softwarized approach, which enables management and the enforcement of the policies that are setup by the System Admin. A Private Key Generator (PKG) is also used, which is a trusted authority that is responsible of setting up CP-ABE public parameters and generating CP-ABE private keys for users. After user authentication, the PKG generates the private keys based on user’s attributes. Since, the PKG knows the CP-ABE master key, it is capable of calculating the corresponding user’s private keys. The System Admin is the person in charge of configuring the system and making sure the policies to act in case abnormal behaviour is detected are in place. In this sense, the System Admin configures the orchestrator, that has a direct communication with the other entities: the PKG and the IoT manager. We have more entities in the architecture as shown before, such as the Resource directory, but for the sake of simplicity we will limit the use case to the entities that are prime actors. Now that we have presented the main actors in this scenario we proceed to describe the interaction process. 1. The System Admin puts in place a policy that says that all motes will cipher with CP-ABE. The policy in place will be chosen taking into account the limited capabilities of these devices. (a) The orchestrator then transforms the policy into practical actions, ordering the PKG to generate the key material necessary and distribute it to the motes. (b) The gateway, which has two APIs in place to send key material to the motes, and to perform a stronger re-encryption, receives the information from the PKG and pushes the key material into the IoT devices. 2. The IoT devices perform their normal operation, and by the evaluation of the IDS system, some misbehaviour is recognised (e.g., sent information to unintended recipients) (a) The IDS system sends this information to the orchestrator. 3. The orchestrator, based on a configuration set up by the System Admin, puts in motion a more restrictive policy, which says that the Gateway has to gather all the information and ciphered it with a more specific CP-ABE policy that will be more difficult to decipher to unintended recipients. (a) This policy is translated in to the action of sending the key material and policy to the Gateway in a specific message (b) The needed key material also to the intended users, if they were not in possession of a key necessary to decipher the information. Throughout the normal operation of the IoT devices, they constantly send data to the Context Broker of the BMS system, so that the users that are subscribed to said information can obtain it and decipher with the CP-ABE keys they have received. Next we see how specifically the aforementioned steps are taken, by explaining how the policies are generated and how the APIs of the Gateway work to enable pushing the key material and policies to the IoT devices. This process can be proactive or reactive depending on if the procedure is started proactively by the administrator or it is started as a reaction to an attack in order to deploy a countermeasure.

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

17

7.3. Proactive user data protection policy-based setup In this approach the administrator decides to apply proactively user data protection as a security measure for a specific IoT device and user. Figure 10 shows the workflow in order to perform this user data protection policy-based setup. In this case, the administrator defines an user data protection security policy by using a XML-based security policy language (Fig. 10-1). Listing 1 shows an extract of the security policy. The security policy model is composed by the capabilities, configuration rules and configuration conditions. Listing 1: Extract from the security policy

P r i v a c y

...

ENCRYPT USER DATA



a t t r 1 −key a t t r 1 −v a l u e ...



SOURCE IP DEST IP ...

In this case it specifies that as part of the Privacy capability of our system, it is required to ENCRYPT USER DATA by using attribute based encryption. The kind of attributes can be specified as a set of key-value pairs and the security policy will be applied for specific source and destination. Once the security policy has been properly defined, the administrator requests the policy enforcement to the orchestrator. The orchestrator analyses the security policy capability and it decides by using a capability mapping that the security policy will be enforced through the IoT gateway in order to perform the Attribute Based Encryption setup. Then it requests a policy translation to transform the security policy in specific configurations (the access policy associated to the data of a certain sensor device) (Fig. 10-2). At this point the orchestrator retrieves the required keys for the user data encryption from the PKG entity (ABE public parameters). (Fig. 10-3), so the orchestrator gathers the configuration for the gateway and the required crypto material. Finally, it requests the policy enforcement to the gateway though a ABE setup API (Fig. 10-4a) which will forward to the IoT device the ABE public parameters and the access policy associated to their data. At the same time, the orchestrator will distribute the crypto material to the user (Fig. 10-4b).

18

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

Listing 2: ABE Setup API example HTTP m e s s a g e POST h t t p : / / IP@gateway : P o r t / admin / s e t u p ? d= @sensor ” HTTP body ( a s a JSON o b j e c t ) : ’ a b e s e t u p ’ : m e s s a g e s i z e ( f o r ex . 01048 b y t e s )\ n | | ( p o l i c y )\ n | | b ’ 1 : g1\n | | b ’ 1 : g1 ˆ a\n | | b ’ 2 : g2\n | | b ’ 2 : g2 ˆ a\n | | b ’ 3 : e ( gg ) ˆ a l p h a \n ( Example o f Body ) { ’ a b e s e t u p ’ : ”01048\ n (A and B)\ nb ’ 1 : 1 4 3 8 1 4 F6531240893756048A41B694C9ED9D6A2A0008B3A0BAF52B2858C 5EE597572DB9FBC2BBFA2E800 ’\ nb ’ 1 : 1 E19783375BAE1D04C286A41EA84EDAFB84BFA9A001CDB894418F9E0BC656DDD E7FF213026BF1B737500 ’\ nb ’ 2 : 0 A6B756B3BF7E5E51AACA471DEE08CD42D5D5B93000C896BC509D9F414057458D6CB1 52 BACC76859D900027E3D4B47D502E08290E01E69B99C07C90701510001953D086EE6464483EA340FF70E26497283D51 200 ’\ nb ’ 2 : 0 7 FFC93352B24A3CBF3B05215217A3434AC5D7910009DED9A984E8C9FA7D9C2308237DD98B51AB8B360003 CC01FE5E9DC77EFA943D5BCABC15DA89D73DE000082DFD95E7C89DA20663FF6F7499EBE0C1D717CB00 ’\ nb ’ 3 : 1 AF071A 3A352651A7E56EF73C9EAD722402152830018E25041BDF48DF8C5026AEA9FCB2BF4416AA998000C793CCFAD4DB4E4853 0C18E95E11EF6E3D450AC001CB94F88FC90409F1173B9A7ACF9B3ABF2AA716200128F2E0674F705F7CDD97A0D854BF7A 0CD9EBA9800090C03E1646ED7D4251AFD273A77C3D1A88298400009D38B8962DDF49F95432270FA2FEC71FBB40637001 AFADB9B126F5E837AE7DC761C8254604D90F7C30015B40E91AACB1B741BDD78A77D4AB411E326D5F40021E1F95C499AF D3EC37410CDDAD847BDEFB3D5F3001E6DAD97A4E3FBB40BABF27A77B3C650B5B3EEAF000368FDC5F612A304B239DBC96 E926C7D017FE40300 ’\ n”}

Listing 2 shows an example of the ABE setup API provided by the IoT gateway. The ABE setup message is a HTTP POST message. The first part shows the HTTP message whose body codifies the ABE public parameters and the access policy associated. The second part shows a JSON codification example for the aforementioned message. 1

Sys Admin

Policy manager

PKG

3 2

IDS Manager

Reaction Manager

Orchestrator

4b

4a

End User

IDS System Gateway

BMS 6LowPAN

Figure 10. Proactive user data protection policy-based setup

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

19

7.4. Reactive user data protection policy-based setup Unlike the previous approach, Fig. 11 shows the process when it is triggered automatically as a reaction of the events received by the IDS Manager. Specifically, the IDS system detects an unusual behaviour and notifies it to the IDS Manager (Fig. 11-1). The IDS Manager analyses the events and it sends an alert to the reaction manager, also indicating the involved devices as well as the kind of event (Fig. 11-2). The reaction manager then retrieves a security policy template from the policy manager, according on the kind of countermeasure we need in order to deal with the received alert. For instance, it will require a stronger access policy through the re-encyption of the user data by adding new attributes to the security policy (Fig. 11-3). The reaction module fills the policy template with the affected source and destination as well as the new attributes and it requests the policy enforcement to the security orchestrator (Fig. 11-4). Once the orchestrator receives the security policy, the process is quite similar to the proactive approach. This is, the orchestrator obtains the required configurations of the system by using the policy manager (Fig. 11-5) and the new required cryptographic material from the PKG (Fig. 11-6). Then, it enforces the new configuration at the gateway that will re-encrypt the cyphertext of the specified IoT device (Fig. 11-7a). In this case the orchestrator also distributes the required cryptographic material to the user.

Sys Admin

Policy manager

PKG

3

6 5

2 4 IDS Manager

Reaction Manager

1

Orchestrator

7b

7a

End User

IDS System Gateway

BMS 6LowPAN

Figure 11. Reactive user data protection policy-based setup

20

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

Listing 3: ABE Setup API example HTTP POST h t t p : / / IP@gateway : P o r t / admin / r e e n c r y p t ? d= @sensor HTTP body ( a s a JSON o b j e c t ) : ’ a b e r e −e n c r y p t ’ : replaced a t t r i b u t e \n | | a d d e d a c c e s s p o l i c y \n | | c u r r e n t a c c e s s p o l i c y \n | | b ’ 1 : g1\n | | b ’ 1 : g1 ˆ a\n | | b ’ 2 : g2\n | | b ’ 2 : g2 ˆ a\n | | b ’ 3 : e ( gg ) ˆ a l p h a \n ( Example o f Body ) { ’ a b e r e −e n c r y p t ’ : ”B\n ( C and D)\ n (A and B)\ nb ’ 1 : 1 4 3 8 1 4 F6531240893756048A41B694C9ED9D6A2A0008B3 A0BAF52B2858C5EE597572DB9FBC2BBFA2E800 ’\ nb ’ 1 : 1 E19783375BAE1D04C286A41EA84EDAFB84BFA9A001CDB8944 18 F9E0BC656DDDE7FF213026BF1B737500 ’\ nb ’ 2 : 0 A6B756B3BF7E5E51AACA471DEE08CD42D5D5B93000C896BC509D9 F414057458D6CB152BACC76859D900027E3D4B47D502E08290E01E69B99C07C90701510001953D086EE6464483EA340 FF70E26497283D51200 ’\ nb ’ 2 : 0 7 FFC93352B24A3CBF3B05215217A3434AC5D7910009DED9A984E8C9FA7D9C2308237 DD98B51AB8B360003CC01FE5E9DC77EFA943D5BCABC15DA89D73DE000082DFD95E7C89DA20663FF6F7499EBE0C1D717 CB00’\ nb ’ 3 : 1 AF071A3A352651A7E56EF73C9EAD722402152830018E25041BDF48DF8C5026AEA9FCB2BF4416AA99800 0C793CCFAD4DB4E48530C18E95E11EF6E3D450AC001CB94F88FC90409F1173B9A7ACF9B3ABF2AA716200128F2E0674F 705 F7CDD97A0D854BF7A0CD9EBA9800090C03E1646ED7D4251AFD273A77C3D1A88298400009D38B8962DDF49F954322 70FA2FEC71FBB40637001AFADB9B126F5E837AE7DC761C8254604D90F7C30015B40E91AACB1B741BDD78A77D4AB411E 326D5F40021E1F95C499AFD3EC37410CDDAD847BDEFB3D5F3001E6DAD97A4E3FBB40BABF27A77B3C650B5B3EEAF0003 68 FDC5F612A304B239DBC96E926C7D017FE40300 ’\ n”}

Listing 3 shows an example of the ABE re-encrypt algorithm API provided by the IoT gateway. The ABE re-encrypt message is a HTTP POST message. The first part shows the HTTP message whose body codifies the new ABE public parameters and the new access policy associated. The second part shows a JSON codification example for the new aforementioned message. 7.5. Context Broker Publication/Subscription The IoT devices engages in periodic publication of data according to their normal operation. The process of publishing information onto the Context Broker as part of the BMS entails the use of the NGSI API [] to communicate with it and publish the information. This process, since it entails the use of HTTP and we assueme here the devices are using a Contstrained protocol as CoAP[12], the additional processing is done by a proxy entity that is co-located with the Context Broker, and acts as a proxy. On the other hand, users that are interested on having the latest information on the updates of the IoT devices can subscribe to it, so every time new information is sent to the Context Broker they can receive it, following the publish/subscribe model. In the context of USEIT, the information that is sent to to the BMS is ciphered with CP-ABE and therefore obscured to the unauthorised user. Next we show the flow of operation from the IoT device or Gateway, that ciphers the information with CP-ABE accordingly and send this data to the BMS. On the other hand we show how the user gathers the message and with the key material that it possesses deciphers the message. Next we show examples of how the interaction with the BMS would go with the scripts that can be used to publish information into the Context Broker. 7.5.1. Creating the NGSI publihed data Listing 4 shows an example of a code that can be used by the send information ot the BMS. In this specific case, we are showing that the entity with an ID ”USEITTempSensor-1” is sending the current temperature it got. For the sake of simplicity here it is not shown the encrypted value.

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

21

Listing 4: Script to create entity in the BMS POST U S E I T C o n t e x t B r o k e r I P : 1 0 2 6 / v1 / u p d a t e C o n t e x t C o n t e n t −Type : a p p l i c a t i o n / j s o n f i w a r e −s e r v i c e : USEIT f i w a r e −s e r v i c e p a t h : / i n t e g r a t i o n Accept : a p p l i c a t i o n / j s o n ( Example o f Body ) { ” contextElements ”: [ { ” t y p e ” : ” USEIT−TempSensor ” , ” isPattern ”: ” false ” , ” i d ” : ” USEIT−TempSensor −1” , ” attributes ”: [ { ” name ” : ” T e m p e r a t u r e ” , ” type ”: ” f l o a t ” , ” value ”: ”25.0” } ] } ], ” u p d a t e A c t i o n ” : ”APPEND” }

7.5.2. Subscribing to the BMS the data Listing 5 shows an example of a code that can be used to subscribe to the BMS. In this specific case, we are showing that the user sending the request to the BMS is subscribing to the entity with ID ”USEIT-TempSensor-1” and that it requests that the information about that device shall be sent whenever the Temperature value changes. Listing 5: Script to subscribe to the BMS POST U S E I T C o n t e x t B r o k e r I P : 1 0 2 6 / v1 / u p d a t e C o n t e x t C o n t e n t −Type : a p p l i c a t i o n / j s o n f i w a r e −s e r v i c e : USEIT f i w a r e −s e r v i c e p a t h : / i n t e g r a t i o n Accept : a p p l i c a t i o n / j s o n ( Example o f Body ) { ” e n t i t i e s ”: [ { ” t y p e ” : ” USEIT−TempSensor ” , ” isPattern ”: ” false ” , ” i d ” : ” USEIT−TempSensor ” , } ], ” a t t r i b u t e s ”: [”∗”] , ” r e f e r e n c e ” : ” h t t p : / / < I P o f t h e s u b s c r i b e d C l i e n t >/ a c c u m u l a t e ” , ” d u r a t i o n ” : ”P1M” , ” notifyConditions ”: [ { ” t y p e ” : ”ONCHANGE” , ” condValues ” : [ ” Temperature ” ] } ], ” t h r o t t l i n g ” : ” PT5S ” }

22

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

8. Conclusions In this book chapter, we have presented the USEIT project as an initiative to empower the end users that are the ones that ultimately use this technology, by providing their means to protect their data and their privacy. We explain the foundations of the USEIT platform, the tools and the framework that it uses, as well as a specific use case that is the focus of this book chapter, which shows how the USEIT platform can be leveraged to improve the security and privacy of the data being sent in a Building Management System (BMS) to different users of said system. We show the integration of the different actors of the USEIT platform and how through advanced policy-based orchestration we are able to facilitate the work of a System Administrator that rest assured that when some misbehaviour is detected by the IDS, the information that comes from the Smart Building will be property protected, avoiding its misuse by unintended third parties. We conclude that through the use of proper policies and systems in place we can provide the Internet of Things environment with the latest security means to push the IoT towards the next level of development. Acknowledgements This work was supported by the USEIT project under Grant CHIST-ERA PCIN-2016010, by the H2020 project OLYMPUS under the EU grant 786725 and, also in part by the Spanish Ministry of Economy and Competitiveness Doctorado Industrial Grant DI16-08432. References [1] [2]

[3] [4] [5] [6]

[7] [8] [9] [10] [11]

A. Bassi, M. Bauer, M. Fiedler, T. Kramp, R. van Kranenburg, S. Lange, and S. Meissner. Enabling Things to Talk. Springer Science Business Media, 2013. M. Bauer, E. Kovacs, A. Schulke, N. Ito, C. Criminisi, L. Goix, and M. Valla. The Context API in the OMA Next Generation Service Interface. In 2010 14th International Conference on Intelligence in Next Generation Networks. Institute of Electrical & Electronics Engineers (IEEE), 2010. J. Bethencourt, A. Sahai, and B. Waters. Ciphertext-policy attribute-based encryption. In 2007, pages 321–334. IEEE Computer Society Press, May 2007. C. Bormann, M. Ersue, and A. Keranen. Terminology for Constrained-Node Networks. RFC 7228 (Informational), May 2014. W. E. Forum. Personal data: The emergence of a new asset class. https://www.weforum.org/reports/personal-data-emergence-new-asset-class, 2011. J. L. Hernandez-Ramos, M. P. Pawlowski, A. J. Jara, A. F. Skarmeta, and L. Ladid. Toward a lightweight authentication and authorization framework for smart objects. IEEE Journal on Selected Areas in Communications, 33(4):690–702, 2015. G. Kortuem, F. Kawsar, V. Sundramoorthy, and D. Fitton. Smart objects as building blocks for the internet of things. IEEE Internet Computing, 14(1):44–51, 2010. A. Poikola, K. Kuikkaniemi, and H. Honko. Mydata a nordic model for human-centered personal data management and processing. Finnish Ministry of Transport and Communications, 2015. A. Sahai and B. R. Waters. Fuzzy identity-based encryption. In R. Cramer, editor, 2005, volume 3494 of LNCS, pages 457–473. eurocryptpub, May 2005. J. Schaad. CBOR Object Signing and Encryption (COSE). RFC 8152, July 2017. Z. Shelby, K. Hartke, and C. Bormann. RFC 7252 - The Constrained Application Protocol (CoAP). 2014.

D. Garcia-Carrillo et al. / USEIT Project: Empowering the Users to Protect Their Data

[12] [13] [14]

[15]

23

Z. Shelby, K. Hartke, and C. Bormann. The Constrained Application Protocol (CoAP). RFC 7252 (Proposed Standard), June 2014. Updated by RFC 7959. SMARTIE. Deliverable 2.2: Smartie requirements, 2015. S. Ziegler, C. Crettaz, L. Ladid, S. Krco, B. Pokric, A. F. Skarmeta, A. Jara, W. Kastner, and M. Jung. IoT6 – Moving to an IPv6-Based Future IoT. In The Future Internet, pages 161–172. Springer Science Business Media, 2013. M. Zorzi, A. Gluhak, S. Lange, and A. Bassi. From today’s INTRAnet of things to a future INTERnet of things: a wireless-and mobility-related view. Wireless Communications, IEEE, 17(6):44–51, 2010.

24

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved. doi:10.3233/AISE200003

Privacy Awareness for IoT Platforms: BRAIN-IoT Approach Mohammad Rifat Ahmmad RASHID a,1 , Davide CONZON a , Xu TAO a , and Enrico FERRERA a a Links foundation, Italy Abstract. With the increasing adaptation of Internet of Things (IoT) platforms in decentralized cloud environments, more focus given towards facilitating the privacy awareness building upon goals set by current European Union (EU) General Data Protection Regulation (GDPR) regulations. Therefore, it is necessary to empower the end users (both private and corporate) of IoT platforms with the capability of deciding which combination of self-hosted or cloud-oriented IoT systems are most suitable to handle the personal data they generate and own as well as with the ability to change the existing (or pre-set) configurations at any time. Furthermore, adaptation of GDPR regulations in IoT platforms is challenging as there are needs for significant efforts to integrate privacy policies in a programmatic way to: (i) increase awareness of users about which data is collected, where it is transmitted, by whom, etc.; (ii) provide controls to enable users to notify such aspects, being at the same time aware of how such a decision affects the quality of the IoT services provided in that IoT platform. BRAIN-IoT project focuses on complex scenarios where actuation and control are cooperatively supported by populations of IoT systems. The breakthrough targeted by BRAIN-IoT is to provide solutions to embed privacy-awareness and privacy control features in IoT solutions. In this work, the authors explore the following key areas: (a) privacy awareness in IoT systems using GDPR regulations and BRAIN-IoT platform, and (b) propose a conceptual framework for Privacy Impact Assessment (PIA) using privacy principles presented in GDPR regulations. The proposed privacy awareness framework is cross-platform, so it is suitable to support a wide number of heterogeneous IoT systems, deployed by corporate and private users. Keywords. Privacy, IoT Platform, GDPR

1. Introduction In the recent years, increasing number of IoT products and services are being widely deployed in all professional and mass-market usage scenarios [3,4]. The optimistic forecasts released in last years about revenues [3,4] of deployed devices, demonstrated the value of IoT solutions in real-scale operational conditions [5]. Moreover, commercial and pilot deployments are progressively demonstrating the importance of IoT platforms in 1 Corresponding Author: Mohammad Rifat Ahmmad Rashid, LINKS foundation, Italy; E-mail: [email protected]

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

25

complex usage scenarios [5]. This exponential growth of diverse IoT platforms leading to magnitude of market opportunities associated with the IoT products, but also in the rise of some associated challenges and concerns. Furthermore, the IoT technology and market landscape will become increasingly complex in the longer term, i.e., ten or more years from now [7], especially after IoT technologies will proven their full potential in business-critical and privacy-sensitive scenarios. Following this market trend, various organizations have started studying how to employ IoT systems to support tasks involving privacy awareness and control [16], resulting in a demand for more dependable and smart IoT platforms able also to provide privacy aware behaviours in isolation from the Internet. Privacy can be conceptualised as ”the right to be left alone” [12]. It refers to the process of disclosing and mobilizing personal data under certain conditions and safeguarding measures. From technical perspectives, IoT systems requires context-based technological advancement regarding privacy awareness, keeping consumers convenience as the primary concern [13]. Solutions suitable to tackle such challenges are still missing. For instance, in smart city scenarios, many initiatives fall into the temptation of developing new IoT platforms, protocols, models or tools aiming to deliver the ultimate solution that will solve all the IoT challenges and become the reference IoT platform or standard. Instead, usually, they result in the creation of yet-another IoT solution or standard. In this paper, the authors explore the challenges of privacy awareness in futuristic IoT environments, considering actuation in dependable fashion, by introducing privacy awareness and control approach in the BRAIN-IoT platform, which is a novel solution for building decentralized IoT platforms, based on inter-networking across heterogeneous existing IoT systems. BRAIN-IoT has been designed to support dynamicity and heterogeneity in terms of new systems, protocols and technologies, by allowing dynamic edge/cloud reconfiguration. Moreover, BRAIN-oT also features a dedicated framework for smart dynamic behavior embedding Artificial Intelligence (AI)/Machine Learning (ML) enablers to provide connectivity and intelligence, actuation and control features. Another advanced approach proposed by BRAIN-IoT is security and privacy protection approach, which supports a set of end-to-end security, privacy and trust enablers suitable for fully decentralized environments. The BRAIN-IoT platform addresses the challenges of privacy awareness for IoT platform using GDPR core principles. The authors divide this research goal into two research questions: RQ1: Which features can be used to assess privacy awareness in the IoT platforms? RQ2: Which privacy impact assessment approach can be defined on top of the GDPR based privacy principles? In response to RQ1, the authors explore key privacy principles introduced in GDPR regulation and International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) 29100 standard [19]. Using these privacy principles, the authors propose privacy assessment measures that can be used to detect privacy impact and address control criteria. To address RQ2, the authors propose an iterative PIA approach that profiles different privacy principles from GDPR and perform privacy compliance evaluation for the IoT platform. The main contributions of this work are: (a) privacy compliance evaluation approach using GDPR key privacy principles and ISO/IEC 29000 standard; and (b) introduce a conceptual framework for analyzing privacy risks using the measures from privacy compliance evaluation.

26

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

This paper is organized as follows: Section 2, outlines the key concepts regarding privacy awareness for IoT platforms. Section 3, presents the related work focusing on GDPR requirements related to IoT platforms, current standards and tools for PIA. Section 4 presents the privacy compliance evaluation approach that relies on the GDPR privacy principles and ISO/IEC 29100 standard. Section 5 outlines the privacy awareness and control integration approach in BRAIN-IoT platform. Finally, the paper concludes with Section 6, by summarizing the main findings and outlining future research activities.

2. Background and Motivation A paradigm shift is expected to happen as technology evolution will allow to safely deploy IoT systems in scenarios involving actuation and strict requirements characterization in terms of dependability, security, privacy and safety constraints, resulting in convergence between IoT and CPS [7]. In this section, the authors present an overview of the three key research areas: (i) EU GDPR, (ii) PIA, and (iii) security, safety and privacy concerns in IoT platforms. 2.1. EU General Data Protection Regulation (GDPR) General Data Protection Regulation (GDPR) is the new regulation in EU law on data protection and privacy for all individual within the EU end the European Economic Area (ECA) [10]. It came into force on May 25, 2018. The goal is to help aligning existing protocols while increasing the levels of protection for individuals. For an organization, be compliant with GDPR is not simple because it is very strict, but once it is compliant, it can confidently do business across the EU. Understanding an individual role in relation to the personal data is crucial in ensuring compliance with the GDPR and the fair treatment of individuals. Article 4 defines the roles of data controllers and data processors as: Controller means the natural or legal person, public authority, agency or other body, which, alone or jointly with others, determines the purposes and means of the processing of personal data. According to Article 25, controllers of personal data must put in place appropriate technical and organizational measures to implement the personal data protection principles. Processor means a natural or legal person, public authority, agency or other body, which processes personal data on behalf of the controller. A processor of personal data must clearly disclose any data collection, declare the lawful basis and purpose for data processing, and state how long data is being retained and if it is being shared with any third parties or outside of the European Environment Agency (EEA). For collecting, processing and storing individuals personal data, the GDPR present six key data protection principles. These six principles are: Lawfulness, fairness and transparency. Considering lawfulness, the organisations need to have a thorough comprehension of the GDPR and its regulations for data collection. Taking into account fairness and transparency, an organisations should mention in their privacy policy the type of data they are collecting and the reason why, they are collecting it. Purpose limitation. For personal data collection, an organisation should explicitly specify the legitimate purposes, and only collect necessary data to achieve that purpose.

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

27

Data minimisation. Considering the personal data for processing purposes, an organisation must only process necessary data to achieve its purposes. This approach has two main benefits: firstly, in the event of unauthorized data access, the unauthorised person will only has access to a small amount of data; secondly, it is easier to keep data accurate and up to date. Accuracy. The accuracy of personal data is essential to any data protection approaches. Taking into account accuracy of personal data, in the GDPR regulation, it is clearly pointed out that, every reasonable step must be taken to erase or rectify data that is inaccurate or incomplete without delay [10]. Storage limitation. Organisations need to delete personal data, when its no longer necessary. Integrity and confidentiality (security). The GDPR regulations pointed out that the security aspects of personal data must be ”processed in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures”[10]. 2.2. Privacy Impact Assessment (PIA) Privacy Impact Assessment (PIA) is a systematic process conducted by an organization for evaluating regulations in terms of its impact upon privacy. This impact assessment approach is intended for data controllers, who shall demonstrate their PIA as well as the control criterias used for evaluation. Also, IoT platform owner shall present that their solutions do not violate privacy thanks to a design that keep in consideration privacy aspects (concept of Privacy by Design, see Art. 25 of the GDPR [14]). In [16], the authors identified the following stakeholders for a PIA approach: (i) decision-making authorities: who commission and validate the creation of new processings of personal data or products; (ii) project owners: who must assess risks to their system and define the security objectives; (iii) prime contractors: who must propose solutions to address risks pursuant to the objectives identified by project owners; (iv) Data Protection Officers (DPO): who support project owners and decision-making authorities in the area of personal data protection; (v) Chief Information Security Officers (CISO): who must support project owners in the area of Information Security (IS). 2.3. Security, safety and privacy concerns in IoT platforms An important constraint to the diffusion of the IoT platforms are security, safety and privacy concerns. The distributed nature of the IoT environment makes the enforcement of proper security and privacy practices intrinsically challenging. For example, the market demands for IoT solutions suitable to safely support business-critical tasks, which can be deployed rapidly and with low costs; but, many of todays IoT-based products are implemented with low awareness of security and privacy risks. As a result, many IoT products lack even basic security mechanisms, resulting in critical effects when such flaws deployed to mass-market scenarios [8]. BRAIN-IoT approach introduces a holistic end-to-end trust framework by adopting standard security protocols, joint with distributed security approaches derived by peer-to-peer systems (e.g., block-chain). Furthermore, BRAIN-IoT privacy-awareness

28

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

and control approach facilitate the adoption of privacy control policies in decentralized environments building upon goals set by current EU GDPR [10].

3. Literature Review This section provides an overview of the State of The Art (SoTA) in the context of (i) GDPR implication in IoT domain, and (ii) current standards and tools for PIA. 3.1. GDPR Requirements related to IoT domain IoT applications are emerging across myriad sectors, for example in health-care, energy consumption and utility monitoring, transportation and traffic control, logistics, production and supply chain management, agriculture, public space and environmental monitoring, social interactions, personalised shopping and commerce, and domestic automation [2]. In these applications, to function properly as well as to optimse and customise their services, the IoT devices constantly collect vast amounts of personal data such as smart health applications that collect location data and health data (e.g. Fitbit) [9]. In IoT applications, objects and services are connected to each other and may share data about a specific user to provide services. However, a stakeholder (controller and processor) can utilize these information for privacy sensitive applications. This process of interaction may lead to users perception of invasive, unexpected and unwelcome from an IoT platform. Taking into account the vulnerabilities in the cybersecurity standards, often owing to the limited computational power of identifying technologies such as Wireless Fidelity (WiFi) or Radio Frequency IDentification (RFID) can intensify privacy risks [2]. Together, these risks make free and well-informed consent challenging in the IoT applications. Considering the risk of personal data processing and linkage of user records, privacy policies often fail to communicate clearly the privacy risks [15]. The GDPR creates governing principles of personal data processing (Articles 5) and guideline for individual rights (Article 25 and 33) that can help to perform PIA relevant for IoT devices [16]. In particular, GDPR [10] helps for transparency (Article 5), data storage, access, rectification, and deletion (Articles 5, 15-17), informed consent (Article 7), notification duties (Articles 13-14 and 33), automated decision-making and profiling (Articles 21-22), privacy by design and privacy by default (Article 25), cybersecurity (Articles 33-34), and data protection impact assessment (Article 35-36). Tools created based on GDPR for PIA [16] can help to undermine the tendency of IoT devices and services to collect, share, and store large and varied types of personal data, to operate seamlessly and covertly, and to personalize functions based on prior behaviour. For example, EU H2020 project [6] explore the benefit of GDPR privacy principles in the domain of smart city. In BRAIN-IoT context, the term privacy denotes ”the right of an entity to be protected from unauthorized disclosure of personal information that are contained in BRAIN-IoT platform”. In this paper, the authors explore the key privacy principles from GDPR to perform privacy compliance evaluation for any IoT systems.

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

29

3.2. Current standards and tools for PIA Personal information, in different forms, is continuously gathered and processed by modern applications [10]. This includes data that is required for accessing a certain service (e.g., email address, credit card number); additional personal data that the user may provide for an enriched user experience (e.g., pictures, connections to social networks); and data that can be automatically gathered by the service provider (e.g., usage pattern, approximate location) [16]. This section outlines the common standards and tools for PIA. 3.2.1. Standards for PIA The elements required to provide specifications about how applications should handle privacy are mainly based on two main sources: privacy principles and reference models. Regarding the privacy principles, the fair information practices developed by the Organization for Economic Cooperation and Development (OECD) [17] and the Global Privacy Standard [18] are commonly used in various application. However, less focus has been given towards privacy awareness and control in a decentralized environment. The most common reference models are the ISO/IEC 29100 [19], the ISO/IEC 29101 [20], the Organization for the Advancement of Structured Information Standards (OASIS) Privacy Management Reference Model [13] and the reference architecture proposed by Basso et al.[21]. The ISO/IEC 29100 describes a high-level framework for the protection of Personal Identifiable Information (PII), sets a common privacy terminology, defines privacy principles and categorizes privacy features. Also, ISO/IEC 29100 is intended to be used by persons and organizations involved in designing, developing, procuring, testing, maintaining, and operating information and communication technology systems, where privacy controls are required for the functioning of PII. This privacy framework is developed with the purpose of serving as assistance to organizations to define their privacy safeguarding requirements related to all information involved through following attributes: (i) by specifying a common privacy terminology; (ii) by defining the actors and their roles in processing PII; (iii) by describing privacy safeguarding considerations; and (iv) by providing references to known privacy principles for Information and Communication Technology (ICT). The continually increased complexity of ICT systems has made it difficult for organizations to ensure that their privacy is protected, and with the high commercial use of PII, achieving compliance with various applicable laws has become harder nowadays. Therefore, the ISO/IEC 29100 standard [19] has eleven substantive privacy principles that are developed to take account of applicable legal and regulatory, contractual, commercial and other relevant factors. The ten principles are: (1) Consent and choice, (2) Purpose legitimacy and specification, (3) Collection limitation, (4) Data minimization, (5) Use, retention and disclosure limitation, (6) Accuracy and quality, (7) Openness, transparency and notice, (8) Individual participation and access, (9) Accountability, and (10) Information security. The ISO/IEC 29101 standard[20] describes a reference architecture with best practices for a technical implementation of privacy requirements. It covers the various stages in data life cycle management and the required privacy functionalities for PII in each data life cycle, as well as positioning the roles and responsibilities of all involved parties in information and communication systems development and management. The OASIS[13] Privacy Management Reference Model is a conceptual model, which helps understand-

30

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

ing and implementing appropriate operational privacy management functionalities and supporting mechanisms capable of executing privacy controls in line with privacy policies. In [21] the authors proposed privacy reference architecture, through a privacy logical layer, and a set of features that should be considered during the development of an application in order to protect the privacy of personal information. Also, in [21] the authors considered users to inform their privacy preferences about their PII, agreeing or not with the policies or part of them. Since privacy enforcement are considered as part of the privacy awareness requirements, these standards could help in the specification of PIA. These standards can be used to guide, design, develop, and implement privacy policies and controls, they can also be used as a reference point in the monitoring and measurement of performance benchmarking and auditing aspects of privacy management programs in an organization. In the BRAIN-IoT context, apart from GDPR regulations, the Consortium explores ISO/IEC 29100 standard to identify privacy control criteria. 3.2.2. Tools for PIA Unified Modeling Language (UML) provides system architects and software engineers the means for analysis, design, and implementation of software-based systems as well as for modelling business and similar processes. With UML 2.0, it has been defined the UML profile mechanism. By defining a UML profile, the language engineer can specify stereotypes, with new attributes, that extend the base UML metaclasses. UML in itself is not a methodology but only a language that can be extended for domain-specific modelling. By applying a profile on a model, the stereotypes can be applied on base UML elements, giving them domain-specific syntax and semantics. At the Object Management Group (OMG), several UML semantics specifications have been standardized under the executable UML [1]. Considering the non-functional properties profiling, there are other approaches similar to OMG, e.g., the Quality of Service and Fault Tolerance (QoS-FT) [23], Modeling and Analysis of Real Time and Embedded systems (MARTE) [24], and Systems Modelling Language (SysML) [25]. In [15] the authors, propose a UML profile for privacy-aware applications. This UML based profiling approach helps building UML models that specify and structure concepts of privacy. Although UML based profiling can help to model the privacy criteria, but less focus has been given towards generalizing privacy control approach for IoT platforms. BRAIN-IoT will extend the IoT-Modeling Language (IoT-ML) adding some specific modelling profile that can be used to describe privacy awareness polices. IoT-ML is a UML-based modeling language built upon the Internet of Things Architecture (IoT-A) architecture framework [11]. It adds to the key concepts provided by IoT-A to describe IoT systems the ability to integrate new IoT domain models [28].

4. A Conceptual Privacy Awareness Framework A PIA approach can assist the controller of an IoT system to identify any specific risks of privacy breaches involved in an envisaged operation. More specifically, a PIA is an essential approach to identify privacy risk, perform risk analysis, risk evaluation, personal data flow and storage. In the BRAIN-IoT scope, the PIA can be performed in the context of two components, IoT devices and IoT systems for various use cases.

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

31

Figure 1. PIA approach for BRAIN-IoT.

Following the ISO/IEC 29100 standard, it is important to identify the actors and their roles in processing PII for information assets (according to ISO/IEC 27000). In this work, the term ”information assets” is used as shorthand to refer to all tangible or intangible things or characteristics that have value to an IoT platform. The PII or information assets can be defined as any information that used to identify privacy principle to whom such information relates or might be directly or indirectly linked. Further, an organization must consider a set of requirements when processing information assets with respect to the privacy policies. Overall, PIA is a continuous improvement process. Sometimes, it requires several iterations to achieve an acceptable privacy protection system. It also requires a monitoring of changes over time (in context, controls, risks, etc.), for example, every year, and updates whenever a significant change occurs. Figure 1 illustrates the proposed PIA approach for BRAIN-IoT. The proposed approach is based on the following four components: (i) Context: Define and describe the context of the IoT system under consideration. (ii) Privacy principles: Analyse the controls guaranteeing compliance with the GDPR fundamental principles considering the proportionality and necessity of processing, and the protection of data subjects rights. (iii) Privacy risks: Identify the threats that associated with data privacy and ensure they are properly treated. (iv) Privacy compliance evaluation: analyse the overall privacy safeguarding requirements related to the processing of information assets in a particular setting. 4.1. Context At the beginning of a PIA, it is important to outline the use case under consideration, its nature, scope, context, and purposes. Also, identify the data controller and any processors of information assets in a particular setting. In the context of BRAIN-IoT, for PIA an information asset could be: (a) software (e.g., an operating system), (b) hardware (e.g., a sensor, Central Processing Unit (CPU), memory, etc.), (c) processed data (e.g., personal data stored in BRAIN-IoT platform, sensor status transmitted over a network,

32

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

robot location in memory, etc.). In certain instances, identifiability of the processed data might vary. To determine whether or not information about a person considered as processed data in IoT platform, several factors need to be taken into account. Personal data can be considered to be information assets in at least the following instances [19]: (i) if it contains or is associated with an identifier, which refers to a natural person, (ii) if it contains or is associated with an identifier, which can be related to a natural person, (iii) if it contains or is associated with an identifier, which can be used to establish a communication with an identified natural person, and (iv) if it contains references, which link the data to any of the identifiers above. Further, the authors explore the processed data based on three categories: (a) information about the user (first name, date of birth, email, etc.), (b) recorded data (sounds, images, movements, temperature, etc.), and (c) calculated data (data from robot operating system, etc.). Table 1 presents a template to provide a brief description of the use cases for PIA. Table 1. Template for the use cases description. Use case description Processing purposes Processing stakes Information assests Processed data

(a) Information about user (b) Recorded data (c) Calculated data

Controller Processor(s)

4.2. Privacy principles In the GDPR [14], the six key principles (Sec. 2.1) are set out right at the start of the regulation and inform everything that follows. Compliance with these key principles is, therefore, a fundamental building block for PIA. The key element in a PIA is the evaluation of how an IoT platform provides services for collect, store and distribute personal data [21]. In this PIA, apart from six key principles, which lie at the heart of the GDPR, four additional privacy principles are considered: (a) the right to be informed, (b) the rights of access and to data portability, (c) the rights to rectification and erasure, and (d) the rights to the restriction of the process and to object. These four principles are based on the regulation for individuals rights presented in GDPR (Article 15-17, 20, 25 and 33). These four additional principles are considered based on the premise that individuals rights for access, rectification, erasure, portability and notifications for personal data are important aspects for an IoT platform [16,21]. This section outlines privacy control criteria based on the six key principles and four additional principles. 4.2.1. (P1) Lawfulness, fairness and transparency The lawfulness, fairness and transparency for information assets are set out in Article 6 of the GDPR. Table 2 outlines the privacy control criteria for the lawfulness, fairness and transparency. At least one of these criteria must apply whenever processing personal data.

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

33

Table 2. Evaluation criteria for the lawfulness, fairness and transparency. ID

Lawfulness, fairness and transparency

P1-01

The individual has given consent to the processing of his or her personal data for one or more specific purposes. Processing is available for the performance of a contract to which an individual is participate or to take steps at the request of the controller prior to entering into a contract. Processing is available for compliance with a legal obligation to which the controller is subject. Processing is available to protect the vital interests of an individual or of another natural person. Processing is available for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller. Processing is available for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data. Processing is available for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data.

P1-02 P1-03 P1-04 P1-05 P1-06

P1-06

4.2.2. (P2) Purpose limitation In the BRAIN-IoT context, purpose limitation principle helps to ensures controller plan to use or disclose processed data. The controller needs to specify purpose that is additional or different from the original specified purpose. Also, it is important that the new specification must be fair, lawful and transparent [19]. Table 3 shows the privacy control criteria for purpose limitation principle considering BRAIN-IoT system. Table 3. Evaluation criteria for purpose limitation. ID

Purpose limitation

P2-01 P2-02

Clearly identified purposes for data processing. Purposes must be documented.

P2-03

Regularly review the processing and, where necessary update the documentation and privacy information for individuals.

4.2.3. (P3) Data minimization Data minimization is closely linked to the principle of ”collection limitation”, but goes further than that. Whereas ”collection limitation” refers to limited data being collected in relation to the specified purpose, ”data minimization” strictly minimizes the processing of information asset. Table 4 presents the privacy control checklist for data minimization principle. 4.2.4. (P4) Data quality Table 5 presents the data accuracy controls criteria for BRAIN-IoT system requirements based on the Article 5.1 (d) of the GDPR [14].

34

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

Table 4. Evaluation criteria for the data minimization ID

Data minimization

P3-01

Ensure adoption of a need-to-know principle, i.e., one should be given access only to the information assets which is necessary for the conduct of his/her official duties. Delete and dispose of information asset whenever the purpose for personal data processing has expired, there are no legal requirements to keep the information asset or whenever it is practical to do so.

P3-02

Table 5. Evaluation criteria for the data quality. ID

Data quality

P4-01 P4-02 P4-03

Ensure the processed data present in the IoT platform is not incorrect or misleading. Ensure the reliability of processed data collected from a source. Keep a note of any challenges to the accuracy of the IoT platform processed data.

4.2.5. (P5) Storage limitation Article 5(1)(e) of GDPR explicitly pointed out that, a storage duration must be defined for each type of process data and it must be justified by the legal requirements and processing needs [14]. Thus, storage of personal data in a IoT platform must be justified using privacy control criterias. Table 6 presents the evaluation criteria for the storage limitation in the context of BRAIN-IoT system. Table 6. Controls for the data storage limitation. ID

Storage limitation

P5-01

Ensure mechanism must be implemented to archive common data or purge archived data at the end of their storage duration. Functional traces will also have to be purged, as will technical logs which may not be stored indefinitely.

P5-02

4.2.6. (P6) Integrity and confidentiality Article 5(1)(f) of the GDPR concerns the ”integrity and confidentiality” of personal data. It can be referred to as the GDPRs ”security principle”. This principle explores the broad concept of information security. Poor information security leaves any systems and services at risk and may cause real harm and distress to individuals lives may even be endangered in some extreme cases. Table 7 outlines the common control criteria for integrity and confidentiality. 4.2.7. (P7) The right to be informed In the BRAIN-IoT platform, various components of IoT systems are integrated for modelling task. An IoT platform should be able to demonstrate that an individual has consented the processed data. The individuals must be able to withdraw his/her consent easily at any time (Articles 7 and 8 of the GDPR). In Table 8, the authors present a list of controls intended to ensure that: users of BRAIN-IoT consent has been informed; there has been a reminder and confirmation of their consent; the settings associated with the latter have been maintained.

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

35

Table 7. Evaluation criteria for the integrity and confidentiality ID

Integrity and confidentiality

P6-01 P6-02 P6-03

Presentation, when the device is activated, of the terms and conditions for use/confidentiality. Possibility of accessing the terms & conditions for use/confidentiality after activation. Legible and easy-to-understand terms.

P6-04 P6-05

Existence of clauses specific to the device. Detailed presentation of the data processing purposes (specified objectives, data matching where applicable, etc.) Detailed presentation of the personal data collected. Presentation of any access to the identifiers of the device, the smartphone/tablet or computer, specifying whether these identifiers are communicated to third parties. Information for the user if the app is likely to run in the background. Information on the secure data storage method, particularly in the event of sourcing. Information on protection of access to the device.

P6-06 P6-07 P6-08 P6-09 P6-10 P6-11

Arrangements for contacting the company (identity and contact details) about confidentiality issues.

P6-12

Where applicable, information for the user on any change concerning the data collected, the purposes and confidentiality clauses.

Table 8. Evaluation criteria for the right to be informed ID

The right to be informed

P7-01 P7-02

Express consent during activation. Consent segmented per data category or processing type.

P7-03 P7-04

Express consent prior to sharing data with other users. Consent presented in an intelligible and easily accessible form, using clear and plain language adapted to the target user. For each new user, consent must once again be obtained.)

P7-05 P7-06

Where the user has consented to the processing of special data (e.g. his/her location), the interface clearly indicates that said processing takes place (icon, light)

4.2.8. (P8) The rights of access and to data portability Where the data processing benefits from an exemption from the right of access, as presented in Articles 15 of the GDPR, it is necessary to describe their implementation on the IoT platform, as well as a justification on the arrangements. In Table 9, the authors present a list of the controls intended to ensure users rights of access to all data concerning restriction in access and portability. Table 9. Evaluation criteria for the rights of access and to data portability ID

The rights of access and to data portability

P8-01 P8-02 P8-03

Possibility of accessing all data, via the common interfaces. Possibility of securely consulting the traces of use associated with the user. Possibility of downloading an archive of all the processed data associated with the use.

P8-04

Possibility of retrieving, in an easily reusable format, personal data provided by the user, to transfer them to another service (Article 20 of the GDPR)

36

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

4.2.9. (P9) The rights to rectification and erasure As presented in Article 17 of the GDPR, it is required to justify the arrangement for responding to the individuals considering the rights to rectification and erasure. Table 10 outlines a list of controls intended to ensure the right of rectification and erasure of the processed data. Table 10. Evaluation criteria for the rights to rectification and erasure ID

The rights to rectification and erasure

P9-01

Indication from the IoT platform that the processed data will nevertheless be stored (technical requirements, legal obligation, etc.) Clear indications and simple steps for erasing data before scrapping the device. Possibility of erasing the data in the event the device is stolen.

P9-02 P9-03

4.2.10. (P10) The rights to restriction of processing and to object Table 11 presents a list of control criterias considering the rights to restriction of processing and to object of personal data in a IoT platform. Table 11. Evaluation criteria for the rights to restriction and to object ID

The rights to restriction and to object

P10-01 P10-02 P10-03 P10-04 P10-05

Existence of Privacy settings. Invitation to change the default settings. Privacy settings accessible when activating the device. Privacy settings accessible after activating the device. Existence of technical means for the data controller to lock access to and use of the individuals to restriction.) Possibility of deactivating some of the device’s features (microphone, Web browser, etc.) Existence of alternative apps for accessing the device. Compliance in terms of tracking. Effective exclusion of processing the user’s data in the event consent is withdrawn.

P10-06 P10-07 P10-08 P10-09

4.3. Privacy risks Adjusting the privacy policy to ensure compliance for alerting individuals about their privacy risks is a prelude to the more complex actions needed to enforce GDPR. Although GDPR is a complex regulation, but at its core, its a data protection law. This means a processor must explore privacy threats and vulnerabilities to personal data protection to assess and mitigate privacy risks under GDPR. According to ISO/IEC 29100 standard, privacy threat is effect of uncertainty on privacy. Based on the threats and vulnerabilities explored in various studies [2,14,16], Table 12 presents a list of privacy threats in the context of BRAIN-IoT platform.

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

37

Table 12.: List of privacy threats in the context of the BRAIN-IoT platform. ID R-01

Privacy Risk(R) Misleading processed data (personal, stored and calculated data)

R-02

Remote spying

R-03

Eavesdropping

R-04

Vulnerable media or documents

R-06

Unauthorized erasure

R-07

Disclosure

R-08

Data from untrustworthy sources

R-09

Unauthorized access and portability

R-10

Unauthorized access to services

R-11 R-12

Access to individual position Illegal access

R-13

Transparency

R-14

Counterfeit licencing

R-15

Corruption of data

Integrity and confidentiality Article 5(d) of GDPR emphasises on the protection of processed data that are collected in the IoT platform. For example, interfering signals from an electromagnetic source emitted by the equipment (by conduction on the electrical power supply cables or earth wires or by radiation in free space) can create misleading processed data. Personnel actions observable from a distance to access unauthorized personal data. Someone connected to communication equipment or media or located inside the transmission coverage boundaries of a communication can use equipment, which may be very expensive, to listen to, save and analyse the information transmitted (voice or data). Someone inside or outside the organization accessing digital media or paper documents with the intention of stealing personal data and using the information on them. Retrieval of electronic media (hard discs, floppy discs, back-up cartridges, Universal Serial Bus (USB) keys, removable hard discs, etc.) or paper copies (lists, incomplete print-outs, messages, etc.) intended for recycling and containing retrievable information. Someone inside the organization who, through negligence or knowingly, passes information to others in the organization who have no need to know or to the outside (the latter case usually having greater consequences). Receiving false data or unsuitable equipment from outside sources and using them in the organization. Someone transmitting false information for integration in the information system, with the intention of misinforming the recipient and attacking the reliability of the system or validity of its information. Article 15 of the GDPR explored the rights of data access and portability for personal data. For example, a privacy risk may arise if someone with access to a communication media or equipment installs an interception or destruction device in it. Unintentional action involving software carried out from inside or outside the organization and resulting in corruption or destruction of programs or data, impaired operation of the resource or even execution of commands in a users name without his/her knowledge. The attacker introduces a program or commands to modify the behaviour of a program or add an unauthorized service to an operating system. Someone with access to equipment used to detect the position of an information system user. A person inside or outside the organisation accesses the information system and uses one of its services to penetrate it, run operations or steal information. Someone inside or outside the organisation makes fraudulent copies (also called pirated copies) of package software or inhouse software. Loss or destruction of documents proving the purchase of licences or negligence committed by installing software without paying for the licence. Someone inside the organisation makes illegal use of copied software. Someone gains access to the communication equipment of the information system and corrupts transmission of information (by intercepting, inserting, destroying, etc.) or repeatedly attempts access until successful.

38

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

R-16

Illegal processing of data

R-17 R-18

Error in use of personal data Abuse of rights

R-19

Forging of rights

R-20

Denial of actions

R-21

Breach of personnel availability

A person carries out information processing that is forbidden by the law or a regulation A person commits an operating error, input error or utilisation error on hardware or software. Someone with special rights (network administration, computer specialists, etc.) modifies the operating characteristics of the resources without informing the users. Someone accesses the system to modify, delete or add operating characteristics or carry out any other unauthorised operation possible to holders of these rights. A person assumes the identity of a different person to use his/her access rights to the information system, misinform the recipient, commit a fraud, etc. A person or entity denies being involved in an exchange with a third party or carrying out an operation. Absence of qualified or authorised personnel held up for reason beyond their control. Deliberate absence of qualified or authorised personnel.

4.4. Privacy compliance evaluation In this approach, the privacy compliance evaluation is performed using the fundamental privacy principles presented in GDPR and privacy threats outlined in Section 4.3. As presented in Figure 2, the privacy compliance evaluation is based on two pillars: (1) Fundamental rights and principles, which are ”non-negotiable”, established by law and which must be respected, regardless of the nature, severity and likelihood of risks; (2) Privacy threats, which identify the appropriate technical and organizational risks to protect individuals data.

Figure 2. Privacy compliance evaluation.

In any IoT application, the controller of a use case should identify and implement privacy controls to meet the privacy safeguarding requirements, which is identified by the privacy risk assessment and treatment process [16]. Effort should be taken by IoT application controllers to develop their privacy controls as part of a general ”privacy by design” approach considerations, i.e., privacy compliance evaluation should be considered at the design phase of systems data processing, rather than being integrated at a subsequent stage [10]. Therefore, it is important to verify and demonstrate that the processing meets data protection and privacy safeguarding requirements by periodically conducting evaluation by stakeholders or trusted third-party individuals [16]. Also, the identified and implemented privacy controls should be documented as part of the PIA for IoT applications.

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

39

Figure 3. Privacy and security risk level [16]

Certain types of data processing can warrant specific controls for which the need only becomes apparent once an envisaged operation has been carefully analysed. A risk is a hypothetical scenario that describes a feared event and all the threats that would allow this to occur (Article 32 of the GDPR). Based on the privacy risk level outlined in Figure 3, in [16] the authors performed the assessment of privacy control criteria using the following scale to estimate the Severity and likelihood of privacy risks: Negligible (Scale: 0): it does not seem possible for the selected risk sources to materialize the threat by exploiting the properties of supporting assets (e.g., theft of paper documents stored in a room protected by a badge reader and access code). Considering severity, data subjects either will not be affected or may encounter a few inconveniences, which they will overcome without any problem. Limited (Scale: 1): it seems difficult for the selected risk sources to materialize the threat by exploiting the properties of supporting assets (e.g., theft of paper documents stored in a room protected by a badge reader). Privacy stakeholder may encounter significant inconveniences, which they will be able to overcome despite a few difficulties. Significant (Scale: 2): it seems possible for the selected risk sources to materialize the threat by exploiting the properties of supporting assets (e.g., theft of paper documents stored in offices that cannot be accessed without first checking in at the reception). Privacy stakeholder may encounter significant consequences, which they should be able to overcome albeit with real and serious difficulties. Maximum (Scale: 3): it seems extremely easy for the selected risk sources to materialize the threat by exploiting the properties of supporting assets (e.g., theft of paper documents stored in the public lobby). Further, a privacy stakeholder may encounter significant, or even irreversible, consequences, which they may not overcome. Considering the likelihood and severity scale, it is possible to evaluate a privacy principle (P) using each privacy control criterion (Ci ). The risk level for a privacy principle (P) can be defined as:

P=

1 n ∑ Ci , i = 1...n n i=1

(1)

40

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

Here, n is the number of privacy control criteria. Based on the risk value of a privacy principle, it can be assumed that if the value is greater than 2, then the risk level for a use case is significant and need further attention from controllers. The severity and likelihood evaluation can be performed by monitoring how many information assets affected by a specific privacy threat. Table 13 presents a template for privacy compliance evaluation. By monitoring the number of information assets affected by each privacy threats, a stakeholder (processor of IoT platform) can quantify risk levels based on Severity and likelihood scale. In the Table 13, a stakeholder identify and report the severity and likelihood scale values in the risk level column by monitoring the privacy threats. Further, each privacy control criteria is aligned with multiple privacy threats. Table 13. Template for privacy compliance evaluation. Privacy Control criteria

Privacy Threats

Risk Level Severity Scale

Likelihood Scale

(P1) Lawfulness, fairness and transparency P1-01 P1-02

R-07 R-19 other rows are omitted for brevity

P1-06

R-18

5. Integration of PIA in BRAIN-IoT The privacy principles considered in the BRAIN-IoT PIA approach is derived from GDPR regulation. More specifically, the proposed PIA approach focuses on the integration of the privacy principles and the development of privacy management systems to be implemented within the BRAIN-IoT platform. These privacy principles are used to guide the design, development, and implementation of privacy control policies in the BRAINIoT platform. Figure 4 presents an overview of the proposed PIA integration approach in the BRAIN-IoT platform. The main phase in this approach is the building of a privacy control schema using the results from PIA. BRAIN-IoT provides a model-based framework for decentralized IoT systems. BRAIN-IoT modeling language is using UML profile, which can be used for modeling views of the system including privacy policies as well as data flows and storage (such as encryption, pseudo-anonymisation, etc.) for personal data. Figure 5 illustrates the structure of privacy control schema based on privacy principles and corresponding privacy evaluation criteria. The benefits of this approach is the following: (i) Modeling privacy views of system application through a structured model which can help to better describe privacy principles and control criteria; (ii) Model-based privacy analysis can be performed with the generated schema. Thus, it can help BRAIN-IoT platform providers to fulfill the requirements of privacy awareness in IoT systems. Further, it is easier to integrate privacy information protection within application development and provide a privacy-aware application development environment for BRAIN-IoT application developers with the profile-based privacy models; (iii) BRAIN-IoT modeling language profile is flexible to be extended and designed for privacy specifications of domain specific applications; (iv) Modeling could help identify potential breaches in IoT systems and find the right solutions to ensure privacy awareness by design.

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

41

Figure 4. Overview of integration of privacy policies in BRAIN-IoT systems.

Figure 5. Structure of privacy control schema.

6. Conclusion and Future Work Complex IoT application scenarios present a set of challenges that were not considered in more traditional IoT applications, such as heterogeneity and lack of interoperability, security and privacy concerns. Further, as more IoT systems get connected to the internet, there is a need for a highly scalable and extensible PIA approach. The authors consider that adopting the GDPR key privacy principles to remodel a PIA approach, will enhance the reliability of IoT platform. This paper has explored the lingering problems in the context of privacy awareness in complex IoT applications, by integrating GDPR in IoT platforms. More specifically, this paper has presented the way in which the BRAIN-IoT

42

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

project intends addressing challenges of privacy awareness in IoT platforms by leveraging GDPR regulation and ISO/IEC standard. This paper has further proposed a PIA approach for IoT platforms, which will also be advantageous from the developers’ point of view. The proposed framework is yet to be fully implemented and tested but it presents a direction of investigation for the IoT community on ways the problems of scalability, ease of development, ease of deployment, and lightweight implementation can be resolved. The next step is to integrate privacy control criteria using modeling schema in the BRAIN-IoT paradigm and how it can be applicable to various use cases. Some of the components presented have been already implemented, but others have been only designed. By the end of the project (December 2020), the whole conceptual framework will be implemented and tested in complex use case scenarios, namely service robotics in smart warehouse and IoT systems in critical infrastructure management. More specifically, a public prototype will be implemented based on the final enablers for privacy control. Also, BRAIN-IoT could likely address more complex IoT systems requiring greater emphasis for exploring privacy awareness, where more focus has been given towards handling personal data.

References [1]

[2] [3] [4] [5] [6] [7] [8]

[9] [10] [11] [12]

[13] [14]

Wendland, M. F. (2016, February). ”Towards executable UML interactions based on fUML.” In 2016 4th International Conference on Model-Driven Engineering and Software Development (MODELSWARD) (pp. 405-411). IEEE. Wachter, Sandra. ”Normative challenges of identification in the Internet of Things: Privacy, profiling, discrimination, and the GDPR.” Computer law & security review, 34.3 (2018): 436-449. Manyika, J., Chui, M., Bisson, P., Woetzel, J., Dobbs, R., Bughin, J., & Aharon, D. (2015). Unlocking the Potential of the Internet of Things. McKinsey Global Institute. Gartner, M2M Global Forecast & Analysis Report, Machine Research,jun,2015. Horizon 2020 Work Programme 2016-2017:Internet Of Things Large Scale Pilots, European Commission. Horizon 2020 Project: SynchroniCity: Delivering an IoT enabled Digital Single Market for Europe and Beyond. https://synchronicity-iot.eu/ Holland, John H. Signals and boundaries: Building blocks for complex adaptive systems. Mit Press, 2012. I. Cohen, D. Corman, J. Davis, H. Khurana, P. J. Mosterman e S. L. Prasad Venkatesh ,Strategic Opportunities for 21st Century Cyber-Physical Systems, SF, Steering Committee for Foundations in Innovation for Cyber-Physical Systems,2012. ETSI, CYBER; Cyber Security for Consumer Internet of Things, ETSI,2019. Voigt, Paul, and Axel Von dem Bussche. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing (2017). Atzori, L., Iera, A., & Morabito, G. (2017). Understanding the Internet of Things: definition, potentials, and societal role of a fast evolving paradigm. Ad Hoc Networks, 56, 122-140. P. Panagiotou, N. Sklavos e I. D. Zaharakis, Design and Implementation of a Privacy Framework, for the Internet of Things (IoT), in 21th EUROMICRO Conference on Digital System Design, Architectures, Methods, Tools (DSD’18), Prague, Czech Republic, 2018. J. Sabo, M. Willett, P. Brown e D. N. Jutla, Privacy Management Reference Model and Methodology, OASIS PMRM TC Standards Track Committee Specification, 2013. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46, Official Journal of the European Union (OJ), vol. 59, pp. 1-88, 2016.

M.R.A. Rashid et al. / Privacy Awareness for IoT Platforms: BRAIN-IoT Approach

[15]

[16] [17]

[18] [19] [20] [21]

[22] [23] [24] [25] [26] [27] [28]

43

T. Basso, M. Leonardo , M. Regina , J. Mario e B. Andrea , Towards a UML profile for privacy-aware applications, in IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, 2015. Commission Nationale de l’Informatique et des Liberts, Privacy Impact Assessment(PIA), 2017. [Online]. Available: https://www.cnil.fr/en/privacy-impact-assessment-pia. Organization for Economic Co-operation and Development, OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data, 2013. [Online]. Available: http://www.oecd.org/sti/ieconomy/2013-oecd-privacy-guidelines.pdf. A. Cavoukian, Creation of a Global Privacy Standard, 2006. [Online]. Available: http://www.ehcca.com/presentations/privacysymposium1/cavoukian 2b h5.pdf. International Organization for Standardization, ISO/IEC 29100.International Standard for Information technology - Security Techniques and Privacy framework, Privacy framework. First Edition (2011). International Organization for Standardization, ISO/IEC 29101. International Standard - Information technology - Security Techniques - Privacy architecture framework, First Edition (2013). T. Basso, R. Moraes, M. Jino e M. Vieira, Requirements, design and evaluation of a privacy reference architecture for web applications and services, in Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015. Object Management Group, OMG Unified Modeling Language (OMG UML), Superstructure, Version 2.4.1, 2011. Object Management Group,UML Profile for Modeling Quality of Service and Fault Tolerance Characteristics and Mechanisms (OMG QoS&FT), Version 1.1, OMG Document formal/2008-04-05. Object Management Group, A UML Profile for MARTE: Modeling and Analysis of Real-Time Embedded systems, Version 1.1, OMG Document formal/2011-06-02. Object Management Group, OMG Systems Modeling Language (OMG SysML), Version 1.3, OMG Document formal/2012-06-01. A. Cavoukian e T. J. Hamilton, The Privacy Payoff: How successful businesses build customer trust, McGraw-Hill Ryerson, 2002. Designing the Internet of Internet, Adrian McEwen & Hakim Cassimally, first edition. John Wiley and Sons, West Sussex, PO19 8SQ, United Kingdom, 2013. S. Grard, C. Dumoulin, P. Tessierck and B. Selic, Papyrus: A UML2 Tool for Domain-Specific Language Modeling, 11-2017, pp. 361-368.

44

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved. doi:10.3233/AISE200004

UPRISE-IoT: User-Centric Privacy & Security in the IoT c,2 , Mirco MUSOLESI d,2 , ¨ Silvia GIORDANO a,1,2 , Victor MOREL b,2 , Melek ONEN a a Davide ANDREOLETTI , Felipe CARDOSO , Alan FERRARI a , Luca LUCERI a , b , C´ ´ edric VAN ROMPAY c , and Claude CASTELLUCCIA b , Daniel LE METAYER d Benjamin BARON a SUPSI, Switzerland b Inria, France c EURECOM, France d UCL, U.K.

Abstract. Data privacy concerns the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others [1]. However, most people are not aware of privacy and security risks. This is particularly relevant for the Internet of Things (IoT) which is increasingly ubiquitous and thus can become very privacy intrusive. On one side, this problem can slow-down IoT development, on the other side it is essential to make users gain control over data generated and collected by the IoT devices surrounding them, and to adopt a privacy-by-design approach for the IoT. UPRISE-IoT takes a fresh look at the IoT privacy space by considering a user-centric approach. It builds upon user behaviours and contexts to improve security and privacy, and follows the privacy by design approach. UPRISE-IoT improves data transparency and control, as users are informed about the data that are being collected in a user-friendly manner and have the option to oppose it. We raise user awareness, to ensure that their behaviour does not compromise their privacy, and we provide new tools to control data collection in the IoT. Keywords. privacy awareness, user-centric, privacy-by-design

1. Introduction Privacy is an open issue, still obscure for most people, despite the current huge attention it got in relation to the recent stories of Cambridge Analytica and Facebook [2], or Google [3], the new GDPR regulation [4] as well as the discussions in the European Parliament on regulation of the digital market [5]. Even if new rules have been developed, people are still not aware of the value of their data, of how they are handled and used, and of their rights. UPRISE-IoT’s goal is to make users gain control over data generated and collected by the IoT devices surrounding them by adopting a user-centric, privacy1 Corresponding

Author: Silvia Giordano, Institute of Informatics and Networking Systems (ISIN) - DTI, Univ. of Applied Science and Arts - SUPSI, Switzerland; E-mail: [email protected]. 2 Main Author

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

45

by-design approach for the IoT. The UPRISE-IoT3 methodology consists in the design and development of algorithms and solutions to build such control, in order to put the basis for a new understanding of data privacy [6], [7], and to improve transparency and control. Users should be aware of the data collected by the IoT. Transparency helps users to make informed choices about the IoT services use. Furthermore, within the project it has been discovered that even when a user has blocked access to her personal data it is still possible to derive some sensitive information of her from data of her friends and acquaintances [8]; so, a more global awareness is needed for a real privacy. To this aim, UPRISE-IoT considers user behaviours and the user context to increase security and privacy, and realizes privacy preserving data collection and processing to that end. Such user-centric security and privacy space tailored for IoT and for IoT users, is based on the following multi-disciplinary elements: • behavioural models extracted from the large amount of data generated by IoT devices for security and privacy applications; • advanced functionalities and algorithms for increasing security and privacy by design; • tools for empowering users in terms of transparency and awareness; • open libraries for supporting user control (preferences, filters, accounting and analytics) and transparency; • strategies to secure IoT devices for M2M authentication systems and encryption libraries. The result is a new secure space centred around the user where security solutions are either integrated within IoT devices directly (creating smart secure objects) or interfaced to the user by a powerful user-friendly app for: (i) smartifying the IoT devices which are not intrinsically secure (creating smartified secure objects), (ii) fine-tune the level of privacy; (iii) getting awareness of her behaviour for being protected from security and privacy threats, (iv) getting awareness of the value of her information. The UPRISE-IoT project has obtained very promising technical and scientific results, and also applied them to very relevant scenarios such as mobile phones and smart cities. As discussed in the next sections, we provide experimental work involving users that validate this approach.

2. Privacy Awareness and Control by Design Privacy by design principles are fundamental in the security landscape. As UPRISEIoT mainly considers user awareness and control over personal data, we focus on the following well-known privacy by design principles: data minimization (i.e., only data that are necessary for a given purpose can be collected), purpose limitation (i.e., the purposes for which data are collected must be specific, legitimate and unambiguously known to the final user) and data de-identification (i.e., data must not reveal the identity of its owner). All these principles require the user to be aware and in control of her data, which is a very challenging problem in the IoT ecosystems, and in particular in the mobile phone and smart city scenarios considered in the project, since human interaction is not as easy as with traditional IT systems. 3 http://uprise-iot.supsi.ch/

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

46

Thus, to make them applicable for user awareness and control in IoT, we have developed several experiments to assess the degree of privacy required by the data that are collected and to map the degree of privacy that the user wants for the data collected. 2.1. A registry for the smart city The smart city is a paradigm which consists in fielding numerous devices in urban areas to provide new services, or to improve existing ones. Some of these devices collect personal data, and their ubiquity combined with growing data analysis techniques can lead to privacy issues [9]. Raising awareness in this context is difficult as these devices 1) do not necessarily have appropriate user interfaces to display intelligible information and 2) may lack the computational capacities or the required network interface to declare themselves. For instance, stores use Bluetooth beacons to track customers to provide targeted advertising 4 ; and municipalities often provide Wi-Fi hotspots to denizens, and these Wi-Fi access points collect personal data. The General Data Protection Regulation (GDPR) requires data controllers to intelligibly inform data subjects of data collection and processing. Therefore, stickers or wall signs can be considered as non-compliant to the extent that they remain unnoticed from most data subjects. To tackle this transparency issue, we propose a registry for the smart city : Map of Things.

Registry Request data collection information Smartphone

Display devices and related data collection information

Data Subject Laptop

(a) Explanatory diagram

(b) Screenshot of the interface

Figure 1. A registry for the smart city

2.1.1. Map of Things The registry Map of Things raises privacy awareness by providing information about personal data collection and processing in a user-friendly manner. It consists in a website 5 on which data controllers can declare any device collecting personal data, regardless of 4 www.nytimes.com/interactive/2019/06/14/opinion/bluetooth-wireless-tracking-privacy.html 5 Available

at https://mapofthings.inrialpes.fr/

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

47

its capacities or interfaces. The registry can be accessed by any device owned by data subjects, such as a smartphone or a laptop (see Figure 1a). It provides a summary of the privacy policies as well as a link to the full version of the privacy policy (see Figure 1b). It also provides further information which is of prime importance in ubiquitous environments such as the geolocation of devices, their range and frequency of collection. This information is available in a graphical format, and through an API for interoperability purposes. 6 The registry Map of Things has been tested in Grenoble and La Rochelle. The registry hosts Wi-Fi hotspots privacy policies communicated by the municipalities. 7 2.2. Mobile Apps and User’s Data Collection In the past few years, smartphone applications have greatly contributed to the quality of users experience in their smart devices, allowing them to reach a set of functionalities even larger when the user’s personal information is used. However, many applications fail to guarantee user privacy; this is especially true for Android smartphones where the Application Market is self-regulated. The mechanism that Android offers to increase user privacy is the use of system permissions: every time an app is downloaded from the market and the installation process is started, Android controls the app’s permissions and then asks the user to allow them. If the user does not agree, the installation process is interrupted. This system permits users to know what information is used and which entity processes it, but it does not indicate when and how the information is used inside an application. For instance, if an application proposes a game that uses sensor data and secretly records it, the sensor data could be used for malicious intents without the user’s awareness. Novel regulations (such as the GDPR [4]) require informing the user about the data accessed and collected by the app. However, still users are not informed enough to decide whether a data access in an app is a potential threat or not. To show this point, we made some measurements of users awareness about the potential risks of apps installed on their devices. 2.2.1. The UPRISE-IoT app We developed an Android application, the UPRISE-IoT app that monitors the behaviour of the other apps installed on the users device. The UPRISE-IoT app also provides two distinct interaction with the device owner: • It informs the owner about a potential risk a given app has, considering a specific permission. • It asks the user if she is aware of those risks and (in case she is not) if her perception of the app is changed. The UPRISE-IoT app collects the following data: • The privacy level of the apps installed on the users device: we collect the permission granted to the app and classify them according to their level of risk. • The level of user awareness: we periodically show to the user a potential risk connected to the data accessed by apps, and ask if she is aware of that and if her perception of the app is changed with this information (in terms of privacy). 6 https://mapofthings.inrialpes.fr/api/devices 7 As

of July 2019.

48

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

We perform two rounds of experiments: the first one with a selected group of 17 users from our university, and a second one with 50 users recruited via internet. The two groups are both very heterogeneous in terms of gender, technological skills and age, as indicated in Table 1. Table 1. Population distribution in the two experiments. Experiment1 total population

Experiment2 50

17

gender

M

F

M

6

IT knowledge

11 high

medium

low

age

F

medium

low

26 high

11

5

3

9

27

24

18-30

31-40

>41

18-30

31-40

>41

5

8

4

20

16

14

24

2.2.2. Experimental Results The permission distribution of the observed app within the two experiments is shown in Figure 2. We can see that the majority of the app requires numbers of permission between 0 and 7 with mean around 2. However, there are some outliers that requires an astonishing number of dangerous permission, permissions that could potentially affect the user’s privacy or the device’s normal operation, up to 16. To determine whether a user is aware of such potential risks, we provide a notification of the potential risk in the given application whenever we notice that the application is under execution. The notification contains the app name, the permission it accesses and a list of potential risks connected to this access. We then ask the user if she was aware of the app dangerous accesses, and, in case she was not, if her perception of the app has changed after being informed of such information. We also notice that in the majority of the cases (more than 60% in both experiments) the user has no idea of the risks related to such app. We further ask them if their perception of the app and of its dangerousness was changed after the notification, and we received a positive answer in more than 80% of the cases for the Experiment1, and in about 50% of the cases in the Experiment2. Our experiments demonstrate that there is no much user awareness and control on the data accessed by the mobile apps, even if it improved during the last years. Users tend to accept all the access requests from an app during the installation, and then they do not control it anymore. We demonstrated that, if we provide awareness to the users, they are taking care of that. Therefore, in the next release, we want to provide them with some guide about how to control such permissions, and a direct link to the android app permission for performing some modifications, if they are interested.

3. Privacy policy and Consent management This section presents a generic framework for information and consent in the IoT. The framework is generic in the sense that the proposition relies on few technical require-

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

49

Figure 2. Distribution of the permissions required by the installed users apps (591 distinct apps)

ments — we do not consider protocols used, types of devices, nor fielding configurations — it can therefore be adapted to numerous contexts. The framework uses privacy policies to provide information and manage consent. The framework consists in: PPNP, a protocol to communicate these privacy policies (Section 3.2), an agent to act on behalf of users that we denote Personal Data Custodian (PDC), and a set of requirements to demonstrate the obtention of consent. The framework refers to previous work [10], and the interested reader will find descriptions on the PDC and the proof of consent in [15]. We will set out by presenting the hypotheses of the framework in Section 3.1. 3.1. Hypotheses The framework proposed in this section relies on certain hypotheses: it requires devices possessing certain features, which communicate by means of specific messages. 3.1.1. Devices Data controllers need a Data Controller Gateway (DSG), a device able to communicate information (the content of which will be described thereafter) and to retrieve consent; one or many Data Controller Devices (DCG), devices owned by a data controller and collecting personal data; and a Consent Storage Server (CSS), a machine on which consents are stored. Data subjects (DS) need a DSG (a device able to communicate information and to communicate consent), and possibly Data Subject Devices (DSD), devices owned by a DS and potentially subject to data collection. Figure 3 provides an explanatory diagram of the different entities and of their possible interactions. 3.1.2. Features A DSG must possess a Data Subject Policy (DSP), as well as a DCG must possess a Data Controller Policy (DCP). The privacy policies must be machine-readable, i.e., in a structured format interpretable by a machine, and it should be possible to perform the operations described thereafter.

50

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

Figure 3. Explanatory diagram of the framework. Entities and their interactions are denoted with normal font, components of the framework are denoted in italic, and the two sides of the framework are denoted in bold.

A DSG must be able to hash files, to store cryptographic keys, and to sign messages with the keys. Similarly, a CSS must be able to hash files, to store cryptographic keys, and to sign messages with the keys. A DSG must have an appropriate interface. We mean by appropriate interface: a touchscreen or the combination of a screen and a keyboard, allowing for the humancomputer interactions described in [15] (consultation of privacy policies, management of DSPs, and notifications to the DS). 3.1.3. Messages Privacy policies As described in Section 3.1, DSGs and DCGs must respectively possess a DSP and a DCP. A DSP can be seen as a set of a priori consents. Privacy policies must be machine-readable, and therefore be expressed in a privacy language. A suitable privacy language for this facet should meet requirements over the syntax (the content expressible), and over the operations permitted. Any language meeting these requirements could be used in the framework. Required content Privacy policies can be represented as a set of rules: policy ::= (rule1 , rule2 . . . rulei ) The mandatory part for a rule in this framework is the type of data, the purpose of collection, the retention time (in days), as well as the DC concerned.

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

51

Operations Privacy policies are required to allow for certain operations. A DSP is required to be comparable to a DCP, and the result is required to be deterministic. For instance suppose a DSP and a DCP. Then either DSP ≥ DCP or DSP  DCP. It is said that the DCP matches the DCP if and only if a DSP ≥ DCP, i.e. the DSP is less restrictive than or equal to the DCP. Two privacy policies are required to be intersectable. We denote the intersection operation ∩. In other words, given two policies DSP and DCP, the result of DSP ∩ DCP is DCP2 such as (DSP ≥ DCP2 ) ∧ (DCP ≥ DCP2 ). The result is a DCP. If DSP ≥ DCP, then the intersection DCP2 = DCP. Consent A consent is the authorization from a DS to a data controller to collect data and use it according to a DCP. A consent consists in: a hash of a DCP, the set of identifiers concerned by data collection, and a cryptographic signature for the authentication. A consent should be communicated in both plain text and signed — i.e. encrypted with the DS cryptographic private key. Dissent A DS can also withdraw a consent by communicating a dissent. A dissent is similar to a consent except that it is related to a nil privacy policy. Other messages The protocol described in the next section uses other messages, that we present here. Devices can also communicate the following messages: Refusal It is a message sent by a DSG to a DCG to inform that no negotiation is possible Deny It is a message sent by a DC to a DCG to inform that the intersection is denied Accept It is a message sent by a DC to a DCG to inform that the intersection is accepted 3.2. Privacy Policies Negotiation Protocol The protocol consists in two phases: information, and consent. What we refer below to negotiation is part of the communication of consent. PPNP ensures that a consent can be communicated if and only if 1) the DSG has received the DCP, and 2) the DSP matches the DCP. 3.2.1. Informal description We provide an informal description of the protocol to intuitively illustrate its functioning. Information The DCG initiates the communication by providing the DCP. The DCP is received by the DSG. Consent and negotiation Upon reception, the DSG compares the DCP and the DSP and issues a consent message if they match. Otherwise, the DSG prompts the DS, sets a timer, and awaits for a new DSP from the DS after having requested it. After the input of the new DSP from the DS (or the expiration of the timer) the DSG checks whether the DSP matches the DCP: • The policies match (i.e., in other words, the DSP is less restrictive than the DCP), in that case a consent for the DCP is issued. • The policies do not match and their intersection is null (i.e., the DC and the DS cannot agree on any data collection according to their current privacy policies), the DSG issues a message of refusal.

52

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

• The policies do not match but their intersection is not null (i.e., they can agree on terms of data collection and processing), in that case the DSG sends the DSP. The DCG, after having sent the DCP, awaits for answers from the DSG. • If the DCG receives a consent, this consent is forwarded to the CSS. • If the DCG receives a refusal, it continues providing the DCP. • If the DCG receives a DSP, it requests its DC for permission to use the intersection policy between the DCP and the DSP, and sets a timer. The DC can either accept or deny this intersection policy. If it is accepted, the intersection is sent to the DSG, and the DCG awaits for a consent, a refusal, or a new DSP. Otherwise (a denial or the expiration of the timer), the DCG goes back to providing the DCP. The DS can withdraw a consent at any moment by sending a dissent message to the DCG. The CSS accepts only two messages: consent and dissent. A DCD should query the CSS regularly — or it can be updated by the CSS — to maintain an up-to-date list of consents.

DSP ≥ DCP / Send(CONSENT, dcg)

start

S5

Send(DISSENT,dcg) DSP ∩ DCP = 0/ / Send(Refusal, dcg)

S1

S4

DSP ≥ DCP / Send(CONSENT, dcg) (DSP < DCP) ∧ (DSP ∩ DCP = 0) / / Send(DSP, dcg)

Receive(DCP,dcg)

S2

Receive(DSP, ds)

Receive(DISSENT,ds)

3.2.2. State diagrams

Timer expires

DSP < DCP / Set timer

S3

Figure 4. DSG state diagram

This section presents the state diagrams of the different entities. Entities are denoted with lower cases. Variables are denoted with UPPER CASES, other messages with regular Title Cases. We first describe the states and transitions used by the state diagrams; then the state diagram of the DSG (Figure 4) and the DCG (Figure 5) as directed graphs. States and transitions States are configurations holding information. They can hold the following information: DCP, DSP, CONSENT, and DISSENT. Transitions can be triggered by events, or by meeting conditions. Transitions are represented by arrows going from a state to another.

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

53

S7

Send(DISSENT,css)

Receive(DISSENT,dsg)

Receive(Deny, dc) ∨ Timer expires

start

S4

Receive(DSP, dsg)

Send(DCP, dsg)

S1

Send(CONSENT, css)

S2

Receive(CONSENT, dsg)

Set timer

S5

Receive(Accept, dc)

S3

S6 Send(DCP ∩ DSP, dsg)

Receive(Refusal, dcg)

Figure 5. DCG state diagram

Conditions Conditions are boolean tests over the information held by the state. An action is performed if the test evaluates to true. Our systems are deterministic in the sense that: given a state, there always is only one transition such that its condition evaluates to true. Conditions are denoted before the / character. If no condition is stated, then the system only considers actions. A timer will eventually expire in the case where no action may be performed. Events Our state diagrams consider two types of events: internal and external. Internal events trigger external events. Some events are internally triggered by the entity • Send(A,b) where A is a information communicated to another entity b. Send() triggers a Receive() in another entity • Set timer. A timer set will eventually expire, triggering a passive function Some events are triggered by external entities, or by the expiration of a timer • Receive(A,b) where A is the information received by another entity b. It implicitly changes the configuration of the entity according to the information • Timer expires

4. Privacy preserving data management With the interconnection of large numbers of IoT devices and the explosion of the amount of (often personal) data exchanges between these devices and other systems, the need for data privacy solutions become crucial. Such data can originate from some cameras feeds or from smart meters and therefore can be considered as highly sensitive. During the data exchanges, control by individuals can easily be lost because users usually cannot trace how data are flowing among devices or are even not aware of the processing of data. Hence, the IoT technnology might be slowed down if privacy is not examined as an essential criterion in designed protocols. UPRISE-IoT hence adopts a privacy-preserving data management approach for the data collected from the IoT devices and processed

54

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

afterwards. The key challenge raised by this new environment is the fact that the massive amount of data usually needs to be processed in real time. When analysed it provides valuable information for stakeholders who could further improve their services. Hence, one of the key functionalities of an IoT platform is the incorporation of data analytics solutions. Unfortunately, as previously mentioned critical privacy issues are raised when analyzing and extracting some information about this sensitive data. Existing privacypreserving data analytics solutions either are not scalable or unfortunately leak some information to centralized entities. Therefore, there is a strong need for novel privacy mechanisms that would help end-users gain back the control over their data. Another challenge that has to be taken into consideration is the multiplicity of data sources and data queriers. Indeed, in the IoT environment, all devices become data producers and these data can be queried by multiple stakeholders that we could name data queriers. In Uprise-IoT, we focus on one of the most used analytics operations that is the word search operation and we describe privacy preserving variants of search operations that allow third parties to process search queries without having access to the content of the data. This section presents the problem of multi-user searchable encryption that is more suitable to the IoT environment and then overviews the state-of-the-art and two newly developed solutions. 4.1. Multi-User Searchable encryption UPRISE-IoT aims to develop new privacy primitives that allow the outsourcing and the search of data while being encrypted. Searchable Encryption (SE) [16] enables searching on data hosted on a third-party server (such as a cloud server), while protecting the privacy of the data and the queries that are executed against the potentially malicious server. More precisely, a SE protocol involves a user (the data owner) who encrypts her data with a key that is only known to her, then uploads the encrypted data to a potentially malicious cloud server. The user can later request the cloud server to perform some search operations by sending privacy preserving search queries. The main privacy guarantees that a SE scheme should provide are mainly the privacy of the data (and keywords) and the privacy of the queries. Ideally, a SE should not leak any information about the search pattern (i.e. the information about whether any two queries are generated from the same keyword or not) and access pattern (i.e. the information about which data segments contain the queried keyword for each of the user queries). In an IoT environment, we are in the context of a multi-user setting [17] whereby each querier (data consumer) may have access to a set of encrypted data segments stored by a number of different owners (IoT devices). The quantity of collected data have been persistently increasing while IoT devices’ storage and computational capacities are very limited, making storing and processing data difficult on devices’ side. Outsourcing storage and computation of data to a third party (i.e. a Cloud Service Provider (CSP)) with much greater storage, computational and network capacities is made possible with cloud computing. In order to ensure data privacy while delegating the storage and search operations to the CSP, Multi-User Searchable Encryption (MUSE) appears to be the solution. A multi-user searchable encryption scheme enables a user to search keywords in multiple data based on some authorization rights granted by the owners of those data segments. MUSE can involve a large number of participants who can have the two following roles:

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

55

• Writers (IoT devices, e.g.) who upload data segments to the CSP and grant some parties to search over their data; • Readers who send their search queries are authorized by writers to do so. 4.2. Existing solutions and their privacy guarantees MUSE is a recent but active research topic, and some solutions already exist ([17,18], etc.). However it seems very difficult to reconcile security and efficiency in MUSE. Therefore prior to the paper of Popa and Zeldovich [18], solutions on MUSE were giving priority on performance optimization and were assuming that the adversary only had control over the server, implicitly assuming that all users were fully trusted. Due to the increasing number of readers and writers, considering the presence of some corrupted users colluding with the server in the threat model of MUSE is a necessary step. Unfortunately, the large majority of existing MUSE protocols cannot protect the privacy of data and queries in the face of even a small number of corrupted users. On the other hand, we have discovered a design pattern that is shared among all these MUSE protocols and which is the origin of their weaknesses against user corruption and hence against user-server collusions. This design pattern in MUSE protocols is studied in [19] and is called the iterative testing structure. The paper describes its consequences on the privacy properties of these protocols in the face of collusions. In iterative testing, the server applies a query on an encrypted record by testing encrypted keywords one by one. The server is thus able to see when two queries originating from different readers correspond to the same keyword if these two queries match the same record. The server does not see the plaintext immediately, but rather relations between various queries and records. Then, after the server processed sufficiently many queries, the information from even a single corrupted user is sufficient, due to these relations known to the server, to recover large amounts of plaintext across the whole database, as well as to recover a large number of queries. The amount of information leaked in these MUSE protocols, and the consequences of such leakage on privacy remain very important. Recently, several attacks [20] were proposed against Single-user Searchable Encryption (SSE) systems that combine the information leaked during the execution of the protocol with some extra information the adversary would have obtained through some other means. A first step towards a systematic study of these attacks named leakage abuse attacks is proposed by Cash et al. [20]. The authors suggest to classify SSE using a notion called leakage profile that captures the amount of information they leak to the cloud service provider (CSP). They define 4 leakage profiles from L1 to L4 where L1 corresponds to schemes leaking the least information and L4 to the schemes leaking the most. Additionally, in [20] Cash et al. present new leakage-abuse attacks against some of the defined leakage profiles that outperform the existing ones: They present an attack named count attack which targets L1 profiles and, given the probability of co-occurrence of keywords and the number of indices containing each keyword, is able to recover the content of some queries. They also present a simple attack against the L3 profile where the CSP, knowing the content of some indexes, is able to recover part of the content of other indexes. We present a very simple leakage-abuse attack against MUSE schemes that affects almost all existing solutions [19], for MUSE and have very serious consequences on privacy. This attack is made possible by the iterative testing structure. This

56

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

structure consists in encrypting each keyword in an index separately, and in testing the presence of a keyword in an encrypted index by applying the trapdoor against each encrypted keyword separately. While very intuitive, this approach lets the CSP see a lot of information, and this leads to the leakage-abuse attack we present. Indeed, not only the iterative testing structure lets the CSP sees which index matches the query (an information that is commonly referred to as the access pattern), but it also lets the CSP sees which encrypted keyword in the index was the one that matched. The CSP is then able to tell that some encrypted keywords from different indexes owned by different writers actually represent the same keyword. This information alone is not sufficient to reveal the content of indexes or queries to the CSP; but as soon as some users collude with the CSP, these known relations between encrypted keywords let the CSP deduce the content of some indexes and queries the colluding users did not have access to. In a system where a large number of queries have been made, the CSP has a lot of information about which encrypted keywords are similar to each other, and we show through statistical simulations that even a small amount of data revealed through collusion may lead to a very large privacy breach. We then suggest ways to avoid this kind of privacy issues for future MUSE schemes. The first two suggestions we make are of course to add user-CSP collusions in the threat model of MUSE, as well as to find an alternative to the iterative testing structure; but we also extend the classification of Cash et al. [3] in order to capture the newly identified threat and to provide a more systematic way to prevent this kind of issue. We define a new leakage profile named keyword access pattern or LKWAP. Let {Wi } be the set of uploaded indices and w a given keyword in a query. The LKWAP leakage profile can be formalized as follows: {(i, l) : i ∈ Auth(r)?Wi [σi (l)] = w} Each σi is some permutation and ”i ∈ Auth(r)” means that reader r is authorized to search the index of writer i. This profile captures the leakage of iterative-testing based schemes in the sense that any MUSE scheme based on iterative testing leaks at least the information described by this profile. While this leakage profile was commonly considered as equivalent to the L1 profile of Cash et al., we show that at least in the multi-user setting it is not the case and the LKWAP profile is in fact closer to a query-revealed version of the L3 profile of Cash et al. We show these claims with the following statistical experiments: we simulate an attack against a system with a L1 profile, and an equivalent attack against a system with a LKWAP profile with the same data, and we measure a loss of privacy that is much greater in the case of the LKWAP profile than in the case of the L1 one. We then perform a similar experiment with LKWAP and L3 profiles and find similar privacy losses. To summarize, almost all existing MUSE schemes provide close to no privacy as soon as even a few users collude with the CSP, which greatly weakens their privacy guarantees. Given the scalability issues of the MUSE scheme, there is a need for new solutions for MUSE that are both secure and scalable. The new guidelines and tools we provide should help avoid similar privacy issues in the design of future MUSE schemes. 4.3. New designs of Multi-User Searchable Encryption (MUSE) In Uprise-IoT we have designed 2 MUSE protocols each of them having different tradeoffs between privacy (leakage) and efficiency and hence suiting different situations. The

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

57

main idea that is common to these two solutions is the use of two servers that are assumed not to collude. This idea is illustrated in figure 6. The additional server also called proxy will mainly help MUSE protocols protect against user-server collusions. We also introduce the notion of data preparation where for each reader being authorized to search a data, a copy of the data is created by the server that will solely be used for queries coming from this reader. This copy is sent to the second server (called the proxy) which is in charge of query application.

Figure 6. An illustration of our use of two non-colluding servers.

In this figure, two trapdoors for the same keyword are sent by different readers. The proxy sees which prepared keywords these trapdoor match but does not see that these prepared keywords originate from the same encrypted keyword. As to the server, it sees this common origin but does not see the trapdoors and where they match. The first protocol named DH-AP-MUSE [21], is the most efficient one. It is also the one that leaks the greatest amount of information, although the amount of information it leaks is much lower than the one of any existing. In this solution, a writer encrypts her data with some secret key which she further sends to the proxy. In a symmetric fashion, a reader will use some secret key for encrypting her queries and this secret key is sent to the other server. The server uses the reader keys it receives to create “transformed” copies of the encrypted data (data preparation) which the reader sends to the proxy. Symmetrically, the proxy transforms the encrypted queries it receives using the data encryption keys it received. As a result, the proxy obtains transformed queries that it can search for in the prepared data that were sent by the server. This protocol is similar to a Diffie-Hellman Key Exchange protocol [23] and the fact that two queries from different readers will never be applied to the same (prepared) data ensures the protection against user-server collusions. The second protocol described in [22] leaks much less information than DH-APMUSE but can be considered as more costly as well. More precisely, while the users’ workload is almost the same as in DH-AP-MUSE, the server workload is more important as it relies on Private Information Retrieval (PIR) protocols. Intuitively, in this solution, prepared data are kept by the server. To search these prepared data, the proxy uses a newly developed privacy-preserving protocol: This protocol assures that the proxy gets no information beyond the number of records that matched the query, while the server gets no information at all. The solution uses two existing primitives, namely Garbled Bloom Filters (GBFs) and Additively-Homomorphic Encryption (AHE) based Oblivious Transfer (OT).

58

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

GBFs are a variant of Bloom Filters where the buckets corresponding to an element reveal nothing beyond the presence or absence of the element in the encoded set. OT allows a receiver to retrieve an item in a database hosted by a sender without the sender learning what item was retrieved, and without the receiver learning anything about the rest of the database. Intuitively, the protocol consists in the server encoding the prepared records as Zero GBFs and the proxy looking up these Zero-GBF using the OT protocol. As a result the only information the proxy can learn from a response is whether the response is positive or negative. 4.4. Summary In this section we have studied the problem of data privacy for IoT environments whereby stakeholders may perform search queries on data produced by various IoT devices. We have shown that the threat model that was used by many early MUSE solutions did not include the users as part of the threat and this was not realistic. We have identified the common design pattern that causes all existing MUSE protocols to be vulnerable to collusions. Consequently a new definition of security model is given. Two solutions were further proposed each offering a different combination of privacy and performance levels. Depending on the actual application and functional requirements one of these two solutions can be used.

5. Data protection of your location: privacy and location tracking Another very relevant element to be considered is the problem of location tracking. In the past it has been shown that it is very easy to identify users considering their location data, both in case of discrete information, such as cellular logs [24] and location-based social network data [27,29], and continuous information, such as GPS data [30]. Several solutions for dealing with the problem of privacy with respect to location tracking, in particular as far as data obfuscation and coarse-graining are concerned, have been proposed. However, in general, there is always a clear trade-off between the level of obfuscation that is applied to the data and the actual utility that can be extracted from them. In fact, for example, if noise is added to a dataset or the location is coarse-grained [28], it might not be possibile to use it for location prediction or analysis of behavioural patterns of individuals. In other words, there is indeed an inescapable trade-off between the actual utility of a location dataset and the level of privacy for individual users. One of the most interesting aspects that has attracted the attention of researchers is the problem of interpretability [25]. In particular, in [31] a framework for interpretable privacy-preserving pervasive systems is presented. The key idea of this work is that, by means of instrumentation of the underlying identity inference algorithms, it is possible to devise solutions for suggesting users possible countermeasures for dealing with privacy risks. In particular, the authors discuss the general architecture of the framework and a potential implementation, together with the detailed analysis of a case study. Finally, given the fact that we are considering Internet of Things systems, which are inherently multi-tiered, a solution to the problem of location privacy might reside in the distribution of computation across different computing devices. More specifically, most of the computation might take place on the devices themselves or on “edge” hosts. By

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

59

doing so, identifiable personal information might not be sent to back-end servers, but analyzed locally. This solution is not possible in presence of computation that involves different users (e.g., analysis of co-location patterns), but it can be used for several applications, such as location prediction and inference based only on data from the owner of a given mobile or wearable device. But there are several situations in which population data have to be collected in order to use them as input to machine learning algorithms. A promising solution for dealing with these situations is federated learning [26].

6. Conclusion and Future Work We presented some works produced by the UPRISE-IoT CHIST-ERA project to give awareness and control to users over their privacy. Such activity, which is fundamental with the increasingly raise of IoT, demonstrates that it is possible to have good privacy level while not decreasing the quality of services. Further work is required to test and improve the interface of the PDC proposed in Section 3. User studies must be conducted to determine to which extent the presentation of information can influence DS choices regarding consent. It is also to be noted that the consent management framework is very relevant to the ongoing discussions about the future ePrivacy Regulation [13], the current draft of which, according to the WP29 [14], “gives the impression that organisations may collect information emitted by terminal equipment to track the physical movements of individuals (such as Wi-Fi-tracking or Bluetooth-tracking) without the consent of the individual concerned.” Regarding the processing of the data, in this chapter we mainly focused on the search operations. As for future work, it would be interesting to study more advanced analytics operations such as machine learning techniques and develop privacy preserving variants of these. Further, this project was seminal to many other activities related to creating awareness around user data, especially with activities for broad public and teenagers. Our goal is to continue to investigate the development of behavioural models, also by fusing different sources of information and sensor modalities. We also plan to explore the privacy aspects related to the data collection itself. Other ongoing activities include more advanced algorithms and procedure for securing the data, as well as strategies for obfuscating the data to make useless their leakage.

References [1] [2]

A. Westin. Privacy And Freedom. Atheneum, pg. 7, New York, 1967 Wikipedia. Facebook-Cambridge Analytica data scandal. https://en.wikipedia.org/wiki/Facebook-Cambridge Analytica data scandal [3] EUROPEAN COMMISSION - Press release. Antitrust: Commission fines Google 4.34 billion for illegal practices regarding Android mobile devices to strengthen dominance of Google’s search engine. http://europa.eu/rapid/press-release IP-18-4581 en.htm [4] EUROPEAN COMMISSION Regulation (EU) 2016/679 of the EUROPEAN PARLIAMENT and of the COUNCIL of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Brussels, 2016. [5] EUROPEAN COMMISSION Directive of the EUROPEAN PARLIAMENT and of the COUNCIL on copyright in the Digital Single Market. COM(2016) 593 final. 2016/0280(COD). Brussels, 2016.

60 [6] [7]

[8]

[9] [10] [11] [12] [13]

[14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26]

[27] [28] [29] [30] [31]

S. Giordano et al. / UPRISE-IoT: User-Centric Privacy & Security in the IoT

Alan Ferrari and Silvia Giordano. A study on users’ privacy perception with smart devices. airXiv preprint, https://arxiv.org/abs/1809.00392 2018. Luca Luceri, Davide Andreoletti, Massimo Tornatore, Torsten Braun, Silvia Giordano. Awareness and Control of Geo-Location Privacy on Twitter. Submitted to IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2019. Davide Andreoletti, Luca Luceri, Felipe Cardoso, Silvia Giordano. Infringement of Tweets GeoLocation Privacy: an approach based on Graph Convolutional Neural Networks. airXiv preprint, https://arxiv.org/abs/1903.11206 2018. Castelluccia C., Cunche M., Le M´etayer D., Morel V. Enhancing Transparency and Consent in the IoT In: International Workshop on Privacy Engineering (IWPE2018) Cunche M., Le M´etayer D., Morel V. A Generic Information and Consent Framework for the IoT. In: 18th TRUSTCOM19, 2019. Working Party 29 Guidelines on consent under Regulation 2016/679, 2017 Working Party 29 Opinion 8/2014 on Recent Developments on the Internet of Things, 2014 EUROPEAN COMMISSION Proposal for a Regulation of the European Parliament and of the Council concerning the respect for private life and the protection of personal data in electronic communications and repealing Directive 2002/58/EC (Regulation on Privacy and Electronic Communications) January 2017 Working Party 29 Opinion 01/2017 on the Proposed Regulation for the ePrivacy Regulation (2002/58/EC), December 2017 Morel V. Privatics Inria PhD thesis: Enhancing Transparency and Consent in the Internet of Things. To be published C. B¨osch, P. Hartel, W. Jonker, and A. Peter A Survey of Provably Secure Searchable Encryption. ACM Computing Surveys, 2017. F. Bao, R. H. Deng, X. Ding, and Y. Yang Private Query on Encrypted Data in Multi-user Settings. Information Security Practice and Experience, 4th International Conf. ISPEC 2008, Sydney, Australia, R. A. Popa and N. Zeldovich Multi-Key Searchable Encryption. IACR Cryptology ePrint Arch., 2013 ¨ C. Van Rompay, R. Molva and M. Onen A leakage abuse attack against multi-user searchable encryption. PETS 2017, Minneapolis, USA D. Cash, P. Grubbs, J. Perry, and T. Ristenpart Leakage-Abuse Attacks Against Searchable Encryption. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 2015 ¨ C. Van Rompay, R. Molva and M. Onen Fast Two-Servers Multi-User Searchable Encryption with Strict Access Pattern Leakage. ICICS 2018, Lille, France ¨ C. Van Rompay, R. Molva and M. Onen Secure and scalable multi-user searchable encryption. SCC 2018, Songdo, Incheon, Korea. Diffie W. and Hellman M. New Directions in Cryptography. In: Journal IEEE Transactions on Information Theory. Volume 22, Issue 6, Pages 644-654, Nov. 1976. De Montjoye, Y.-A., C. A. Hidalgo, M. Verleysen, and V. D. Blondel, Unique in the crowd: The privacy bounds of human mobility. Scientific Reports 3, 1376, 2013. Doshi-Velez, F. and B. Kim, Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017 Konecn´y, J., H. B. McMahan, F. X. Yu, P. Richt´arik, A. T. Suresh, and D. Bacon, Federated learning: Strategies for improving communication efficiency. In NIPS Workshop on Private Multi-Party Machine Learning, 2016. Rossi, L. and M. Musolesi, It’s the Way You Check-in: Identifying Users in Location-based Social Networks. In Proceedings of COSN’14, NY, pp. 215–226. ACM 2014. Shokri, R., G. Theodorakopoulos, C. Troncoso, J.-P. Hubaux, and J.-Y. Le Boudec. Protecting location privacy: optimal strategy against localization attacks. In Proceedings of CCS’12, pp. 617627, 2012. Rossi, L., M. J. Williams, C. Stich, and M. Musolesi, Privacy and the City: User Identification and Location Semantics in Location-Based Social Networks. In Proceedings of AAAI ICWSM’15, 2015. Rossi, L. and M. Musolesi, Spatio-temporal Techniques for User Identification by means of GPS Mobility Data. EPJ Data Science 4(11), 2015. Baron, B. and M. Musolesi, Interpretable machine learning for privacy-preserving pervasive systems. IEEE Pervasive Computing. To Appear.

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved. doi:10.3233/AISE200005

61

Making the Internet of Things More Reliable Thanks to Dynamic Access Control Anne GALLONa, Erkuden RIOSb, Eider ITURBEb, Hui SONGc, Nicolas FERRYc a EVIDIAN, Les Clayes-sous-Bois, France b FUNDACIÓN TECNALIA RESEARCH & INNOVATION, Derio, Spain c SINTEF Digital, Oslo, Norway

Abstract. While the Internet-of-Things (IoT) infrastructure is rapidly growing, the performance and correctness of such systems becomes more and more critical. Together with flexibility and interoperability, trustworthiness related aspects, including security, privacy, resilience and robustness, are challenging goals faced by the next generation of IoT systems. In this chapter, we propose approaches for IoT tailored access control mechanisms that ensure data and services protection against unauthorized use, with the aim of improving IoT system trustworthiness and lowering the risks of massive-scale IoT-driven cyber-attacks or incidents. Keywords. Internet-of-Things, Trustworthiness, Access Control, Context, Dynamicity, Security, Privacy.

1. Introduction By 2021, Gartner envisions that 25 billion Internet-of-Things (IoT) endpoints will be in use 1, representing great business opportunities. However, complex challenges remain to be solved to efficiently exploit the full potential of the rapidly evolving IoT technologies. The performance and correctness of such systems will be critical, ranging from business critical to safety critical. Thus, aspects related to trustworthiness such as security, privacy, resilience and robustness, are still unsolved challenges of paramount importance for next generation of IoT systems [1]. Access control and identity governance mechanisms are cornerstones of security and privacy, which is today focused on addressing people accessing IT applications. In the context of IoT, access control needs to be extended to address not only people accessing the Internet of Things, but also to manage the relationships between connected things. This requires designing and building new access control mechanisms for authorizing access to and from connected things, with ad hoc protocols while still being able to address traditional access to IT applications. The key challenge for access control in IoT is dynamicity. IoT systems are changing all the time: Devices keep entering and exiting the system; The same devices may be used in different context; New connections emerge among the devices; etc. For such 1 https://www.gartner.com/en/newsroom/press-releases/2018-11-07-gartner-identifies-top-10-strategiciot-technologies-and-trends

62

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

highly dynamic IoT systems, access rights from people to devices, and from devices to devices, are not immutable. On the one hand, the access right may vary according to the context change. Take an eHealth scenario as an example, where senior adults use IoT devices to monitor their physiological data such blood pressure. In the normal, day-today context, only the user himself should have the access right to the data, due to the privacy concern. However, in a special context, such as under emergency rescue, medical staff should be granted with the access right the journal with historical physiological data. Therefore, the decision of access right in IoT systems must be made with awareness of the context. On the other hand, the distribution of IoT systems and the instability of connections between devices require that those decisions must be made in a distributed way. To answer this challenge, the objective we have set ourselves is to develop an authorization server with dynamicity that is tailored to context and architecture changes of IoT systems. Today, no protocol can deliver dynamic authorization based on context for both IT and OT (operational technologies) domains. Our work described in this chapter proposes to deal with these considerations, by providing dynamic access control mechanisms for IoT systems based on context awareness and risk identification, in order to ensure data protection by controlling access to (personal) data and resources which are usually distributed in the IoT environment. Our solution controls the access of all the actors (end-users, services, devices, administrators) to the data managed by smart IoT systems (SIS) which contributes to system trust by ensuring integrity and confidentiality of the operated data and resources, and by providing data security and privacy to Operation phase of IoT system engineering process. We present two complementary dynamic access control mechanisms tailored to IoT which can be adopted individually or combined depending on the needs of the IoT system resources access policies: i) a Context-aware Access Control to be used in cases when the security policy requires reasoning over changing context conditions that impact permissions to access resources and ii) a distributed Access Control mechanism based on distributed agents able to evaluate independent access policies close to the target resources. Both approaches are based on industrial standards, i.e., XACML [13] and OAuth 2.0 [16]. We implement the approaches into proof-of-concept tools, as part of the ENACT toolset. ENACT [18] is a research project with the overall goal to enable DevOps in the realm of trustworthy smart IoT systems. ENACT will provide an integrated DevOps Framework composed of a set of loosely coupled enablers that can be easily integrated with existing IoT platforms via plug-in mechanism. As shown in Figure 1, the ENACT enablers are categorized into three groups as follows: (i) the toolkit for the continuous delivery of smart IoT systems, (ii) the toolkit for the agile operation of smart IoT systems, and (iii) the ENACT facilities for trustworthiness. The dynamic access control tools are part of the third group, and provide the reusable facilities for trustworthiness solutions, which can be integrated into IoT applications under development. The dynamicity achieved by the access control facilities also provide the other tools with the capability of continuously evolving the security and privacy policies of the IoT applications. These advanced and IoT tailored context-aware access control and authorization mechanisms and tools for trustworthy smart IoT systems will advance state-of-the-art techniques for managing the accesses of both devices and users. The success of this work will lead to bring the potential to accelerate the adoption of the IoT, by improving smart IoT system trustworthiness and lowering the risks of massive-scale IoT-driven cyber-

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

63

attacks infringes. This will enable the full potential of IoT systems in the future digitalized society.

Figure 1. The dynamic access control tool in the ENACT framework

The rest of the chapter is organized as follows. Section 2 introduces the industry standards of access control protocols which we use as the basis of our approach. Section 3 and Section 4 presents our approaches of context-aware and distributed access control tailored for IoT, and the proof-of-concept implementations. Section 5 compare the approach with related work. Section 6 concludes the chapter with our future plans.

2. Background: Industry standards of Access Control protocols 2.1. The traditional dynamic access control chain based on the XACML model Although in 2013, a Forrester analyst wrote a blog 2 proclaiming that XACML (eXtensible Access Control Markup Language) was dead, in fact some years later this is not so obvious, and therefore a first approach is to study how the traditional dynamic access control chain based on the XACML model could help to answer the challenge of securing the Internet of Things. XACML is a policy-based management system that defines a declarative access control policy language implemented in XML and a processing model describing how to evaluate authorization requests according to the rules defined in policies. As a published standard specification, one of the goals of XACML is to promote common terminology and interoperability between authorization implementations by multiple vendors. XACML is primarily an Attribute-Based Access Control (ABAC) system, where attributes associated with an Entity are inputs into the decision of whether a given Entity may access a given resource and perform a specific action. The XACML model supports and encourages the separation of the authorization decision from the point of use. When authorization decisions are baked into client applications, it is very difficult to update the decision criteria when the governing policy changes. When the client is decoupled from the authorization decision, authorization policies can be updated on the fly and affect all clients immediately. 2

https://go.forrester.com/blogs/13-05-07-xacml_is_dead/

64

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

The access control chain based on the XACML model is depicted in Figure 2.

Figure 2. The dynamic access control chain based on the XACML model

In this chain: x The Policy Decision Point (PDP) evaluates access requests against authorization policies before issuing access decisions. x The Policy Enforcement Point (PEP) intercepts the user’s access request to a resource, makes a decision request to the Policy Decision Point to obtain the access decision (i.e. access to the resource is approved or rejected), and acts on the received decision. In fact, this approach is dynamic by essence, since the access control decisions are made based on attributes associated with relevant entities. In addition, it offers a powerful access control language with which to express a wide range of access control policies. But the following points make this approach prohibitive: x An approach based on rules is difficult to administer. Defining policies is effort consuming. You need to invest in the identification of the attributes that are relevant to make authorization decisions and mint policies from them. In addition, the ABAC system introduces issues, most notably the 'attribute explosion' issue and, maybe more importantly, the lack of audibility. x Although Service-Oriented Architecture and Web Services offer advanced flexibility and operability capabilities, they are quite heavy infrastructures that imply significant performance overheads. x Since XACML has been designed to meet the authorization needs of the monolithic enterprise where all users are managed centrally, this central access control chain is not suitable for Cloud computing and distributed system deployment, and it doesn’t scale the Internet.

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

65

2.2. The new approach based on OAuth 2.0 Another approach has been studied, based on the OAuth 2.0 industry-standard protocol for authorization. This new approach is depicted in Figure 3.

Figure 3. A new approach based on OAuth 2.0

In this approach, a client can access a resource on behalf of a user through an authorization delegation mechanism. This assumes that the user has given his consent for the requested scopes. As a major advantage, this protocol can be implemented in a light way, by leveraging HTTP and REST-based APIs. In fact, OAuth 2.0 supports the mobile device application endpoint in a lightweight manner. Its simplicity makes it the de-facto choice for mobile and also non-mobile applications. Due to the growing importance of Cloud technologies and APIs, the REST architecture is now heavily favored. In addition, OAuth 2.0 allows a fluid integration with role management: OAuth 2.0 scopes can be used to provide role-based authorization. But this protocol does not have the granularity of XACML in terms of rules. And another point is still an obstacle to meet the need of an IoT Context-aware Access Control: the dynamicity, for situation awareness, is not delivered by design.

3. A Context-aware access control approach for IoT 3.1. Objectives Even if identification and authentication of both users and nodes (things, edge, etc.) are fundamental for trustworthy IoT systems, the focus of this work will be on authorization. Thus, the objective is to offer an innovative access control mechanism to meet the security needs of the Internet of Things. Our aim is to provide mechanisms for controlling the security, privacy and trustworthiness behavior of smart IoT systems. This includes reaction models and mechanisms that address the adaptation and recovery of the IoT application operation on

66

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

the basis of the application context, in order to deliver dynamic authorization based on context for both IT and OT (operational technologies) domains. The Internet of Things links many devices such as sensors, cameras or smartphones to the Internet. These devices have the capacity to act as sensors or actuators in their environment, while the context can continuously change and evolve. The environmental data are considered as dynamic and give crucial information about a context (state of devices, user’s behavior and location, etc.). The traditional mechanisms of access control do not use these contextual data while doing authorization decisions. The context-aware access control mechanisms described in this chapter will provide context-aware risk & trust-based dynamic authorization mechanisms, through an IAM (Identity and Access Management) gateway for IoT that includes next-generation authorization mechanisms. The aim is to ensure (i) that an authenticated IoT node accesses only what it is authorized to and (ii) that an IoT node can only be accessed by authorized software components. Access authorizations will be adapted according to contextual information. Context may be for instance the date and time an access authorization is requested, or the geolocation of this request; it may be also composed of a set of information about the status of the underlying infrastructure, the physical system status, SIEM alerts, for example to make certain information more widely available in the case when an alarm has been triggered. 3.2. The solution Due to the disadvantages observed on the traditional dynamic access control chain based on the XACML model, we turned to a solution based on OAuth 2.0. But to achieve the goal we set ourselves, which is to provide an IoT context-aware access control mechanism, we must fill the gap to deliver dynamic authorizations based on context by using the OAuth 2.0 protocol. Starting from security features for identity management and access control based on the protocols OAuth 2.0 and OpenID Connect (OIDC), our approach is to develop an evolution of these authentication and authorization mechanisms intended for the Internet of Things. Due to the dynamicity of the data concerning the environment of a person, this contextual information must be used to manage and adjust the security mechanisms, i.e. consider contextual information in the identification of the entity requesting access and in the evaluation of the conditions to grant access. By assessing the applicability of OAuth 2.0, our IoT context-aware access control will leverage it as a key protocol for interoperability. Our work will address problems of adding dynamicity to the authorization decisions produced by OAuth 2.0 even if it is not meant for that. This dynamic capability will be in charge of evaluating contextual information and insert it in authorization decisions. 3.3. Proof of concept: the ENACT Context-aware Access Control tool The context-aware access control tool provides an authorization mechanism that issues access tokens to the connected objects after successfully authenticating their owner and obtaining authorization. This authorization mechanism uses the OAuth 2.0 protocol, which provides authorization delegation mechanism. Following this protocol, an object can access a backend API by using an access token containing the list of claims (i.e. user’s attributes, e.g., user name, email address, etc.) and scopes (read-only, read/write)

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

67

that an authenticated user has consented for this object to access. This mechanism may be coupled with contextual information to adapt the access authorizations according to them (for example to make certain information more widely available in some urgent case). The context-aware access control tool provides access tokens that allow a reverse proxy working as an API Gateway to control the access to applications and APIs. The scopes and claims contained in the access tokens are used to restrict accesses to the backend server APIs to a consented set of resources. The authorization mechanism can be coupled to a multi-level, multi-factor Authentication Server that provides strong authentications mechanisms to the users. This mechanism mitigates the level of authentication required depending on the user’s environment context and an external context. The risk is a value computed either statically, depending on a defined configuration, or dynamically by using a REST API to dialog with an external decision engine. The input used to compute the risk is the user’s session context, which contains the browser DNA (i.e. the user’s browser fingerprinting which consists in uniquely identifying a web browser through its configuration: time zone, screen resolution, character font, user-agent), the service the user wants to access and the configured trust level of this service, the access time and the trust planning associated to the service, and the IP address and its geolocation. Depending on the evaluated risk of the user’s session, the level of the required authentication will be leveled up, or, if the risk is too high, the connection will be refused. Figure 4 shows a use case example of the context-aware access control tool integrated in Evidian Web Access Manager (WAM 5). The numbers associated to arrows in this schema are mentioned in parentheses in the next paragraph.

Figure 4. Use case example of the Context-aware Access Control

During the enrolment phase, a connected object is associated with a user (1); then the connected object can push data to a backend server in a controlled way (3) by using 5

https://www.evidian.com/products/web-sso/

68

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

the access token it received (2) from the access control tool. The backend server stores this data and can display it within an application. The authorized users and applications can retrieve the data from the backend server (4). WAM plays a pivotal role between all these exchanges by making authorization decisions depending on the context (5). The context-aware access control tool includes a scoping system. The scopes are used by an application to authorize access to a user information. In the OAuth 2.0 protocol, the scopes system only returns a set of static user attributes. The context-aware access control tool extends the protocol to also consider dynamic attributes which provide contextual data on the user and his devices. That way, the access control can be adapted depending on the context which can be continuously evolving, in order to make the access rights more secure and efficient in function of the current environment. The access control tool directly communicates with a Context Server (6) to make dynamic access controls based on the context information during the authorization phase. For example, it can reject the authorization if the access token is valid while other context information does not respect the authorization policy. The authorization policy is a set of rules that define whether a user or device must be permitted or denied accessing to a resource. An administrator can control this adjustment and create special authorization rules based on risk levels computed from the context data provided (7). In this architecture, two components are providing the Context-aware Access Control mechanisms: x x

The context server. The Context Server exposes a REST API that provides contextual data on the user and his devices. These data are dynamic attributes and come from other external sources (sensors, other applications, etc.). The access control Tool. The access control tool is composed of an authorization server associated to a post authorization plugin, to add more controls during the authorization phase. Its purpose is to check if the request is authenticated and is authorized to access the backend server.

Indeed, each time a device sends a request to a backend server, the access control tool can check the dynamic scopes about the user associated to the device that performs the request and realize special actions according to this information like blocking the request or limiting the accessible scopes. The Post authorization plugin extends the basic authorization phase and is entirely customizable. Any operation can be executed during the authorization phase, including calling external programs, and in particular the context server. From the provided contextual information, the idea is to compute a risk value that will be used to apply context-aware dynamic scopes. The post authorization plugin can create injection variables that can be reused and injected in the initial request sent to the backend server. The Administrator can configure the access control tool to adjust the authorization security rules according to the dynamic attributes.

4. A distributed access control approach for IoT Considering the large number and great diversity of interconnected resources in IoT systems, it would likely be the case that one access control solution does not fit all the

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

69

possible scenarios or needs. Therefore, for some IoT systems it would be necessary to adopt complementary mechanisms that together ensure secure access to resources. In this chapter we describe a distributed access control solution conceived as a mechanism that could be easily integrated in the IoT system components and activated at runtime when needed. These control mechanism responds to the need of securing the access to resources and services in the distributed IoT system components, while having the enforcement of security policies continuously controlled. The solution is based on enforcement agents developed in ENACT as preventive security mechanisms or controls that are managed by the ENACT Framework. These agents are an IoT-tailored evolution of MUSA Enforcement agents [11] which in turn were built on top of existing open source solutions. The major innovation resides in having a tool in the ENACT Framework as the single point of management for orchestrating multiple agents and mechanisms that address diverse security properties on the IoT system. In this chapter we only describe those enforcement agents related to Access Control (AC). The AC agents developed rely on XACML policy specification standard by OASIS explained in previous section. The AC agents check whether the policy rules evaluate to true or false and the enforcement of the access (grant or deny the access) will be done by an external entity (e.g. the IoT platform) according to the result. Note that the power of the access control performed depends on the granularity of attributes taken into account in the XACML rules. The finer the granularity the richer the possibilities. To this end, an actionable AC agent-based distributed enforcement has been developed so as the AC agents can be deployed by the GeneSIS Orchestration tool (within the ENACT Framework) together with the components of the IoT system. These AC control agents will be external to the SIS components and managed externally by the operators. The agents are able to send events at the IoT application level that serve a double purpose: i) allow the continuous control of the good performance of the agent and ii) detect security anomaly by processing and correlating agent events with other data from system and network layers. In the following subsections we elaborate architectural details of the solution. 4.1. Proof of concept: the ENACT Agent-based Access Control 4.1.1. Architecture We prototyped the solution as a distributed Access Control tool part of the ENACT framework for the DevOps of smart IoT systems. The overall architecture of the tool is depicted in Figure 5. The AC agents are deployed to work together with the IoT application system components, so they can enforce the authorisation policies to the resources in these components. Every AC agent is controlled through control actions (e.g. access policy updates) that can be sent either by the Control Manager or by the SIS operator through the GUI displayed in the Dashboard. The Control Manager can be set with predefined rules based on learned patterns of cyber threats. In addition, the distributed AC agents send events to the Streaming bus, which will be later stored and displayed in the Dashboard. In the rest of this section, we elaborate the main components of this architecture.

70

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

Figure 5. The agent-based distributed Access Control tool in the ENACT framework

4.1.2. The Access Control agents The AC mechanisms are deployed as agents distributed in the IoT system that provide authorisation services to IoT system resources. They can be considered as security controls that can be activated, deactivated or configured when needed at runtime. These agents are deployed to work together with IoT application components, while they can be installed on the same infrastructure (smart device, IoT platform, etc.) where the application component is deployed or in another, the closer to the resource, the faster the access. AC agents of two types are designed: a) agents that can be used when HTTP communications are used between the access requester and the resource (service or data), and b) agents that are tailored to non-HTTP type of communications. It is interesting to note that both types of agents rely on the XACML technology, though the non-HTTP AC agent can be easily adapted to interpret policies in other languages. More technical details on both classes of AC agents follow: a) AC agent for HTTP communications: The AC agent developed in Node.js [12] is intended to be integrated into the application as a reverse proxy that intercepts all requests to a target and ensures that the requester has the rights to reach/consume the target. The agent includes a rule engine based on XACML policies [13]. The agents use the JSON Web Token open standard RFC 7519 [14] as self-contained security resource for securely transmitting information between communication parties in the form of a JSON object. The mechanism includes

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

71

the creation of access tokens that are used to assert a number of claims in the service requests. When a user needs to consume a service protected by the AC agent, she needs to include the token as HTTP Header on the service request. The request will be intercepted by the AC agent, and according to the user attributes in the token, the agent will evaluate the XACML policy and grant or reject the access to the service. The XACML model supports and encourages the separation of the access decision, the point of use, and the management of the policies. In this implementation the access decision (XACML PDP) and the point of use (XACML Policy Enforcement Point, PEP) resides on the same instance to improve the performance. The AC agent includes the following features: x

A XACML PEP (Policy Enforcement Point) that intercepts access request and ensures XACML PDP decision.

x

A XACML PDP (Policy Decision Point) that evaluates XACML policies.

x

A XACML PIP (Policy Information Point) that provides external information to the PDP, such as LDAP attributes. It is implemented as data retrievers that get information from the request (origin, date, IP address, user email, user name...).

x

A JWT client to verify JWT tokens.

The AC agent does not offer an external REST API, but it works as a reverse proxy that intercepts all the requests to the protected service, i.e. offers an access control mechanism to REST services based on XACML rules. In order the agent can apply the XACML rules, each request must include a JWT token with the requester information. To protect an IoT application component service, for each request sent by clients to consume this service, the AC agent will execute the following steps: 1. 2. 3. 4. 5. 6.

Intercept any request to the Backend service. Extract user information from JWT Token located in request header. Get context data from request. Make access control decision based on policies, user information and context data. If decision is permit it will redirect the request to Backend service. If decision is not permit the request will be rejected.

The AC agent can evaluate all and only the attributes included in the JWT. Therefore, the power of the Attribute Based Access Control performed by the AC agent depends on the granularity of attributes taken into account in the XACML rules. Note that the management of the policies (XACML Policy Administration Point, PAP) is not supported inside the agent. The XACML access policies would be defined in the Policy Administration Point (PAP). As explained below, the Dashboard within the ENACT Framework provides a PAP functionality to view and edit the XACML rules (in the form of JSON files) as well as automatically communicate them to the AC agent.

72

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

b) AC agent for non-HTTP communications: When the communications are not HTTP it is not possible to adopt the reverse HTTP proxy paradigm and therefore, the AC agent includes only the part of the PDP which evaluates pre-defined security policy rules over access requests where the attributes of the request context (e.g. requesting IP, user role, etc.) are no longer taken from the HTTP protocol but rather they need to be sent to the agent together with the rule evaluation order. This is exactly the case of the IoT systems of the ENACT project use cases. Therefore, the PDP agent does not perform the interception of the access request but requires an explicit order to evaluate the access policy. The interface of the PDP Agent being developed in ENACT will offer at least the following services: • updatePolicy(policyID): updates the indicated XACML policy stored in the agent. • evaluatePolicy(policyID, attributes): boolean evaluates the rules in the XACML policy for a specific set of attributes sent as parameters. • start: starts the agent running. • stop: stops the agent. 4.1.3. The Control Manager The Control Manager acts as a centralized management hub for the distributed enforcement agents. This component is in charge of analyzing the events sent by the distributed agents and controlling whether the policies are being correctly applied. When an enforcement agent is launched, the first thing that it does is to inform the Control Manager that a new agent is alive, and the Control Manager registers the ID of the new agent which will be used for identifying the agents in the Dashboard. 4.1.4. The streaming bus This component is the typical message broker or bus where agents subscribe and unsubscribe to topics, so they can communicate their events to the Control Manager within the ENACT Framework. The event bus is implemented in Apache Kafka [15] . The events emitted by the internal services of the agents and exchanged through this bus are: • Agent events: On every transition, a public event is emitted by the agent to notify on internal state transitions. • Proxy events: events fired by the internal proxy service of the agent, e.g. when a request to a resource is proxied, i.e. these are actually access control events. • Log events: events fired by internal logger service of the agent, e.g. log a value of a token. 4.1.5. The Dashboard This module refers to the main Enforcement GUI that allows the management of different preventive enforcement agents to be deployed for the IoT system in order to work with the enforcement agents and apply at runtime diverse security mechanisms such as Identity Management and Access Control in the resource usage. The Dashboard allows for registering and configuring the agents (endpoints to protect, etc.) as well as setting up the XACML policies that each agent will enforce. The policies

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

73

need to be in JSON file format and the Dashboard includes a JSON Editor that gives support to the editing of the file. The manual edition by the operators is optional and automatic updates of the JSON files is also supported.

5. Related work The academic and industrial state-of-the-art solutions for IoT Access Control lacks dynamic and adaptive capabilities, through IoT system context-awareness, as well as with risk and trust as potential sources of context information. Dynamic access control. Mahmud et al. [2] have stated that several IoT-centric security issues might be unnoticed or poorly addressed by the security researchers, as this paradigm is not full-fledged yet. A key requirement they identified is access control: the act of ensuring that an authenticated IoT node accesses only what it is authorized to. Cvitic et al. [3] analyzed the security aspects for each layer of the IoT architecture: the biggest security risk is at the perception layer of the IoT architecture due to the specific limitations of devices and the transmission technology used at this layer, followed by the middleware layer based on cloud computing and inherited vulnerabilities of that concept. Fall et al. [4] have learned that cloud computing infrastructures do not use dynamic access control, but static traditional mechanisms, despite the highly dynamic nature of cloud computing capabilities. Farooq et al. [5] confirmed that, in the future, more security techniques (such as risk assessment) must be explored in each architectural layer. More approaches can be found in a survey by Ravidas, et al. [20] Context awareness. Ramos et al. [21] summarized different types of context for IoT devices, and their impact of security and privacy. Jagadamba et al. [6] studied adaptive security schemes based on context. Context-awareness enhances the effectiveness of the mechanisms by incorporating contextual data into a decision-making process. This capability of taking grey decisions instead of black-or-white is particularly key in environments where perimeter security is not enough anymore, especially for cloud and IoT infrastructures. Habib et al. [7] have identified 3 types of context (physical, computing, user-related), with 4 approaches (category, context-awareness, context learning, context modelling). Interestingly they identified active or passive context awareness (contextual changes are automatically discovered or statically presented), as well as sensed (taken from the processes’ environment) and derived (computed on the go). Our approach follows the same direction and provide a solution of context awareness for access control in IoT. Risk-based access control. Dankar et al. [8] learned that different risk classes are identified ahead of time and each class is matched with a protection level. In their solution, an access request to a resource undergoes automated risk assessment and it is accordingly classified into one of the predefined classes. The appropriate protection level is then applied to the requested data. While analyzing competing smart home frameworks, Fernandes et al. [9] refined this by considering that device operations are inherently asymmetric risk-wise, and a capability model needs to split such operations into equivalence classes. An on/off operation pair for a light bulb is less risky than the same operation pair for an alarm. They proposed splitting/grouping objects’ capabilities based on risk, hence with the possibility to select the granularity. From the range of granularities observed, none was risk-based. Atlam et al. [] use user context, resource sensitivity and risk history to analyse the security risk for each access request, and adjust access control accordingly. Fall et al. [4] learned that many researchers define a risk

74

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

formula for a given user or object, but on an insufficient set of parameters (e.g., focusing on requestor but not on the resource accessed). They learned also that the main issue with risk-aware access control is the cost of computation. The benefit is that risk is evaluated for each access request, but this is costly in terms of computation. Their proposition does not solve the issue. In our approach, we introduce risk analysis into Context-aware Access Control as a post authorization step to adjust access control scope. Privacy concerns. Privacy is an essential part of planning for cyber secure systems, and is getting more and more important for IoT systems as they are moving closer to personal users. Authors in [17] exemplify hands on the “most severe, yet easy to abuse” IoT threats, namely: leakage of the personally identifiable information (PII), leakage of sensitive user information and unauthorised execution of functions. Hiller et al. [10] put a focus on involving privacy in risk management, while analysing the NIST Privacy Framework, and confirmed that adaptive capability is a cornerstone for the resilience of privacy. Our approaches focus on the first safeguard towards IoT privacy protection, by preventing unauthorized and risky data access, and achieve the adaptive capability of access control according to the context and system changes. Contribution of trust. Jagadamba et al. [6] learned that conceptually trust is a parameter used to exchange information regarding the entity’s actions through belief and faith. Positive behaviors increase the trust while negative behaviors decrease the trust upon the entity. Trust is classified into proofs (certified information –such as identity, property and authorization- issued by a certification authority or from other central controlled systems) and indicators (possible factors collected from various sources). Dynamic access control is an attempt to combine proofs and indicators, considering not only the identify of data requestor, but also the context and system status during runtime.

6. Conclusion We presented the IoT Access control mechanisms, including both Context-aware Access control and distributed access control agents, developed as part of the ENACT H2020 project DevOps Framework which offers novel solutions to address challenges related to the development, operation, and quality assurance of trustworthy smart IoT systems. The presented Access control mechanisms are under development and the Context-aware Access control solution will be integrated in the Evidian standard offer as “an IAM gateway for IoT”, while access control agents are part of Tecnalia open source solution portfolio. The lesson learned so far is that we are committed on a powerful and promising approach that offers authorisation dynamicity to different types of scenarios. We are comforted in the idea that the OAuth 2.0 framework remains the foundational industry approach for providing authentication and authorization to REST-based APIs, but it is a mistake to think of OAuth 2.0 as a simple protocol by underestimate its complexity. Furthermore, building a generic solution for multiple situations requires a sophisticated design, compared to deploying OAuth 2.0 for one use case only. The management of scopes and claims is key. Too many scopes make administration difficult, whereas too few scopes degrade security with over-entitlements. At this stage, the key topics of further research in context-aware Access Control are as follows: x A robust and scalable approach for scope management must be defined, by injecting scopes in OAuth 2.0 “Device flow”, applicable to IoT use cases.

A. Gallon et al. / Making the Internet of Things More Reliable Thanks to Dynamic Access Control

75

x

Contextual information inside scopes must be exploited, to be able to apply dynamic scopes via notions of risk/trust, and so enable situation-aware dynamic access control behaviors. The ENACT use cases (eHealth, Smart Buildings and Intelligent Transportation Systems) will give us the means to validate these concepts.

References [1] IEC: IoT 2020: Smart and secure IoT platform. IEC white paper (2016). [2] H. Mahmud, F. Maziar. and H. Ragib (2015), “Towards an Analysis of Security Issues, Challenges, and Open Problems in the Internet of Things”. [3] I. Cvitic, and Vujic, M. and Husnjak, S. (2015), “Classification of Security Risks in the IoT Environment”, 26th DAAAM International Symposium on Intelligent Manufacturing and Automation, p731-740. [4] D. Fall, T. Okuda, Y. Kadobayashi, and S. Yamaguchi (2016), “Risk Adaptive Authorization Mechanism (RAdAM) for Cloud Computing, Journal of Information Processing, Vol 24 No.2, p371-380. [5] M. U. Farooq, M. Waseem, A. Khairi and S. Mazhar (2015), “A critical Analysis on the Security Concerns of Internet of Things (IoT)”, International Journal of Computer Applications, Volume 111 No.7. [6] G. Jagadamba, and B. Sathish Babu (2016), “Adaptive Security Schemes based on Context and Trust for Ubiquitous Computing Environment: A Comprehensive Survey”, Indian Journal of Science & Technology, Vol 9 (48). [7] K. Habib and W. Leister (2015), “Context-Aware Authentication for the Internet of Things”, ICAS 2015: The Eleventh International Conference on Autonomic and Autonomous Systems, p134-139. [8] F. Dankar, and R. Badji (2017), “A risk-based framework for biomedical data sharing”, Journal of Biomedical Informatics, Vol 66, p231-240. [9] E. Fernandes, J. Jung and A. Prakash (2016), “Security Analysis of Emerging Smart Home Applications”. [10] J. Hiller and R. Russel (2017),”Privacy in Crises: The NIST Privacy Framework”, Journal of Contingencies and Crisis Management, Volume 25 Number 1, p31-38. [11] E. Rios, E. Iturbe, W. Mallouli and M. Rak (2017, October). Dynamic security assurance in multi-cloud DevOps. In 2017 IEEE Conference on Communications and Network Security (CNS) (pp. 467-475). IEEE. [12] The Node.js project. Available at: https://nodejs.org/en (Retrieved July 2019) [13] eXtensible Access Control Markup Language (XACML) Version 3.0. Available at: http://docs.oasisopen.org/xacml/3.0/xacml-3.0-core-spec-os-en.html (Retrieved July 2019) [14] JSON Web Token (JWT) standard by Internet Engineering Task Force (IETF). Available at: https://tools.ietf.org/html/rfc7519 (Retrieved July 2019) [15] Apache Kafka® distributed streaming platform by Apache Software Foundation. Available at: https://kafka.apache.org/ (Retrieved July 2019) [16] Hardt, Dick. "The OAuth 2.0 authorization framework." (2012). [17] C. Kolias, A. Stavrou, J. M. Voas, I. V. Bojanova and D. R. Kuhn. “Learning Internet of Things Security "Hands-on"”, 2016. [18] ENACT: Development, Operation, and Quality Assurance of Trustworthy Smart IoT Systems. https://www.enact-project.eu/ [19] H.F. Atlam, A. Alenezi, R.J. Walters, G.B. Wills and J. Daniel (2017). Developing an adaptive Risk-based access control model for the Internet of Things. In 2017 IEEE international conference on internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and ieee smart data (SmartData) (pp. 655-661). IEEE. [20] S. Ravidas, A. Lekidis, F. Paci, and N. Zannone (2019). Access control in Internet-of-Things: A survey. Journal of Network and Computer Applications, 144, 79-101. [21] J. L. H. Ramos, J. B. Bernabe, and A. F. Skarmeta (2015). Managing context information for adaptive security in iot environments. In 2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops (pp. 676-681). IEEE.

76

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved. doi:10.3233/AISE200006

The SOFIE Approach to Address the Security and Privacy of the IoT Using Interledger Technologies Dmitrij LAGUTINa, Priit ANTONb, Francesco BELLESINIc, Tommaso BRAGATTOd, Alessio CAVADENTId, Vincenzo CROCEe, Nikos FOTIOUf, Margus HAAVALAb, Yki KORTESNIEMIg, Helen C. LELIGOUh, Ahsan MANZOORi, Yannis OIKONOMIDISh, George C. POLYZOSf, Giuseppe RAVEDUTOe, Francesca SANTORId, Vasilios SIRISf, Panagiotis TRAKADASh, Matteo VERBERe a Department of Communications and Networking, Aalto University, Espoo, Finland b Guardtime, Tallinn, Estonia c Emotion s.r.l., Bastia, Italy d ASM Terni, Terni, Italy e Engineering Ingegneria Informatica S.p.A., Rome, Italy f Mobile Multimedia Laboratory, Athens University of Economics and Business, Athens, Greece g Department of Computer Science, Aalto University, Espoo, Finland h Synelixis Solutions, Athens, Greece i Rovio Entertainment, Espoo, Finland Abstract. The Internet of Things (IoT) suffer from lack of interoperability as data, devices, and whole sub-systems are locked in ‘silos’ because of technical, but mostly business reasons. Many new applications would be enabled and existing ones could be implemented in a more cost-efficient way, if the ‘silos’ could be bridged in a secure and privacy preserving manner. The SOFIE approach provides an effective way of accomplishing this by using interledger technologies that leverage the distributed trust enabled by distributed ledgers. The federated approach of SOFIE facilitates the creation of cross-organisational applications. This chapter presents the SOFIE approach and details the benefits it provides in four real-world pilots. Keywords. Internet of Things, Distributed Ledger Technologies (DLT), blockchains, privacy, security, smart contracts

1. Introduction Fragmentation and lack of interoperability among different platforms is a major issue for the Internet of Things (IoT). Currently, IoT platforms and systems are vertically oriented silos unable (or unwilling) to exchange data with, or perform actions across, each other. This leads to multiple problems: reduced competition and vendor lock-ins, as it is difficult for customers to switch IoT providers or combine IoT devices and data from multiple vendors in a single system, worse security as vendors often use proprietary security solutions that have not been properly audited, worse privacy as vendors usually force their customers to move at least some of their data or metadata to the vendor’s

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

77

cloud, and reduced functionality compared to what better interoperability between platforms would afford. As IoT systems are becoming prevalent in everyday life, lack of interoperability and the resultant reduced use of relevant data is growing into a significant problem for the whole society. IoT systems face many important security and privacy challenges. Since IoT systems interact with the real world, security is extremely important as breaches can cause significant physical damage and even loss of life. Similarly, as using the IoT is becoming a compulsory part of everyday life and IoT devices are able to collect increasing amounts of personal data, people should be able to carry out their lives without compromising their privacy. IoT systems usually contain large numbers of devices, therefore manually configuring and managing every IoT device is not feasible. Hence, security and privacy solutions for IoT must support high degrees of automation. Authorisation mechanisms are an integral part of IoT security. The device owner should be able to authorise other parties to access the device or its data in a secure, flexible, and decentralised manner. Decentralised authorisation is important as it allows authorisation without a central control point, which may become a bottleneck, an increased failure risk or attack target, or require manual work. As there are numerous IoT devices interacting with each other, people, and the rest of the world, strong auditability is also a very important security feature for IoT. This is necessary for the normal operation of the system (e.g., goods have been delivered, therefore the payment should be made), troubleshooting in case of a problem, and dispute resolution between the parties involved. In addition to the above-mentioned challenges, there are security challenges which will not be covered in this chapter. These include IoT device-level security, including the secure firmware updates, and verifying that IoT data is authentic and correct. The latter problem is very difficult to resolve in practice: it is not enough that the device is properly designed, implemented, calibrated, installed and certified, as it is easy to e.g., manipulate a thermometer by installing a heat source next to it. From the privacy point of view, it is important to minimise both the data collection and storage. Especially, long-term storage is dangerous, since encrypted data will be revealed after the used encryption algorithm is broken, which will eventually happen. Protection against correlation attacks should also be provided, as in many situations the service should not be able to identify the user or even be aware that the user has used the service previously. The EU H2020 project SOFIE 1 enables applications to link heterogeneous IoT platforms and autonomous things across technological, organisational, and administrative borders in an open and secure manner, thus simplifying the reuse of existing infrastructure and data, and allowing the creation of open business platforms, which in turn enable new kinds of services. This goal is accomplished by using Distributed Ledger Technologies (DLTs) [1] and interledger techniques – without requiring any modifications to the existing IoT platforms. Decentralised identifiers (DIDs) [2] can be used to manage users’ identifiers in a privacy-preserving way. In the long term this will also enable open data markets, where participants can buy and sell IoT data and access to IoT actuation (or more generally: dictate rules for access to data and actuation) in a decentralised and automated manner.

1 Secure Open Federation for Internet Everywhere (SOFIE), funded by EU’s Horizon 2020 Programme under Grant 779984, 1.1.2018 – 31.12.2020, https://www.sofie-iot.eu.

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

78

The contributions of this chapter include descriptions of: 1) how DLTs, interledger techniques, and DIDs can be utilized to resolve problems with IoT security and privacy 2) how to realise a secure and open federation among heterogeneous IoT platforms and 3) examples of how these techniques can be leveraged in complex real-world systems, namely: (a) food supply chain provenance and transport conditions tracing, (b) electricity distribution grid balancing through electrical vehicle (EV) charging, (c) mixed reality mobile gaming with interactions between real and virtual worlds through IoT devices, and (d) secure sharing of electricity smart meter data. This book chapter is organized as follows: Section 2 provides background information about DLTs and DIDs. The SOFIE pilots are presented in Section 3, while Section 4 describes the SOFIE IoT federation approach. Section 5 highlights benefits of SOFIE from the pilot use cases point of view. Related work is discussed in Section 6, while Section 7 provides a discussion of relevant issues and Section 8 concludes the chapter. 2. Background This section describes key related technologies such as distributed ledgers, interledger technologies, and decentralised identifiers. 2.1. Distributed Ledger and Interledger Technologies Distributed Ledger Technologies (DLTs), such as blockchains, offer decentralised solutions for collaboration and interoperability. One of the main features of DLTs is the immutability of data: ledgers are append-only databases where existing data cannot be modified and only new data can be added. Another major feature of DLTs is a distributed consensus mechanism [3], which controls what and how data is added to the ledger. Finally, DLTs also replicate data to participating nodes thus improving availability. Because of these three properties, DLTs avoid a single point of failure and offer resilience against many attacks. It is relatively easy to determine if any of the participating nodes in the DLT are misbehaving and even in an extreme case where an attacker manages to control the majority of the DLT's resources, the attacker can only control the addition of new data and in some extreme cases modify the very latest previously added data (but not the older data). DLTs can be implemented with different levels of openness. They can be fully open (permissionless), which means that anyone can join the DLT and propose transactions; most well-known DLTs such as Bitcoin 2 and Ethereum 3 are based on this principle. However, DLTs can also be permissioned, either semi-open, in which case read access is open to everyone but write access is restricted, or closed, in which case both read and write access are restricted. Overall, the main practical innovation of DLTs is the enablement of distributed trust. While there have been multiple proposals for distributed databases in the past, they have mostly concentrated on the distributed implementation, while the trust model has remained firmly centralized. In contrast, DLTs allow various entities, such as individuals, organizations, and companies, which may not fully trust each other, to collaborate in a 2 3

https://bitcoin.org/en/ https://www.ethereum.org/

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

79

safe and transparent manner, with only a low risk of being cheated by others. This makes DLTs a natural approach for solving the (business rather than technical) interoperability problem among IoT platforms. Smart contracts [4] are another important feature provided by several DLTs: they are distributed applications that are executed on the ledger. Whenever an entity interacts with a smart contract, all operations are executed by all (full) nodes in the DLT network in a deterministic and reliable way; one of these nodes is selected to store the contract's execution outcome (if any) in the ledger. Smart contracts can verify DLT identities and digital signatures, perform general purpose computations, and invoke other smart contracts. The code of the smart contract is immutable and cannot be modified even by its owner. Moreover, since all transactions sent to a contract are recorded in the DLT, it is possible to obtain all historical values of the contract. Smart contracts typically refer to code running on Ethereum (in which case they are Turing-complete), but similar functionality is available in other DLTs. In particular, in the permissioned Hyperledger Fabric [5], similar functionality is named chaincode, while simpler, more constrained scripts can be run on Bitcoin. Smart contracts or similar functionality is critical for automating processes and will be exploited in the techniques described later. There exists a large number of DLTs, each offering different trade-offs in terms of latency, throughput, consensus algorithm, functionality, etc., thus rendering them suitable for different types of applications. For example, a DLT can focus on cryptocurrency payments, recording of IoT events, or access authorisation. In complex systems it is therefore often not feasible to use only a single DLT for everything, hence the interledger approach that allows different DLTs to exchange data with each other is required in many situations. Using multiple ledgers is also beneficial for privacy reasons: participants within a DLT need to be able to access all data stored in that DLT to independently verify its integrity, which encourages the participants to use private ledgers, and store only a subset of the data to the main ledger used for collaboration with others. Multiple ledgers are also necessary for crypto-agility, as cryptographic algorithms used by DLTs (such as SHA-256) will not stay safe forever, thus it is necessary to have a mechanism to transfer data from one ledger to another. Siris et al. [6] present a review of interledger approaches, which differ in their support for transferring and/or trading value between ledgers, whether they support the transfer of information in addition to payments across ledgers, the balance between decentralised trust and cost (which can include both transaction cost and delay), the level of privacy, and their overall scalability and functionality that can facilitate the innovation of the DLT ecosystem. A concrete example of the use of interledger is the following: Some parties decide to use the Hyperledger Fabric blockchain, which provides low-cost transactions and chaincode for transaction automation, for recording IoT and authorisation-related events. Parties also decide to use the Ethereum blockchain in order to make payments and fully automate the whole process with smart contracts. An interledger mechanism can be used to interconnect these two ledgers in a way that ensures atomic transactions, i.e., either both the authorisation and payment related transactions succeed, or both fail. 2.2. Decentralised Identifiers (DIDs) Currently, an identity technology receiving much attention are decentralised identifiers (DIDs). A key aspect of DIDs is that they are designed not to be dependent on a central issuing party (Identity Provider or IdP) that creates and controls the identity. Instead,

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

80

DIDs are managed by the identity owner (or a guardian on the owner’s behalf), an approach known as self-sovereign identity [7]. There are several different DID technologies in development [8], some of the most prominent being Sovrin4, uPort5, and Veres One6. These technologies started with similar but distinct goals in mind, but lately many of them have adopted the approach and format of the W3C DID specification [2], thus rendering them more and more interoperable. The specification defines a DID as a random string, often derived from the public key used with the identity. If a new DID is allocated for every party one operates/communicates with, correlating one’s activities with different parties would be significantly harder to achieve. This property can be further enhanced by replacing existing DIDs with new ones at suitable intervals, e.g., even after just a single use. Yet DIDs alone do not suffice, as some means of distributing the related public keys, any later changes to the keys, or other identity-related information is required. To this end, many of the DID solutions rely on a DLT for public DIDs (used by parties that want to be known publicly), whereas for private DIDs (e.g., used by individuals) application specific channels are used to distribute the information. Some DID technologies, e.g., Sovrin and Veres One, are launching their own permissioned DLTs, while others rely on existing blockchains (e.g., uPort is built on top of Ethereum). All three example technologies originally intended to use DLTs/blockchains for distributing information about DIDs belonging to individuals and IoT devices in addition to organisations, but the emergence of the General Data Protection Regulation (GDPR) [9] in the EU and other similar requirements have made storing personally identifiable information on a nonmutable platform such as a DLT/blockchain problematic. For this reason, Sovrin and Veres One have already excluded individuals DIDs from the ledger – and similar treatment may face the DIDs of IoT devices if they reveal personal information. In many cases, there is also a need to associate machine verifiable properties to the identifier of an entity. This is accomplished with Verifiable Credentials (VCs) [10], which are analogous to traditional authorisation certificates. In a VC, the party issuing the credential (i.e. the issuer) states that according to them, the party about which the credential is made, known as the prover, has the stated properties. These could be e.g., the person’s name, date of birth etc. in the case of driver’s license issued by the police. To rely on a credential to prove something, the prover also has to demonstrate that the credential was issued to them. This can be done e.g., by proving the possession of the private key corresponding to the public key used in the credential (if the credential format supports such information), or with a separate proof built onto the credential. With a suitably created credential, a proof can also be used to only reveal some of the attributes of the credential (known as selective disclosure) or even prove that e.g., one is over a certain age without revealing the actual age attribute (a property known as zeroknowledge proof). 3. SOFIE Pilots This section describes security and privacy challenges of four SOFIE real-life pilots that rely heavily on IoT: 1) agricultural/food supply chain, where produce growth and 4

https://sovrin.org/ https://www.uport.me 6 https://veres.one/ 5

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

81

transportation conditions are tracked from field to fork, 2) power balancing of the electrical grid by offering incentives to EV owners to charge their cars at certain times and locations, 3) mixed reality mobile gaming, where gamers can interact with the real world through IoT devices, and 4) utilizing data from electricity smart meters to develop various applications, e.g., to suggest the best electricity provider for a given user profile. 3.1. Tracking Food from Farm to Fork The farm-to-fork pilot demonstrates a community-supported heterogeneous end-to-end agricultural food chain scenario. The main goal is to provide accountability, immutability, and auditability for both IoT data and transactions between different parties. As a result, consumers can reliably verify the provenance of a specific product from farm to fork. This gives consumers the ability to make decisions about their food based on e.g., health and ethical concerns, including environmental sustainability, fair labour practices, the use of fertilizers and pesticides, and other similar issues. The producers will be able to launch new products with a description, pricing, quantity and photos, while customers may interact with the marketplace, looking for products that fulfil certain requirements or preferences. Enabling immutable transactions also helps with dispute resolution between all the parties involved, reducing the chances of fraud and cutting out corresponding mediation expenses and transaction costs.

1 SF

2 TRA

3 SDC

4 TRB

5 SM

Figure 1. An overview of the SOFIE food-chain pilot, describing how agricultural produce moves from the farm to the supermarket through transporters and distributers

The path from farm to fork is split into 5 segments as depicted in Figure 1, and between segments the produce is handed over to the party responsible of the next segment. Each IoT platform uses its own data management and storage infrastructure (which can be either a database or a DLT). Smart Farm (SF): In the farm, there are multiple sensor nodes capable of measuring e.g. temperature, humidity, wind speed/direction, rainfall, and soil moisture. Transportation Routes A (TRA) and B (TRB): These segments cover the paths from the SF to the Storage & Distribution Centre (SDC), and from the SDC to the Supermarket (SM). The vehicles are equipped with GPS and temperature sensors. Storage & Distribution Centre (SDC): SDC is where the smart boxes with farm crops will be stored until they are transported to the Supermarket. In SDC, a number of sensors monitor, among other parameters, temperature and presence of the boxes. Supermarket (SM): SM contains the storage area, where the boxes are kept until they are placed in the customer area, and the customer area, where the products are available for the customers. Before the products are removed from the smart boxes to be placed to the customers’ area, QR labels are created and applied on the crop packages, enabling the retrieval of the relevant information by customers.

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

82

The security and privacy challenges of this pilot include: how to accurately record the IoT data and handover events of the produce between different parties, how to provide sufficient audit trail for dispute resolutions, and how to minimize the leakage of private data (e.g., real identity of the workers should not be revealed to other party during the handovers). 3.2. Grid Balancing with Scheduled Electrical Vehicles Charging In a second pilot, the goal is to balance a load on a real electricity network, namely the distribution grid of the city of Terni located in central Italy. There, a notable amount of energy is produced locally by distributed photovoltaic plants [11], which on occasion can cause Reverse Power Flow, when unbalances between locally produced and consumed electricity occur. To avoid this abnormal operation [12][13], electrical vehicles (EVs) will be offered incentives to match their EV charging needs with the distribution network’s requirements.

DSO

Fleet Manager

DSO DASHBOARD

Forecast System

FLEET MANAGER DASHBOARD

EV / EVS Interface

Marketplace API

Marketplace API Matchmaking System Verification System

Smart Meters

CS Owner Ethereum Blockchain

Other DLTs (Interledger)

CS

EV EV User

Figure 2. An overview of the SOFIE energy pilot, describing how DSO, EV fleet manager, and EV users utilise a decentralised marketplace to optimize the load on electrical grid

The actors in the pilot, as depicted in Figure 2, are the Distribution System Operator (DSO), who is responsible for grid management, the charging station (CS) Owner, who owns and manages the EV charging stations, the Fleet Manager, who represents EVs in energy price negotiations, and the EV users, who receive information and requests about the optimal scheduling of the charging of their vehicle. The main part of the pilot is a decentralised marketplace enabling DSO and fleet manager to negotiate on scheduled electricity consumption (using EV charging) and associated incentives, thus forming an end-to-end scenario from production via distribution to storage and consumption. Both the DSO and the fleet manager interact with the system through their dedicated dashboards that show near real-time data collected from the two IoT subsystems (i.e. smart meters for the DSO and EV sensors for the fleet manager). The actors create requests and offers accordingly on the decentralised marketplace. From the security point of view, it is important that agreements made on the marketplace cannot be tampered with, there is a secure way to verify that the terms of the agreement have been carried out, and parties will be compensated accordingly after

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

83

the agreement has been fulfilled. From the privacy point of view, it is important to protect privacy of the electric vehicle users, therefore the DSO or the CS Owner should not be able to determine EV users’ real identities or correlate their charging activities. 3.3. Context-aware Mobile Gaming In-game assets are a large market, with rare assets costing thousands of euros. However, currently in-game asset market poses significant risks to the players, since the gaming company can create unlimited instances of the assets, which in turn devaluates them, or the assets can disappear completely if the gaming company closes down. The goals of this pilot are to 1) provide a mechanism for recording asset ownership and trades in a secure and transparent manner, and 2) allow interactions between the mobile games and physical world through IoT devices.

Figure 3. Overview of the SOFIE context-aware gaming pilot

An overview of the pilot is shown in Figure 3. The main actors of pilot include Game player, who can play any challenge, manage their profiles and assets, and claim reward data, through a mobile application, Game company, which is responsible for developing and maintaining the game servers, Challenge designer, who can create new challenges, assets, tasks, and puzzles using the existing game infrastructure, and the Asset designer, who can also list their creation for the trade on the SOFIE platform. Multiple use-cases will be studied throughout the pilot. In the first use case players can collect and trade in-game content (e.g., characters, weapons, equipment, parts, etc.). The second use case will utilize a scavenger hunt location-based game using IoT beacons. The player needs to solve the riddles using the received clues to reveal the location of the IoT beacon, which needs to be visited by the player. The player must perform some tasks (such as viewing an advertisement) to collect the points, which can be later redeemed for rewards. The third use case allows generic trading between IoT resources and gaming assets. For example, as an extension of the IoT beacon use case, gamer who would perform certain real world activity (physical exercise, solving puzzles, etc.) with IoT devices could receive a gaming asset as a reward. The possession or sale of gaming assets could

84

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

in turn enable the gamer to, for example, receive a discount from the vending machine, temporary control of a robot in a mall, or some other IoT resource access. There are several security and privacy issues in this pilot. Securing access control (for both IoT data and actuation) is very important as gamers are interacting with thirdparty IoT devices. The system should also offer an audit trail to help with dispute resolution in case something goes wrong. Furthermore, the owner of IoT beacons or other IoT devices should not be able to track players or determine their real identity. Finally, when IoT resources are exchanged with gaming assets, parties managing abovementioned resources should not be able to determine the other side of the trade. For example, if a player uses the gaming asset to gain access to an IoT resource, the owner of the IoT resource should only see that the player receives the access to the resource, without being able to determine whether access has been granted with the help of gaming assets or by other means (e.g., a monetary payment). In a similar way, if a player receives a gaming asset as a reward for solving physical challenge, other parties should not be able to determine how the gaming asset was received. 3.4. Decentralised Energy Data Exchange The core idea of this pilot is to provide secure data exchange of smart meter data between end users, infrastructure owners, and energy service providers (intermediaries, distributors, and brokers). This in turn enables novel services such as fine-grained energy trading and energy flexibility marketplace. The overview of the pilot is shown in Figure 4 where participants, the SOFIE approach and the added value are presented.

Figure 4. An overview of the SOFIE decentralised energy data exchange pilot

The key input for the pilot is the Estfeed open software platform (connecting 700 000 smart meters in Estonia). In order to demonstrate the cross-border data exchange and transfer of trust between network grid participants, the Danish Datahub (Energinet) will be the secondary input for the pilot. Besides national hubs integration, the pilot will also develop adapters and connection to two other instances: a local IoT network (windfarm) and a household metering point. The main objective of the pilot is to enable trust between parties who exchange energy meter readings, which in turn creates several security and privacy challenges. From the data owner side, it is critical to guarantee control of the data (including the ability to grant and revoke access to/from third parties), as well as to have access to audit

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

85

logs for transparent overview of to whom the data access rights are given and how private data are handled. From the smart meter system operator side (transmission or distribution system operator), there is a need for mechanisms to agree on and prove the responsibility of the smart meter data after the data exchange. Data consumers (brokers, aggregators, and energy traders) need to access authentic smart meter data and be able to reliably verify the whole data provenance chain. The auditors require access to audit logs and tamper-proof evidence of the activities that have taken place in data exchange process. The pilot can be divided into the following two scenarios: 1. Data exchange - covering the full chain from identification, authorisation to requesting and granting access and exchanging the smart meter data. 2. Data exchange verification - including audit logs, tamper-proof evidence in case of disputes, and verification of the integrity of smart meter data. 4. The SOFIE Federation Approach The main goal of SOFIE is to federate existing IoT platforms in an open and secure manner, in order to enable interoperability and without making any internal changes to the platforms themselves. Here, openness refers both to technical aspects (interfaces, implementation, etc.) and to flexible and (at least partially) open business models. A key benefit of SOFIE is that it allows the creation of solutions that connect many individual systems to a whole that provides significant new functionality. The approach also preserves users' privacy and is compliant with the EU General Data Protection Regulation (GDPR), which requires the minimisation of personal data collection. /ŽdĂƉƉůŝĐĂƚŝŽŶƐ /ŽdĐůŝĞŶƚƐ

KĨĨͲĐŚĂŝŶ ĐŽŵŵƵŶŝĐĂƚŝŽŶ ƵƚŚŽƌŝnjĂƚŝŽŶƐ͕ ĂĐĐĞƐƐůŽŐƐ;ŚĂƐŚĞƐͿ͕ ĞǀĞŶƚƐ͕ƉĂLJŵĞŶƚƐ /;ĞĐĞŶƚƌĂůŝnjĞĚ /ĚĞŶƚŝĨŝĞƌͿƌĞƐŽůƵƚŝŽŶ ŝŵƉůĞŵĞŶƚƐĐŽŵƉŽŶĞŶƚ ĨƵŶĐƚŝŽŶĂůŝƚLJͬŝŶƚĞƌĨĂĐĞ ^ŵĂƌƚĐŽŶƚƌĂĐƚ

ƉƉůŝĐĂƚŝŽŶ W/Ɛ /ĚĞŶƚŝƚLJ ƵƚŚĞŶƚŝĐĂƚŝŽŶ ƵƚŚŽƌŝnjĂƚŝŽŶ

^K&/

WƵďůŝĐ>dͬ ƚŝŵĞƐƚĂŵƉŝŶŐ >dƐĞƌǀŝĐĞ

WƌŝǀĂĐLJΘĂƚĂ ^ŽǀĞƌĞŝŐŶƚLJ DĂƌŬĞƚƉůĂĐĞ

/ŶƚĞƌůĞĚŐĞƌ

^K&/ŝŶƚĞƌŶĂů ĐŽŵƉŽŶĞŶƚ

^ĞŵĂŶƚŝĐ ZĞƉƌĞƐĞŶƚĂƚŝŽŶ

^K&/ŝŶƚĞƌĨĂĐĞƐ

ŝƐĐŽǀĞƌLJΘ WƌŽǀŝƐŝŽŶŝŶŐ

WƌŝǀĂƚĞͬ ƉĞƌŵŝƐƐŝŽŶĞĚ >d

WƵďůŝĐͬ ƉĞƌŵŝƐƐŝŽŶĞĚ >dĨŽƌ/Ɛ &ĞĚĞƌĂƚŝŽŶ ĂĚĂƉƚĞƌ

/ŽdƉůĂƚĨŽƌŵƐͬŐĂƚĞǁĂLJƐ /ŽdĚĞǀŝĐĞƐͬĂƐƐĞƚƐ

Figure 5. SOFIE framework architecture

The SOFIE framework architecture is depicted in Figure 5. The lowest level of the architecture contains IoT assets (or resources), that include e.g., IoT sensors for sensing the physical environment, actuators for acting on the physical environment, and boxes with RFID tags that are used to transport products. The IoT assets can be either connected to or integrated in actual devices. IoT platforms include platforms with data stores, where the measurements from sensors are collected and made available to third parties, in addition to servers providing IoT services. The federation adapters are used to interface

86

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

the IoT platforms with the SOFIE framework. This allows the IoT platforms to interact with the SOFIE framework without requiring any changes to the IoT platforms themselves. Moreover, different scenarios and pilots can utilize different types of federation adapters, which implement only the required parts of the SOFIE functionality. The architecture emphasises the interledger functionality responsible for interconnecting different types of DLTs, which can have quite diverse features and functionality. The architecture also illustrates the separation of data transfer and control message exchanges. Some IoT data can be transferred directly between the IoT platforms and IoT clients. Control messages related to authorisation logs, events, payments, etc. go through the SOFIE framework. The other SOFIE framework components [14] are: Identity, Authentication, and Authorization (IAA), which provides identity management and supports multiple authentication and authorisation techniques; Privacy and data sovereignty, which provides mechanisms that enable data sharing in a controlled and privacy preserving way; Semantic representation, which provides tools for describing services, devices, and data in an interoperable way; Marketplace, which allows participants to trade resources by placing bids and offers in a secure, auditable, and decentralised way; and Discovery and provisioning, which provides functionality for the discovery and bootstrapping of services. Finally, in the upper part of the figure are the application APIs, which provide the interfaces for IoT clients and applications to interact with the SOFIE framework. The rest of this section describes the most important components from privacy and security perspective in more detail: interledger, IAA, and Privacy and data sovereignty. SOFIE results are open source and the source code for the SOFIE framework is available from https://github.com/SOFIE-project. 4.1. Interledger The main purpose of the SOFIE interledger component is to enable transactions between actors and devices belonging to different (isolated) IoT platforms or silos. Each IoT silo either utilizes or is connected to one or more DLTs. SOFIE’s pilots and evaluation scenarios will utilize the following ledgers: Ethereum (both private deployments of the Ethereum client code and public test networks such as Rinkeby and Ropsten), Hyperledger Fabric, Guardtime’s KSI blockchain, and Hyperledger Indy. If the federated IoT silo relies upon a ledger, such a ledger can be connected via SOFIE to the degree allowed by both the silo owner and the connected ledger governance or owner, provided that the silo has been enabled to support SOFIE federation. Cross-chain transactions can take different forms depending on the specific scenario and its requirements. For example, interactions between a public and a permissioned ledger can use hashed time-lock contracts (HTLCs) to cryptographically link transactions and events on the two ledgers. In that scenario, the public ledger can record payments while the permissioned ledger can record authorisation message exchanges and IoT events. Alternatively, hashes of records stored on the permissioned ledger can be periodically recorded on the public ledger in order to provide a timestamped anchoring point. This approach exploits the wide-scale decentralised trust provided by the public ledger, while keeping the actual records accessible only by a permissioned set of nodes. Finally, interactions between a public or permissioned ledger and a ledger storing DID documents can focus on the resolution of DIDs to DID documents. This allows the interoperability between different DID implementations and different trust, privacy, and

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

87

cost tradeoffs with different selections of the ledgers for storing the transactions and the DID documents. The interledger functionality can be implemented in different entities, which include the entities that are interacting, a third party, or multiple third parties. In the case of third parties offering interledger services, such services can be provided for some fee. Moreover, in the case where multiple third parties offer interledger services, some coordination between the different parties is necessary. 4.2. Identity, authentication, authorization (IAA) The goal of the Identity, Authentication, Authorization (IAA) component is to provide mechanisms that can be used for entities’ and services’ identification and authentication, and consumers’ authorisation. To this end, it supports the following Identification and Authentication mechanisms: URIs (e.g., Web of Things URIs) for identification coupled with digital certificates for authentication, usernames for identifications bounded to secret passwords used for authentication, and decentralised identifiers (DIDs) associated with a DID documents used for authentication. A popular DID implementation, also considered by our component, is Hyperledger Indy. Consumers’ authorisation is primarily implemented with the widely-used OAuth 2.0 protocol. The IAA component supports plain OAuth 2.0, OAuth 2.0 tailored for constrained devices as defined by the IETF’s Authentication and Authorization for Constrained Environments (ACE) working group, and OAuth 2.0 combined with DIDs. Furthermore, it supports various token types and encodings. In addition to OAuth 2.0, the IAA component supports the UMA (UserManaged Access) protocol. An example of utilising DIDs together with OAuth 2.0 is presented in [15], while general authorisation solutions for IoT utilising DLTs and smart contracts are presented in [16] and [17]. The IAA component can use smart contracts in order to link authorisation decisions with payments, as well as for logging transaction-specific information that can be later used for auditing and dispute resolution. Moreover, authorisation decisions can be linked to IoT events that are recorded on the blockchain. 4.3. Privacy and Data sovereignty The goal of the Privacy and Data sovereignty component is to enable data sharing in a controlled and privacy preserving way. This component considers privacy preservation as a two-dimensional problem. The first dimension concerns the privacy of the data provider, whereas the second dimension concerns the privacy of the data consumer. Data provider privacy is related to the amount and the accuracy of information a third party (including the consumer) can deduce about the provider from all the available data. This can be achieved by reducing or obfuscating the data stored in ledgers. A mechanism to reduce the data is to store only hashes on a more accessible platform. Depending on the use case it could mean storing data in private databases and storing hashes on permissioned ledger, or storing data in permissioned ledger and storing hashes on public ledger. Mechanisms to obfuscate data include differential privacy mechanisms. In particular, data obfuscation can be provided by selecting a special purpose node that acts as a data accumulator and also adds noise to the (encrypted) collected data. An alternative can be adding noise directly at the sources; however, in order to achieve the required degree of privacy and accuracy of the results, this approach requires a large number of sources. The coordination among the entities, namely the data provider, data consumer, and data accumulator, can be achieved through a smart contract. Consumer privacy is

88

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

related to the amount and the accuracy of information a third party (including the provider) can deduce about the consumer during the authentication, authorisation, and payment processes. To this end, this component supports attribute-based access control, where consumers can attest some of their attributes using verifiable credentials and zeroknowledge proofs. The underlying mechanisms support the minimum disclosure of information necessary to obtain a service. Additionally, multiple identifiers can be used to further improve privacy. Data sovereignty is achieved by supporting two access control mechanisms: access control through delegation to an authorisation server and cryptotoken-based access control imposed by smart contracts. The first scheme enables data owners to define an authorisation server (AS), i.e., a special type of mediator that vouches about the eligibility and/or handles payments made by a consumer to access an IoT resource. The second scheme leverages blockchain-backed cryptotokens and enables owners to define access control policies based on these tokens. Cryptotokens can be granted only through a blockchain transaction and blockchain-specific functions, such as transfer, aggregation, etc. can be applied on these tokens. 5. SOFIE Benefits This section describes how the SOFIE approach provides benefits for real-world pilot use cases in terms of interoperability, security, and privacy. From the interoperability and security point of view, smart contracts, immutability, decentralisation, and other properties of DLTs allow high-level of automation, low risk of fraud, and efficient dispute resolution between participants. This enables interoperability between multiple parties in a secure and transparent manner, without requiring changes to the underlying IoT platforms. DLTs also allow maintaining nonrepudiation and transparency without compromising privacy and business secrets by keeping the critical data in private data stores, while storing hashes of that data to DLTs. Only in case of a dispute the actual data will be revealed, and hashes stored in DLT guarantee that the data has not been tampered with in the meantime. DIDs are used to enhance the privacy, since they allow the user to be in charge of their digital identifiers and solely be in possession of the associated private key (in contrast to some schemes that rely on centralised key management and distribution). DIDs also allow identifiers to be changed frequently, which offers protection against correlation attacks. In most of use cases, it is not necessary for third parties to know the real identity of the user, or even be aware that the user is the same who used the system previously, it is enough to determine that the user has a right to perform some action (such as to deliver the package on behalf of the company or charge the electrical vehicle). 5.1. Tracking Food from Farm to Fork This pilot utilises a Consortium Ledger (CL) (private Ethereum) with smart contracts to record all the relevant data and meta-data related to the whole provenance chain from the farm to the supermarket. The members of the Consortium Ledger are the participants of the provenance chain (this if, for example, some of the produce is transported by other companies, another CL is formed) and a Legal Entity on a national or European level (association or public authority). Most of the measurements (temperature, soil conditions, humidity, etc.) are stored in private databases of IoT

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

89

platforms with hashes being frequently stored on CL. Aggregated data, such as average, maximum, and minimum temperature during the stage, is also stored on CL, along with all handover events between the participants (e.g., when a package is delivered by the transportation company to the warehouse). Finally, the hashes of CL transactions and data are periodically stored in public DLTs through interledger operations for extra accountability and transparency. DIDs are used to protect the identities of the involved employees as, for example, the warehouse accepting a package does not need to know the real identity of the truck driver, or even whether the driver is the same as yesterday, it is enough to know that the truck driver is authorised by the transportation company to deliver that package. 5.2. Grid Balancing with Electrical Vehicles In this pilot, the marketplace matching energy flexibility bids and offers operates on a private Ethereum blockchain, ensuring privacy (i.e., data cannot be read by external parties) and reducing transactions costs and times (i.e., mining is not required). Using SOFIE’s interledger capabilities, this “first layer” will be paired with a public DLT acting as a “second layer”, where the status of the private blockchain will be periodically synchronized, granting security and auditability, thus protecting the data stored in the first layer DLT from any alterations. The business logic for the requests and offers collection, and for the winning offer selection algorithm is coded in smart contracts, ensuring transparency and auditability of the whole process. In the current version of the marketplace, a smart contract implements an auction mechanism, in which the best offer is selected following the “lowest bidder” rule. In the upcoming versions of the marketplace, an upgraded version of the smart contract will consider a different matchmaking algorithm, based on the clearing price algorithm used in commodity trades. In addition, the smart meter readings are stored on blockchain to ensure transparency, and the blockchain will also contain data of electric vehicles, charging stations, and charging events. Such data will be used for payments by the DSO to the fleet manager and for rewarding the users (through tokens or discounts) in an automated manner. The pilot can be easily extended to include a retailer actor in charge of accounting, providing benefits to the two main actors involved: the DSO benefits of the grid stability provided and the fleet manager can reduce the overall charging costs to be paid to the retailer thanks to the incentives awarded by the DSO. The privacy of the electrical vehicle users can be further enhanced with DIDs to protect their privacy against the DSO or the CS Owner, who do not need to know the real identity of the user charging the vehicle [18][19]. 5.3. Context-aware Mobile Gaming The mobile gaming pilot utilises a permissioned DLT (Hyperledger Fabric) to store ownership information and ’DNA’ of in-game assets, enabling transparency and consistency of asset attributes and ownership changes. This also enables verification of the asset’s rarity, since new assets cannot be created in secret. For the actual asset trading, this ledger would be interconnected with either cryptocurrencies (such as public Ethereum), or other payment methods. The pilot will also use other DLTs to store information and relevant transactions (such as authorisations for accessing IoT

90

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

resources) related to advertisement views, IoT beacons, and other IoT devices that interact with the player. DIDs are used to protect players’ privacy. For example, when the player needs to perform some task near the IoT beacon to collect the reward, the player registers with the entity running the challenge (which can be a gaming company or other party) with pseudonymous or anonymous identifiers X and Y. After the user completes the task, this event is recorded to the “IoT beacon ledger” using player’s identifier X. An interledger function written by the challenge designer is monitoring this ledger and when such event occurs, it triggers (perhaps after some random delay to prevent correlation attacks) the ownership change in the “Gaming asset ledger” granting asset ownership to the identifier Y. In this way parties monitoring the first ledger will not be able to know what kind of reward the player has received for the completion of the task, while parties monitoring the asset ledger wil not know which event triggered the ownership change of the asset. 5.4. Decentralised Energy Data Exchange As with the previous pilots, SOFIE enables strong auditability while preserving user’s privacy through usage of DLTs and DIDs. All authorisation related messages concerning smart meter data are signed with the KSI blockchain. No smart meter data is handled by the SOFIE framework, the data is stored by the data owner or in the data hub. For the actual data exchange, a secure communication channel is created between the participants. SOFIE’s semantic representation functionality is used to describe available datasets. 6. Related work Some existing approaches for solving the IoT interoperability problem rely on creating a new interoperability layer, which is not feasible in most cases, since it requires making changes to the existing IoT platforms. Other approaches, including BIG IoT [20], aim to allow interoperability between IoT systems through an API and Marketplace; however the proposed marketplace is designed to be centralized, limiting its applicability and flexibility. WAVE [21] provides a decentralised authorisation solution for IoT devices using a private Ethereum blockchain and smart contracts, however it assumes that all IoT devices are able to interact with the blockchain, which is not a feasible assumption for many constrained devices. There are also application-specific approaches utilizing DLTs for, e.g., energy trading [22][23][24]. They often utilise tokens issued by a single party as currency, which can lead to speculation and harm the actual users of the system. While cryptocurrency was the original use case of blockchains, it is important to use separate DLTs for payments and for other use cases, such as asset tracking, logging, etc. In this way, price fluctuations of the cryptocurrency will not affect the cost of, e.g. recording asset ownership changes. Furthermore, performance limitations and transaction cost issues associated with public, permissionless DLTs typically used for cryptocurrencies will not limit other uses of DLTs in IoT systems, which need to be responsive and highly scalable. Therefore, the existing work does not fully address the need for an open, secure, and decentralised solution for the IoT interoperability problem, which supports existing IoT platforms, enables new open business models, while taking security and privacy into account.

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

91

7. Discussion In order to enable real, efficient, and secure IoT interoperability, the SOFIE approach relies on multiple, appropriate, distributed ledgers glued together using interledger technologies, to provide openness, decentralization, trust, security, privacy, automation, and auditability. Openness may be undesirable for (myopic) individuals or businesses, but when it comes to the whole society, openness becomes beneficial (or even critical), especially from an economic and business perspective. Openness enables inclusion, preventing powerful players from excluding new entrants (either directly or by creating entrance barriers that are hard to overcome). Two key ingredients of openness are decentralization and trust. DLTs are a successful example of a decentralized system that is robust and does not have a single (or few) point(s) of failure. Some DLTs even rely on distributed governance, thus allowing the evolution of the rules from within the system. Similarly, DIDs facilitate decentralized trust management, rendering overlay applications more secure and also more usable. Security is of paramount importance for SOFIE and the IoT. For this reason, SOFIE does not focus only on security issues at the level of each individual IoT system (which by itself is critical, since the IoT is bridging the cyber with the physical world, therefore security breaches can lead to major safety issues), but it also considers end-to-end security at the level of the whole system, including the interfacing mechanisms and components, which may be even more susceptible to attacks. SOFIE defence mechanisms consider both internal and external threats, as well as threats originating from interconnected systems, which need to be provided controlled access. A federation system that includes various actors and even expands across the borders of a single country creates challenging privacy issues, since the privacy policies that govern user data depend on many entities and possibly many jurisdictions, legal systems, and rules. Privacy preservation becomes even more challenging when public DLTs are involved, since not only do all parties have access to all information, but replication makes information access easier, and immutability facilitates correlation. Moreover, since data stored in public DLTs never disappears, future advances in de-anonymisation techniques could compromise currently anonymised data. On other hand, open systems facilitate verifiability and auditability, hence there is a critical trade-off. SOFIE tries to get the best of the two worlds by leveraging pseudonymous, self-sovereign identifiers, such as DIDs, which can be frequently changed. Automation in SOFIE is provided through smart contracts, which enable automation in a reliable, available, secure, and decentralised manner. For instance, in order to support openness and privacy in access to data and actuation, an automatic process is required to control the access, perhaps complemented by an associated payment mechanism (which can be provided by cryptocurrencies). Using SOFIE’s Federation Adapters, diverse IoT platforms can be integrated into the SOFIE framework, without any modifications to the platforms. All integrated systems can then benefit from the intriguing features of the federation approach, including increased functionality and privacy. SOFIE’s federation approach enables interesting extensions to its four real-life pilots. For example, in-game assets could be provided as a reward for providing energy consumption measurements and could be used for paying for electrical vehicle charging. Similarly, “ethical” producers could be rewarded with cheaper and cleaner energy. All pilots are currently being implemented and tested and the results will be presented in future publications.

92

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

8. Conclusions This chapter described how SOFIE utilises interledger and Distributed Ledger Technologies (DLTs) for providing interoperability between IoT platforms, while providing strong security, auditability, and privacy. This work has shown that using DLTs and interledger approaches allows more flexible co-operation among various parties in multiple use cases, such as a food supply chain, electricity grid load balancing, context-aware mobile gaming, and secure smart meter data exchange. The SOFIE solution is tested in four real-life pilots, which also raise interesting cross-pilot interactions. In the longer term, this approach will also enable open data markets and allow the creation of new business models around IoT data. References [1]

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

[16]

M. Rauchs. Distributed Ledger Technology Systems - A Conceptual Framework. University of Cambridge Report, 2018, available at: https://www.jbs.cam.ac.uk/fileadmin/user_upload/research/centres/alternativefinance/downloads/2018-10-26-conceptualising-dlt-systems.pdf (accessed 4.2.2019). D. Reed et al. Decentralized Identifiers (DIDs) v0.13 - Data Model and Syntaxes. Draft Community Group Report, July 2019, available at: https://w3c-ccg.github.io/did-spec/ (accessed 31.7.2019). N. Stifter, A. Judmayer, P. Schindler, A. Zamyatin, and E. Weippl. Agreement with Satoshi - On the Formalization of Nakamoto Consensus. Cryptology ePrint Archive, Report 2018/400, 2018. N. Fotiou and G.C. Polyzos. Smart Contracts for the Internet of Things: Opportunities and Challenges. European Conference on Networks and Communications (EuCNC), Ljubljana, Slovenia, June 2018. E. Androulaki et al. Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains. Eurosys 2018, Porto, Portugal, April 2018. V.A. Siris, P. Nikander, S. Voulgaris, N. Fotiou, D. Lagutin, and G.C. Polyzos. Interledger Approaches. IEEE Access, Vol. 7, pp. 89948–89966, July 2019. C. Allen. The Path to Self-Sovereign Identity. April 2016. Available at: http://www.lifewithalacrity.com/2016/04/the-path-to-self-soverereignidentity.html (Accessed 18.12.2018). Blockchain and Identity: Projects/companies working on blockchain and identity. Available at: https://github.com/peacekeeper/blockchainidentity (Accessed 7.11.2018). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). M. Sporny, D.C. Burnett, D. Longley, and G. Kellogg. Verifiable Credentials Data Model 1.0 Expressing verifiable information on the Web. W3C Proposed Recommendation, August 2019. Available at: https://w3c.github.io/vc-data-model (Accessed 31.7.2019). T. Bragatto et al. Statistical Analysis of Prosumer Behaviour in a Real Distribution Network Over Two Years. IEEE EEEIC/I&CPS 2018. T.O. Olowu, A. Sundararajan, M. Moghaddami, and A.I. Sarwat. Future challenges and mitigation methods for high photovoltaic penetration: A survey. Energies, vol. 11, no. 7, 2018. S. Rahman, H. Aburub, M. Moghaddami, and A.I. Sarwat. Reverse Power Flow Protection in Grid Connected PV Systems. IEEE SoutheastCon, 2018. Y. Kortesniemi, et al. SOFIE Deliverable D2.5 - Federation Framework, 2nd version. August 2019, available at: https://media.voog.com/0000/0042/0957/files/SOFIE_D2.5Federation_Framework%2C_2nd_version.pdf D. Lagutin, Y. Kortesniemi, N. Fotiou, and V.A. Siris. Enabling Decentralised Identifiers and Verifiable Credentials for Constrained Internet-of-Things Devices using OAuth-based Delegation. Workshop on Decentralized IoT Systems and Security (DISS 2019), in conjunction with the NDSS Symposium 2019, San Diego, USA, February 2019. N. Fotiou, V.A. Siris, S. Voulgaris, and G.C. Polyzos. Interacting with the Internet of Things using Smart Contracts and Blockchain Technologies. 11th SpaCCS, Melbourne, Australia, December 2018.

D. Lagutin et al. / The SOFIE Approach to Address the Security and Privacy of the IoT

93

[17] N. Fotiou, V.A. Siris, G.C. Polyzos, and D. Lagutin. Bridging the Cyber and Physical Worlds using Blockchains and Smart Contracts. Workshop on Decentralized IoT Systems and Security (DISS 2019), in conjunction with the NDSS Symposium 2019, San Diego, USA, February 2019. [18] Y. Kortesniemi, D. Lagutin, T. Elo, and N. Fotiou. Improving the Privacy of Internet of Things with Decentralised Identifiers (DIDs). Journal of Computer Networks and Communications, 2019. [19] A. Antonino. A Privacy-preserving approach to grid balancing via scheduled electric vehicle charging. Master’s thesis, Aalto University, September 2019. [20] BIG IoT - Bridging the Interoperability Gap of the Internet of Things, available at: http://big-iot.eu/ (accessed 6.2.2019). [21] M.P. Andersen et al. WAVE: A Decentralized Authorization System for IoT via Blockchain Smart Contracts. University of California at Berkeley Technical Report UCB/EECS-2017-234, 2017, available at: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-234.pdf (accessed 4.2.2019). [22] LO3 Energy, available at: https://lo3energy.com/ (accessed 5.2.2019). [23] Grid Singularity, available at: http://gridsingularity.com/ (accessed 5.2.2019). [24] SolarCoin, available at: https://solarcoin.org/ (accessed 5.2.2019).

94

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved. doi:10.3233/AISE200007

Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems ´ Ioana-Domnina CRISTESCU a,1 and Jos´e Gin´es GIMENEZ MANUEL b,2 and b,3 Juan Carlos AUGUSTO a TAMIS Team INRIA, Rennes, France b Research Group on Development of Intelligent Environments Department of Computer Science, Middlesex University, London, UK Abstract. Ambient Assisted Living systems aim at providing automated support to humans with special needs. Smart Homes equipped with Internet of Things infrastructure supporting the development of Ambient Intelligence which can look after humans is being widely investigated worldwide. As any IT based system, these have strengths and also weaknesses. One dimension of these systems developers want to strengthen is security, eliminating or at least reducing as much as possible potential threats. The motivation is clear, as these systems gather sensitive information about the health of an individual there is potential for harm if that information is accessed and used by the wrong person. This chapter starts by providing an analysis of stakeholders in this area. Then explains the IoT infrastructure used as a testbed for the main security analysis methods and tools. Finally it explains a process to assess the likelihood of certain vulnerabilities in the system. This process is mainly focused on the design stage of a system. It can be iteratively combined with development to inform a developing team which system architectures may be safer and worth given development priority. Keywords. Ambient Assisted Living, Attack Tree, IoT Model Checking

1. Introduction Ambient Assisted Living systems [1] have been developed for several years already. However being such a complex combination of technologies and having such a potential impact in humans lives, require extra care in design and development. One important advantage of such systems is that they provide an extra layer of care to people, especially when for circumstantial reasons better care is not available. So for example, older people prefer to live in their own independent space for as long as possible, however as they age more special care and precautions are needed and may be there are no other humans who can provide appropriate support at times. Systems which care raise alerts during emergencies are then useful. Also other, subtler, assistance is equally important. For 1 Ioana-Domnina

Cristescu. E-mail: [email protected] Gin´es Gim´enez. E-mail: [email protected] 3 Juan Carlos Augusto. E-mail: [email protected] 2 Jos´ e

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

95

example, users starting to experience cognitive decline gradually start living out of synch with healthy life rhythms and some phenomena such as day-night misalignments and sun down syndrome can be observed in some cases [2,3]. Collecting fundamental life style patterns is useful to predict, advice, anticipate, and in some cases being able to react to emergencies saving time and reducing the negative effects of acute ill-health situations. Often the best person to assess the lifestyle information is outside the place where the information is gathered. For example, a person may be looked after by a smart home and those who need to have access to the system diagnosis on whether a change on medication led to better quality of life may be placed at a healthcare organization. Being able to securely transfer such sensitive information is an important part of the system. Current systems include sophisticated mechanisms to transport information securely from A to B, for example by using sophisticated security protocols including complex encryption mechanisms. The weakest links at this point in history are the participation of humans in the process (e.g., how do we know the person reading the information at a hospital is the one the data is intended for) and also the weak security mechanisms in various satellite technologies (e.g., Internet of Things gadgets). This chapter explores the perception of users about the security of healthcare information being collected in domestic environments and transported to another environment for processing. First we show the results of questionnaires we run with various stakeholders. Then we explain the infrastructure which has been used to test a system prototype to aid the design of safer systems. Lastly a modelling system is illustrated showing how the possibilities of vulnerabilities can be assessed in a given system, in this case illustrated with the infrastructure previously described.

2. Stakeholders’ Perceptions As part of our research project we have routinely gathered stakeholders’ perceptions, focusing on three main groups of them: system developers (S), healthcare professionals (H), and technology end users (U). We gathered opinions through an online questionnaire at various events and stakeholders workshops totalling 48 respondents (S:14, H:14, U:18, Others:2). Some questions were aimed at all respondents whilst other questions where aimed at specific stakeholder categories. There was a mix of multiple choice and open questions. The following questions were addressed to end users only and from the 18 respondents in this category they clustered themselves as follows for each of the questions “How much do you know of the internal working of the security mechanisms applied to your data?”: 61% chose ‘not enough’, 17% chose ‘enough’ and 22% chose ‘a considerable amount’. “How much would you like to know of the internal working of the security mechanisms applied to your data?”: 5% chose ‘not much or nothing’, 40% chose ‘just enough’ and 55% chose ‘a considerable amount’. “How much are you prepared to let the system know about yourself if that translates into greater security for your information?”: 64% chose ‘very little’, 36% chose ‘a lot’. “What part of your private information would you be prepared to disclose if that would guarantee a better monitoring for your specific condition?”: 33% chose ‘Personal information: name, DOB, Phone number’, 0% chose ‘Current location and activity (use

96

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

of electronic equipment, e.g. phone)’ and 47% chose ‘Anonymized medical data but including age, weight, height, etc.’, and 20% chose ‘None of the above unless I can see and understand how and where it is used’. Amongst the questions aimed at all respondents we collected the following: Question 12: “In a scale of 0-10 how high do you rate the security of the information you transfer through the tools which are most important to your specific work?” Question 13: “In a scale of 0-10 how high do you rate the level of security provided by the tools which are most important to your specific work?” Question 14: “In a scale of 0-10 how high do you rate the user-centred flexibility of the security mechanisms that these tools offer which are most important to your specific work?”

(a) Answers to Question 12.

(b) Answers to Question 13.

(c) Answers to Question 14.

Figure 1. Statistic extracted from the survey. The x-axis represents the percentage of participants who chose a value of proposed scale from 1 to 10 (Y-axis).

Some take away messages of the results above indicate the reluctance of users to share much personal information and the distrust on systems and tools handling their personal data as well as the unfriendliness and lack of transparency of the tools they rely on. 3. Pilot infrastructure A pilot prototype was developed to create a real system wherein security concerns can be tested by different tools. The pilot deployment was carried out within the Smart Spaces lab at Middlesex University. Part of the Smart Spaces lab is set up as a smart room for experiments within IoT and for use of the Research GrOup On Development of Intelligent EnvironmentS. Figure 2 shows an accurate map of the lab with hardware elements installed inside as server and sensors used, smart hub and processing unit.

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

97

Figure 2. Lab map and hardware distribution.

3.1. System architecture The approach to design the pilot architecture is based on create a smart home which manages sensitive user’s information. This simulates a technological healthcare indoor environment wherein the security concerns can be audited. The Pilot background is based on indoor user’s activity recognition focus on dementia such as [4] shows. In addiction, the user can provide personal health input through a mobile [5] such as blood rate using a smart watch. Figure 3 shows the Pilot architecture and next sections describe each element in more detail.

Figure 3. Initial Pilot 1 architecture

98

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

1 - The house sensing environment: A - Z-wave sensing network. B - Vera Hub. 2 - Processing module C - MReasoner tool D - MReasoner’s database. 3 - Main Server E - MySQL Database. F - Web server and PHP API RESTful. 4 - Mobile Enviroment G - Android mobile APP 3.1.1. Sensing environment Sensing environment is the component which collects information from the house related to user’s actions (e.g.: opening doors, switching on lights, etc.) or environment (e.g. temperature in a room, humidity or quantity of light) by using sensor devices. This element consists in a set of sensors distributed around the house and an smart hub which manages them. The smart hub installed is a Vera Plus model which uses its own wireless Z-Wave network to manage the sensors installed in the lab. The Z-Wave security implementation features 128-bit encryption. Vera does not use a database but it stores devices configuration and properties in JSON files and also writes in a non-persistent log the information from sensors, e.g. the change of state of devices. JSON files and log can be queried by external elements such as processor module (reasoner in figure 3) through 88 port by using HTTP protocol. The installed devices range from motion sensors which can detect movement in a place; switchers which informs about whether the light is on/off; energy sensors which can give information about the appliance plugged; pressure sensors placed on bed or chairs detect whether someone is sitting on it and reed sensors which are installed in doors, windows, cupboards, wardrobe and fridge door reporting if a door is open or close. Figure 2 shows a precise picture of sensors’ location in the lab. 3.1.2. Processor module The Processing Module (PM) requests the sensing information collected by Vera. PM consists of temporal reasoning tool (MReasoner [6]) which infers sensors states and extracts logical conclusions about the user’s context. To illustrate what represents user’s context an example used in this project is the Activity Daily Living (ADL) recognition. These activities such are eating, sleeping, bathing, cooking or dressing are carrying out in the house by the user. PM can determine the activity being performing with some degree of verisimilitude. This activity recognition task is valuable in health environments such as in houses of people with cognitive decline or dementia wherein the primary user (PU), the person living with dementia, is monitored. The gathered information about ADLs is related to when the activities occur and how long the user spends doing them. This information evidences behavioural patterns and deviations which can indicate an impairment in user’s cognitive capacities. Beside professionals can use this information for user’s evaluation, these systems provide efficient real-time monitoring which supports secondary users (SU), such as caregivers, in their supervision tasks over patients [4] allowing them to take action in critical situations such as user’s falls.

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

99

3.1.3. Main server The lab server (main server) plays the role of a normal server hosts in the cloud. It is in charge to store the user’s health information to be accessible by doctors or other persons involved, as well as by the own user, from any place and device. The basic server configuration is similar to standard cloud servers. Thus, all connection with the server from external devices are through HTTPS protocol hence those connection using other protocols are refused. The web server manages a CRUD RESTful API developed in PHP. It provides the layer to retrieve information from MySQL database. The API implements register and login procedures using SHA1 password encryption, therefore, a user needs to authenticate in the system to reach sensible data. The API also manage sessions, cookies and other mechanisms related to security process such as blocking the user account, is user exits, after three login attempts which avoids brute-force attacks. 3.1.4. APP mobile This component represents a direct input from users. They can connect to the cloud server sending and receiving data. The actions available are registration, log in, request and send information. The mobile interface displays different Graphical Interfaces (GUI) according to the user’s role varying the available actions. Figure 4 shows the main menu of both roles after the login process.

(a) Primary user main menu.

(b) Secondary user main menu.

Figure 4. Mobile APP interface.

100

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

4. A case study Common processes in client-server applications are registration and log-in. These mechanisms play a crucial feature on security due to it grants access to users’ information. The absence or deficiency of these procedures can report undesirable situations. In this sort of environment an unauthorized person can access to user’s personal information and misusing it without the owner approval and unknown aim. We think it is important not just to analyse this process from the software point of view but also to show the users the risk associated with their behaviours and personal actions in the system. The current pilot provides a registration system in the server allowing users to create an account based on their unique email and password chosen by them. A registered user is associated with a role which can be a primary user (PU) or a secondary user(SU). Depending on the assigned role a user will be able to access different information after log in. In this pilot the PU represents the person which is sending personal data to the server, either using the mobile or through the house (sensing environment). Thus, a logged PU can send data directly to the server and saved them but also can delete the account, delete data and give or withdraw grants to SU for access PU data. A SU represents doctors, caregivers or relatives interested in accessing PU’s information. Initially, SU does not have access to any data. Thus, the first action available for a logged SU is to request authorization for access to PU’s information. Once the PU permits SU access, SU can visualize user health information gathered whilst the PU does not revoke the access. Next section describes the transformation process from a probabilistic Pilot IoT model to SBIP system which evaluates the probability of an attack within the proposed context. 5. Modelling and evaluating a Pilot attack Figure 5 is a graphical representation of the Pilot formal model described in the following sections. From this model we develop the security analysis based on the framework extensively described in [7]. The notation used is based on BIP language [8,9] to model the IoT paradigm in a simple way. 5.1. Normal System, without an Attacker The Primary User (PU) has two threads: a first one, called sensorData, is the user’s connection with the sensing environment or Equipment (Eq). Thus sensorData is the set of all atomic relations between PU and Eq. The second thread, called giveAuth, is used when a Secondary User (SU) asks permission to access the PU’s data on the cloud.

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

101

Figure 5. The SmartHome model. The green lines represent normal communications, the blue ones communications with the Attacker and the red dotted lines are leaks occurring in the system. sensors

enterRoom =PU −−−−→ Eq on

sensors

sensing =PU −−−−→ Eq data

sensors

exitRoom =PU −−−−→ Eq off

sensorData =enterRoom.sensing.exitRoom.sensorData speaking

(1)

https

giveAuth =PU ←−−−− SU.(PU −−−−→ SU.giveAuth + τ.giveAuth) authPU

PrimaryUser =sensorData + giveAuth

(2)

A Secondary User has to first get an authorization from the Primary User to access its data. It can then access the data stored on the cloud. Note that the protocol used for accessing the Primary User’s data requires an authorisation. Afterwards, either the user logs out, or it is logged out by the system after a timeout.

102

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems speaking

getAuthorisation =SU −−−−→ PU getAuth

timeout =τ

(3) mobile-https

mobile-http

queryCloud =SU −−−−−−→ Cloud.SU ←−−−−−− Cloud. credentials

mobile-https

mobile-https

SU −−−−−−→ Cloud.SU ←−−−−−− Cloud. info request

mobile-https

(SU −−−−−−→ Cloud.queryCloud + timeout.queryCloud) logout

SecondaryUser = getAuthorisation.queryCloud

(4)

The Equipment, consisting of the sensors in the house, are forwarding all data captured to the data storage unit. sensors

ssh

Equipment =Eq ←−−−− PU.Eq −−→ DS.Equipment data

(5)

The Data Storage works as a server in the house. It receives and stores data, modeled by the action receiveRawData, and does some analysis on them to compile behaviour logs, which are then send on the cloud, using the action sendBehaviourLog. ssh

receiveRawData =DS ←− Eq https

sendBehaviourLog =DS −−−−−−−→ Cloud BehaviourLog

DataStorage =receiveRawData.DataStorage+ sendBehaviourLog.DataStorage

(6)

The Cloud receives behaviour logs from the data storage, shown in receiveBL, and provides an api for querying the data stored, modeled by queryAPI. https

receiveBL =Cloud ←−− DS mobile-https

(7) mobile-https

queryAPI =Cloud ←−−−−−− PU.Cloud −−−−−−→ PU. cookies

mobile-https

mobile-https

Cloud ←−−−−−− PU.Cloud −−−−−−→ PU. info

mobile-https

(Cloud ←−−−−−− PU.queryAPI + timeout.queryAPI) Cloud =receiveBL.Cloud + queryAPI

(8) (9)

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

103

5.2. System with an Attacker The Attacker has several lines of attack, modeled by the thread AChoice and shown in Figure 6. The attacker also collects leaks from the Primary and Secondary Users and from the Cloud.

https

attackCloud =A −−−−−→ Cloud.AChoice getEmail speaking

attackSecondaryUser =A −−−−−→ SU.AChoice getEmail mail

phishing =A −−−−−−−→ SU.AChoice getCredential speaking

attackPrimaryUser =A −−−−→ PU.AChoice getAuth

mobile-https

mobile-https

getSensitiveData =A −−−−−−→ Cloud.A ←−−−−−− Cloud. login

mobile-https

mobile-https

A −−−−−−−−−→ Cloud.A ←−−−−−− Cloud.AChoice get sensitive info

AChoice =[a1 ]attackCloud + [a2 ]attackSecondaryUser + [a3 ]phishing+ [a4 ]attackPrimaryUser + [a5 ]getSensitiveData collectPrimaryUser =A  PU.collectPrimaryUser collectSecondaryUser =A  SU.collectSecondaryUser collectCloud =A  Cloud.collectCloud Attacker =AChoice | collectPrimaryUser | collectSecondaryUser | collectCloud The Primary User in Eq.2 has a new thread, leakAuth, where it gives its authorisation to an attacker, through a social attack. speaking

leakAuth =PU ←−−−− A.([p1 ]PU −−−−−− A.giveAuth + [p2 ]τ.giveAuth) authPU

PrimaryUser =sensorData(Eq.1) + leakAuth The Secondary User in Eq.4 has two unsafe communications with the Attacker. It can choose between giving his email address, in the thread leakEmail. And it also has to choose between giving away his credentials or not, in the phishing attack modeled by

104

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

Figure 6. Attack Tree

leakCredential. For simplicity, we do not model the normal behaviour of the Secondary User when considering the attacks. speaking

leakEmail =SU ←−−−− A.([s1 ]SU −−−−−− A.leakEmail + [s2 ]τ.leakEmail) emailSU

mail

leakCredential =SU ←−− A. ([s3 ]SU −−−−−−−−− A.leakCredential + [s4 ]τ.leakCredential) credentialSU

SecondaryUser = leakEmail + leakCredential The Equipment in Eq.5 and Data Storage in Eq.6 have the same behaviours in the system under attack. The Cloud in Eq.9 has the same behaviour as before, except that it can also leak (or not) the email of the Secondary User. Note that the thread queryAPIAttacker is similar to the one in Eq.8, but an extra protection step is added, to make the attack more difficult to succeed.

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

Nb of Simulations 100 1000 10000 100000

Monte Carlo Result (×10−4 ) Time (s) 0 3,49 0 5,12 2,3 20,30 1,2 399,82

105

Importance Splitting Result (×10−4 ) Time (s) 1,20 4,69 1,43 8,17 1,50 48,62 out of memory

Figure 7. Experiments.

mobile-https

mobile-https

queryAPIAttacker =Cloud ←−−−−−− A.Cloud −−−−−−→ A. cookies

mobile-https

Cloud ←−−−−−− A.extraProtection mobile-https

extraProtection =[c3 ]Cloud −−−−−−−→ A.Cloud + [c4 ]τ.Cloud sensitive info

https

leakEmail =Cloud ←−− A. ([c1 ]Cloud −−−−−− A.leakEmail + [c2 ]τ.leakEmail) emailSU

Cloud =receiveBL(Eq.7) + queryAPIAttacker + leakEmail 5.3. Experiments In this section we use executions of an IoT system to evaluate the probability of an attack. In the process we transform the IoT model and the attack tree into two SBIP [9] files which are compiled into a BIP executable. The BIP simulation engine 4 runs these executables and interacts with Plasma library [10], the statistical model checker we used. We use the corresponding SBIP system and employ two statistical model checking (SMC) techniques [9]. First we use the Monte Carlo method which consists of sampling executions and then estimating the probability of an attack, based on the number of executions for which the attack was successful. However the Monte Carlo method requires a large number of simulations for a correct estimate of an event which occurs with probability 10−5 . The experimental framework we used does not scale well for a large number of simulations. We therefore employed importance splitting [11]. This technique is tailor-made for rare events, that is precisely those events that occur rarely in a simulation, and for which Monte Carlo does not scale. Figure 7 shows the results from running statistical model checking to estimate the probability of a successful attack. In the model we use for the experiments, a leak is five times less probable than an internal action. From the figure 7 we can infer the probability to be around 10−4 . We can see that Monte Carlo requires a larger number of simulations to estimate the probability of an attack, whereas importance splitting can estimate the probability using fewer simulations, and in less time. 4 http://www-verimag.imag.fr/BIP-Tools-93.html

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

106

5.4. Other technical details The protocols used are: • • • • • •

mobile-https verifies users’ authorisation; used in communication with the Cloud. speaking assumes physical proximity. sensors assumes physical proximity. https verifies that the right url is used to access the Cloud. ssh verifies that the data storage knows the equipment it receives data from. mail verifies that the attacker knows the email address of the user it tries to attack.

6. Conclusions This chapter presents a practical example of using a framework to model, understand and analyse the security risks in a real IoT solution. Models aimed to real attacks analysis are useful as an immediate tool for identifying the attacks to a system as part of its security audit. However, we also consider these models as a good way of making security and privacy risks transparent to users, covering stakeholders concerns as expressed in the survey. Attack trees provide a bigger and clearer picture of situations that can jeopardize personal users information. This understandable information allows developing strategies addressed to stakeholders concerns by improving their knowledge about a system and security measures taken. Also, the final calculated probability of a successful attack in a system section where the human component is included offers a quantitative measure which is understandable by the general public (high risk/low risk). We are aware that the proposed Pilot is not the most secure solution and it probably has many security breaches to be improved. This is because its design has been constrained, just like in a real scenario, by the available resources like limited funds, time, among others. Nevertheless, using the proposed security analysis in early design stages can be beneficial to reach an effective solution such as the case of study explained here. Although the chapter covers only one system attack scenario, the results show that there is a low attack probability in the proposed process. Thus, designers and developers can focus on analysing other procedures where risks could be higher. Hence, it can be said that this outcome allows them to allocate resources in other system modules where sensitive information is more exposed to potential harms. Thereby, the methods proposed in this chapter provide an understandable representation of the system risks that is useful for users and a quantitative analysis that is valuable for the developers.

References [1]

J. C. Augusto, M. Huch, A. Kameas, J. Maitland, P. J. McCullagh, J. Roberts, A. Sixsmith, and R. Wichert, eds., Handbook of Ambient Assisted Living - Technology for Healthcare, Rehabilitation and Well-being, vol. 11 of Ambient Intelligence and Smart Environments. IOS Press, 2012. [2] P. McCullagh, W. Carswell, J. Augusto, S. Martin, M. Mulvenna, H. Zheng, H. Wang, J. Wallace, K. McSorley, B. Taylor, and W. Jeffers, “State of the art on night-time care of people with dementia,” in Proc. of the Conf. on Assisted Living 2009. IET, London. [3] N. Wolkove, O. Elkholy, M. Baltzam, and M. Palayew, “Sleep and aging: Sleep disorders commonly found in older people,” vol. 176, no. 9, p. 12991304, 2007.

I.-D. Cristescu et al. / Assessing Vulnerabilities in IoT-Based Ambient Assisted Living Systems

[4]

[5] [6]

[7]

[8]

[9]

[10]

[11]

107

I. Lazarou, A. Karakostas, T. G. Stavropoulos, T. Theodorosa, G. Meditskos, I. Kompatsiaris, and M. Tsolaki, “A novel and intelligent home monitoring system for care support of elders with cognitive impairment,” vol. 54, no. 4, pp. 1561–1591, 2016. B. Reeder and A. David, “Health at hand: A systematic review of smart watch uses for health and wellness,” Journal of Biomedical Informatics, vol. 63, pp. 269 – 276, 2016. U. A. Ibarra, J. C. Augusto, and A. A. Goenaga, “Temporal reasoning for intuitive specification of context-awareness,” in 2014 International Conference on Intelligent Environments, pp. 234–241, June 2014. D. Beaulaton, N. B. Said, I. Cristescu, and S. Sadou, “Security analysis of iot systems using attack trees,” in Graphical Models for Security (M. Albanese, R. Horne, and C. P. Howar, eds.), Springer International Publishing, 2019. D. Beaulaton, N. B. Said, I. Cristescu, R. Fleurquin, A. Legay, J. Quilbeuf, and S. Sadou, “A language for analyzing security of iot systems,” in 2018 13th Annual Conference on System of Systems Engineering (SoSE), pp. 37–44, IEEE, 2018. S. Bensalem, M. Bozga, B. Delahaye, C. Jegourel, A. Legay, and A. Nouri, “Statistical model checking qos properties of systems with sbip,” in International Symposium On Leveraging Applications of Formal Methods, Verification and Validation, pp. 327–341, Springer, 2012. B. Boyer, K. Corre, A. Legay, and S. Sedwards, “Plasma-lab: A flexible, distributable statistical model checking library,” in International Conference on Quantitative Evaluation of Systems, pp. 160–164, Springer, 2013. C. Jegourel, A. Legay, and S. Sedwards, “Importance splitting for statistical model checking rare properties,” in International Conference on Computer Aided Verification, pp. 576–591, Springer, 2013.

108

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved. doi:10.3233/AISE200008

Construction of Efficient Codes for High-Order Direct Sum Masking 1 ¨ ˙I d Sihem MESNAGER e,c Claude CARLET a,e,2 Sylvain GUILLEY b,c Cem GUNER f ¨ and Ferruh OZBUDAK a University of Bergen, PB 7803, 5020 Bergen, Norway b Secure-IC S.A.S., 35510 Cesson-S´ evign´e, France c Telecom ParisTech, Institut Polytechnique de Paris, 75013 Paris, France d Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul, ˙ Turkey e Department of Mathematics, University of Paris VIII, 93526 Saint-Denis, France and University of Paris XIII, CNRS, LAGA UMR 7539, Sorbonne Paris Cit´e, 93430 Villetaneuse, France f Department of Mathematics and The Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey Abstract. Linear complementary dual (LCD) codes and linear complementary pairs (LCP) of codes have been proposed as counter-measures against side-channel attacks (SCA) and fault injection attacks (FIA) in the context of direct sum masking (DSM). Although LCD codes were introduced by Massey long ago for other reasons, there has been a renewed interest in coding theory community for these kinds of codes due to these new applications. It has later been observed that the counter-measure against FIA may possibly lead to a vulnerability for SCA when the whole algorithm needs to be masked (in environments like smart cards). This led to a variant of the LCD and LCP problems, where some partial results have been very recently obtained by the authors. This chapter reviews the coding theoretic problems and solutions related to the security problems mentioned. Keywords. Linear complementary dual codes, linear complementary pair of codes, direct sum masking, side channel attacks, fault injection attacks, security of Internet of Things (IoT).

1. Introduction Due to the proliferation of embedded devices such as mobile phones, smart cards, smartwatches, small hardware units, there is an increasing demand for the secure transmission of confidential information in situations where cyber and physical attacks on the im¨ ¨ ˙ITAK project 215E200, which is associated with the and Ozbudak are supported by the TUB SECODE project in the scope of the CHIST-ERA Program. Carlet, Guilley and Mesnager are also supported by the ANR CHIST-ERA project (https://secode.enst.fr/) SECODE (Secure Codes to thwart Cyber-physical Attacks). 2 Corresponding Author: Claude Carlet. University of Bergen, PB 7803, 5020 Bergen, Norway and Department of Mathematics, University of Paris VIII, 93526 Saint-Denis, France, and University of Paris XIII, CNRS, LAGA UMR 7539, Sorbonne Paris Cit´e, 93430 Villetaneuse, France. E-mail: [email protected] 1 G¨ uneri

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

109

plementation of cryptographic algorithms are possible. This raises a challenge for these so-called Internet-of-Thing (IoT) devices, as they shall both be secure and efficient (in terms of power and size). Despite the fact that current standard cryptographic algorithms are proved to withstand so-called logical attacks (i.e. classical cryptanalyses), their hardware and software implementations have exhibited vulnerabilities to side-channel attacks (SCA) and fault injection attacks (FIA). Countermeasures to these two types of attacks exist but they are costly and hardly implementable on such small devices. Recently, direct sum masking (DSM) has been proposed in [4] as a countermeasure to protect the sensitive data stored in registers against both SCA and FIA, in a way which considerably reduces the cost (a whole Chapter of the present book is devoted to this countermeasure). This method uses linear codes, which operate in general on any alphabet of q symbols (where q is a prime number), and in particular on bits (case q = 2). A pair of linear codes (C, D) in Fnq is said to be a linear complementary pair (LCP [38]) of codes if C ⊕ D = Fnq . In other words, the codes intersect trivially and they have complementary dimensions. For an LCP of codes used in direct sum masking (DSM), the security parameter is defined as the pair (d(C), d(D⊥ )). The system can detect up to d(C) − 1 injected faults and can resist a side-channel attack up to order d(D⊥ ) − 1. The security parameter of an LCP (C, D) of codes has also been defined as min{d(C), d(D⊥ )} in some articles, to assure the minimal level of guaranteed security against both attacks. Obviously, one can also revise the security parameter as the pair (d(C) − 1, d(D⊥ ) − 1) to directly refer to the FIA and SCA counter-measures that it yields. If D = C⊥ (i.e. C ⊕ C⊥ = Fnq ), then we call C a linear complementary dual (LCD) code. Note that in this case, there is only one security parameter, which is d(C). (Euclidean) LCD codes were introduced by Massey in [31] long before the recent cryptographic interest introduced by Carlet and Guilley [5]. Massey characterized LCD codes in terms of their generator matrices and also showed that the family is asymptotically good. It was later shown by Sendrier in [35] that LCD codes meet the Gilbert-Varshamov bound. Yang and Massey characterized LCD cyclic codes in terms of the generator polynomial of the code ([37]). Until the recent boost of interest, these were the main results on LCD codes. By the cryptographic motivation explained above, some of the main coding problems on LCD and LCP of codes have been: • Characterization, classification and generalization, • Study of LCD codes and LCP of codes in special code families, • Special construction methods of LCD codes and LCP of codes, especially over small fields. In addition to these general problems, it has recently been observed that the countermeasure against FIA in the DSM scheme could lead to a weakness for security against SCA in some specific environments. This led to a variant of the LCD and LCP problem on the codes side, where one has to lengthen the two codes used in DSM but at the same time preserve the parameters of the original pair as much as possible. In this chapter, we present recent findings on each problem described above. We start with the fundamental results due to Massey, Sendrier and Carlet-Guilley in Section 3. This is followed by an analysis of LCD and LCP subclasses in cyclic codes and some of their generalizations, such as nD cyclic (Abelian) codes and quasi-cyclic codes in Section 4.3. Section 4.4 presents a construction for obtaining LCD/LCP examples via a

110

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

descend from extension fields to small fields. Concatenation is the main idea used in this technique. Finally, Section 5.1 presents the most recent work where the coding problem is slightly modified to address the potential weakness of codes for SCA. The reader is assumed to have familiarity with the basic notions of coding theory. Moreover, Fq will always denote the finite field with q elements, where q is a prime power. If a specific finite field is in consideration, the number of elements will be clear from the subscript.

2. Preliminaries 2.1. Dual of a linear code Given a linear code C of length n over Fq (resp. Fq2 ), its Euclidean dual code (resp. Hermitian dual code) is denoted by C⊥ (resp. C⊥H ). The codes C⊥ and C⊥H are defined as: C⊥ = {(b0 , b1 , . . . , bn−1 ) ∈ Fnq :

n−1

∑ bi ci = 0, ∀(c0 , c1 , · · · , cn−1 ) ∈ C},

i=0

C⊥H = {(b0 , b1 , . . . , bn−1 ) ∈ Fnq2 :

n−1

∑ bi cqi = 0, ∀(c0 , c1 , . . . , cn−1 ) ∈ C},

i=0

respectively. A code C is said to be self-dual if C = C⊥ . Self-dual codes are among one of the most well-studied classes in coding theory, which have beautiful relations to other mathematical theories such as invariant theory and lattices, as well as applications to, for instance, quantum codes (see [33]). 2.2. MDS codes Let C be a linear code of length n and dimension k over a finite field Fq , where q = pm and p is a prime. The code C is called an [n, k] linear code over Fq . The minimum distance d of a linear code C is bounded by the Singleton bound d ≤ n + 1 − k. If d = n + 1 − k, then the code C is called maximum distance separable (MDS). For given length n and dimension k, the MDS codes are those with the greatest error correcting capability. A class of these MDS codes is the one of the so-called Reed-Solomon codes, which is of great importance in modern industrial applications.

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

111

2.3. Cyclic codes We start by briefly recalling cyclic codes. A linear code C over Fq of length n is said to be cyclic if it is closed under cyclic shift of coordinates: (c0 , c1 , . . . , cn−1 ) ∈ C =⇒ (cn−1 , c0 , . . . , cn−2 ) ∈ C. One can identify the space Fnq with the quotient ring R := Fq [x]/xn − 1 as follows: φ:

Fnq

−→

(a0 , a1 , . . . , an−1 ) −→

R n−1

∑ a jx j.

(1)

j=0

The map φ is an isomorphism of Fq vector spaces and it takes a cyclic code in Fnq to an ideal in R. Hence, the “algebraic realization” of a cyclic code of length n over Fq is an ideal in R and has a unique monic generator polynomial g(x), which divides xn − 1. Dimension of C is n − deg g. It is easy to observe that the dual of a cyclic code is also cyclic. If deg g = k, then for h(x) = (xn − 1)/g(x), the dual cyclic code C⊥ has the generator polynomial h∗ (x) = h(0)−1 xn−k h(x−1 ). The polynomial h∗ (x) is called the reciprocal polynomial of h(x).

3. Fundamental Results LCD codes are the opposite extreme where the code and its dual only intersect trivially. Hence, from a mathematical point of view, it is very natural to study LCD codes. They were first introduced by Massey, the motivation being their application to the so-called two-user binary adder channel. Massey showed that the unique decodability problematic for this scenario is overcome if one uses an LCD code. Moreover, he showed that the nearest-codeword decoding problem for an LCD code reduces to a simpler problem (see [31, pages 340-342]). In the same paper, Massey proved the following. Theorem 1. ([31, Propositions 1 and 3]) i. Let G be a generator matrix for an [n, k] linear code C over Fq . Then C is LCD if and only if the k × k matrix GGT is nonsingular. ii. The family of LCD codes is asymptotically good. Let us note that an analogous characterization for an LCP of codes, in terms of the generator matrices of the two codes, is provided in [38, Proposition 1]. Later, Sendrier showed that LCD codes meet the asymptotic Gilbert-Varshamov bound ([35, Corollary 8]). Moreover, Sendrier also provided a formula for the proportion of [n, k] LCD codes among all [n, k] linear codes over Fq . This number is roughly 1 − 1/q when q grows to infinity ([35, p. 346]). So, one expects to come across LCD codes more often when the alphabet size is large. The following result of Carlet et al. sheds further light on the alphabet size for the frequency of LCD codes.

112

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

Theorem 2. ([13, Corollary 14]) For q > 3, an [n, k, d] LCD code over Fq exists if an [n, k, d] linear code over Fq exists. In order to prove this, Carlet et al. start with an [n, k, d] linear code over Fq and describe a monomially equivalent LCD code with the same parameters, when q > 3. Note that Theorem 2 also implies the asymptotic conclusions of Massey and Sendrier immediately. Let us remark that this result does not hold for binary and ternary codes (binary ones being the most interesting for applications). We refer to [1,2], where the best minimum distances for some moderate length binary and ternary LCD codes are determined. There are examples where the best minimum distance for the given length and dimension is less than the optimal linear code’s minimum distance for the given length and dimension. A conceptual explanation to why the result of Carlet et al. does not hold for q = 2 and 3 can be found in [19], where it is proved that monomial equivalence preserves the dimension of C ∩ C⊥ (the so-called hull dimension). Hence, one cannot expect to start with a non-LCD linear code and reduce the dimension of C ∩ C⊥ via monomial equivalence to reach an LCD code in some cases. Theorem 2 makes the determination of optimal binary and ternary LCD codes particularly interesting. In [5], Carlet and Guilley revisited the best know constructions of linear codes and adapted them to build LCD codes. They also showed how an LCD code over a finite field of characteristic 2 can be transformed into an LCD binary code. A construction which yields LCD and LCP of codes over small base fields will be presented in Subsection 4.4. Recently, interesting results have been found in [16] for binary and ternary LCD codes. More specifically, the authors have constructed binary and ternary LCD code as subfield subcodes of affine variety codes and deduced some new and good LCD codes.

4. Main Results on LCD codes and LCP of Codes The literature on LCD codes is overflowed since 2015. In this section we present the main results on those codes by highlighting firstly their (further) characterization and parametrization and secondly the special important cases of LCD MDS codes and cyclic LCD codes. The obtained results increase our knowledge of LCD codes. We also emphasize LCP of codes for which the literature is thin. We shall consider LCP of codes in special code families. 4.1. New characterization and parametrization of LCD Codes In [10], Carlet et al. have presented a new characterization of binary LCD codes in terms of their symplectic basis. A vector x = (x1 , x2 , . . . , xn ) in Fn2 is even-like if n

∑ xi = 0

i=1

and is odd-like otherwise. A code is said to be even-like if it has only even-like codewords, and is said to be odd-like if it is not even-like. For any v1 , . . . , vk ∈ Fn2 , Span{v1 , . . . , vk } denotes the linear subspace of Fn2 spanned by v1 , . . . , vk . We have the following main result.

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

113

Theorem 3. ([10, Theorem 3.1]) Let C be an odd-like binary code with parameters [n, k]. Then C is LCD if and only if there exists a basis c1 , c2 , . . . , ck of C such that for any i, j ∈ {1, 2, . . . , k}, ci · c j equals 1 if i = j and equals 0 if i = j. Theorem 3 shows that a binary odd-like code C is LCD if and only if C has an orthonormal basis. Furthermore, we have the second main result stated below. Theorem 4. ([10, Theorem 3.2]) Let C be an even-like binary code with parameters [n, k]. Then C is LCD if and only if k is even and there exists a basis c1 , c1 , c2 , c2 , . . . , c k , ck of C such that for any 2

i, j ∈ {1, 2, . . . , 2k }, the following conditions hold (i) ci · ci = ci · ci = 0; (ii) ci · cj = 0 if i = j; (iii) ci · ci = 1.

2

A basis of C satisfying the conditions (i), (ii), and (iii) in Theorem 4 is called a symplectic basis. Theorem 4 shows that an even-like code C is LCD if and only if C has a symplectic basis. Using such a characterization, Carlet et al. [10] have proved a conjecture proposed by Galvez et al. ([17]) on the minimum distance of binary LCD codes. Moreover, all the possible orbits under the action of the orthogonal group over the set of all LCD codes have been determined. Closed formulas for their size have been given and asymptotic results have been obtained, showing in particular that almost all binary LCD codes are odd-like as well as their duals. The same reference also studied the case of q-ary LCD codes, where q is a power of an odd prime, and showed that they all have orthonormal basis. 4.2. LCD MDS Codes MacWilliams and Sloane described in their book [30] MDS codes as “one of the most fascinating chapters in all of coding theory”. MDS codes are at the heart of combinatorics and finite geometries since MDS codes are equivalent to geometric objects called narcs and are particular cases of combinatorial objects called orthogonal arrays [30]. It is natural to consider codes which are at the same time LCD and MDS. One of the most important problems in this topic is to determine the existence of q-ary LCD MDS codes for various lengths and dimensions. This problem was solved in the Euclidean case for q even [26]. Before [11] appeared in 2018, only a few constructions of LCD MDS codes for odd characteristic were known, see e.g. [14,26,34]. Below we summarize known results on sufficient conditions for the existence of q-ary Euclidean LCD MDS codes of length n and dimension k when q is odd: (i) n = q + 1 and k even with √4 ≤ k ≤ n − 4 [26]; (ii) q is an odd square, n ≤ q + 1 and 0 ≤ n ≤ q + 1 [26]; (iii) q ≡ 1 mod 4 , 4 · 16n · n2 < q and 0 ≤ k ≤ n [26]; q+1 (iv) n| q−1 2 or n| 2 and 0 ≤ k ≤ n [34]; (v) n even with n|(q − 1) and even k with 0 ≤ k ≤ n [14]. Still, few results are known for Hermitian LCD MDS codes in odd characteristic. In [34] is given a class of q2 -ary Hermitian LCD MDS codes with length q − 1 and di-

114

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

mension q−1 2 . In [11] is studied the construction of Euclidean and Hermitian LCD MDS codes (classes of new Euclidean and Hermitian LCD MDS codes are obtained) and presented some secondary constructions of LCD codes, using linear codes with small dimension and codimension, self-orthogonal codes and generalized Reed-Solomon codes. It is additionally proved that, for any q > 3 and 0 ≤ k ≤ n ≤ q + 1, there exists a q-ary [n, k] Euclidean LCD MDS code. In [3] are used tools from algebraic function fields and is presented an explicit construction of several classes of LCD MDS codes, in particular for the odd characteristic case. Though the problem of classifying LCD MDS codes was completely settled by Carlet et al. in [11] and [13], the algebraic geometry framework in [3] provides tools that could be used to push further the analysis of when a generalized Reed-Solomon code is an LCD code. 4.3. LCD and LCP of Cyclic Codes and Generalizations of Cyclic Codes 4.3.1. Cyclic Codes In [7], the following characterization of LCP of cyclic codes has been given. Theorem 5. ([7, Theorem 2.1]) Let C and D be q-ary cyclic codes of length n with the generator polynomials g(x) and u(x), respectively. Then (C, D) is LCP if and only if u(x) = (xn − 1)/g(x) and gcd(u(x), g(x)) = 1. Remark 6. Characterization of LCD cyclic codes was first given in [37]. Yang and Massey proved that a cyclic code C with the generator polynomial g(x) in R is LCD if and only if g(x) is self-reciprocal (i.e. g∗ (x) = g(x)) and all the monic irreducible factors of g(x) have the same multiplicity in g(x) and in xn − 1. It is an easy exercise to show that Theorem 5 yields Yang-Massey result as a special case. In [28], C. Li, C. Ding and S. Li have constructed several families of LCD cyclic codes over finite fields and analyzed their parameters (with a well rounded treatment of LCD cyclic codes, in particular, LCD BCH cyclic codes), including the family of LCD cyclic codes of length n = q + 1 over Fq . The LCD cyclic codes derived in their paper have very good parameters in general, and contain many optimal codes. In [29], Li et al. have explored two special families of LCD cyclic codes, which are both BCH codes. The dimensions and the minimum distances of these LCD BCH codes have been investigated. Regarding the security parameter of LCP of cyclic codes, an observation has been given in [7] . If g(x) = g0 + g1 x + · · · + xk is the generator polynomial of C then, by Theorem 5, the dual D⊥ of the complementary cyclic code D is generated by k −1 g∗ (x) = g−1 0 x g(x ).

Hence, generator matrices of C and D⊥ are as follows: ⎛ g 0 g1 . . . 1 0 ⎜ 0 g 0 g1 . . . 1 GC = ⎜ .. ⎝ ... . 0 . . . 0 g 0 g1



... 0 ...⎟ ⎟ .. ⎠ . ... 1

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking



1 gk−1 . . . g1 g0 0 0 1 gk−1 . . . g1 g0 ⎜ ⎜ GD⊥ = g−1 .. 0 ⎝ .. . . 0 ... 0 1 gk−1

115



... 0 ... ⎟ ⎟ .. ⎠ . . . . g1 g0

Codes generated by these matrices are equivalent (up to a nonzero scalar multiplication in each coordinate) under the coordinate permutation that sends the ith coordinate to the (n − 1 − i)th coordinate (for 0 ≤ i ≤ n − 1). Therefore, we have the following: Theorem 7. If (C, D) is a q-ary cyclic LCP of codes, then C and D⊥ are equivalent codes. Hence, d(C) = d(D⊥ ). In other words, when cyclic LCP of codes are used for DSM, the security against both FIA and SCA are the same. Hence, finding the best cyclic code and the most secure cyclic LCP of codes for DSM are equivalent problems. In [25] are constructed some cyclic Hermitian LCD codes over finite fields and analyzed their parameters. Most Hermitian LCD codes presented in his paper are not BCH codes. 4.3.2. Generalizations of Cyclic Codes Cyclic codes have been generalized in various ways and these generalizations have been justified by providing good code examples or relaxed parameters. In the context of LCD codes and LCP of codes, quasi-cyclic and 2D cyclic codes (two particular generalizations of cyclic codes) were studied in [7]. Later, nD cyclic codes (Abelian codes) were addressed from the same perspective in [23]. These are some of the well-established generalizations of cyclic codes. We refer the reader to these two papers and the references therein for a detailed description of these code families. We choose to be very brief here and present the algebraic descriptions of these code families. Let m1 , . . . , mn be positive integers all of which are relatively prime to q. Consider the quotient ring Rn = Fq [x1 , . . . , xn ]/x1m1 − 1, . . . , xnmn − 1 . An ideal in Rn is called an nD cyclic code. It is an Fq -linear code of length m1 · · · mn with various symmetries (coming from the ideal structure). Note that n = 1 amounts to a cyclic code, which is a length m1 linear code over Fq with the cyclic shift automorphism. Moreover, an R1 -module in R2 is called a quasi-cyclic (QC) code of length m1 m2 and index m2 . Let us emphasize that an equivalent description of such a QC code is that it is m2 2 an R1 -module in Rm 1 . This is due to the identification one can write between R1 and R2 . LCD QC codes are thoroughly studied in [24], where it is in particular shown that LCD QC codes are asymptotically good ([24, Corollary 3.8]). LCP of QC codes are characterized in [7, Theorem 3.1] in terms of their, so-called, constituent codes. Importantly, it is also observed that the analogue of Theorem 7 does not hold for an LCP of QC codes ([7, Example 3.3]), by presenting an example where the minimum distances d(C) and d(D⊥ ) are not equal for an LCP of QC codes (C, D). The validity of Theorem 7 is also proved for 2D cyclic codes ([7, Theorem 3.4]) via the constituents and the trace representation of 2D cyclic codes. The proof of this result is rather technical and challenging to extend to nD cyclic codes for n > 2. The same result for general nD cyclic codes is

116

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

settled in [23]. Namely, if (C, D) is an LCP of nD cyclic codes in Rn , then C and D⊥ are equivalent codes. The proof of this generalization of Theorem 7 uses the correspondence between the ideal description of nD cyclic codes and their zero sets. Remark 8. Out of these results, one sees an algebraic interpretation: Theorem 7 is valid for “ideal” codes (such as cyclic and nD cyclic) but it is not true for “module” codes (e.g. QC codes). Clearly, being an ideal is a stronger assumption than being a module in algebraic terms. Remark 9. Results on LCD and LCP classes in some other generalizations of cyclic codes have been investigated in [20] (additive cyclic codes) and in [21] (generalized QC codes). As mentioned in Section 1, such studies are also carried out in other code families, which do not necessarily have relations to cyclic codes. We refer to [27] and [32] for the case of algebraic geometry codes. Algebraic geometry codes have been useful for the variant of the LCD/LCP problem and they will be briefly introduced in Section 5.1 for this purpose. 4.4. Descending the LCD and LCP Property: A Concatenated Construction There are various techniques for construction of LCD and LCP of codes, especially in recent literature. Here we focus on a particular technique and explain how it evolved. 4.4.1. The First Descend Idea As pointed out in Section 3, Sendrier proved that LCD codes are more frequent over large finite fields. Moreover, Carlet et al.’s result (Theorem 2) demands particularly LCD codes over F2 and F3 . Of course, binary codes are of particular interest from a cryptographic point of view, too. So, one is led to look for ways to construct LCD (or LCP of) codes over small fields by utilizing similar codes over extension fields. One such construction was given independently in [5] and [24], and it was called respectively expansion and subfield constructions. Let B = {β1 , . . . , βk } be a self-dual basis of Fqk over Fq . This means,  Tr(βi β j ) = δi j =

0 if i = j , 1 if i = j

where Tr denotes the trace map from Fqk to Fq , defined by 2

Tr(a) := a + aq + aq + · · · + aq

k−1

, for a ∈ Fqk .

It is well-known that a self-dual basis for Fqk over Fq exists if and only if q is even or both q and k are odd. For x ∈ Fqk , denote the coordinates relative to B as xB = (x1 , . . . , xk ) ∈ Fkq (i.e. x = ∑i xi βi ). Note that this can be viewed as π : Fqk → Fkq x −→ (Tr(β1 x), . . . , Tr(βk x)) =xB .

(2)

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

117

Then we have (by the self-duality of the basis),  j = δi j , π(βi ) · π(β j ) = βi B · β B

(3)

and π(x) · π(y) =xB ·yB = Tr(xy), for any x, y ∈ Fqk . Note that · denotes the Euclidean inner product on the relevant space. The map π can be extended in a natural way to FNqk as follows: π : FNqk −→ FkN q (c1 , . . . , cN ) → (π(c1 ) · · · π(cN )) . The map preserves the orthogonality of the basis B (cf. Eq. 3), which is essential for the following result. Theorem 10. ([5, Proposition 3] and [24, Theorem 5.2]) If C ⊂ FNqk is an LCD code, then π(C) ⊂ FkN q is also an LCD code. Let us note that if C is a qk -ary LCD code with parameters [N, K, d(C)], then π(C) is a q-ary LCD code with parameters [kN, kK, ≥ d(C)]. So, the goal of descending LCD property to a base field is fulfilled. However, the guaranteed distance of the resulting code is not so satisfactory. Moreover, the theorem requires a self-dual basis, which does not exist for all q and k. 4.4.2. Concatenation We introduce concatenation, which is a well-known technique in coding theory ([15]). Let π : Fqk → Fnq be an Fq -linear injection (so, k ≤ n). Hence, A := Im(π) is an [n, k, d(A)] linear code over Fq . For an [N, K, d(C)] linear code C over Fqk , π(C) := {(π(c1 ), . . . , π(cN )) : (c1 , . . . , cN ) ∈ C} is called a concatenated code, which is also denoted by A2C. Here, A is called the inner code and C is called the outer code. The concatenated code π(C) = A2C has parameters [nN, kK, ≥ d(C)d(A)]. In fact, the map π in Eq. 2 introduced in Section 4.4.1 defines a particular concatenation. In particular, A = Im(π) = Fkq in Section 4.4.1 (i.e. the trivial, full-space code). Therefore d(A) = 1 and hence the resulting concatenated code has poor designed distance. One can also define more general (so-called multi-level) concatenations, but the above definition suffices for our purposes. We refer the reader to [15] for further information on concatenated codes.

118

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

4.4.3. Concatenated Construction of LCD codes and LCP of Codes Theorem 10 is improved by a better concatenation in [8]. Moreover, it is extended to a construction of LCP of codes in [22]. Here, we only present the result in [8]. Let B = {β1 , . . . , βk } ⊂ Fqk be a basis of Fqk over Fq . We know that there is a uniquely determined dual basis B  = {β1 , . . . , βk } of B: Tr(βi β j ) = δi j , for all i, j. An Fq -linear map π : Fqk → Fnq is called an isometry with respect to B, if π(βi ) · π(β j ) = δi j ,

(4)

for all 1 ≤ i, j ≤ k. It is easy to see that such a map has to be injective. Therefore the image Im(π) ⊂ Fnq is an [n, k] linear code over Fq , which is called an isometry code. Note that Eq. 4 is the analogue of Eq. 3. However, we freed ourselves from a selfdual basis of the construction in Section 4.4.1, and used a dual basis, which always exists. The condition in Eq. 4 is also somewhat restrictive, and for a given q, k, n, and a given basis B of Fqk over Fq , an isometry may or may not exist. Still, the map in Eq. 2 is an isometry by Eq. 3. Moreover, there are isometries for some q, k values which cannot be covered by the concatenation in Eq. 2 (due to nonexistence of self-dual basis). For instance, for k = 2, n = 3 and q = 3, there are 24 distinct isometries π : Fq2 → F3q for each basis of Fq2 over Fq . Observe that there is no self-dual basis for Fq2 over Fq for q = 3. See [8, Example 2.2] for further examples of number of isometries for certain extensions. Other than relaxing the existence of concatenation map, this idea also allows inner codes A = Im(π) with d(A) > 1 (unlike Theorem 10). If there exists an isometry π : Fqk → Fnq with respect to at least one basis B of Fqk over Fq , let d(q; [n, k]) be the largest minimum distance of all isometry codes π(Fqk ) ⊆ Fnq among all basis of Fqk over Fq . See Examples 2.3 and 2.4 in [8] for instances where d(q; [n, k]) > 1. The following is the main result of [8]. Note that we extend an isometry map π from Fqk to Fnq naturally to FNqk below. Theorem 11. ([8, Theorem 3.1]) Assume that an isometry from Fqk to Fq exists with respect to at least one basis of Fqk over Fq . Let π denote the isometry yielding the maximum minimum distance for all possible isometry codes in Fnq . If C is an LCD code over Fqk with parameters [N, K, d(C)], then π(C) is an LCD code over Fq with parameters [nN, kK, ≥ d(C)d(q; [n, k])]. We refer to [8, Section 4] for examples of LCD codes, obtained via Theorem 11, which have good parameters. 5. A generalization of LCD Codes: σ -LCD Codes In [12], Carlet et al. have introduced the concept of linear codes with σ complementary dual (σ -LCD), among which are known Euclidean LCD codes, Hermitian LCD codes, and Galois LCD codes. This concept can be used to construct LCP of codes more easily and with more flexibility.

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

119

A mapping σ : Fnq −→ Fnq is called an isometry if dH (σ (c), σ (w)) = dH (c, w) for any c, w ∈ Fnq . All isometries on Fnq form a group, which is denoted by Aut(Fnq ). Now let σ ∈ Aut(Fnq ) and C be a linear code over Fq with length n. We call σ dual code of C the code: C⊥σ = {w ∈ Fnq : w, c σ = 0 for any c ∈ C} where w, c σ is defined as follows: w, c σ = w, σ (c) =

n−1

∑ wi di ,

i=0

where w = (w0 , w1 , . . . , wn−1 ) and σ (c) = (d0 , d1 , . . . , dn−1 ). From the definition of C⊥σ , one gets immediately the following relationship C⊥σ = (σ (C))⊥ ,

(5)

where σ (C) = {σ (c) : c ∈ C} and (σ (C))⊥ is the dual code of σ (C). It is easily seen that σ inner product is nondegenerate, C⊥σ is also an Fq linear subspace of Fnq , and dimFq (C) + dimFq (C⊥σ ) = n. Definition 12. A linear code C over Fq is said to be σ complementary dual (σ -LCD) if C ∩C⊥σ = {0}. If C is an [n, k, d] σ -LCD code, then (C,C⊥σ ) is an LCP of codes with parameters [n, k, d, d]. A linear code C over Fq shall be called a σ self-orthogonal code (resp. σ selfdual code) if C ⊆ C⊥σ (resp. if C = C⊥σ ). It is easy to see that any σ self-dual code has dimension n2 . From the definition of σ -LCD code, if σ is the identity map on Fnq , then σ -LCD codes are exactly Euclidean LCD codes. If σ = (1Fnq , π), where π is a nontrivial automorphism on Fnq , then σ -LCD codes are exactly Galois LCD codes. In particular, if q is a prime, then there are non trivial Galois LCD codes; there are many more σ -LCD codes. It is shown in [12] that, for q > 2, all q-ary linear codes are σ -LCD and that, for every binary linear code C, the code {0} ×C is σ -LCD. In this same reference is developed the theory of σ -LCD generalized quasi-cyclic (GQC) codes. Characterizations and constructions of asymptotically good σ -LCD GQC codes are provided. Moreover are considered σ -LCD Abelian codes; it is proved that all Abelian codes in a semi-simple group algebra are σ -LCD. Fixing appropriate mappings σ allows deducing many results on the classical LCD codes and LCD GQC codes. 5.1. Variant of the Problem and New Type of Codes We recalled in introduction that, by using two linear codes C and D whose sum is direct and equals Fnq , direct sum masking (DSM) protects against both SCA and FIA the sensitive data stored in registers. The vector handled by the algorithm is then the sum of a codeword of C, which is the encoded version of the sensitive data, and of a codeword of D, which is encoded random data and plays the role of the mask. Thanks to the

120

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

fact that the two codes have sum equal to Fnq , the vector handled by the algorithm can be any vector of Fnq . And thanks to the fact that the two codes have trivial intersection, any such vector is equal to such sum in a unique way. As proved in [4], the resulting security parameter is the pair (d(C) − 1, d(D⊥ ) − 1). For being able to protect not only the sensitive input data stored in registers against SCA and FIA but the whole algorithm (which is required at least in some software applications), it is necessary to have access to the source vector whose multiplication by the generator matrix of the second code gives the second vector. A way of doing so is to lengthen C and D into C and D , by adding 0’s at the end of each codeword of C (the minimum distance of the resulting code C equals that of C) and appending the k × k identity matrix at the end of the generator matrix of D; the resulting code D is an [n + k, k] linear code. It is not difficult to see that in this case, d(D ⊥ ) ≤ d(D⊥ ). It is then highly desired to construct linear codes D such that d(D ⊥ ) is very close to d(D⊥ ). In such case, we say that D is almost optimally extendable (and is optimally extendable if d(D ⊥ ) = d(D⊥ )). More precisely, we have the following definition. Definition 13. Let D be an [n, k] linear code over a finite field Fq and H a generator matrix of D. Let D be the [n + k, k] linear code with generator matrix [H : Ik ], where Ik is the k × k identity matrix. We say that D is optimally extendable if D has same dual distance d ⊥ = d(D⊥ ) as D. Let t be a positive integer such that t ≤ d ⊥ . We say that D is t-extendable if the dual distance of D is at least t. If t is close to d ⊥ , we then say that D is almost optimally extendable. Note that, the ability of being (almost) optimally extendable depends on the choice of H and properly speaking, we should speak of the pair (D, H) to be optimally extendable, but to simplify statements, we shall make the abuse of terminology of Definition 13. Recall from [30] that d ⊥ equals the minimal strictly positive number of Fq -linearly dependent columns of H. It is then easily seen that, given t ≤ d ⊥ , D is t-extendable if and only if, for every i < t, any i columns of H generate (Fq -linearly) a code of minimum distance strictly larger than t − i. This condition seems rather difficult to achieve for t = d ⊥ (and even for t close to d ⊥ ). Note that, according to a well-known result (see [30]), a parity check matrix of D equals [In : −H t ], where H t is the transposed matrix of H. This may have little relationship with the parity check matrix of D. This confirms that (almost) optimally extendable codes are difficult to construct. In general, it is notoriously difficult to determine the minimum distances of the codes ⊥ D and D ⊥ simultaneously. Only a few constructions of (almost) optimally extendable linear codes have been proposed in the literature. In 2018, Carlet, G¨uneri, Mesnager, and ¨ Ozbudak [6] constructed an optimally extendable linear code by employing algebraic geometry codes. Very recently, Carlet, Li and Mesnager have investigated in [9], constructions of (almost) optimally extendable linear codes from irreducible cyclic codes and from the first-order Reed-Muller codes. We present the findings of these articles in this subsection.

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

121

5.1.1. Optimally Extendable Algebraic Geometry Codes For an introduction to algebraic function fields and the codes defined from them, we refer to [36]. Here, we just recall very briefly the definition of algebraic geometry codes and some of their basic properties. Let F be an algebraic function field of one variable, whose full constant field is Fq . For an element f ∈ F and a rational (degree 1) place P of F, f (P) denotes the evaluation of f at P and f (P) ∈ Fq . Moreover, g denotes the genus of F. The number of rational places of F is important for the study of codes from function fields, and it is denoted by N(F). An important result in the theory is the Hasse-Weil bound, which says √ |N(F) − (q + 1)| ≤ 2g q. For a divisor A of F, the Riemann-Roch space associated to A is defined as L(A) := { f ∈ F : ( f ) ≥ −A} ∪ {0}, where ( f ) denotes the principal divisior of f , which contains information about the zeros and poles of f together with their orders. It is well-known that L(A) is a finite dimensional vector space over Fq . The degree and the dimension of L(A) are denoted, respectively, by deg(A) and dim(A). It is in general difficult to compute dim(A). However, if deg(A) < 0, then dim(A) = 0 and if deg(A) > 2g − 2, then dim(A) = deg(A) + 1 − g. Definition 14. Let P1 , . . . , Pn be pairwise distinct rational places of F and set the divisor D = P1 + · · · + Pn . Let G be another divisor of F whose support is disjoint from that of D. The algebraic geometry (AG) code associated to D, G is defined as CL (D, G) := {( f (P1 ), . . . , f (Pn )) : f ∈ L(G)} ⊂ Fnq . It is clear that if we know a basis for L(G), then a generator matrix for CL (D, G) can be obtained by writing in each row the evaluation of these basis elements at the places P1 , . . . , Pn . We collect some results on AG codes, which will be used in this section. Theorem 15. ([36]) i. The dimension and the minimum distance of CL (D, G) satisfy dim(CL (D, G)) = dim(G) − dim(G − D) d(CL (D, G)) ≥ n − deg(G). ii. CL (D, G)⊥ = CL (D, H), where H = D − G − (η) for a certain Weil differential η. We now describe the construction of AG codes. Let F be an algebraic function field of one variable whose full constant field is Fq . Let k, n be positive integers such that g ≤ k ≤ n and n + k ≤ N(F). Let T = {U1 ,U2 , . . . ,Uk+n } be a set of k + n distinct rational places of F and let G be a divisor of F such that deg(G) = k + g − 1 and supp(G) ∩ T = 0. / Since deg(G) ≥ 2g − 1, we have dim(G) = deg(G) + 1 − g = k. Let { f0 , f1 , . . . , fk−1 } be an Fq −basis of L(G). Consider the Fq −linear map

122

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

ψ : L(G) → Fk+n q f → ( f (U1 ), f (U2 ), . . . , f (Uk+n )). As deg(G −U1 −U2 − · · · −Uk+n ) = k + g − 1 − k − n < 0, ψ is injective and the image is an Fq −linear code of dimension k. The image has a generating matrix as ⎤ f0 (U2 ) . . . f0 (Uk+n ) ⎢ f1 (U2 ) . . . f1 (Uk+n ) ⎥ ⎥ ⎢ ⎥. ⎢ .. .. .. ⎦ ⎣ . . . fk−1 (U1 ) fk−1 (U2 ) . . . fk−1 (Uk+n ) ⎡

f0 (U1 ) f1 (U1 ) .. .

This matrix has rank k and hence there exist k linearly independent columns of this matrix. Let Q1 , Q2 , . . . , Qk denote a configuration of k linearly independent columns. Let {P1 , . . . , Pn } = T \ {Q1 , . . . , Qk }. With these analyses, we have that the k × k matrix H2 given by ⎡

⎤ f0 (Q2 ) . . . f0 (Qk ) ⎢ f1 (Q2 ) . . . f1 (Qk ) ⎥ ⎢ ⎥ H2 = ⎢ ⎥. .. .. .. ⎣ ⎦ . . . fk−1 (Q1 ) fk−1 (Q2 ) . . . fk−1 (Qk ) f0 (Q1 ) f1 (Q1 ) .. .

is invertible. Let H1 be the k × n matrix given by ⎡

⎤ f0 (P2 ) . . . f0 (Pn ) ⎢ f1 (P2 ) . . . f1 (Pn ) ⎥ ⎢ ⎥ H1 = ⎢ ⎥. .. .. .. ⎣ ⎦ . . . fk−1 (P1 ) fk−1 (P2 ) . . . fk−1 (Pn ) f0 (P1 ) f1 (P1 ) .. .

Finally, let H be the k × n matrix given by H = H2−1 H1 . For the divisors D = P1 + · · · + Pn and E = Q1 + · · · + Qk , consider the AG codes C1 = CL (D, G) ⊂ Fnq and C2 = CL (D + E, G) ⊂ Fn+k q . It is clear that both C1 and C2 have dimension k. Moreover, a generator matrix for C1 and C2 are, respectively, H1 and [H1 : H2 ]. Since H2 is invertible, we can also consider H and [H : Ik ] as generator matrices of these codes. By Theorem 15, we have C1⊥ = CL (D, D − G + (η)) and C2⊥ = CL (D + E, D + E − G + (η)) for a suitably chosen Weil differential η. The degree of (η) is known to be 2g − 2. Hence, d(C1⊥ ) ≥ n − deg(D − G + (η)) = k − g + 1,

(6)

d(C2⊥ )

(7)

≥ (n + k) − deg(D + E − G + (η)) = k − g + 1.

In summary, we showed that under the assumptions made on the function field F of genus g, we can construct an AG code C1 ⊂ Fnq of dimension k with a generator

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

123

matrix H and its lengthening C2 ⊂ Fn+k of dimension k with a generator matrix [H : Ik ] q such that both have the same designed minimal distances for the dual codes (Eqs. 6 and 7). However, since we do not know explicitly the dual distances, we cannot determine exactly whether C1 is optimally or almost optimally extendable in the sense of Definition 13. However for the rational function field F = Fq (x), which has genus g = 0, we obtain not only an optimally extendable code but also construct it to reach the Singleton bound (i.e. maximum distance separable (MDS)). Theorem 16. ([6, Theorem 2]) Let n and k be positive integers such that k ≤ n and n + k ≤ q + 1. Then there exists an AG code C1 over Fq of length n and dimension k, constructed form the rational function field Fq (x), which is optimally extendable to an AG code C2 . Moreover, both C1 and C2 are MDS codes. Proof. The rational function field over Fq has genus 0 and q + 1 rational places. The preceding discussion then yields the codes C1 and C2 as before with d(C1⊥ ) ≥ k + 1 and d(C2⊥ ) ≥ k + 1. Note that the parameters of C1⊥ and C2⊥ are [n, n − k] and [n + k, n], respectively. Hence the Singleton bound yields k + 1 as upper bound for the minimum distance of both C1⊥ and C2⊥ . Hence, we obtain d(C1⊥ ) = d(C2⊥ ) = k + 1. In particular, both codes are MDS. Since the dual of an MDS code is also MDS, we have the result. Remark 17. Concatenation techniques as in Section 4.4.1 are applied to AG codes obtained in [6] to construct codes over some small fields. We refer to the article for details. 5.1.2. Optimally Extendable Primitive Irreducible Cyclic Codes Another idea to construct optimally extendable codes is to consider primitive irreducible cyclic codes. Cyclic codes were introduced in Section 4.3. A cyclic code of length n generated by the polynomial g(x) is said to be an irreducible cyclic code if its check polynomial (xn − 1)/g(x) is irreducible. Let D be the irreducible cyclic code of length qm − 1 with check polynomial f (x) where f (x) is the minimal polynomial of α −1 over Fq , with α is a generator of F∗qm and m ≥ 3 is an integer. Clearly, the dimension of D is equal to m. Define D to be the linear code over Fq with generator matrix [H : Im ] where H is the generator matrix in the standard form of the code D. It is well-known that the weight enumerator of D is 1 + (qm − 1)z(q−1)q

m−1

.

As proved in [9], the weight distribution of the dual code D⊥ is given by  m−1   q 1 − 1 (q − 1)qm−1 At = m ∑ q j i m−1 0≤i≤q −1 0≤ j≤(q−1)qm−1 i+ j=t

  (q − 1)t + (−1) j (q − 1)i (qm − 1)

124

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

for 0 ≤ t ≤ qm − 1. When m ≥ 3, the code D⊥ has parameters  [2m − 1, 2m − m − 1, 3], if q = 2; [qm − 1, qm − m − 1, 2], if q ≥ 3. The parameters and the weight enumerator of the corresponding linear code D as defined in Definition 13 can be determined. Theorem 18. ([9, Theorem 1]) Let D be the irreducible cyclic code of length qm − 1 with check polynomial f (x) defined above and generator matrix in the standard form. The corresponding linear code D as defined in Definition 13 has parameters [qm + m − 1, m, qm − qm−1 + 1] and its weight enumerator is given by   m m−1 m (q − 1)i zq −q +i . i=1 i m

1+ ∑

The following theorem presents an optimally extendable linear code D. Theorem 19. ([9, Theorem 3]) Let D be the primitive irreducible cyclic code over Fq and D be defined as in Theorem 18. When q ≥ 3 and m ≥ 3, we have d(D⊥ ) = d(D⊥ ) = 2, i.e., D is an optimally extendable linear code. 5.1.3. Optimally Extendable Codes by Puncturing Primitive Irreducible Cyclic Codes Another way to construct optimally extendable codes is to choose D from cyclic codes and D the code D punctured on a coordinate set T , which is obtained by deleting components indexed by the set T in all codewords of D . To do this, we consider an [n, k, d] cyclic code D over Fq and let h(x) = h0 + h1 x + · · · + hn−k xn−k (hn−k = 1) be the monic generator polynomial of the code D . Without loss of generality, we assume that n ≥ 2k. Otherwise, we take the dual code D⊥ as D . Denote ⎛ ⎞ h0 h1 h2 · · · hk−1 · · · h−2 h−1 ⎜ 0 h0 h1 · · · hk−2 · · · h−3 h−2 ⎟ ⎜ ⎟ H1 = ⎜ . . . .. .. .. ⎟ . . . ⎝ . . . . . . ⎠ 0 0 0 · · · h0 · · · h−k−1 h−k k× and ⎛ ⎜ ⎜ H2 = ⎜ ⎝

h h−1 .. .

0 h .. .

··· ··· .. .

⎞ 0 0⎟ ⎟ .. ⎟ . ⎠

h−k+1 h−k+2 · · · h

k×k

,

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

125

where  = n − k. Set H = H2−1 H1 . Then [H : Ik ] is a generator matrix of the code D . Let D be the linear code over Fq with generator matrix H. The code D is in fact the code D punctured on the coordinate set T = { + 1,  + 2, . . . , n}, where  = qm − 1 − m that is, D is obtained by deleting the components indexed by the set T = { + 1,  + 2, . . . , n} in all codewords of D . Then we have the following theorem: Theorem 20. ([9, Theorem 5]) Let α be a generator of F∗qm and let f (x) be the minimal polynomial of α −1 over Fq , where m ≥ 3 is an integer. Let D be the irreducible cyclic code of length qm − 1 with check polynomial f (x). The linear code D defined as above has parameters [qm − m − 1, m, qm − qm−1 − m] and its weight enumerator is given by m   m m−1 m (q − 1)i xq −q −i . 1+ ∑ i=1 i The following theorem presents another optimally extendable linear code D. Theorem 21. ([9, Theorem 7]) Let D be the primitive irreducible cyclic code D punctured on the coordinate set T = { + 1,  + 2, . . . , n} as described above. When m ≥ 3, we have d(D⊥ ) = d(D⊥ ), i.e., D is an optimally extendable linear code. 5.1.4. Almost Optimally Extendable Linear Codes from the First Order Reed-Muller Codes The first-order Reed-Muller codes is a family of low-rate linear codes with the maximal possible minimum distance for this rate and length. Thanks to their optimal parameters and simple decoding algorithms, these codes have a lot of applications. One can employ first-order Reed-Muller codes to derive almost optimally extendable linear codes. Reed-Muller codes can be defined in terms of Boolean functions. Let f (x) be a m m Boolean function on Fm 2 , where x = (x1 , x2 , . . . , xm ) ∈ F2 . Assume that n = 2 and denote m by v0 , v1 , . . . , vn−1 the vectors of F2 in some fixed order. Write   c f = f (v0 ), f (v1 ), . . . , f (vn−1 ) ∈ Fn2 . The r-th order binary Reed-Muller code RM(r, m) of length n = 2m , for 0 ≤ r ≤ m, is the set of all vectors c f , where f (v) is a Boolean function which is a polynomial of degree at most r, i.e., RM(r, m) = {c f ∈ Fn2 : deg( f ) ≤ r}. It is well-known that for 0 ≤ r ≤ m, the binary Reed-Muller code RM(r, m) has parameters m   m [2m , ∑ , 2m−r ]. i i=0

126

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

To construct an almost optimally extendable linear code, we choose D as the first order binary Reed-Muller code RM(1, m). A generator matrix H of D can be obtained as follows. Denote u = 1 + x1 + x2 + · · · + xm . It is clear that the set {cu , cx1 , cx2 , . . . , cxm } forms an F2 -basis of the code D. Denote by H the (m + 1) × n matrix ⎛

⎞ cu ⎜ cx ⎟ ⎜ 1⎟ ⎜ ⎟ H = ⎜ cx2 ⎟ ⎜ .. ⎟ ⎝ . ⎠

(8)

cxm Let D be the binary code with the generator matrix [H : Im+1 ]. The minimum distances the parameters of of the codes D, D , D⊥ and D⊥ can be determined. Morespecifically, m m−1 ], [2m + m + 1, m + , 2 the codes D, D , D⊥ and D⊥ are respectively [2m , ∑m i=0 i 1, 2m−1 + 1], [2m , 2m − m − 1, 4] and [2m + m + 1, 2m , 2]. 5.1.5. Almost Optimally Extendable Linear Codes from Cyclic Codes of dimension 2 Finally, one can construct almost optimally extendable linear codes by employing cyclic codes of dimension 2. Assume that n = q + 1 and α is a generator of F∗q2 . Let β = α q−1 . Then β is a primitive n-th root of unity. Suppose that h(x) is the minimal polynomial of β −1 over Fq . Let D be the irreducible cyclic code of length n with generator polynomial g(x) = (xn − 1)/h(x). As observed in [9], it then follows from Delsarte’s Theorem that D=



  Tr(a), Tr(aβ ), . . . , Tr(aβ n−1 ) : a ∈ Fq2 ,

where Tr denotes the trace function from Fq2 onto Fq defined by Tr(x) = x + xq . It is known that D is an [q+1, 2] linear code over Fq [30]. Let H be a generator matrix of D. Let D be the linear code over Fq with generator matrix [H : I2 ]. The parameters of the codes D, D , D⊥ and D⊥ can be determined. More specifically, the parameters of the codes D, D , D⊥ and D⊥ are respectively [q + 1, 2, q], [q + 3, 2, q + 1], [q + 1, q − 1, 3] and [q + 3, q + 1, 2].

6. Conclusions The direct sum masking (DSM) technique represents a new avenue for reducing the cost of the protection against side channel attacks (SCA) and fault injection attacks (FIA). Such cost reduction is vital for ensuring privacy in the Internet of Things (IoT), since this recent framework gives new opportunities for devastating attacks, and since the efficient countermeasures against them represent a cost often considered as prohibitive by industry. This has considerably renewed the interest of linear complementary dual (LCD) codes, which are an old notion. Much work remains needed to have, at the disposal of the DSM technique, codes adapted to all situations. DSM has been introduced in 2014

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

127

and the first study of the related LCD codes has been published in 2016 (the notion of LCD codes dates back to 1992 but few papers about it appeared between this date and recently). In the three years between 2016 and 2019 have been published about a hundred papers on the subject of LCD codes and LCP of codes, a minority of which are referred in the present chapter. The case of codes over Fq with q > 3 is now settled and the effort is devoted to the cases q = 2 and q = 3. The former case is the most important for applications to privacy in networked IoT systems.

References [1] [2] [3] [4]

[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]

M. Araya, M. Harada, On the classification of linear complementary dual codes, Discrete Math. 342 (2019), 270–278. M. Araya, M. Harada, On the minimum weights of binary linear complementary dual codes, arXiv:1807.03525v1 [math.CO]. P. Beelen and L. Jin, Explicit MDS Codes with complementary duals. IEEE Trans. Information Theory 64(11) (2018) 7188–7193. J. Bringer, C. Carlet, H. Chabanne, S. Guilley, H. Maghrebi, Orthogonal direct sum masking: a smartcard friendly computation paradigm in a code, with builtin protection against side-channel and fault attacks, WISTP, Heraklion, Crete, Springer LNCS 8501 (2014), 40–56. C. Carlet, S. Guilley, Complementary dual codes for counter-measures to side-channel attacks, Adv. Math. Commun. 10 (2016), 131–150. ¨ C. Carlet, C. G¨uneri, S. Mesnager, F. Ozbudak, Construction of some codes suitable for both side channel and fault injection attacks, WAIFI, Bergen, LNCS 11321 (2018), 95–107. ¨ ¨ C. Carlet, C. G¨uneri, F. Ozbudak, B. Ozkaya, P. Sol´e, On linear complementary pairs of codes, IEEE Trans. Inf. Theory 64 (2018), 6583–6589. ¨ C. Carlet, C. G¨uneri, F. Ozbudak, P. Sol´e, A new concatenated type construction for LCD codes and isometry codes, Discrete Math. 341 (2018), 830–835. C. Carlet, C. Li, S. Mesnager, Some (almost) optimally extendable linear codes, Des. Codes Cryptogr.. To appear. C. Carlet, S. Mesnager, C. Tang and Y. Qi, New Characterization and Parametrization of LCD Codes, IEEE Trans. Information Theory 65(1) (2019), 39–49. C. Carlet, S. Mesnager, C. Tang and Y. Qi, Euclidean and Hermitian LCD MDS codes”, Des. Codes Cryptogr. 86(11) (2018) 2605–2618. C. Carlet, S. Mesnager, C. Tang and Y. Qi. On sigma-LCD codes, IEEE Trans. Information Theory 65(3) (2019),1694–1704. C. Carlet, S. Mesnager, C. Tang, Y. Qi, R. Pellikaan, Linear codes over Fq are equivalent to LCD codes for q > 3, IEEE Trans. Inf. Theory 64 (2018), 3010–3017. B. Chen, H. Liu.: New constructions of MDS codes with complementary duals. arXiv: 1702.07831, 2017. I. Dumer, Concatenated codes and their multilevel generalizations, Handbook of Coding Theory, NorthHolland, Amsterdam (1998), 1911–1988. C. Galindo, O. Geil, F. Hernando, D. Ruano, New Binary and Ternary LCD Codes.IEEE Trans. Inf. Theory 65(2) (2019), 1008–1016. L. Galvez, J. L. Kim, N. Lee, Y. G. Roe, and B. S. Won, Some bounds on binary LCD Codes, Cryptography and Communications, vol. 10, no. 4, (2018) 719–728. M. Grassl, Bounds on the minimum distance of linear codes and quantum codes. Online available at http://www.codetables.de, accessed on 2019-3-9. ¨ C. G¨uneri, A. Neri, F. Ozbudak, On the hull of linear codes, preprint. ¨ ¨ C. G¨uneri, F. Ozbudak, F. Ozdemir, On complementary dual additive cyclic codes, Adv. Math. Commun. 11 (2017), 353–357. ¨ ¨ C. G¨uneri, F. Ozbudak, B. Ozkaya, E. Sac¸ıkara, Z. Sepasdar, P. Sol´e, Structure and performance of generalized quasi-cyclic codes, Finite Fields Appl. 47 (2017), 183–202. ¨ C. G¨uneri, F. Ozbudak, E. Sac¸ıkara, A concatenated construction of linear complementary pair of codes, Cryptogr. Commun., to appear (2019), https://doi.org/10.1007/s12095-019-0354-5.

128 [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38]

C. Carlet et al. / Construction of Efficient Codes for High-Order Direct Sum Masking

¨ C. G¨uneri, B. Ozkaya, S. Sayıcı, On linear complementary pair of nD cyclic codes, IEEE Commun. Lett. 22 (2018), 2404–2406. ¨ C. G¨uneri, B. Ozkaya, P. Sol´e, Quasi-cyclic complementary dual codes, Finite Fields Appl. 42 (2016), 67–80. C. Li. Hermitian LCD codes from cyclic codes. Designs, Codes and Cryptography 86 (10) (2018), 2261–2278. L. Jin, Construction of MDS codes with complementary duals, IEEE Trans. Information Theory 63 (5) (2017), 2843–2847. L. Jin, C. Xing, Algebraic geometry codes with complementary duals exceed the asymptotic GilbertVarshamov bound, IEEE Trans. Inform. Theory 64 (2018), 6277–6282. C. Li, C. Ding and S. Li, LCD Cyclic codes over finite fields, IEEE Trans. Information Theory 63 (7) (2017), 4344–4356. S. Li, C. Li, C. Ding, and H. Liu, Two families of LCD BCH codes, IEEE Trans. Information Theory 63 (8) (2017), 5699-5717. F. J. MacWilliams, N.J.A. Sloane, The Theory of Error-Correcting Codes, Amsterdam, North-Holland, 1977. J. L. Massey, Linear codes with complementary duals, Discrete Math. 106 (1992), 337–342. S. Mesnager, C. Tang, Y. Qi, Complementary dual algebraic geometry codes, IEEE Trans. Inform. Theory 64 (2018), 2390–2396. G. Nebe, E.M. Rains, N.J.A. Sloane, Self-Dual Codes and Invariant Theory, Springer-Verlag, Berlin Heidelberg, 2006. M. Sari and M. E. Koroglu, On MDS Negacyclic LCD Codes, arXiv:1611.06371, 2016. N. Sendrier, Linear codes with complementary duals meet the Gilbert-Varshamov bound, Discrete Math. 285 (2004), 345–347. H. Stichtenoth, Algebraic Function Fields and Codes, Springer-Verlag, Berlin Heidelberg, 2009. X. Yang, J. L. Massey, The condition for a cyclic code to have a complementary dual, Discrete Math. 126 (1994), 391–393. X. T. Ngo, S. Bhasin, J.-L. Danger, S. Guilley, Z. Najm, Linear complementary dual code improvement to strengthen encoded circuit against hardware Trojan horses, IEEE International Symposium on Hardware Oriented Security and Trust, HOST 2015, Washington, DC, USA, 5-7 May, 2015, pages 82–87

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved. doi:10.3233/AISE200009

129

A Framework for Security and Privacy for the Internet of Things (SPIRIT) Julian MURPHY, Canterbury, UK; E-mail: [email protected] a , Gareth HOWELLS a , Klaus MCDONALD-MAIER b , Sami GHADFI c , Giles FALQUET c , Kais ROUIS d , Sabrine AROUA d , Nouredine TAMANI d , Mickael COUSTATY d , Petra GOMEZ-KRMER d and Yacine GHAMRI-DOUDANE d a University of Kent b University of Essex c Universit´ e de Gen`eve d Universit´ e de La Rochelle Abstract. As the adoption of digital technologies expands, it becomes vital to build trust and confidence in the integrity of such technology. The SPIRIT project investigates the Proof-of Concept of employing novel secure and privacy-ensuring techniques in services set-up in the Internet of Things (IoT) environment, aiming to increase the trust of users in IoT-based systems. In this paper, we 1) outline our research and results to-date; and, 2) propose a system that addresses the distinct issues related to security and privacy, hence, overcoming the lack of user confidence, which inhibits utilisation of IoT technology. The system integrates three highly novel technology concepts developed by the project partners.

1. Introduction User Privacy Protection is a step further in user data protection mechanisms and aims at enabling users to take a full control of their own personal data (i.e., to grant/deny sharing data with any entity, to be informed about all the processes the data are subject to and the conclusions drawn from them, to be able to retract the consent for any reason, and to delete for good the data collected by service providers). For this purpose different privacy preserving mechanisms have been developed [1,2,3], such as Anonymization applied on multi-attribute data, Differential Privacy applied on statistical databases, and Obfuscation/Perturbation, which can lead to data degradation and information loss. While the European regulation “General Data Protection Regulation” (GDPR - https://gdpr-info.eu, 2018) imposes on public institutions and service providers to provide a trustworthy environment for data sharing and usage. At the same time the Internet-of-Things (IoT) has developed rapidly and introduced a whole new set of security issues. The SPIRIT project seeks to address these points holistically by developing a Proofof Concept system, which employs novel secure and privacy-ensuring techniques in services set-up in an Internet of Things (IoT) environment — therefore aiming to increase the trust of users in IoT-based systems. As Privacy as a concept has a gradual nature, it cannot be been perceived as an all/nothing manner [4].

130

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

In this paper, we outline our research and results to-date; and, propose a system that addresses the distinct issues related to security and privacy, hence, overcoming the lack of user confidence, which inhibits utilisation of IoT technology. The system integrates three highly novel technology concepts developed by the project partners, which are discussed in each of the following sections. The integration of these technologies will be demonstrated at the end of the project in a IoT use case scenario. 2. Content-based Signature The aim of the content-based signature is to guarantee that the content read by the consumer has not been forged or modified internationally from the original version. In order to create an appropriate automatic document security system, one needs to secure the textual and graphical information of the document, which requires complementary related topics. For instance, a recent tendency of graphical feature extraction conveys the idea of perceptually analyzing enclosed visual objects. Perceptual aspects would be approved within the proposed content authentication schemes, while preserving the sensibility constraint in view of forgery and post-processing operations. This should be appropriately achieved in a blind manner, e.g., the received document image is analyzed without referring to the original one. In the same vein, common image analysis processes could be involved to characterize graphical items, along with segmenting salient objects to separate textual and graphical information for further processing [5]. 2.1. Digest Generation Scheme Our approach is based on a hashing of the document’s content [6] (Figure 1). The document is first straightened up as rotations and geometrical distortion can appear during the document acquisition. Then, the digital image of the document is analyzed by algorithms in order extract the text, the tables, the images/graphics. Afterwards, the layout, the text, the tables and the graphics are described by specific algorithms to obtain a compact representation. Finally, these descriptions are combined in a digest. Our recent work focused on the analysis, description and hashing of graphical parts in the document. Perceptual image hashing can be used to accomplish this task. Figure 2 shows a generic perceptual image hashing algorithm. A set of robust features are extracted from an image to generate a hash/digest as a compact representation [7]. Indeed, this digest can be encrypted to make a signature and compute a similarity measure between two image digests or signatures. 2.2. Content-based Hashing Particularly, we propose to use a hashing algorithm capable of securing the graphical parts of paper and digital documents with unprecedented performance and a very small digest. The main challenge for such an algorithm is that of stability with reference to print and scan noise. We define the generic notion of stability and how to evaluate it. Let fK (.) a perceptual hash function with a secret key K and the input space I → O. This function

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

131

Figure 1. Algorithm for content-based digest generation.

Figure 2. Generic scheme of perceptual image hashing.

should be stable with respect to binary similarity functions s1 (for its input I) and s2 (for its output O) in such a way that ∀{a, b} ∈ I 2 , s2 ( f (a), f (b)) = s1 (a, b). Formally, suitable properties of fK (.) could be evinced as follows. (1) Non-invertible: Calculate the hash values from the image content and secret key where P ( fK (I) −→ I) < th, P denotes the probability. (2) Sensibility: Disclose the changing of perceptual content between two different images. Hence, P ( fK (I) = fK (ID )) > 1 − thd . (3) Robustness: Produce very similar hash values among two perceptual contentsimilar images. Hence, P ( fK (I) = fK (Is )) > 1 − ths . (4) Security: Calculate the hash value using only the secret key K. Here, th, thd and ths are positive thresholds close to zero. To achieve a satisfactory performance, we used both dense local information and global descriptors and then defined the ASYCHA method [8]. We have tested our method on two datasets totaling nearly 45 000 logo images and compared our method to two state-of-the art methods. Classical metrics have already been defined to estimate this probability on a given dataset. Among them we choose a set of four metrics regardless of the dataset bias towards positive or negative conditions:

132

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

• The false negative rate (FNR) is the probability that an authentic document is wrongly detected as modified. • The false positive rate (FPR) is the probability that a modified document is wrongly detected as authentic. • The false omission rate (FOR) is the probability that a document detected as modified is actually authentic. • The false discovery rate (FDR) is the probability that a document detected as authentic is actually modified. The provided results are shown in Table 1 according to the different metrics. The shown percentages demonstrate a significant improvement except for the FNR. This means that our algorithm is more sensitive in detecting perceptive changes in the image while maintaining an equivalent robustness to print and scan noise. Regardless, this algorithm has several shortcomings. The size of the digest is not fixed. It depends on the size of the image to be hashed which may be bigger as required. Furthermore, this hashing does not respect the non-invertibility requirement and an approximation of the original image content can be obtained from the digest. At least, the algorithm as is cannot compare two digests directly as the computation of the test image digest needs the hash of the original image. Table 1. Best results for the different methods. All the values should be as small as possible.

Metric

Venkatesan [9]

Wu [10]

ASYCHA 8.2

FNR (%)

0.3

5.2

FPR (%)

8.9

39.3 2

3

3

3.3 × 10

3

FOR (%)

2.7 × 10

FDR (%)

49.9

99.9

8.0

500 Bytes

50 bits

median 427 Bytes

Digest size

3.4 × 10

3.2 × 10

On the one hand, a multimedia authentication system highlights the perceptual content integrity compared to a cryptographic hash function of only text messages. For example, a document image should be authentic after bearing certain distortions (compression, filtering, additive noise, etc). The geometric changes over rotation and scaling operations would be considered to recognize the provided hash. On the other hand, a consistent hash function should be able to distinguish malicious tampering from content-preserving manipulations. Since there are many constraints to recognize digitized documents in real-world applications, using typically the mobile devices that encounter lightning and acquisition problems [11,12]. Noisy and blurred versions of captured documents engender several challenges according to security tasks. Accordingly, a consistent merge between an appropriate information extraction process and a corresponding modeling of extracted features remain crucial. Superior per-

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

133

formances are required to point out different attacks such as copy-move forgeries. The underlying objective is to explore a meaningful space of data and produce a fixed digest size.

3. Knowledge Extraction The modules developed here provide knowledge (e.g., the name of a person on an identity card) contained in a document (e.g., an identity card or a payment slip) to a semantic firewall. The knowledge can be based on logical rules such as in [13,4], or learned from machine learning methods [14]. This will, based on a set of access rules, decide to allow or deny access to a given document. Where the decision is based on the information that the semantic firewall possesses about the data requester (e.g., an insurance company) and the requested data. In other words the goal of the firewall is to ensure that data is transmitted to the right requester. The two main tasks are (1) to implement a document classifier which helps to identify the document type before applying the right method for Information Extraction (IE), and (2) the implementation of an IE system. We have focused exclusively on extracting information from images, this choice is due to the agreement with the different partners of the project on how the use-case (that will involve all components produced by the partners of the project) will be conducted to produce a first version demonstrator. A description of the work now follows. Given an image document, the first step is to do Optical Character Recognition (OCR) to produce a set of tokens (words, numbers) that the document possesses along with the bounding boxes, which delimits the rectangular regions where these tokens are located. Since the OCR process is far from being perfect (two scans of the same document are likely to not result in an identical nor a 100% accurate OCR output), there is a need to correct this output. Such a process is called OCR Post-Correction (for brevity, we call it text correction), this correction is done on words only1 . The corrected words are used for both document classification and IE. The IE process involves the use of templates where each template is specific to one kind of document (e.g., invoices issued by a particular supplier correspond to one class of documents). Thus, in order to use the right template for IE, the category of the document has to be recognized beforehand. In the following, we describe the methods we have applied for each task. 3.1. OCR-Post Correction We treated OCR Post-Correction as an Information Retrieval task, where the OCR output (a string recognized by the OCR engine2 ) is treated as a query and the goal is to look for the set of words in a vocabulary (e.g., a vocabulary of English words) which are relevant to the query. The more similar (in its character-wise structure) a word of the vocabulary 1 on the other hand, the correction of numbers requires either hand design of rules or making a training datasets to learn how to correct numbers automatically. 2 We used Google's open source OCR engine Tesseract https://en.wikipedia.org/wiki/Tesseract (software)).

134

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

is to the query, the more relevant that word is e.g., the word ”designed” is more relevant to the query ”deslijgned” than the word ”aligned” (a reference measure of similarity between tokens is the Levenshtein distance3 ). We have made two approaches to do OCR-Post correction. The first approach is based on (exact search instead of approximate search) finding the longest substrings that are included in both the query and at least one word in the vocabulary, these substrings are retrieved by mining the maximal frequent patterns45 from a sequence database composed of sequences each corresponding either to the query or a word of the vocabulary6 , and the search produces candidate corrections (which are words of the vocabulary whose corresponding sequences have the longest subsequences included in the query) then a final ranking based on the Levenshtein distance is applied to keep the best results7 . The novelty of this approach is how the bitmaps are built, if we denote by V the set of sequences corresponding to the vocabulary database (which contains sequences corresponding to the words of the vocabulary) and q the sequence corresponding to the query, the bitmaps are computed efficiently without a whole scan of the vocabulary database each time a new query is presented, the produced bitmaps are the same as if they were computed on V ∪ {q}, the mined patterns all have to be included in the query and also at least one sequence of the vocabulary database. The second approach is based on learning a vocabulary embeddings, which is based on the character-wise structure of words (as in fastText8 [16]) where the vector representation (the embedding) of a word is given by the sum of the embeddings of its n-grams9 which is useful to handle the Out Of Vocabulary words that are often produced by OCR engines, these embeddings associate any word to a vector where words with similar structure should be close (the closeness is measured using a geometrical measure of similarity like the Euclidean distance), then a KD-tree variant indexing the embedding vectors of the words in the vocabulary (the trees are balanced, yielding a O(log2 n) complexity for search, n denotes the vocabulary size) is used to find the (this is an Approximate Nearest Neighbor search) words of the vocabulary that are most similar to the query.

3 https://en.wikipedia.org/wiki/Levenshtein

distance space availability reasons, we refer the reader to [15] for introductory definitions of Sequential Pattern Mining–SPM and also a pioneering use of bitmaps in Sequential Pattern Mining, the bitmaps serve to compute the support of a pattern efficiently. The support of a pattern is the number of sequences of the database in which it is included. A pattern P is said to be included in a sequence S if P is a subsequence of S. A pattern P is said to be frequent if it is included in at least σ sequences of the database, where σ ∈ N is a threshold set by the user. 5 a pattern is maximal if it’s not included in any other pattern. 6 e.g., the word ”invoice” is transformed into the sequence < (i)(n)(v)(o)(i)(c)(e) >, each itemset is delimited by parentheses (which is the way itemsets are typically represented in SPM, rather than using braces), and the items of the sequences that we produce are singletons containing characters. 7 e.g., given the query q=”mterdlsmphnary” (an OCR result for the word ”interdisciplinary”), the Levenshtein distance makes the candidate ”interdisciplinary” more similar to the query than the candidate ”interfilamentary”. 8 https://en.wikipedia.org/wiki/FastText 9 e.g., for n = 5 and the word ”different”, the produced embedding (as described in [16]) is the sum of the embeddings of the 5-grams (the start ”¡” and end ”¿” symbols are added by the authors of [16] to distinguish between prefixes and suffixes) ”¡diff”, ”diffe”, ”iffer”, ”ffere”, ”feren”, ”erent”, ”rent¿” and the special sequence ”¡different¿”. 4 For

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

135

3.2. Document Classification Methods for document classification (involving only image documents) include the following. Generalized n-gram approaches which makes use of a generalization of n-grams to describe documents in the same way a sentence in NLP (Natural Language Processing) can be described by the n-grams it contains. Some examples of Generalized n-gram approaches for document classification are [17] where logical labels are combined with spatial (left, right, above, below) relationships between these labels, [18] which proposed a kind of n-grams (N x M grams) on bits of a binarized image (it is also a texture based approach, it uses texture features to describe documents). Layout based approaches which are based on finding the documents that have similar layout to an input document (some make also use of textual information e.g., [19]) [20], [19], [21]. Template-based approaches where the classification is based on finding which templates correspond most to an input document (e.g., [22]). While the simplest methods for Text-based document classification that yield a satisfying performance are bag of words models (typically they make use of n-grams as features), which assign a document to a feature vector where the features correspond to ngrams of words appearing in the sentences of the document, and then training a classifier (logistic regression, SVM, neural models) in a supervised way. The approach we used for document classification proceeds as follows: 1. It starts by extracting from the training data (each line output by the OCR engine is transformed into a sequence) the maximal frequent sequential patterns that are representative of a certain kind of documents (patterns that are indicative of one category of documents). The extracted patterns are used as features for classification, i.e. if a document contains a relevant pattern then its corresponding feature (in the feature vector) is set to 1, otherwise it is set to 0. The gap constraints10 we used for learning these patterns are minGap = 0 and maxGap = 1 (the minGap and maxGap constraints will be particularly useful when considering the spatial positions of words in the document in the next version of our classifier). We rank patterns by a measure similar to the C-value proposed in [23] used to extract relevant terms from text documents; 2. it uses the extracted patterns for each category as features for a multi-class classifier (there's one label to predict per document), we experimented with different kinds of models (multinomial logistic regression, SVM, feed forward neural networks with hidden layers), the one that performed well (at least 92% of macro F-Score) within reasonable runtime is the multinomial logistic regression model. The experiments we conducted for classification were done on a corpus of wikipedia articles containing 21 categories: astronomy, biology, economy, food, football, genetics, geography, health, heritage, informatics, literature, mathematics, medicine, movie, music, politics, religion, rugby, sculpture, skating, tennis. 10 Given gap constraints (minGap, maxGap) ∈ N2 where minGap ≤ maxGap, the pattern/sequence P =< J1 J2 J3 . . . Jm > is said to be included in another sequence S =< I1 I2 I3 . . . In > satisfying the gap constraints (minGap, maxGap), if there is a sequence of integers 1 ≤ u1 < u2 < u3 < · · · < um ≤ n, such that ∀k ∈ {1, . . . , m}, Jk ⊆ Iuk , and ∀k ∈ {2, . . . , m}, minGap ≤ uk uk 1 1 ≤ maxGap.

136

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

Still, since we are interested in classifying image document, the evaluation have to be done on a proper dataset for classifying image documents with categories such as invoices, receipts, etc. We are working on creating such a dataset for testing the integration of both components, the document classifier and the IE module. 3.3. Information Extraction–IE The IE system we have built is a template-based system using two kinds of templates. The first is called the Anchoring template and contains information about anchors which are keywords that are specific to certain kinds of documents11 and where these keywords are located. The spatial information is used to compute relative positions between the different anchors when mapped into a document, these relative positions help to match these anchors on an input document regardless of variations of scale and shift (the template must be matched however big, small or shifted the document is). The second kind of templates is called the Filling template and contains information about the fields to be extracted with respect to the anchors defined in the Anchoring template. When we know where the anchors are located (e.g., the keyword ”Invoice” in an invoice document of a specific supplier) and the relative position of the field of interest (e.g., the name of the client) with respect to the anchors. Then we can have a good idea about the exact position of the field of interest. The IE process we made is robust to changes in scale, shift, and cases where minor parts of the document are missing. We also made a Graphical User Interface to interact with the IE module. Primary evaluations of the IE module (which is still under development, see the missing functionalities below) were conducted manually (hand checking) on a very small set of documents (twenty documents) some were hand-made (insurance cards), the others were collected (receipts). For each type of documents we made the corresponding templates to do IE. We are preparing a proper benchmarking where we generate a dataset of images (including the cases we want to test, i.e., changes of scale and shift, and missing parts in the documents) for specific categories (those that we have for now are insurance cards and receipts). This dataset is still under construction. The major missing functionality to yet implement in our system are: (1) processing documents with varying fields (e.g. product lines in an invoice); (2) learning templates in an automatic way; (3) taking into account the spatial structure (where words are located) and visual information (pixels) of documents when doing document classification; and, (4) doing named entity disambiguation. The IE system also still needs to integrate both modules (the classifier and the IE system), then it needs to be tested and benchmarked using the datasets mentioned above for Document classification and IE. 4. Semantic firewall The concept of a “semantic firewall” provides an environment where data owners and data consumers can agree upon a set of data sharing rules, which are compliant with GDPR. For instance, it becomes possible to grant/deny access to data, as performed by 11 e.g.,

the words ”invoice”, ”quantity”, ”qty”, ”total” are indicative of the category of Invoice documents.

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

137

any Access Control rule, but the owner can also specify privacy functions that can be applied to allow data to be shared in a distorted form such as aggregation, obfuscation, anonymization, etc. This work is an extension of a previously published work in [13]. 4.1. Knowledge Base Specification We detail in this section our formal model for knowledge base implementation, in terms of vocabulary and syntax operators used to build a knowledge base as facts and rules. We consider the positive existential syntactic fragment of first-order logic, denoted by FOL(∧, ∃) [24,25], whose formulas are built with connectors: implication (→), conjunction (∧), and the usual quantifiers (∃, ∀). Definition 1 (Vocabulary) Let V = (C , P, V ) be a vocabulary composed of 3 disjoint sets where C is a finite set of constants, P is a finite set of predicates and V is an infinite set of variables, such that: • a term t over V is a constant (t ∈ C ) or a variable (t ∈ V ) referring to a unique value (unique name assumption), • an atomic formula (or atom) over V is of the form p(t1 , ...,tn ) where p is an n-ary predicate (n ∈ N), and t1 , ...,tn are terms, • a ground atom is an atom with no variables, • a variable in a formula is free if it is not in the scope of any quantifier, • a formula is said closed if it has no free variables, • a conjunct is a finite conjunction of atoms, • a ground conjunct is a conjunct of ground atoms, • given an atom or a set of atoms A, vars(A), consts(A) and terms(A) denote its set of variables, constants and terms, respectively. Definition 2 (Fact) A fact on V is the existential closure of a conjunct over V. Rules are logical formulas that are used to infer new facts from other facts. Rules may contain variables and should account for unknown individuals to capture the case where some information are incomplete. Therefore, we consider existential rules. Definition 3 (Existential Rules) An existential rule (or simply a rule) is a first-order formula of the form: r = ∀X ((∀Y H[X, Y]) → (∃Z C[Z, Y])), where vars(H) = X ∪ Y and vars(C) = Z ∪ Y where H and C are conjuncts called the hypothesis (or the body of the rule) and conclusion (or the head of the rule) of r respectively. We denote by r = (H,C) a contracted form of rule r. H and C should not be empty. Negative constraints are a special case of existential rules that allow representing knowledge that dictates how things are not ought to be in a given domain. Definition 4 (Negative Constraints) A negative constraint (or constraint) N is a firstorder formula of the form: N = ∀X (H[X] →⊥), where H[X] is a conjunct called hypothesis of N and X is a sequence of variables appearing in the hypothesis. We can define our knowledge base as a collection of facts, rules, and negative constraints, denoted by K = (F , R, N ). To be able to perform deduction, rules have to be

138

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

used with the facts to produce new facts. In logic, this is a simple application of Modus Ponens inference rule. In our case, the application of rule Ri on fact Fj is the substitution of variables that makes the body Ri looks like Fj . Thus, it is a homomorphism that maps body(Ri ) to Fj . Definition 5 (Rule Application [25]) A rule R = H → C is applicable to a fact F if there is a homomorphism σ from H to F. The application of R to F by π produces a fact α(F, R, π) = F ∪ π sa f e (C), where π sa f e is the safe substitution that replaces existential variables with new variables. α(F, R, π) is said to be an immediate derivation from F. The process of rule application creates new facts making other rules applicable. The process of such successive applications of rules on facts is called R-derivation [25]. Definition 6 (Closure) Given a fact F ∈ F and a set of rules R, the closure of F with respect to R, denoted by ClR (F), is defined as the smallest set (with respect to ⊆) which contains F and is closed under R-derivation. Definition 7 (Consistency/Inconsistency) Given a knowledge base K = (F , R, N ), a set F ⊆ F is R-inconsistent iff there exists a constraint N ∈ N such that ClR (F) |= HN (or ClR (F) |= ⊥), with |= is the first-order semantic entailment (given two facts f and f , f |= f iff there is a homomorphism from f to f ). Otherwise, F is said R-consistent. 4.2. Privacy-aware query answering We express the problem of privacy preservation as a query evaluation process in a knowledge base, such that a given query is extended with the information about the requester and its privacy restrictions defined in the ontology. The privacy algorithm we developed starts when it receives a data query Q from a requester. The requester includes also its identification information. The ontology checks then access rights defined by the data owner as privacy rules. If the requester has not any access right to the data, then the query is rejected. If the requester has no access limit to the data then the privacy-aware query is similar to the original query (QP = Q). If the requester has access to data under some privacy rules, then a privacy-aware query, denoted by QP , is generated based on privacy functions to apply on the requested data. Once query QP is obtained, it is processed by the data retrieval module to fetch the protected data to the requester. 4.3. Proof of Concept and Prototyping For the implementation, we need to consider the decidability issue of the logical framework. We notice that the R-derivation process can be infinite when recursive rules are expressed. Therefore, in our implementation we discard recursive rules in our sublanguage. Furthermore, we limit our prototype to the sublanguage defined by Description Logics [26], which are decidable and there exist many reasoning engines already implemented and available online to check the consistency of the knowledge base. The access rule are implemented as SWRL [27]. We have shown in [28] the following result. Proposition 1 Let K = (F , R, N ) be a knowledge base, and S is the set of all SWRL rules. Then: S ⊆ R, which means: ∀r ∈ S ⇒ r ∈ R.

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

139

Figure 3. Aggregated data sent to my employer.

Figure 4. Data sent to my doctor practitioner.

The use-case we implemented are as follows: • My employer requests my sport activity data and my privacy rule is: “do not share the time and location, and aggregate the other attributes”. Figure 3 displays the results computed for my employer. • My doctor practitioner is requesting the same data: the rule is “share all the data with no limit”. Figure 4 displays the results that would be shared with my doctor. 4.4. Remaining functions to implement To evolve the prototype, the next step is implementing the privacy toolkit defining the privacy functions mentioned within the SPIRIT ontology such as anonymization, obfuscation, data shelf-life, etc. Within the SPIRIT integration chain, the semantic firewall is located in between the knowledge extraction module and the encryption module. Therefore, we need to implement the interface with the knowledge extraction module to feed the ontology with required data according to the use-cases considered, and the format of the data to send to the encryption module.

5. Device-Based Data Encryption The aim of this work is to develop practical approaches to facilitate secure communication between distributed IoT devices and the semantic firewall by analyzing probability

140

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

density function (PDFs) distributions extracted from sensors of the communicating IoT devices. This approach has already been developed in previous work, for example using processor cache signatures, and known as ICMetrics. However, it has not been applied to sensor enabled IoT devices which feature limited computational ability. Furthermore, their raw sensor data before analog-to-digital digitization is inherently analog and will exhibit mean-reverting volatility around some quiescent state such as 0.145 at t0 , 0.1578 at t1 and 0.1256 at t3 e.g., an accelerometer axis or ambient light sensor readings. Since the overall approach has already been developed, the work to-date has focused on adapting it to this application domain. Furthermore, as volatility is the predominant issue in non-machine created datasets (processor caches or software instruction signatures are machine created) which causes unstable and inconsistent sensor PDFs. We describe here a method, also documented in [29], we have developed using hidden markov models (HMMs) in an endeavor to replicate the operation of an optocoupler, which are frequently used in hardware security to get stable or less noisy signals from analog sources for analysis e.g., smartcard differential power analysis attacks. 5.1. HMM Sensor Data Stabilization Recent security trends have led to a need to be able to authenticate and identify correctly embedded devices used in the Internet-of-Things (IoT). Where authentication is achieved by extracting a stable and predictable sequence of n ID bits (e.g., 128-bits) from the device. A n-bit ID can be formed in two ways: 1. from outputs of dedicated integrated circuits (ICs) or from on-chip hardware macro blocks; or, 2. by inferring it from data extracted from the sensors, memory and processor, which contains hidden information local and unique to that device. We presented a solution to the first approach (bullet point 1 above) in [30] but suffers from inherent stability issues. The second approach (bullet point 2 above) to form an n-bit ID—where the authors have conducted work on this approach using probability density functions (PDFs) in [31] and termed ICMetrics—also suffers from the exact same stability problems. Therefore, based on the previous work and after evolving a number of ideas, we have researched and developed a method to act as the basis for a new solution, and is discussed as follows. Given stability affects the consistency of the sensor data extracted from a device, we first attempted various approaches to stabilize the data (pre-processing) before it was used as a PDF to generate an ID e.g., filtering and various DSP techniques. And, later cutting out the analysis as a PDF and instead generating some visual fingerprint using image processing and geometric algorithms to generate an ID e.g., voroni diagrams and line sweeping. This led to novel solutions, but no real or appreciable improvement in stability was observed, indeed it often worsened. However, what we did note, which stood out when combining and permutating the previous work and the above attempts to improve stability, was that: machine generated PDFs, such as instruction cache signatures, were usually perfectly stable. This is because they are effectively synthetic PDFs, while the PDFs from sensors are naturally analogue

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

141

and influenced by their environment; and obviously intuitive since one is digitally created and one is not. But it was not clear that perhaps something similar could be exploited to improve stability, and which spawned the following idea: if synthetic PDFs can be generated from sensor data, and thus making them machine generated, stability should improve (or at least by an appreciable percentage). In researching this, we also noted a similar process, conceptually, to this exists in hardware via opto-couplers, which take in a dirty signal and infer a clean digital signal from light induced variations. This method is also often used in side-channel analysis of smartcards to crack their secure IDs from cleaned up power signals. Therefore, we have investigated methods to emulate an opto-coupler’s operation with the purpose of generating synthetic PDFs from raw sensor data, and led to a method of using Hidden Markov Models (HMMs) to output synthetic PDFs as follows. A HMM is initially trained on a data set of sensor data (e.g., 1000 samples from an ambient light sensor). Its parameters are stored and then reused to rebuild the HMM later, then a new data set of sensor data is used with it to make a set of state predictions to form a synthetic PDF. More precisely, as the new sensor data is passed through the HMM the probability of being in a certain state changes over time—for example, moving from state 1 to state 4 to state 2—which naturally leads to a count for each state for the sensor data set, and which can be used to build a synthetic PDF. This new digital PDF can then be used in the regular ICMetrics approach to generate an ID. We now discuss in detail the method behind the idea, since we have not found any other works that are related. The work is also covered in [29]. The scientific novelty is using HMMs to improve stability of IDs from synthetic PDFs. 5.1.1. Background The technology and novelty of ICMetrics is that it is designed to be similar to what can be found in real life and in a human that is unique and can be used to generate digital IDs e.g., DNA, a thumb fingerprint or an eye’s iris. Other technologies from the literature that work in a similar way include: Physical Unclonable Functions (PUF) [32], hard-wired digital keys [33], biometrics [34], dynamic encryption and passwords [35]. However, ICMetrics can be considered technologically more of a hybrid approach, given that it exploits data which can be extracted from hardware (or software) as might be available or best suited to the application at hand. This is then used to form PDFs whose parameters are used to form unique IDs. In previous publications, namely [36] and [37], it was explored how viable and if at all possible it was to apply the core technology to generate stable IDs from a processors software execution signature. Here, the main idea was to exploit the program counter (PC) as the data source, given that the PC signature yields distinct PDFs from different programs, in a similar way malware analysis. The main findings of the work are presented in [38]. It can be observed of this approach, that the PDFs formed are purely digital and synthetic, and thus effectively noise free and highly stable. Conversely, any PDF formed from reading sensor values will have a high amount of noisy and changing values simply due to environmental effects. For example, an ambient light sensor’s values require light to be at exactly the same level to get repeatable readings. Therefore, a means to generate a stable synthetic from analogue data is desirable. We have not found anything to this end in the literature, only an approach to go the opposite way from PDFs to synthetic time-series instead in [39].

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

142

5.1.2. Methodology A Hidden Markov model is a stochastic signal model which was first introduced by [40] and based on the following assumptions: 1. 2. 3. 4.

an observation at t was generated by a hidden state; the hidden states are finite and satisfy the first-order Markov property; the matrix of transition probabilities between these states is constant; the observation at time t of an HMM has a certain probability distribution corresponding with one of the possible hidden states.

Although HMMs were developed in the 1960s, a maximization method was not presented until the 1970s in [41] to calibrate the model’s parameters. Since, more than one observation can be generated by a hidden state the authors in [42] introduced a maximum likelihood estimation method to train HMMs with multiple observation sequences, assuming that all the observations are independent. From which two main types of hidden Markov models can be built: discrete HMMs and continuous HMMs. The actual parameters of an HMM are the constant matrix A, the observation probability matrix B and the vector p, as follows: λ ≡ {A, B, p} If we have infinite symbols for each hidden state, the symbol vk will be omitted from the model, and the conditional observation probability bik is: bik = bi (Ot ) = P(Ot |qt = Si ) If the probabilities are continuously distributed, we have a continuous HMM. In this work, we assume that the observation probability is a Gaussian distribution. And, therefore, bi (Ot ) = N (Ot = vk , μi , σi ) , where μi and σi are the mean and variance of the distribution corresponding to the state Si , respectively. Such that the parameters of an HMM are:

λ ≡ {A, μ, σ , p} where μ and σ are vectors of means and variances of the Gaussian distributions. Three main questions can be answered when applying a HMM to solve a real-world problem as follows: 1. Given the observation data O = Ot ,t = 1, 2, ..., T and the model parameters λ = A, B, p, calculate the probability of observations, P(O|λ ). 2. Given the observation data O = Ot ,t = 1, 2, ..., T and the model parameters λ = A, B, p , find the “best fit” state sequence Q = q1 , q2 , ..., qT of the observation sequence. 3. Given the observation sequence O = Ot ,t = 1, 2, ..., T , calibrate HMM’s parameters, λ = A, B, p.

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

143

These problems can be solved by using the main HMM algorithms as below: 1. Find the probability of observations: Forward or backward algorithm. 2. Find the “bet fit” hidden states of observations: Viterbi algorithm. 3. Calibrate parameters for the model: Baum–Welch algorithm. The most important of the HMM’s algorithms is the Baum–Welch algorithm, which calibrates the parameters of a HMM given the observation data. HMMs have been widely used in mathematics to predict economic cycles and for speech/text recognition. However, in this paper we propose a method of using them with the purpose of generating stable PDFs by emulating the operation of opto-couplers. In this method, we start by training a HMM using a raw sensor data set of a fixed length for a given sensor:

O = Ot ,t = 1, 2, ..., T where where Ot is the sensor sample at time t. While the number of HMM states used should correspond to the number of bins of the raw training data set’s PDF. We also assume that the distribution corresponding with each hidden state is a Gaussian distribution. Next given a trained HMM for a sensor we simply predict the PDF given another raw data set from the same sensor. The prediction is executed by simply running the data set through the HMM and recording the predicted state changes. A synthetic PDF can then be constructed from the tally of states. If the correct sensor has been used the resultant PDF should correspond to the trained HMM PDF from which parameters can be extracted to generate an ID. 5.1.3. Evaluation To develop and evaluate the method we have used extensively a Arty FPGA board together with a variety of PMOD sensors to extract data from. For ease of analysis sensor data was recorded straight into a database and file storage under various conditions to exercise the full range of sensor data ranges as per a given sensor’s usage. For example, an accelerometer sensor was moved through each axis and the ambient light sensors data was extracted under varying lighting conditions. This approach was taken to ease back-end analysis, rather than trying to perform it all in real-time and potentially missing points or data anomalies of interest. Once the batch sensor data sets had been collected, training of the HMMs and prediction was performed in the open-source statistical analysis software known as R. Prior to settling on R various other software solutions were investigated for their HMM capabilities (such as Ruby, Python and Matlab), however the availability and open source nature of R’s HMM packages proved the most reliable and easiest to work with. Various HMM solutions are available in R, so the most practical was chosen and discussed here, namely depmixS4 with R version 3.4.4. To use any of code presented the reader will need to first install version 3.4.4 and also load depmixS4 as a R library. For the experiments relevant sensor data was simply read from file as a CSV then used with the following core code listings.

144

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

dataDiffT = as . numeric ( d i f f ( dataT ) ) ; hmmT < depmix ( d a t a D i f f T ˜ 1 , family = gaussian () , nstates = 3 , d a t a = d a t a . frame ( d a t a D i f f = d a t a D i f f ) ) ; hmmfitT < f i t ( hmmT, v e r b o s e = FALSE ) ; p o s t p r o b s T < p o s t e r i o r ( hmmfitT ) ; hist ( post probsT$state ); Algorithm 1. Building a HMM model and histogram in R

d a t a D i f f P = as . numeric ( d i f f ( dataP ) ) ; hmmP < depmix ( d a t a D i f f P ˜ 1 , family = gaussian () , nstates = 3 , d a t a = d a t a . frame ( d a t a D i f f P = d a t a D i f f P ) ) ; hmmP < s e t p a r s (hmmP , g e t p a r s ( hmmfitT ) ) ; hmmfitP < f i t ( hmmP , v e r b o s e = FALSE ) ; p o s t p r o b s P < p o s t e r i o r ( hmmfitP ) ; hist ( post probsP$state ); Algorithm 2. Predicting a HMM model and histogram in R

The first, Algorithm 1, on line 1 takes the difference between all the samples to form a training data set named dataDiffT. From which a Gaussian HMM is built with three states; the state count was varied over differing ranges during the experiments, three is just a placeholder in the code. We found it is not guaranteed that a HMM can be built given the data and number of states, or even that the resultant histogram is actually useful, all input data can map to just one HMM state for instance. To address this, initializing R’s seed setting, for example as “set.seed(1)”, to different values helps mitigate this issue and to choose states that matched the number of bins in the raw data set PDF. Next a HMM is trained and fit to the input training sensor data in line 3 and the posterior probabilities extracted in line 4 into variable post probsT. Lastly, a histogram is constructed by extracting the state tally from this variable with post probsT$state. All the variable names in the code are appended with letter ’T’ to signify that they are training data. Once a HMM had been trained on a sensors raw data set it was just a matter of saving the HMM parameters. They are simply reloaded as required to make a prediction with new sensor data to output a synthetic PDF of the HMM states. The code listing to accomplish this in R is listed in Algorithm 2. The differences of the sensor data to be used for prediction are read stored into dataDiffP on line 1. Then a placeholder HMM is constructed in line 2, whose parameters are updated with the trained HMM in line 3. Next a HMM is fit to the data using the parameters of the trained HMM and the posterior probabilities extracted into the variable post probsP in line 4. A prediction PDF is then built by extracting the states stored in variable post probsP$state. As example of the method, Figure 5 shows a trained HMM PDF for three states for an ambient sensor under room lighting conditions. Using two other data sets from the

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

145

2500

Frequency

2000

1500

1000

500

0 0

1

3

2

5

4

Predicted State

2500

2500

2000

2000

Frequency

Frequency

Figure 5. Ambient sensor HMM trained PDF

1500

1000

500

1500

1000

500

0

0 0

1

2

3

Predicted State

4

5

0

1

2

3

4

5

Predicted State

Figure 6. Two ambient sensor HMM prediction PDFs

same sensor also under room lighting two synthetic PDFs constructed using the trained HMM but making a prediction is shown in Figure 6. For each of the synthetic PDFs there is a clear consistency and self-evident stability in comparison to the trained HMM PDF. 6. Conclusions We have outlined our research and results to-date; and, a proposed system to address the distinct issues related to security and privacy, hence, overcoming the lack of user confidence, which inhibits utilisation of IoT technology. The system integrates three highly novel technology concepts developed by the project partners as discussed. The next step is the integration of these technologies in a demonstrator at the end of the project in a IoT use case scenario.

146

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

References [1] [2] [3] [4] [5]

[6]

[7]

[8]

[9] [10] [11]

[12]

[13] [14] [15]

[16] [17]

[18]

[19]

[20]

[21]

I. J. Vergara-Laurens, L. G. Jaimes, and M. A. Labrador, “Privacy-preserving mechanisms for crowdsensing: Survey and research challenges,” Internet of Things Journal, vol. 4, pp. 855–869, Aug. 2017. I. Wagner and D. Eckoff, “Technical privacy metrics: a systematic survey,” ACM Computing Surveys, vol. 51, Jul. 2018. F. Du Pin Calmon and N. Fawaz, “Privacy against statistical inference,” in 50th Annual Allerton Conference on Communication, Control, and Computing, pp. 1401–1408, 2012. S. Alboaie, L. Alboaie, and A. Panu, “Levels of privacy for ehealth systems in the cloud era,” in 24th International Conference On Information Systems Development (Isd Harbin), 2015. X. Wang, K. Pang, X. Zhou, Y. Zhou, L. Li, and J. Xue, “A visual model-based perceptual image hash for content authentication,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 7, pp. 1336–1349, 2015. S. Eskenazi, P. Gomez-Kr¨amer, and J.-M. Ogier, “When document security brings new challenges to document analysis,” in International Workshop on Computational Forensics (IWCF), ser. Lecture Notes in Computer Science (LNCS), vol. 8915. Springer, 2015, pp. 104–116. M. Schneider and S.-F. Chang, “A robust content based digital signature for image authentication,” in Proceedings of 3rd IEEE International Conference on Image Processing, vol. 3. IEEE, 1996, pp. 227–230. S. Eskenazi, B. Bodin, P. Gomez-Kr¨amer, and J.-M. Ogier, “A perceptual image hashing algorithm for hybrid document security,” in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1. IEEE, 2017, pp. 741–746. R. Venkatesan, S.-M. Koon, M. Jakubowski, and P. Moulin, “Robust image hashing,” in International Conference on Image Processing (ICIP). IEEE, 2000, pp. 664–666. D. Wu, X. Zhou, and X. Niu, “A novel image hash algorithm resistant to print-scan,” Signal Processing, vol. 89, no. 12, pp. 2415–2424, 2009. ˆ V. Ngoc, J. Fabrizio, and T. G´eraud, “Saliency-based detection of identy documents captured by M. O. smartphones,” in 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE, 2018, pp. 387–392. K. Bulatov, V. V. Arlazarov, T. Chernov, O. Slavin, and D. Nikolaev, “Smart idreader: Document recognition in video stream,” in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 6. IEEE, 2017, pp. 39–44. N. Tamani and Y. Ghamri-Doudane, “Towards a user privacy preservation system for iot environments: a habit-based approach,” in IEEE International Conference on Fuzzy Systems, 2016, pp. 2425–2432. D. Hu, X. Hu, W. Jiang, S. Zheng, and Z. qiu Zhao, “Intelligent digital image firewall system for filtering privacy or sensitive images,” Cognitive Systems Research, vol. 53, pp. 85 – 97, 2019. J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential pattern mining using a bitmap representation,” in KDD ’02 Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Jul 2002. [Online]. Available: https://doi.org/10.1145/775047.775109 P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2016. R. Brugger, A. Zramdini, and R. Ingold, “Modeling documents for structure recognition using generalized n-grams,” in Proceedings of the Fourth International Conference on Document Analysis and Recognition, Aug 1997. [Online]. Available: https://doi.org/10.1109/ICDAR.1997.619813 A. Soffer, “Image categorization using texture features,” in Proceedings of the Fourth International Conference on Document Analysis and Recognition, Aug 1997. [Online]. Available: https://doi.org/10.1109/ICDAR.1997.619847 G. Maderlechner, P. Suda, and T. Br¨uckner, “Classification of documents by form and content,” Pattern Recognition Letters, vol. 18, pp. 1225–1231, 1997. [Online]. Available: https://doi.org/10.1016/S01678655(97)00098-6 E. Appiani, F. Cesarini, A. Colla, M. Diligenti, M. Gori, S. Marinai, and G. Soda, “Automatic document classification and indexing in high-volume applications,” International Journal on Document Analysis and Recognition, vol. 4, no. 2, pp. 69–83, Dec 2001. [Online]. Available: https://doi.org/10.1007/PL00010904 F. Cesarini, M. Lastri, S. Marinai, and G. Soda, “Encoding of modified x-y trees for document classification,” in Proceedings of Sixth International Conference on Document Analysis and Recognition, Sept 2001. [Online]. Available: https://doi.org/10.1109/ICDAR.2001.953962

J. Murphy et al. / A Framework for Security and Privacy for the Internet of Things (SPIRIT)

[22]

[23]

[24] [25] [26] [27] [28]

[29] [30] [31]

[32] [33]

[34] [35] [36]

[37] [38]

[39] [40] [41]

[42]

147

P. Sarkar, “Learning image anchor templates for document classification and data extraction,” in 2010 20th International Conference on Pattern Recognition, Aug 2010. [Online]. Available: https://doi.org/10.1109/ICPR.2010.837 K. Frantzi, S. Ananiadou, and H. Mima, “Automatic recognition of multi-word terms:. the c-value/nc-value method,” International Journal on Digital Libraries, vol. 3, no. 2, pp. 115–130, Aug 2000. [Online]. Available: https://doi.org/10.1007/s007999900023 M. Chein and M.-L. Mugnier, Graph-based Knowledge Representation: Computational Foundations of Conceptual Graphs, 1st ed. Springer Publishing Company, Incorporated, 2008. J.-F. Baget, M. Lecl`ere, M.-L. Mugnier, and E. Salvat, “On rules with existential variables: Walking the decidability line,” Artificial Intelligence, vol. 175, no. 9, pp. 1620 – 1654, 2011. D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, and R. Rosati, “Dl-lite: Tractable description logics for ontologies,” in American Association for Artificial Intelligence, 2005, pp. 602–607. W3C. (2004, May) SWRL: A semantic web rule language combining OWL and RuleML. W3C. Last access on Dec. 2008 at: http://www.w3.org/Submission/SWRL/. N. Tamani, S. Ahvar, G. Santos, B. Istasse, I. Prac¸a, P. Brun, Y. Ghamri-Doudane, N. Crespi, and A. B´ecue, “Rule-based model for smart building supervision and management,” in IEEE International Conference on Services Computing, SCC, San Francisco, CA, USA, 2018, pp. 9–16. J. Murphy, G. Howells, and K. D. McDonald-Maier, “A machine learning method for sensor authentication using hidden markov models,” in IEEE EST 2019, July 2019. ——, “On quaternary 1-of-4 id generator circuits,” in IEEE EST 2018, November 2018, pp. 323–326. Y. Kovalchuk, K. D. McDonald-Maier, and G. Howells, “Overview of icmetrics technology-security infrastructure for autonomous and intelligent healthcare system,” in International Journal of u-and eService, Science and Technology, vol. 4, 2011, pp. 49–60. G. E. Suh and S. Devadas, “Physical unclonable functions for device authentication and secret key generation,” in 44th ACM/IEEE Design Automation Conference, 2007, pp. 9–14. H. Handschuh, G. Schrijen, and P. Tuyls, “Hardware intrinsic security from physically unclonable functions,” in Towards Hardware-Intrinsic Security, A.-R. Sadeghi and D. Naccache, Eds., ed: Springer Berlin Heidelberg, 2010, pp. 39–53. A. K. Jain, P. Flynn, and A. Ross, “Handbook of biometrics,” in Springer US, 2008. W. Sheng, G. Howells, M. C. Fairhurst, F. Deravi, and K. Harmer, “Consensus fingerprint matching with genetically optimised approach,” Pattern Recognition, vol. 42, pp. 1399–1407, 2009. Y. Kovalchuk, W. G. J. Howells, H. Hu, D. Gu, and K. D. McDonald-Maier, “A practical proposal for ensuring the provenance of hardware devices and their safe operation,” in 7th IET International Conference on System Safety, incorporating the Cyber Security Conference, 2012, pp. 1–6. ——, “Icmetrics for low resource embedded systems,” in 3rd International Conference on Emerging Security Technologies, 2012, pp. 121–126. Y. Kovalchuk, H. Huosheng, G. Dongbing, K. McDonald-Maier, D. Newman, S. Kelly, and et al., “Investigation of properties of icmetrics features,” in 3rd International Conference on Emerging Security Technologies (EST), 2012, pp. 115–120. M. Sinhuber, E. Bodenschatz, and M. Wilczek, “A probability distribution approach to synthetic turbulence time series,” APS Division of Fluid Dynamic, 2016. L. E. Baum and T. Petrie, “Statistical inference for probabilistic functions of finite state markov chains,” The Annals of Mathematical Statistic, vol. 37, pp. 1554–63, 1966. L. E. Baum, T. Petrie, G. Soules, , and N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic functions of markov chain,” The Annals of Mathematical Statistic, vol. 41, pp. 164–71, 1970. S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “An introduction to the application of the theory of probabilistic functions of markov process to automatic speech recognition,” The Bell System Technical Journal, vol. 62, pp. 1035–74, 1983.

148

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved. doi:10.3233/AISE200010

Direct Sum Masking as a Countermeasure to Side-Channel and Fault Injection Attacks 1 Claude CARLET a,b,2 Sylvain GUILLEY c,d and Sihem MESNAGER b,d a University

of Bergen, PB 7803, 5020 Bergen, Norway. of Mathematics, University of Paris VIII, 93526 Saint-Denis, France and University of Paris XIII, CNRS, LAGA UMR 7539, Sorbonne Paris Cite, 93430 Villetaneuse, France. c Secure-IC S.A.S., 15 rue Claude Chappe, Bât. B, ZAC des Champs Blancs, 35510 Cesson-Sévigné, France. d LTCI, Télécom Paris, Institut Polytechnique de Paris (IPP), 75013 Paris, France.

b Department

Abstract Internet of Things is developing at a very fast rate. In order to ensure security and privacy, end-devices (e.g. smartphones, smart sensors, or any connected smartcards) shall be protected both against cyber attacks (coming down from the network) and against physical attacks (arising from attacker low-level interaction with the device). In this context, proactive protections shall be put in place to mitigate information theft from either side-channel monitoring or active computation/data corruption. Although both countermeasures have been developing fast and have become mature, there has surprisingly been little research to combine both. In this chapter, we tackle this difficult topic and highlight a viable solution. It is shown to be more efficient than mere fault detection by repetition (which is anyway prone to repeated correlated faults). The presented solution leverages the fact that both side-channel protection and fault attack detection are coding techniques. We explain how to both prevent (higher-order) side-channel analyses and detect (higher-order) fault injection attacks. The specificity of this method is that it works “end-to-end”, meaning that the detection can be delayed until the computation is finished. This simplifies considerably the error management logic as there is a single verification throughout the computation. Keywords. Security, privacy, Internet of Things, side-channel analysis, fault injection attacks, countermeasure, high-order, coding theory, direct sum masking (DSM).

1 This

work was supported by the ANR CHIST-ERA project SECODE (Secure Codes to thwart Cyberphysical Attacks) and by National Natural Science Foundation of China (No. 61632020). 2 Corresponding Author: Claude Carlet. University of Bergen, PB 7803, 5020 Bergen, Norway and Department of Mathematics, University of Paris VIII, 93526 Saint-Denis, France. E-mail: [email protected]

C. Carlet et al. / Direct Sum Masking as a Countermeasure

149

1. Introduction Along with the advent of Internet of Things (IoT), sensitive data is basically flowing through network locations which cannot be determined in advance. This means that information can be recovered at any point of the network. Now, the weakest point is the end-device (e.g. the user’s smartphone, smart sensor, or connected smartcard). Indeed, it can easily be procured by an attacker and then be thoroughly studied by him. This may allow for instance template attacks, which are a type of side-channel attacks which need to make experiments on the device priorly to the attack. Side-channel and fault injection analyses are two independent albeit equally dangerous attacks on embedded devices. Such attacks compromise the IoT data privacy and code security. They are, for instance, well documented in some application notes [8] from the Common Criteria ISO/IEC 15408 standard. In this chapter, we leverage on provable masking schemes that can detect errors. We also relax some constraints on previously discussed schemes; namely, we subsume the original article “Orthogonal Direct Sum Masking” (short for ODSM [5]), in that: • The computation is not bound to be in F2 , but can take place in any finite field K. This allows to use K = F2l , where the computation can be carried out on words (of l-bit width) rather than on individual bits. Therefore, algorithmic computation schemes can be leveraged (see Section 2.5). • The information and the masking data must live in codes C and D such that the mapping (x, y) ∈ C × D → x + y is injective, so that it is possible to retrieve the sensitive data coded by x from the masked data x + y, but there is no need that C and D be orthogonal linear codes, which is the case in ODSM; it is even possible to search for them among unrestricted (i.e. linear or nonlinear) codes like Z4 linear codes [14]. This leaves the possibility for more efficient codes, since linear complementary dual (LCD [18,6]) codes constitute only a subset of all possible complementary codes. • The information is considered to embed a certain level of redundancy, allowing for end-to-end fault detection capability. Historically, in ODSM paper, only the masks could be checked for errors, not the information. Now, the check is made at the very end of the algorithm. This redundancy can be at bit or at word level, depending on the expected implementation. Contributions. In this chapter, we review masking schemes which also enable, as an additional feature, to detect faults. They are not so many, and most of the time, fault detection is ad hoc. Our main novel contribution is to disclose a masking scheme with provably end-to-end fault detection, using optimized parameters. Moreover, this masking scheme generalizes most classical masking schemes (in particular, Boolean masking BM and inner product masking IPM). For the sake of illustration, we provide instantiation examples in Verilog Hardware Description Language (HDL). Outline. Existing high-order masking schemes are reviewed in Section 2. The original Direct Sum Masking (DSM) is presented in Section 3, where we waive some constraints of the original ODSM paper. Our original contributions start in Section 4, where we expose our new modus operandi for end-to-end fault detection. Finally, Section 5 presents conclusions and perspectives.

150

C. Carlet et al. / Direct Sum Masking as a Countermeasure

2. State-of-the-art about high-order masking schemes 2.1. Notation Cryptographic algorithms can be seen either at bit or at word level. At bit level, all computations are carried out in the finite field F2 . Such representation is useful for implementation at hardware-level (with parallelism) and at software-level, in the case of bitslice implementations [2]. At the word level, the computations leverage acceleration in software; indeed, processor registers and memories are word-oriented, i.e., they typically manipulate several bits in parallel. The choice of word length (i.e., the bitwidth l) depends on the target processor but also on the target algorithm. Usually, the computation takes place in Fl2 for l = 4 or 8. One further advantage of working with words is that some operations can be better implemented in the finite field K = F2l . The mapping between vector spaces Fl2 and F2l , based on the fact that these sets are two vector spaces of the same dimension l over F2 , is usually irrelevant, but we will precise it whenever necessary. Let n be a strictly positive integer. Then the Cartesian product Kn is endowed by a structure of a vector space. The subsets of Kn are called unrestricted codes. Linear subsets, in that, for all pairs of elements c, c ∈ K, any linear combination αc + β c (for arbitrary α, β ∈ K) also belongs to the subset, are simply called linear codes. They are generated by a basis of k non-zero vectors, whose representation as a k × n matrix of elements from K is called the generator matrix of the code. For both unrestricted and linear codes, the minimum number of nonzero positions in c + c for all c = c , is referred to as the minimum distance, and is customarily denoted by d. For linear codes, it coincides with the minimum Hamming weight of the nonzero codewords. An unrestricted binary code is characterized by its base field K of cardinality 2l , its length n, its number of codewords m, and its minimum distance d. Its parameters are denoted as (n, m, d)2l . A linear code C is characterized by its basefield K of cardinality 2l , its length n, its dimension k = dim(C), and its minimum distance d (also denoted dC in case of ambiguity). Its parameters are denoted as [n, k, d]2l . When the base field is obvious, the index (i.e., 2l ) can be omitted. The dual C⊥ of a linear code C is the code whose all codewords are orthogonal to those of C, according to the usual scalar product c, c = ∑ni=1 ci ci ∈ K, where c, c ∈ Kn , and where n is the length of codes C and C⊥ . We have dim(C⊥ ) + dim(C) = n and the so-called dual distance dC⊥ is defined as dC⊥ . Note that the notion of dual distance extends to unrestricted codes, see [16]. We are interested in a dth-order masking scheme. Traditionally (see for instance [3, 24]), this means that each variable is splitted in (d + 1) shares, or, equivalently, that d random numbers are drawn to mask a sensitive variable. In this article, we consider d as a security parameter, as we will be using redundant shares: the number of shares is not directly linked to the security order. Therefore, assuming that there is no flaw in the scheme, we stick to the understanding that any attack combining d shares (or fewer) is doomed to fail. This expresses the definition of the d-th order probing model. The successful attack of lowest order is thus a (d + 1)th-order attack, as illustrated in [23]. Usually, masking schemes are chosen according to their affinity with the algorithm to protect. For instance, in the case of a block cipher, many operations revolve around the XOR operation; therefore Boolean masking is chosen. In this respect, the linear functions with respect to XOR are simple, since they apply verbatim to each share. The difficulty

C. Carlet et al. / Direct Sum Masking as a Countermeasure

151

lays in the masked evaluation of non-linear functions. In a view to be general, we denote them by (n, m)-functions, that is applications Fn2 → Fm 2 . Such functions are used in block ciphers, under a different name: substitution boxes (or S-box in short). All those are synonymous. 2.2. Problem statement When cryptographic algorithms are run over smart cards and other mobile cryptographic devices, or on light hardware devices (e.g. FPGA, ASIC), side-channel information (through running-time, power consumption, electromagnetic emanation, etc.) is leaked by the algorithm. side-channel attacks (SCA) can take advantage of this additional information and use it for extracting the secret parameters of the algorithm. The classical counter-measure is to mask the sensitive data (which leaks a part of the secret), say x, assumed to be a binary vector (to simplify our presentation): vectors m1 , . . . , mn−1 of the same length as x are drawn at random and the algorithm, instead of handling x, handles the n-tuple (x + ∑ni=2 mi , m2 , . . . , mn ). Fault injection attacks (FIA) can also be performed, extracting the secret key when the algorithm is running over some device, by injecting some fault in the computation, so as to obtain exploitable differences at the output. Featuring both side-channel mitigation and fault detection is mandatory from a “threat model” point of view, but at the same time, it is fairly difficult to combine those protections. Indeed, fault detection consists of replicating (giving redundancy to) information for consistency checking. Now, the way information is copied might induce uncontrolled leakages, which can reduce the security order of the countermeasure. Reciprocally, fault detection assumes some predictable formatting of variables (in terms of minimum distance). Their representation is important for the detection to operate as intended. Now, masking replaces variables by random sharing, thereby jeopardizing the encoding of codewords. For these reasons, the composability of independent side-channel and fault detection countermeasure can be termed non-obvious. Provable countermeasures against passive and active attacks are thus the topic of active research. Still, some research papers have proposed masking schemes amenable to fault injection detection. For instance: • Attempts have been done by sporadic checks on states (leaving computations unprotected) in [5], • Private circuits III [11] also went in this direction, but on a special masking scheme. Advantageously, the detected faults might as well be injected either adversarially (posing then a security problem) or naturally (posing a safety problem—as addressed in ISO 26262 for the automotive field). Therefore, the effort to detect faults kills two birds with one stone. This is for instance required in mission-critical applications, such as automotive. In the rest of this Section 2, we review existing high-order side-channel protection schemes. A scheme is a method to compute a complete algorithm (say AES [20], which is our running example) using the proposed protection. Namely, in Section 2.3, the global look-up table approach is presented. An equivalent concept, where such time tables are recomputed just-in-time, therefore with a smaller footprint, is the topic of Section 2.4.

152

C. Carlet et al. / Direct Sum Masking as a Countermeasure

Last but not least, rewriting of the algorithm under the form of computations in (one or several) field(s), is illustrated in Section 2.5. We intentionally focus on the protection of substitution boxes, as they are non-linear functions with respect to addition in K = F2l , which is hard to protect. 2.3. Computation with global look-up table It is always possible to tabulate the complete masking scheme computational parts. Regarding AES, a didactic explanation is provided in [10]. Of course, the described implementation regards field-programmable gate arrays (FPGAs), but is transposable without difficulties to software code. In this respect, replace: • block random access memories (BRAM) by look-up tables, • logic by Boolean instructions, and • Digital signal processing (DSP) blocks by arithmetic operations. But tables may be really too large. Then, one shall consider evaluating tables of smaller sizes. One such possibility is explained in the following Lemma 1 (Sub-evaluation of substitution boxes). A table S : Fn2 → Fm 2 can be evaluated m . The definition of tables S and → F as two evaluations of smaller tables S0 , S1 : Fn−1 0 2 2 S1 is as follows:  ∀x = (x1 , . . . , xn−1 ) ∈

Fn−1 2 ,

S0 (x) = S(x||0) = S(x1 , . . . xn−1 , 0), S1 (x) = S(x||1) = S(x1 , . . . xn−1 , 1).

The reconstruction of the original S-box S is as follows: S(x) = (xn + 1) · S0 (x1 , . . . , xn−1 ) + xn · S1 (x1 , . . . , xn−1 ),

(1)

where “·” is scalar multiplication. The decomposition presented in Lemma 1 can be applied to masked representations, namely x can be the concatenation of several shares (if the shares are made of several bits, we shall have to apply it iteratively). Ultimately, this strategy allows to decompose all the table look-ups. Other divide-and-conquer strategies are possible, for instance, when n is even, by directly replacing n by n/2 in Lemma 1. This strategy has been applied in the past to optimize generalizations of private circuits (Ishai, Sahai, Wagner [13]) from F256 (Rivain, Prouff [24]) to F16 (Kim, Hong, Lim [15]). 2.4. Computation with recomputed look-up tables The size of precomputed tables can be prohibitive, despite memory-time possible tradeoffs, allowed for instance by Lemma 1. For this reason, Coron prepared a new scheme which consumes only about l2l n bits of memory (with n tables containing the 2l values at the l-bit vectors). But actually, recomputation can be leveraged to maintain a target security order d whilst limiting this memory size. Recomputation requires about n2 clock cycles. As underlined by Coron [9], this scheme is only efficient on random S-boxes, otherwise, algebraic computations (see next subsection 2.5) perform better.

C. Carlet et al. / Direct Sum Masking as a Countermeasure

153

2.5. Algebraic computation In some situations (more in the framework of smart cards), the whole algorithm must be rewritten (in a polynomial form) so that all the sensitive data is in the masked version. This needs to change each transformation F(x) in the original algorithm into a transformation F  (z) in the masked algorithm, taking as input the masked version z of x and giving as output a masked version of F(x). We shall say that F  is a masked version of F and speak of masked computation when dealing with the transformed algorithm. For cryptographic algorithms which are efficiently described as operations in an extension K = F2l of F, it has been proven beneficial to perform computations in F2l rather than in Fl2 (which has less structure). Any computation in a finite field can be represented as the evaluation of a polynomial (which can be obtained by Lagrange interpolation). In usual cases, the algorithm control flow is independent of the inputs. Therefore, this Lagrange interpolation polynomial is static (same whatever the inputs) and can be evaluated using basic field addition and multiplication operations. Horner’s method can be leveraged for efficient polynomial evaluation. In practice in symmetric cryptography, most operations are explicitly carried out in some field of characteristic two, such as: • K = F24 for PRESENT [4] (l = 4) lightweight block cipher, • K = F28 for AES [20] (l = 8) standard block cipher, etc. Therefore, a cryptographic algorithm can be decomposed into computations in some field F2l . For example, • in F16 represented as F2 [x]/x4 + x + 1 , we denote3 by [0x0, 0x1, 0x2, 0x3, 0x4, . . ., 0xf] the 16 elements [0, 1, x, x + 1, x2 , . . ., x3 + x2 + x + 1]. Using magma [27], we get for the S-box of PRESENT the expression: S-box(a) = 0xc + 0x7 · a2 + 0x7 · a3 + 0xe · a4 + 0xa · a5 + 0xc · a6 + 0x4 · a7 + 0x7 · a8 + 0x9 · a9 + 0x9 · a10 + 0xe · a11 + 0xc · a12 + 0xd · a13 + 0xd · a14 . • in F256 represented as F2 [x]/x8 + x4 + x3 + x + 1 , we denote by [0x00, 0x01, 0x02, 0x03, 0x04, . . ., 0xff] the 16 elements [0, 1, x, x + 1, x2 , . . ., x7 + x6 + x5 + x4 + x3 + x2 + x + 1]. From the documentation on AES [19, Sec. 8.5, page 38/45], we get for SubBytes, the AES S-box, the following expression: S-box(a) = 0x63 + 0x8f · a127 + 0xb5 · a191 + 0x01 · a223 + 0xf4 · a239 + 0x25 · a247 + 0xf9 · a251 + 0x09 · a253 + 0x05 · a254 .

(2)

Taking into account that, by Fermat’s little theorem, ∀a ∈ F256 \{0}, a−1 = a254 , Eqn. (2) rewrites: S-box(a) = 0x63 + 0x05 · a−1 + 0x09 · a−2 + 0xf9 · a−4 + 0x25 · a−8 + 0xf4 · a−16 + 0x01 · a−32 + 0xb5 · a−64 + 0x8f · a−128 . 3 In

hexadecimal notation; the prefix 0x is used in C and related languages.

C. Carlet et al. / Direct Sum Masking as a Countermeasure

154

Table 1. Summary about use of three computation paradigms, in conjunction with masking K

Global Look-up (GLUT) (Sec. 2.3)

F2 F2l

Source: [22]

Table

Table (htable)

recomputation

(Sec. 2.4) ?

*

Source: Coron [9]

Algebraic (Lagrange)

computation

(Sec. 2.5) Source: ISW [13] Source: RP [24]

In this respect, many works have been optimizing masked computation. However, the algebraic manipulation (that is addition and/or multiplication) of masked data is today only known in the case each sensitive data is masked by one or more shares. This means that the masked variables have a bit-width n which is a multiple of l, in which cases we can exploit: • in F2 the Ishai-Sahai-Wagner [13] (ISW); • in F2l the Rivain-Prouff generalization [24] of ISW. 2.6. Summary about the three computation paradigms The usage of computation paradigms in the context of masking is recalled in Tab. 1. As we shall further detail, the masking schemes can come in several options. For instance, each sensitive variable might be masked independently (as in classical masking). This means that the information is a single scalar X (of dimensionality k = 1) whereas the masked variable consists of a vector Z (of length n > 1). The masking schemes which adhere to this masking principle are underlined in gray cell . Some masking schemes are more generic, in that the information can be a vector of symbols X ∈ Kk (for some k, 1 ≤ k ≤ n), while the masked representation is another vector Z ∈ Kn , for n ≥ k. Such schemes which do not feature limitation are represented in white cells. Let us comment on the contents of Tab. 1, on a per-column basis: • Global look-up table (GLUT): the operation to be conducted on masked material Z is performed through a statically precomputed table (computed once for all, and stored in read-only memory upon software delivery first-time load). Such table takes as input the whole word Z. In terms of security analysis, this has led some to confusion, since in the probing model, it could be (erroneously) assumed that the “item to be probed” is Z in its entirety, whereas the words to consider are actually individual bits (= elements of the base field K = F2 ). Words are vectors of l bit strings, and a word of n bit long is a word array (which cannot be probed at once: in the word-level leakage probing model, n probes are needed). Regarding applicability of GLUT to K = F2l , there is a direct translation (hence the * symbol in Tab. 1), by subfield representation of F2l elements into Fl2 (linear transformations allow to go from one representation to the other). Therefore, new information size on F2 is k = kl and new masking length on F2 is n = nl. • Table recomputation: in this paradigm, one element X ∈ K is applied a table (such as a cryptographic S-box) under its shared form Z ∈ Kn . The algorithm consists in on-the-fly computation of tables adapted to Z, for a specific masking scheme (only X = ∑ni=1 Zi paradigm is supported). It would not seem unreason-

C. Carlet et al. / Direct Sum Masking as a Countermeasure

155

able to extend this computation paradigm to K = F2 , using information X ∈ Kk , but this has never been studied yet (hence the ? symbol in Tab. 1). • Algebraic computation: all transformations in the algorithm being expressed under a polynomial form, which reduces then the problem of masking them to addressing addition and multiplication, secure addition and multiplication have been studied, in [13] at bit-level and in [24] at word-level. Nonetheless, these schemes only work for scalar information X ∈ K, which we extend to vectorial information X ∈ Kk in this article (refer to Section 4.2). Therefore, all three computation paradigms have merits and drawbacks, in terms of implementation size, evaluation time, etc. But all three strategies described in Section 2.3, 2.4 and 2.5 lack fault detection capability.

3. Direct Sum Masking 3.1. Introduction on Direct Sum Masking This section presents a masking scheme called Direct Sum Masking (DSM), which is based on two complementary codes. We generalize in this section the original concept, presented in [5]. It consists of encoding some information X ∈ Kk such that it is random ∈ Kn−k . The mixture occurs thanks to a direct sum in Kn , between the ized by a mask M two codes generated by G (matrix of size k × n) and H (matrix of size (n − k) × n). The   to Z are protected information writes Z = XG + MH. The transformations from (X, M) described in Fig. 1. Remark 1 (On notations G and H). In this section, we explore different “calibrations” of DSM. Therefore, for each one, a new definition of matrices G and H is provided.   The only requirement is that the codes they spawn are complementary, i.e., that G H is an invertible n × n matrix in K. The basic result on DSM is that the side-channel protection is at order d in the probing model, where d is the dual distance of the code generated by H, minus the ⊥ − 1 = dspan(H⊥ ) − 1. A security order of d means number one [5,21]. That is, d = dspan(H) that all attacks of order d or less than d do fail. When the base field is K = F2l , with l > 1, ⊥ can be considered on K or on F2 , in which case the security the dual distance dspan(H) order is considered respectively at word- or an bit-level. As the dual distance increases after sub-field representation, the security level is not smaller at bit-level compared to word-level. A successful attack must combine at least (d + 1) words (or bits), depending on whether d is the security order at a word or at a bit level. Two examples of DSM codes are given hereafter. Example 1 (DSM generalizes classical masking [3]). In classical masking, we have that:

156

C. Carlet et al. / Direct Sum Masking as a Countermeasure

  G = 1 0 0 . . . 0 ∈ K1×n ⎛ ⎞ 1 1 0 ... 0 ⎜1 0 1 . . . 0⎟ ⎜ ⎟ H =⎜. . . . ⎟ ∈ K(n−1)×n . ⎝ .. .. .. . . 0⎠ 1 0 0 ... 1 Example 2 (DSM generalizes Inner Product Masking [1]). In inner product masking, we have that:   G = L1 0 0 . . . 0 ∈ K1×n ⎛ ⎞ L2 1 0 . . . 0 ⎜L3 0 1 . . . 0⎟ ⎜ ⎟ H =⎜ . . . . ⎟ ∈ K(n−1)×n . . . . . ⎝ . . . . 0⎠ Ln 0 0 . . . 1 This scheme is distinct from classical masking scheme, in that binary elements are replaced by elements Li ∈ F2l (1 ≤ i ≤ n). Let us underline that in IPM masking, coefficient L1 = 0 can also, without loss of generality, be chosen equal to 1. Indeed both Z ∈ Kn and Z/L1 ∈ Kn do carry the same information. Note that neither classical masking (example 1) nor IPM (example 2) are orthogonal direct sum masking schemes. 3.2. Generalization from k = 1 to 1 ≤ k ≤ n, and from F2 to F2l When the code generated by matrix G has dimension k > 1 on K and k does not divide n, then only computation algorithms are actually those based on table look-ups, namely “global look-up tables” (Section 2.3) or “table recomputation” (Section 2.4). Indeed, today’s algebraic evaluation (Sec. 2.5) requires k = 1, i.e., one data is masked by (n − 1) masks. Such computation is meaningful only on bits (classical ISW). On words, multiplication algorithm such as the algebraic evaluation (Section 2.5) applies, but only provided k = 1. 3.3. DSM vs ODSM In the original ODSM paper, not only the codes were provided on F2 (and not on K = Fl2 ), but also they required the complementary codes C = span(G) and D = span(H) (with C ⊕ D = Kn ) to be orthogonal. Such configuration is convenient and holds the name of C and D being two linear complementary codes (LCD). This allows simplifying the write −1  −1 up of some equations, such as ΠC (Z) = ZGT GGT G or ΠD (Z) = ZHT HHT H. ⊥ T Security-wise, it is not requested that C = D (i.e., we do not require GH = 0). Relaxing such constraint allows for more efficient parameters selection. Let us give a couple of examples.

C. Carlet et al. / Direct Sum Masking as a Countermeasure

157

Example 3 (Best linear code). Let a linear code over F2 of length n = 8 and dimension k = 4. Such code being linear, its generating matrix G can be written in systematic form, i.e., as ⎛ ⎞ ⎛ ⎞ 1 0 0 0 1 0 0 0 · · · · ⎜ 0 1 0 0 · ⎜ ⎟ · · · ⎟ ⎜ ⎟=⎜ 0 1 0 0 ⎟ ⎝ 0 0 1 0 · ⎠ · · · ⎠ ⎝ 0 0 1 0 0 0 0 1 0 0 0 1 · · · ·

P

where P ∈ F4×4 2 . If one line of P has four 1’s, then: • if a second line also has four 1, the Hamming distance between the two codewords will be 2; • if a second line has strictly less than four 1, then the codeword on that line has Hamming distance at most 4. Therefore a minimum distance of 5 is not possible We show that the minimum weight of all lines can actually reach 4. This implies that all lines of P have Hamming weight 3. It can be checked that the only 0 on the lines of P cannot be at the same position. Thus, up to equivalence of codes by coordinates swapping, we have that ⎛ ⎞ 0 1 1 1 ⎜ 1 0 1 1 ⎟ ⎟ P=⎜ ⎝ 1 1 0 1 ⎠. 1 1 1 0 However, this (unique) binary code of parameters [8, 4, 4]2 is self-dual. Therefore, it is not LCD. It can be deduced that all suitable codes for ODSM with length n = 8 and k = 4 have at most minimum distance d = 3. At the opposite, it is possible to use two complementary codes generated by G and H, such that: ⎞ ⎛ ⎞ ⎛ 10000111 10000000 ⎜0 1 0 0 0 0 0 0⎟ ⎜0 1 0 0 1 0 1 1⎟ ⎟ ⎜ ⎟ G=⎜ ⎝0 0 1 0 0 0 0 0⎠ and H = ⎝0 0 1 0 1 1 0 1⎠ . 00010000 00011110 Example 4 (Best unrestricted code). It is known that the best binary code of length n = 16 and with m = 256 codewords is the Nordstrom-Robinson code, unique code of parameters (16, 256, 6)2 [25]. This code is unique and self-dual. Therefore, it is not LCD. But as in the previous example, a non-orthogonal complementary code can be found, for instance span(I8 , 08 ), where I8 (resp. 08 ) is the identity (resp. null) matrix of size 8 × 8. 4. Fault detection with DSM 4.1. Venues for fault detection in DSM representation The concept of fault detection during the execution of a cryptographic algorithm is illustrated in Fig. 2. We describe hereafter this figure. Without loss of generality, we consider

C. Carlet et al. / Direct Sum Masking as a Countermeasure

158

spacevector C  X  M

EC DC ED DD

 XG

 = XG  +M H Z ΠD

(aka DSM)

 →Z  → Z   → . . . in Figure 2) (see Z

ΠC +

H M

direct sum “C ⊕ D”

Global lookup table Table recomputation Algebraic computation

spacevector D Figure 1. Commutative diagram of DSM masking scheme (valid for both K = F2 and K = F2l )

the example of a block cipher (such as AES-128). All input variables, collectively referred to as “X” (which gathers plaintext, key, etc.), is masked by some random variables  Classical block cipher design is based on product ciphers (or iterative ciphers) patM. tern. The state is updated several times in round logic, which is most of the time, the same (or almost the same) operation iterated several times. Notice that both message and key are updated concomitantly in usual encryption schemes. Sometimes, the first and the last rounds are specialized versions of the inner rounds. The convention used in the figure 2 we describe is that we denote by X the inputs and by X  the outputs. The outputs can thus be the ciphertext along with the last round key (in case of encryption) or the master key (in case of decryption). The round logic is referred to as combinational logic (denoted as “combi”), because, when implemented in hardware, this part would consist in stateless functions. The intermediate states are denoted as Z, Z  , Z  , etc. The number of primes (i.e., the depth of quoting “  ” symbols) indicates the depth of round logic inside of the algorithm. In both schemes (a) & (b) presented in Fig. 2, the protected representation  is referred to as a combination of type Z = XG + MH. Precisely, the conversions are recalled in the diagram represented in Fig. 1. We highlight two computation schemes enabling fault detection: • state-of-the-art direct sum masking, in Fig. 2(a), and • our new end-to-end masking scheme with end-to-end fault detection capability, as illustrated in Fig. 2(b). Both schemes feature exceptional alarms, which are checkpoints raising computation abort in case of verified invariant violation. In a view to highlight the difference between the two schemes, the background of unprotected data/operation is shown in gray, as follows unprotected . Therefore our new method is definitely useful for legacy reasons. In the direct sum masking, the idea is to check whether the refreshed values of the masks have been changed or not once stored in the state register Z (or Z  , Z  , etc.) This  This is requires to project (securely) the masked state Z onto corresponding mask M. always possible since information and masks are encoded in direct sum. But combinational logic is not protected. Indeed, predicting its parity is hard in practice. Hence several unprotected spots (actually all combinational logic inside of the algorithm). Clearly, inputs and outputs are not checked, but this is of little practical importance, as attacks aiming at extracting the key shall target sensitive values, that is values which depend on the key and of either the plaintext and/or the ciphertext.

C. Carlet et al. / Direct Sum Masking as a Countermeasure

(a) Fault detection in ODSM  M

 X

= ΠD

No

 Z

combi ΠD ?

= ΠD

No

ΠD ?

= ΠD

 Z

first round logic second state

alarm

combi   Z

initial state

alarm

combi  Z

 M

 . . . , X)G  H (X, +M ?

 Z

(b) This paper, end-to-end detection  X

 +M H XG

No alarm

combi second round logic etc. 1 . . .   X Z ΠC

ΠC  X

159

k X

?  ? 1 = ... = X X k

 X

 = X 1 return X

No alarm

Figure 2. Comparison between “check the masks” ((a), [5]) and “encode-information” ((b), this paper) fault detection side-channel protected schemes

 from Z). When recovering the mask M  from Remark 2 (On the security of recovery M     a sensitive variable Z = XG + MH, one shall not leak information on X. A priori, this is not trivial. But let us explain that there is, in general, no risk, provided the LCD codes spawn by G and H are built properly. The masking code, i.e. that generated from H, can be chosen systematic, and written as H = L In−k , where L is an (n − k) × k matrix. In original DSM, only the property of H was relevant for the side-channel security. Thus, G can be constructed arbitrarily provided it complements of H to build the   the codewords n universe code K . Therefore, the simple choice G = Ik 0k×(n−k) fills this need. Indeed, I 0 G whatever L, = k k×(n−k) is an invertible matrix. H L In−k

4.2. Computation with end-to-end DSM fault detection For the scheme depicted in Fig. 2(b) to work, the information X ∈ K shall be represented   ∈ Kn−k is the masking material for k-times replias Z = (X, . . . , X)G + MH, where M  cated information X = (X, . . . , X) ∈ Kk . It can be noticed that all unfolds as of each co-

160

C. Carlet et al. / Direct Sum Masking as a Countermeasure

 Therefore, evaluation is conordinate of (X, . . . , X)4 is masked with the same mask M. ducted as follows: • Computation is carried out independently on each copy X of X = (X, . . . , X) ∈ Kk , • Mask homogenization between the k computations is carried out: this step consists in making sure that each k shared values (in (n − k + 1) shares) are using the same (n − k) masks. It is, therefore, possible to rebuild a consistent word  of format (X, . . . , X)G + MH, as underlined in Alg. 1. This algorithm takes two    vectors T , T , and assumes an IPM representation (recall Example 2). We call L = (L1 = 1, L2 , . . . , Ln ) where n is the length of vectors T and T  . The security proof of Alg. 1 is provided in [7].

Algorithm 1: Homogenization of two sharings (in the case of IPM) input : T = (T1 , . . . , Tn ) and T  = (T1 , . . . , Tn ) output: T˜  , a new sharing with those properties: • it is equivalent to T  , meaning that L, T  = L, T˜  , and • it has the same masks as T , meaning that T˜i = Ti for 2 ≤ i ≤ n. 1 2 3 4 5

T˜  ← T  for i ∈ {2, . . . , n} do ε ← Ti + Ti T˜  ← T˜  + (Li ε, 0, . . . , 0, ε, 0, . . . , 0) where the value ε lays at position i return T˜ 

End-to-end computation is better achieved as per Fig. 2(b). Fault detection in the whole algorithm is deferred until the very end of the computation, which considerably simplifies the management of the alarm signal, whilst leaving no unprotected hole in the algorithm to protect. 4.3. Examples in F2 4.3.1. Bitwise multiplication without error detection Let us recall the original ISW scheme at hardware level, i.e., where we have l = 1 and n = 2 (whilst k = 1, necessarily). The masked data representation is that of DSM with:   G= 10

and

  H= 11 .

  Though it is trivial in this case, one can verify that H⊥ = H = 1 1 , which generates the binary repetition code of parameters [2, 1, 2]2 . Therefore, the scheme is protected at 4 If the initial clear material is already vectorial, such as 16 bytes of plaintext + 16 bytes of key, such as in AES-128, then the input information is already denoted as a vector X ∈ K32 , where K = F256 . Hence the −−−−−−→ redundant information is (X, . . . , X) ∈ Kk , where 32|k. This quantity can be noted as (X, . . . , X) ∈ K32k , but for the sake of formula readability, we prefer to stick to one single stage of arrows when denoting vectors.

C. Carlet et al. / Direct Sum Masking as a Countermeasure

161

1st-order (recall characterization enunciated in Section 3.1). The product of two shares (a1 , a2 ) of a ∈ F2 and of b ∈ F2 , also represented shared, as (b1 , b2 ), can be computed, according to [3]. In order to simplify the notations, we denote the multiplicand as Z = (a1 , a2 , a3 ), the multiplier as (b1 , b2 , b3 ), and the multiplication result as (c1 , c2 , c3 ). Without fault protection, we have, for instance: • c1 = a1 b1 + a2 b2 + r1 , and • c2 = a1 b2 + a2 b1 + r1 , where r1 an additional random mask which is needed while computing the masked product c. It is straightforward to check that the shared computation of c = c1 + c2 is correct, since indeed: c1 + c2 = ab ,

where a = a1 + a2 and b = b1 + b2 .

The number of gates is eight (four AND and four XOR). 4.3.2. Bitwise multiplication with detection of a single error Let us now consider one single data redundancy, meaning that n = 3 and k = 2. The first step is to expand the data representation, whilst keeping a first-order side-channel security. The masked representation now becomes:



100 G= 010

  and H = 1 1 1 .

(3)

One can see that 101 , H = 011 ⊥

which indeed spawns a linear code of minimum distance two. Therefore, the representation Z = (X1 , X2 )G + MH = (X1 + M, X2 + M, M)

(4)

is still protected at first order against side-channel attacks. According to the mode of operation for computation presented in Section 4.2, the multiplication shall be carried out on each coordinate of X = (X1 , X2 ), as sketched in Fig. 3. Now, in order to get to know how to proceed with multiplication, the recipe is to compute using (X1 , M) on the one hand, and (X2 , M) on the other. The representation, in either case, is the same, hence this yields on the one hand: • c11 = a1 b1 + a3 b3 + r1 , • c12 = a1 b3 + a3 b1 + r1 , and on the other hand: • c21 = a2 b2 + a3 b3 + r2 ,

C. Carlet et al. / Direct Sum Masking as a Countermeasure

162

k = 2, n = 3

k = 2, n = 3

X 1 X2 M 1 0 0 G= 0 1 0 H= 1 1 1

n rki wo wo rki

go

ng

n

on

X1

X2

k = 1, n = 2

X1X2 M 1 0 0 G= 0 1 0 H= 1 1 1

X1 M collapses G= 1 0 H= 1 1

X1X2 M 1 0 0 G= 0 1 0 H= 1 1 1

X2 M collapses G= 1 0 H= 1 1

Figure 3. Decomposition of the computation with k = 2 information bits X1 and X2 in two computations

• c22 = a2 b3 + a3 b2 + r2 , where r1 and r2 are two independent random bits required for the multiplication to be secure. The homogenization consists now in merging the pair (c11 , c12 ) and (c21 , c22 ) into the single representation of Eqn. (4). In our case, we apply Alg. 1 on masks c12 and c22 : • c12 is the pivot, and c22 is turned into c22 = c12 = c22 + ε, where ε = c12 + c22 ; • therefore information c21 becomes in turn c21 = c21 + ε = c21 + c12 + c22 = (a2 b2 + a3 b3 + r2 ) + (a1 b3 + a3 b1 + r1 ) + (a2 b3 + a3 b2 + r2 ). Finally, we get for the multiplication in representation (4) the following formula: • c1 = a1 b1 + a3 b3 + r1 , • c2 = (a2 b2 + a3 b3 + r2 ) + (a1 b3 + a3 b1 + r1 ) + (a2 b3 + a3 b2 + r2 ), • c3 = a1 b3 + a3 b1 + r1 . Now, these equations can be optimized, as the term (a1 b3 + a3 b1 + r1 ) is used both in c2 and in c3 , and since it can play the role of the refresh bit r2 involved in ISW algorithm for the multiplication of the second coordinate: • c1 = a1 b1 + a3 b3 + r1 , • c2 = (a2 b2 + a3 b3 + c3 ) + (a2 b3 + a3 b2 ), • c3 = a1 b3 + a3 b1 + r1 . Eventually, the product a3 b3 is needed both in c1 and c2 thus can be shared. This results in 7 AND and 8 XOR, which is strictly less than the double size of the example of Section 4.3.1. This shows the gain of leveraging codes to detect faults. In terms of circuits, the version of bit multiplication without redundancy is represented in Fig. 4, along with its Verilog [12] netlist. The version with one bit of error detection is represented in Fig. 5, also along with its netlist.

C. Carlet et al. / Direct Sum Masking as a Countermeasure

163

// Original ISW multiplication module mult_n2_k1 ( c , a , b , r ); parameter n = 2; input [1: n ] a , b ; // Data to multiply input [1:1] r ; // Random for internal refresh output [1: n ] c ; // Multiplication result assign c [1] = a [1] & b [1] ^ a [2] & b [2] ^ r [1]; assign c [2] = a [1] & b [2] ^ a [2] & b [1] ^ r [1]; endmodule Listing 1 Verilog code for the multiplication in F2 , protected against masking at first order with no fault detection capability (refer to Section 4.3.1)

Figure 4. Structure of ISW multiplication without error detection capability (k = 1, n = 2), obtained from logical synthesis with Cadence genus EDA tool of Listing 1 with optimizations disabled

Please be careful that those netlists can feature glitches [17], which potentially reduce the security order of the countermeasure. Those netlists must be carefully analyzed R by a tool such as Virtualyzr [26].

5. Conclusions and perspectives In addition to their functionality, Internet of Things devices are expected to enforce security and privacy functions. However, some attacks aim at breaking those assets on data/code. Therefore, masking schemes able to detect faults are of great practical importance. We have revised the state-of-the-art, which focuses particularly on ODSM. We underlined that original ODSM presents some shortcoming owing to non-optimal parameter selection (rigid base field = F2 ) and pair of codes which have to be orthogonal (which is an unnecessary convenience). But most importantly, we recall that ODSM leaves holes regarding fault detection, as only states are verified but not the computations occurring between each state snapshot. Indeed, ODSM is designed to verify that masks remain

C. Carlet et al. / Direct Sum Masking as a Countermeasure

164

// ISW multiplication with redundancy // ( two interwoven ISW multiplications ) module mult_n3_k2 ( c , a , b , r ); parameter n = 3; input [1: n ] a , b ; // Data to multiply input [1:1] r ; // Random for internal refresh output [1: n ] c ; // Multiplication result assign assign assign assign assign assign assign

a1b1 a2b2 a3b3 a1b3 a3b1 a2b3 a3b2

= = = = = = =

a [1] a [2] a [3] a [1] a [3] a [2] a [3]

& & & & & & &

b [1]; b [2]; b [3]; b [3]; b [1]; b [3]; b [2];

assign c [1] = a1b1 ^ a3b3 ^ r [1]; assign c [2] = ( a2b2 ^ a3b3 ^ c [3]) ^ ( a2b3 ^ a3b2 ); assign c [3] = a1b3 ^ a3b1 ^ r [1]; endmodule Listing 2 Verilog code for the multiplication in F2 , protected against masking at first order and with single fault detection capability (refer to Section 4.3.2)

Figure 5. Structure of ISW multiplication without error detection capability (k = 2, n = 3), obtained from logical synthesis with Cadence genus EDA tool of Listing 2 with optimizations disabled

unaltered; unfortunately, masks are often refreshed, thereby cutting the computational integrity verification chain. In this chapter, we contribute a method based on DSM to detect faults end-to-end, that is from plaintext to ciphertext, including both sequential and combinational logic. This method works by injecting in the computation not only the information plain, but actually some redundant information. Therefore, verification can be carried out on the

C. Carlet et al. / Direct Sum Masking as a Countermeasure

165

state (only when it is used as public information, e.g., as ciphertext in the case of block ciphers). We show that all currently known computation paradigms (i.e., global look-up table, table recomputation and algebraic computation) still apply and we detail them. As a perspective, it should be made more clear what is the quantitative gain of having redundancy within the masking scheme for several parameters k and n. Besides, it is desirable to provide a better error detection scheme than simply the k-fold repetition code X → X = (X, . . . , X) ∈ Kk = Fk2l . Indeed, proper encoding of X with a parity matrix could allow to detect more faults at a given code rate k/n.

Acknowledgments R solutions, The methods presented in this paper are implemented in Secure-IC Securyzr R  and can be verified by Secure-IC Virtualyzr EDA (Electronic Design Automation) tool. The authors are also grateful to Prof. Thierry Berger for providing oral evidence (while in a stay at Dakar, Senegal, on January 2019) for the proof associated with Example 3.

References [1]

[2] [3]

[4]

[5]

[6] [7]

[8]

[9]

[10]

Josep Balasch, Sebastian Faust, and Benedikt Gierlichs. Inner product masking revisited. In Elisabeth Oswald and Marc Fischlin, editors, Advances in Cryptology - EUROCRYPT 2015 - 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Sofia, Bulgaria, April 26-30, 2015, Proceedings, Part I, volume 9056 of Lecture Notes in Computer Science, pages 486–510. Springer, 2015. Eli Biham. A Fast New DES Implementation in Software. In Eli Biham, editor, FSE, volume 1267 of Lecture Notes in Computer Science, pages 260–272. Springer, 1997. Johannes Blömer, Jorge Guajardo, and Volker Krummel. Provably Secure Masking of AES. In Helena Handschuh and M. Anwar Hasan, editors, Selected Areas in Cryptography, volume 3357 of Lecture Notes in Computer Science, pages 69–83. Springer, 2004. Andrey Bogdanov, Lars R. Knudsen, Gregor Leander, Christof Paar, Axel Poschmann, Matthew J. B. Robshaw, Yannick Seurin, and Charlotte Vikkelsoe. PRESENT: An Ultra-Lightweight Block Cipher. In CHES, volume 4727 of LNCS, pages 450–466. Springer, September 10-13 2007. Vienna, Austria. Julien Bringer, Claude Carlet, Hervé Chabanne, Sylvain Guilley, and Houssem Maghrebi. Orthogonal Direct Sum Masking - A Smartcard Friendly Computation Paradigm in a Code, with Builtin Protection against Side-Channel and Fault Attacks. In David Naccache and Damien Sauveron, editors, Information Security Theory and Practice. Securing the Internet of Things - 8th IFIP WG 11.2 International Workshop, WISTP 2014, Heraklion, Crete, Greece, June 30 - July 2, 2014. Proceedings, volume 8501 of Lecture Notes in Computer Science, pages 40–56. Springer, 2014. Claude Carlet and Sylvain Guilley. Complementary dual codes for counter-measures to side-channel attacks. Adv. in Math. of Comm., 10(1):131–150, February 2016. Wei Cheng, Claude Carlet, Kouassi Goli, Jean-Luc Danger, and Sylvain Guilley. Detecting Faults in Inner Product Masking Scheme — IPM-FD: IPM with Fault Detection, August 24 2019. 8th International Workshop on Security Proofs for Embedded Systems (PROOFS). Atlanta, GA, USA. Common Criteria Management Board. Common Methodology for Information Technology Security Evaluation Evaluation methodology, Version 3.1, Revision 4, CCMB-2012-09-004, September 2012. https://www.commoncriteriaportal.org/files/ccfiles/CEMV3.1R4.pdf. Jean-Sébastien Coron. Higher Order Masking of Look-Up Tables. In Phong Q. Nguyen and Elisabeth Oswald, editors, EUROCRYPT, volume 8441 of Lecture Notes in Computer Science, pages 441–458. Springer, 2014. Saar Drimer, Tim Güneysu, and Christof Paar. DSPs, BRAMs, and a Pinch of Logic: Extended Recipes for AES on FPGAs. TRETS, 3(1), 2010.

166 [11]

[12] [13]

[14] [15]

[16]

[17]

[18] [19] [20] [21]

[22]

[23] [24]

[25] [26]

[27]

C. Carlet et al. / Direct Sum Masking as a Countermeasure

Stefan Dziembowski, Sebastian Faust, and François-Xavier Standaert. Private circuits III: hardware trojan-resilience via testing amplification. In Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi, editors, Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, pages 142–153. ACM, 2016. Institute of Electrical and Electronics Engineers (http://www.ieee.org/). IEEE Standard Verilog Description Language, Std 1364-2001, September 28 2001. ISBN: 0-7381-2826-0. Yuval Ishai, Amit Sahai, and David Wagner. Private Circuits: Securing Hardware against Probing Attacks. In CRYPTO, volume 2729 of Lecture Notes in Computer Science, pages 463–481. Springer, August 17–21 2003. Santa Barbara, California, USA. A. R. Hammons Jr., P. V. Kumar, A. R. Calderbank, N. J. A. Sloane, and P. Solé. The Z4 -linearity of Kerdock, Preparata, Goethals and related codes, 1994. IEEE Transactions on Information Theory 40. HeeSeok Kim, Seokhie Hong, and Jongin Lim. A Fast and Provably Secure Higher-Order Masking of AES S-Box. In Bart Preneel and Tsuyoshi Takagi, editors, CHES, volume 6917 of LNCS, pages 95–107. Springer, 2011. Florence Jessie MacWilliams and N. J. A. Neil James Alexander Sloane. The theory of error correcting codes. North-Holland mathematical library. North-Holland Pub. Co. New York, Amsterdam, New York, 1977. Includes index. Stefan Mangard and Kai Schramm. Pinpointing the side-channel leakage of masked AES hardware implementations. In Louis Goubin and Mitsuru Matsui, editors, Cryptographic Hardware and Embedded Systems - CHES 2006, 8th International Workshop, Yokohama, Japan, October 10-13, 2006, Proceedings, volume 4249 of Lecture Notes in Computer Science, pages 76–90. Springer, 2006. James L. Massey. Linear codes with complementary duals. Discrete Mathematics, 106-107:337–342, 1992. NIST. AES Proposal: Rijndael (now FIPS PUB 197), 9 April 2003. http://csrc.nist.gov/archive/aes/rijndael/Rijndael-ammended.pdf. NIST/ITL/CSD. Advanced Encryption Standard (AES). FIPS PUB 197, Nov 2001. http://nvlpubs. nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf (also ISO/IEC 18033-3:2010). Romain Poussier, Qian Guo, François-Xavier Standaert, Claude Carlet, and Sylvain Guilley. Connecting and Improving Direct Sum Masking and Inner Product Masking. In Thomas Eisenbarth and Yannick Teglia, editors, Smart Card Research and Advanced Applications - 16th International Conference, CARDIS 2017, Lugano, Switzerland, November 13-15, 2017, Revised Selected Papers, volume 10728 of Lecture Notes in Computer Science, pages 123–141. Springer, 2017. Emmanuel Prouff and Matthieu Rivain. A Generic Method for Secure SBox Implementation. In Sehun Kim, Moti Yung, and Hyung-Woo Lee, editors, WISA, volume 4867 of Lecture Notes in Computer Science, pages 227–244. Springer, 2007. Emmanuel Prouff, Matthieu Rivain, and Régis Bevan. Statistical Analysis of Second Order Differential Power Analysis. IEEE Trans. Computers, 58(6):799–811, 2009. Matthieu Rivain and Emmanuel Prouff. Provably Secure Higher-Order Masking of AES. In Stefan Mangard and François-Xavier Standaert, editors, CHES, volume 6225 of LNCS, pages 413–427. Springer, 2010. Stephen L. Snover. The uniqueness of the Nordstrom-Robinson and the Golay binary codes. PhD thesis, Department of Mathematics, Michigan State University, MI, USA, 1973. Youssef Souissi, Adrien Facon, and Sylvain Guilley. Virtual Security Evaluation - An Operational Methodology for Side-Channel Leakage Detection at Source-Code Level. In Claude Carlet, Sylvain Guilley, Abderrahmane Nitaj, and El Mamoun Souidi, editors, Codes, Cryptology and Information Security - Third International Conference, C2SI 2019, Rabat, Morocco, April 22-24, 2019, Proceedings In Honor of Said El Hajji, volume 11445 of Lecture Notes in Computer Science, pages 3–12. Springer, 2019. University of Sydney (Australia). Magma Computational Algebra System. http://magma.maths. usyd.edu.au/magma/, Accessed on 2014-08-22.

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved. doi:10.3233/AISE200011

167

IoTCrawler. Managing Security and Privacy for IoT Pedro GONZALEZ-GIL a,1 , Juan Antonio MARTINEZ b , Hien Thi Thu TRUONG c , Alessandro SFORZIN c and Antonio F. SKARMETA a a Department of Information and Communications Engineering, Computer Science Faculty, University of Murcia b Odin Solutions, Research Department, Alcantarilla, Murcia, Spain c NEC Laboratories Europe, Germany Abstract IoTCrawler is an H2020 project whose main objective is to become a search engine for IoT information. Its intention is not to become a new IoT platform competing with existing ones, but being a higher frame of reference for all of them, creating an IoT ecosystem, quite like any web-based search engine is for websites and webpages. IoTCrawler improves on other approaches by considering security and privacy as main driving pillars, from the information registration phase, to users and machines requests to the stored information. In this chapter, we detail the different components responsible for identity management, authorisation and privacy, and how they interact to obtain the desired goal of controlling and managing the way in which information is registered by existing IoT platforms and later provided to legitimate consumers. We also present the introduction of Distributed Ledger Technologies in this IoT ecosystem as a way to enable distributed trust, avoiding single-point-of-failure threats and implementing smart contracts for authorisation, strategic features to be leveraged for the enablement of data-markets. Keywords. IoTCrawler, security, privacy, enablers

1. Introduction It is a widely accepted fact that efficient and secure access to IoT data will be crucial for the prosperity of society, driving many different aspects of it and further broadening the horizon of what is possible. However, as of today, search and access to the vast repositories of information and services that are being continuously produced and integrated, is still on its infancy, following practices better suited for static resource repositories; overlooking the dynamism and pervasiveness of the resources, and spending large amounts of time on integration. A dynamic and adaptable solution that allows the effective integration of heterogeneous and distributed IoT data, compliant with security and privacy, is yet to come, dragging the sprout and evolution of data markets. A quick summary of the issues limiting the evolution of these can be found on the following list: 1 Corresponding Author: Pedro Gonzalez-Gil, Department of Information and Communications Engineering, Computer Science Faculty, University of Murcia, Spain; E-mail: [email protected].

168

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

• The lack of a normalised and consistent way of communication, bridging the gap produced by the heterogeneity of the different data producers, severely hinders the development of innovative cross-domain applications. • The absence of meta-data representing the meaning and context of the information makes it difficult for the sharing of the later across platforms. • Security and privacy, both major concerns in nearly every domain of modern IT, are still widely overlooked, and challenging for constrained IoT devices. • The large-scale, distributed and dynamic nature of IoT resources require novel approaches for finding, indexing and ranking the required information, as well as accessing it. Some existing technologies, such as Shodan 2 and Thingful 3 already provide IoT searching solutions, although they follow a centralised approach to indexing and ranking, while manually providing metadata, making for a rather static solution. Finally they fail to address privacy and security issues. In order to fully enable the usage of IoT data in business scenarios, an effective approach must provide: • Abstraction from heterogeneous sources of data and dynamic integration of volatile IoT resources through an adaptive distributed framework. • Security, privacy and trust by design as integral part of all processes involved from publication of information to IoT application high-level access. • Scalable discovery, crawling, indexing and ranking of IoT resources in large-scale cross-platform and cross-disciplinary systems and scenarios. • Semantic search, enabling automated, context dependent access to IoT resources. • Continuous monitoring and analysis of the Quality of Service (QoS) and Quality of Information (QoI) of IoT resources, supporting fault recovery and service continuity. The IoTCrawler [14] project, funded by the EU under the H2020 programme, and whose key features are shown in Figure 1, approaches the aforementioned challenges by providing scalable and efficient methods for discovery, crawling, indexing and ranking of IoT resources in large-scale cross-platform, cross-disciplinary systems and scenarios. It provides enablers for secure and privacy-aware access of IoT resources while also providing monitoring and analysis of QoS and QoI that is evaluated during the ranking of suitable resources and supports fault recovery and service continuity. The project’s aim is to create an scalable, flexible and secure IoT search engine by using metadata and resource descriptions in a dynamic data model. For that end, the system should understand the user priorities, and provide results accordingly by using adaptive and dynamic techniques, not overlooking machine-initiated queries and search requests. IoTCrawler framework does not pretend to be another IoT platform, but to integrate existing ones and the IoT information stored in it, making them available to the community in a secure way. To do so, in the scope of this project different cases have been considered for such integration of information. IoTCrawler does not only include a mechanism to integrate IoT platforms, but also IoT gateways (Gw) as representative of very constrained devices, or even IoT devices with the appropriate processing capabilities. 2 https://www.shodan.io/ 3 https://www.thingful.net/

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

169

Figure 1. IoTCrawler key concepts and general architecture

In this sense, the adoption of NGSI-LD as the standard for information representation is a core value of this proposal guaranteeing the right integration of heterogeneous IoT sources of information. Context-aware search is one of the main driving pillars of IoTCrawler, since the trend of using a search engine by humans has slowly drifted and now information is also queried by other services or machines. Last but not least, the topic that brings this research project into this book, security and privacy has been considered from the initial in the design of the IoTCrawler framework. Although there are many potential benefits to implement an ecosystem of IoT platforms, a number of potential risks related to security and privacy are to be expected. The distributed nature and the frequent lack of control over the environments on which they are deployed, make trust, security and privacy a challenging task to tackle. The approach followed by IoTCrawler on the front of quality, privacy, security and trust, is to employ an holistic end-to-end approach from data and service publication to search and access workflow. In the following section we go into detail about the architecture design of this framework further explaining the responsibilities of each component. Next we will introduce the different security and privacy enablers that are part of the IoTCrawler Framework, that will later be further explained in detail on the following sections. Finally we will sum up our conclusions and present them in the last section of this chapter.

2. IoTCrawler Architecture The ambitious objective of IoTCrawler of integrating existing IoT platforms, providing a federated scheme, and with the value-added of doing it in a secure and controlled manner required an in-depth analysis of existing IoT platforms to identify the critical requirements for a correct design of the framework architecture. The need for a cloudbased solution capable of interconnecting distinct IoT domains, as well as extracting

170

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

Figure 2. IoTCrawler overlay architecture

metadata to cope with the data discovery/indexing and the semantic search are notable facts. Thanks to this study we obtained the following list of requirements: • The system must be flexible enough to incorporate IoT resources of different levels of granularity. The range of possible IoT resources ranges from single IoT gateways to complete domains with brokering capabilities. • Since the user is able to introduce semantic search on top of the IoTCrawler framework, the system must be able to allow both the extraction of data and metadata over the information registered in it. • Scalability is a must, since IoTCrawler expects to integrate a high number of IoT domains. Here the computing and storage requirements must be specially considered, given the potential increase of data volumes and Big Data requirements in the coming years. • Indexing and ranking modules already identified in Figure 1 must also support the extraction of metada, as well as semantic features for their processing tasks. • Security and Privacy has been considered from the very beginning of the design of this system, implementing the cross-layer nature of security features within IoTCrawler. • Last but not least, reputation of data must also be assured, being this a challenging task because of the very different nature of resources to be integrated into the framework. Bearing these requirements in mind, as well as the baseline and the general layered structure of the devised architecture already presented in defined in Figure 1, the IoTCrawler consortium designed an overlay infrastructure which is presented in Figure 2. Following a bottom-up approach, this architecture comprises the following layers: Micro layer, Domain layer, Inter-domain layer, Internal processing layer, and Application layer. According to this architecture, the different IoT domains are interconnected with

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

171

the base IoTCrawler platform through a federation approach using Metadata Repositories (MDRs) at different levels, over which semantics empower user-level searches. On the other hand, the endpoint of this framework, from the user perspective is the Orchestrator, responsible for handling the different queries made by them. In the following paragraphs, we provide a thorough explanation of each of this layer. Micro layer This is the layer responsible for integrating IoT resources into the IoTCrawler framework. It is where crawling takes place. Three different sort of sources of information can be crawled: Local IoT Gw, IoT Platform, IoT Dev. Although IoT Platforms are common context sources to be integrated in the framework, attending to the different nature of IoT resources, other cases has also been considered, such as the integration with local gateways acting as intermediary nodes to very constrained IoT devices, or even situations of final devices solely provided in domains with Internet access. For all of the three cases, the crawling task requires that the representation must incorporate semantic annotation provided as NGSI-LD representation [4]. This layer is, therefore responsible for transforming the data to NGSI-LD common data protocol and formats as used by this specific framework. Domain layer In the Domain layer we consider not only the case of having a single MDR, but also a distribution of them. The usage of several MDRs in a domain allows load-balancing mechanisms that can be necessary when the number of users or services accessing or reporting data is too large, or when the data resources to save make it recommendable to use different servers. We must also highlight the interaction with the Authorisation Enabler and the PEP Proxy which guarantees a controlled registration and access to the information stored in our IoTCrawler Platform, which is later explained. Inter-domain layer The inter-domain layer of the architecture federates metadata from different domains into a global (although distributed) data platform and exposes a distributed MDR. This layer is responsible for tracking where to search information about IoT resources interconnected with the IoTCrawler ecosystem. Searching for non-indexed data could be initiated through a DHT approximation that provides the base of the IoTCrawler discovery mechanism. Additionally, a security mechanism based on Distributed Ledger Technology (DLT) using the Blockchain Handler ensures the secure communication between the Distributed MDRs of this layer. Internal processing layer This layer contains a Semantic enrichment module responsible for enriching and extending the existing knowledge of the integrated data sources by calculating certain Quality of Information (QoI) attributes. This process is done thanks to the help of the monitoring component which tracks the data sources and queries made by users and can detect if there are problems with the availability of the requested information. Additionally, indexing component creates and updates indices for the Metadata, while the ranking one, triggered by a search request, accesses to the indices and retrieve an ordered list of the resources matching the request made by the user through the Orchestrator component. In this framework, we have considered algorithms and mechanisms for adaptive ranking and selection of IoT resources and services, providing a dynamic solution for management and orchestration of the selected resources based on application requirements.

172

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

Application layer The Application layer is the last layer of this platform. It comprises the Orchestrator. The Orchestrator is in charge of managing the requests coming from applications, and accessing both indexes and live rankings. When the specific data sources are identified, the Orchestrator provides a list of ranked results to the application. Then, the Application selects one of the results by sending a message to the Orchestrator. Finally, it establishes the necessary data paths to the data source involving the monitoring component to track these resources and queries as described above. Security, Privacy & trust This layer is responsible for integrating the enablers responsible for providing the data sources integrated in the IoTCrawler framework to the users in a secure and controlled manner. It comprises the Identity Manager, which stores and handles the identities with its attributes stored in the framework, the blockchain handler which guarantees thanks to this technology the integrity of the authorisation policies defined in the framework, the authorisation enabler comprised by the XACML modules, the Capability Manager and the PEP Proxy, and finally the CP-ABE technology which allows for encrypting the integrated information. These components are described in deep in the following section. Thanks to this architecture, heterogeneous sources of information can be integrated thanks to the use of a uniform information representation based on NGSI-LD. Additionally, the semantic annotation introduced during the registration of such sources allows for a further semantic enrichment extending the current knowledge with QoI values. On the other hand, users and applications, making use of the Orchestrator component, provide their queries and access to the legitimate information already defined in authorisation policies which are validated in a distributed way thanks to the blockchain technology. Finally, the use of CP-ABE allows for encrypting the information itself, allowing for a secure broadcast of such a content guaranteeing that only the authorised users can decrypt it.

3. Security and Privacy Enablers IoTCrawler has been conceived as a platform to allow users to access the information provided by milliards of IoT devices which are already connected to the Internet, but with one differentiating key point, this access must be provided in a secure and private manner. From the beginning, this transversal requirement has been considered and satisfied at every layer: local, intra-domain, inter-domain, meta and application layer, as was presented in Figure 2. Many potential risks related to security and privacy need to be addressed when integrating such a diverse and complex collection of sources of information, accounting for all the nuances, particularities and needs of each of their domains, from privacy to data monetization. The distributed nature of IoT systems and the frequent lack of control over the environments on which they are deployed, make trust, security and privacy a challenging task to tackle. Although security is a very broad term, in the context of resource access, the three terms most widely used are authentication, authorization and privacy. Another related term, that gains in relevance given the ecosystem nature of IoTCrawler, is trust, which

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

173

defines the relationship between different platforms according to the truthfulness of the exchanged information. In order to cover those aspects, IoTCrawler provides a number of security and privacy enablers: already existing or purposely built technologies that account for the already mentioned elements of security in the described scenario of IoTCrawler. This security and privacy enablers will be covered more in depth on the following sections, but for now lets make a brief summary trying to outline the whole of them in a big picture: • The identity management enabler, as a key component in which the information of the subjects accessing the information is registered. • An authorization enabler, in charge of access control management, based on XACML and capabilities which provides a decoupled approach where a Capability Token is presented as an authorization token, that can easily be validated without requesting third parties. • A privacy enabler for the information broadcast to a group of consumers, which allows only the legitimate users to decrypt the received information based on the attributes of logical entity corresponding to these users. This has been stored in the identity management enabler. • An inter-domain policy management system which accounts for the distributed nature of the IoTCrawler domain ecosystem, avoiding single-point failure conditions and addressing inter-domain trust.

4. Identity Management Identity Management [1] is a security technology which allows for managing, not only human’s identity, as it was envisioned in its initial design, but also IoT devices or socalled smart objects’ identities. This technology has many advantages in a world with a plethora of IoT devices accessed world-wide through different domains, like smart buildings, smart cities, e-health or smart industry, not only by humans but also following a machine-to-machine manner. Handling smart objects’ identities allows for autonomous and independent entities with their respective attributes and identity management mechanisms, which allow for preserving the owner’s privacy during their operation with other services. Following this approach, in IoTCrawler, we have considered the integration of an Identity Manager for both human and devices’ entities. We have integrated an extension of the well-known enabler for Identity Management in the FIWARE community called KeyRock [2], which is also a derivation of OpenStack Keystone [5] implementation. It has been extended in order to handle specific attributes of smart objects entities. Additionally, the adoption of Identity Mixer [6] (Idemix) technology from IBM, as an extension of our IdM endow users and smart objects with means to control and manage their private data. It also allows for the definition of partial identities as subsets of identity attributes from their whole identities. Idemix allows users to minimize personal data disclosure in electronic communications. For this reason, Idemix defines different roles: Issuer, Recipient, Prover, and Verifier. Recipients receive their credentials from Issuer following a specific protocol. This credential comprises a set of attribute values and cryptographic information which allows the credential’s owner to create a proof of possession. Moreover, Idemix technology

174

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

Figure 3. IoT partial identities representation

also allows for proving the possession of several credentials at the same time and stating different relationships among the attributes contained in such credentials. Figure 3 presents a diagram depicting the primary behaviour of our IdM. This manages the user’s attributes as traditional IdM does. Besides, the system can manage directly smart objects’ identities and their attributes. Our IdM system also acts as a Credential Issuer. It is in charge of generating the Idemix credentials for both the users and their associated smart objects according to the attributes hold in the IdM. Each user has associated one or more attributes (e.g., ID, name, address, domain, and email), whereas a smart object has its particular set of attributes, such as ID, owner, vendor, manufacturer, date, and model. The attribute ”owner” allows for the association between smart object and owner. This IdM interacts with other components of our IoTCrawler framework. On the one hand, this has been integrated with our Distributed Capability-Based Access Control (DCapBAC) approach [8], as a lightweight and distributed authorisation model for IoT environments. In this case, the identity attributes that are disclosed by using a specific proof are employed during the authorisation process based on the eXtensible Access Control Markup Language (XACML) [13] to obtain the DCapBAC token. On the other hand, this has been used to obtain cryptographic keys based on the Ciphertext-Policy Attribute-Based Encryption scheme [3], as a flexible approach for scenarios where information needs to be shared in a privacy-preserving manner.

5. Authorisation Due to the heterogeneous nature of IoT devices and networks, most of recent access control proposals have been designed through centralized approaches in which a central entity or gateway is responsible for managing the corresponding authorization mechanisms, allowing or denying requests from external entities. Since this component is usually instantiated by unconstrained entities or back-end servers, standard access control technologies are directly applied. However, significant drawbacks arise when centralized approaches are considered on a real IoT deployment. On the one hand, the inclusion of a central entity for each access request clearly compromises end-to-end security properties, which are considered as an essential requirement on IoT, due to the sensitivity level

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

175

Figure 4. DCapBAC approach for authorization application

of potential applications. On the other hand, the dynamic nature of IoT scenarios with a potential huge number of devices complicates the trust management with the central entity, affecting scalability. Moreover, access control decisions do not consider contextual conditions which are locally sensed by end devices. Finally and further driving the point, current trends show an steady increase in M2M communications, meaning that we expect a great number of M2M requests for authorisation from potentially constrained devices with low latency expectations. In IoTCrawler, the proposed enabler for access control management is based on Distributed Capability Based Acces Control (DCapBAC), a combination of models and techniques making use of access control policies (e.g. eXtensible Access Control Markup Language (XACML) 3.0) [9], which are employed to generate lightweight authorization tokens based on the capabilities-based approach[12][11][7], that, in addition to identity management, takes context information into account. DCapBAC (Figure 4) has been postulated as a feasible approach to be deployed on IoT scenarios even in the presence of devices with tight resource constraints. Inspired by Simple Public Key Infrastructure (SPKI) Certificate Theory and Authorization-Based Access Control (ZBAC) foundations [10], it is based on a lightweight and flexible design that embeds authorization functionality on IoT devices, providing the advantages of a distributed security approach for IoT in terms of scalability, interoperability and end-toend security. The key element of this approach is the concept of capability, as “token, ticket, or key that gives the possessor permission to access an entity or object in a computer system”. This token is usually composed by a set of privileges which are granted to the entity holding the token. Additionally, the token must be tamper-proof and unequivocally identified in order to be considered in a real environment. Therefore, it is necessary to consider suitable cryptographic mechanisms to be used even on resource-constrained devices which enable an end-to-end secure access control mechanism. This concept is applied to IoT environments and extended by defining conditions which are locally verified

176

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

on the constrained device. This feature enhances the flexibility of DCapBAC since any parameter which is read by the smart object could be used in the authorization process. DCapBAC will be part of the access control system and integrated with Policy-based approach by using XACML, in order to infer the access control privileges to be embedded into the capability token. 5.1. Token Life Cycle Following again the process depicted in Figure 4, the component in charge of producing the token is the Capability Manager (CM). In order to do so, it will verify the authentication information of the user via interaction with the identity management system (IdM), and the the feasibility of the request against the XACML policies via the Policy Decision Point (PDP). Once the request has been fully verified, fulfilling all of the requirements, the token will be issued and sent back to the user. The PDP is the component in charge of verifying the request against XACML policies, issuing a verdict. This decoupling of CM and PDP, as opposed to having the CM to check directly against XACML policies, allows for the adaption of several different processes or access mechanisms to the XACML policies (like the use of DLTs as will be seen in Section 7) as well as giving flexibility and allowing for an horizontal scaling of services. Additionally, the interface that enables the management of policies is the Policy Administration Point (PAP), not shown in the picture, but a part of the system nonetheless. Once again, this component can have different implementations depending on the policy-warehouse used to store policies. Once in the hands of the client, the token will be attached to a request for an action on a resource, which will be granted or denied depending on the permissions associated to that token and the specific request. The point at which that decision is made is called the Policy Enforcement Point (PEP). In the depiction shown in in Figure 4, we can appreciate how the PEP Proxy lies between clients (an IoTStream provider in this case) and the Metadata Repository (MDR), enforcing the policies via the permissions attached to the token. This decision does not need to incur in further queries for verification of the token, as the token itself is self-verifiable thanks to the SPKI signature verification, although further mechanisms for distributed trust management can be added (like, once again, using DLTs)

6. Privacy In addition to more established token-based access control approaches, there will be common situations in which information needs to be outsourced or shared to a group of consumers through the use of a central data management platform. For these scenarios, an approach based on advanced cryptographic schemes, such as the Ciphertext Policy Attribute-Based Encryption (CP-ABE) scheme [3], is key to guarantee security properties when this data needs to be shared with groups of users or services. In this case, the high-level context information could be used to select a specific CP-ABE policy to encrypt a certain piece of data. To give an easier description on the principle behind this enabler, whereas the tokenbased access control mechanism prevents or allows access to the resource by means of a

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

177

Figure 5. Privacy requirement for IoTCrawler framework

“key” that opens the service, with the private data sharing enabler (shown in Figure 5), some or all of the information is encrypted, hidden in plain sight from prying eyes, so that only the user with the right key can decode it. Thanks to this cryptographic scheme, the IoTCrawler framework enables a secure data sharing mechanism with groups of consumers (i.e. communities and bubbles of smart objects) in such a way that only legitimate consumers are able to decrypt the received information. In particular, the authorization enabler could contain a set of Sharing Policies specifying how the information should be disseminated according to contextual data. These policies are intended to be evaluated before information is disseminated by the smart objects. The result of the evaluation of these policies could be a CP-ABE policy indicating the set of entities which will be enabled to decrypt the information to be shared. An example sharing policy could be IF contextA=atPub AND data=myLocation, THEN CP-ABEpolicy=myfriends OR myfamily, specifying the location of a user is shared with friends or family members when he/she is at a pub. According to it, when a policy is successfully evaluated, the resulting CP-ABE policy is used to encrypt the information to be shared. In the case of two or more sharing policies are successfully evaluated, the most restrictive CP-ABE policy could be selected. After the information is encrypted and disseminated, this component of smart objects receiving such data will try to decrypt it with CP-ABE keys related to its identity attributes. It should be noted that the use of such approach could be integrated into end devices (e.g.smartphones) that will share their data with other users through the platform. At the same time, such approach could be included into the platform, for less performant devices (e.g. sensors) or cases where it is not possible to deploy this mechanism.

7. Inter-Domain Policy Management IoT search system refers to an engine that is connected to multiple IoT systems (domains) to index and crawl IoT data up on requests from users of the engine. Multiple domains where the search engine connects do not hold pre-trust established relationships. How-

178

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

IoT Search

Data 1

Local Policies Data

Data

Local Policies

Domain 2

Data 2

...

Domain n

Local Policies Data

Domain 1

Da ta

ta Da

Data

Global Policies

Blockchain-based polocy management and policy enforcement

Data

Applications/Users

Data n

Figure 6. Inter-Domains Policy Management

ever, they are supposed to comply with sets of policies on sharing data when they are connected to the search engine. The federation of distributed MDRs previously presented in the inter-domain layer, requires some special mechanisms to be set in place in order to control the secure and safe access to shared resources. In this system, we define two levels for data sharing rules: global rules and domain-local rules. The global data sharing policies are ones that all domains contractually agree on. For example, “No data owner (domain) can give consumers access to their data, to untrusted parties such as embargoed countries”. The domain-local policies are ones that each individual domain creates to restrict access to their data. In this system, only validated policies are executed. Also a domain policy once validated is modified only if it gets approval by the network and it cannot revoke an proved policy without requiring new approval on the removal. This layer introduces the use of distributed ledger technology handlers to provide inter-domain policy management (Figure 6), in order to store and distribute those policies among participants. IoTCrawler policy model includes not only global policies, that all domains4 must comply with, but also domain-specific ones by which each data source owner has full rights to set its own policies, typically about who and under what circumstances data can be accessed. It is important to note that under this schema, domainspecific policies are not allowed to conflict with global policies, as the latter are created under the agreement of major or all domains in the federation. Another important fact is that policies cannot be revoked, as it might cause broken services, but modified with the approval of the remaining domains in the federation. 4 In

this context, domain refers to a virtual concept referring data source providers

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

179

Distributed Ledger Technology (DLT) and Smart Contracts: a distributed ledger allows a peer-to-peer (P2P) network of untrusted entities to exchange transactions securely. The P2P network does not have a central administration and does not rely on a particular hierarchy because all participants have the same capabilities. In a distributed ledger network, each new member must be agreed by all peers, and a single ledger of ordered transactions is shared by all peers in real time. Concretely, each peer stores a complete copy of the shared ledger. This full replication strategy not only boost the reliability of the ledger, but also protects it against attacks, because unauthorized modifications can only happen if an attacker controls the majority of the network’s peers. Furthermore, additional restrictions can be applied so that access is granted only to a limited set of entities. A blockchain is one instance of DLT, where the peers execute a consensus protocol to validate transactions and group them in special data structures called blocks that will be appended to the ledger. Blocks are chained by including the cryptographic hash value of previous block’s header into the newly created next block. This technique of chaining with linked hash values prevents tampering transactions without being detected. Another benefit of using blockchain is that it allows the execution of business logic on transactions, programmed in the form of smart contracts that are executed (verified) concurrently in the network by many peers, thus relying on their consensus as a really strong security mechanism. Namely, smart contracts are computer programs, uploaded by the peers, that handle the business logic which was pre-agreed by the network members. Smart contracts are added to the blockchain in a similar fashion of adding transactions, thus they are included in blocks. Naturally, transactions that update smart contract states are also recorded in ledger. The federating policy management design considers IoTCrawler a special domain, acting as an end-point for serving external clients such as applications, services, and other consumers. We argue that a distributed ledger is well suited for this particular type of application. Typically, a domain creates a policy, advertises it on the network, and waits for its approval (offer acceptance in our sharing scheme). Global policies also can be initiated by a node and approved by other nodes. Validated policies are visible to all nodes so that they can verify whether any given data sharing complies to such policies. Policy enforcement is made via the blockchain such that only valid policies are executed by respective smart contracts. For the implementation of this DLT, we have chosen Hyperledger Fabric (most commonly referred as Fabric). Fabric is a widely used open-source blockchain platform managed by the Linux Foundation. Some use-cases of this technology include various areas such as supply chain management, contract management, data provenance, and identity management. Most importantly, Fabric is designed as a modular and extensible permissioned blockchain, supporting the execution of distributed applications written in general purpose programming languages such as Go and Java, and following the executeorder-validate paradigm for execution of untrusted code in untrusted environment. A distributed application consists of a chaincode (smart contract) and an endorsement policy. The chaincode implements the application logic and runs in the execution phase. The endorsement policy is evaluated in the validation phase and it is only modified by trusted entities, e.g., administrators. Fabric uses a hybrid replication design which incorporates primary-backup (passive) replication and active replication. Primary-backup replication

180

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

Figure 7. Smart Contract application. Blockchain handler.

in Fabric means every transaction is executed only by a subset of peers based on endorsement policies. Finally, Fabric adopts active replication so that transactions are written to the ledger only once consensus is reached on the total order. This hybrid design is what makes Fabric a scalable permissioned blockchain. Regarding the validation of the federation policies, applying this technology allows the implementation of policies as smart contracts. In this way, at checking point, each policy can be either approved and added to the ledger, or rejected. Moreover, the management of policies (i.e., adding or modifying existing ones), is verified and validated by the network. Finally, policy checks are performed via the execution of the smart contracts. By leveraging the blockchain , as can be seen in Figure 7, we also provide a way for data monetization, as a solution for transparency where users are able to establish policies that set a limited time of use, license fees, and so on.

8. Conclusions In this chapter we have outlined the overall IoTCrawler framework, going into more detail on the security aspects of it. For this purpose, we have presented a diagram defining the different layers comprising this framework. Their entities, as well as the interactions among them. As we explained in the Introduction, the adoption of security and privacy are fundamental to control the access to the information in our framework. We have also pinpointed in this chapter the requirements associated to these properties at different layers in the scope of the IoTCrawler framework. On the one hand, an authorization enabler is required to control the way producers and consumers add/query information to our framework. Privacy is also considered by using an attribute-based encryption mechanism which allows for a secure broadcast of information. Finally, at inter-domain level, we have also introduced the Distributed Ledger Technology, and more specifically the use of Smart Contracts and blockchain for the definition of the agreement of global policies. Thanks to the technologies provided along this document we are already in the process of developing this ambitious framework so as to allow users to securely access to IoT information on Internet, without compromising scalability nor performance.

P. Gonzalez-Gil et al. / IoTCrawler. Managing Security and Privacy for IoT

181

Acknowledgements This work has been sponsored by the European Commission through the following projects: IoTCrawler (contract 779852), Olympus (contract 786725) and Use-IT (CHIST-ERA PCIN-2016-010); and by the Spanish Ministry of Economy and Competitiveness, through the Torres Quevedo program (grant TQ-15-08073).

References [1] [2]

[3] [4] [5] [6] [7]

[8]

[9] [10] [11]

[12] [13] [14]

Jorge Bernal Bernabe, Jose L. Hernandez-Ramos, and Antonio F.Skarmeta Gomez. Holistic PrivacyPreserving Identity Management System for the Internet of Things. Mobile Information Systems, 2017. Jorge Bernal Bernabe, Jose L Hernandez-Ramos, and Antonio F Skarmeta Gomez. Holistic privacypreserving identity management system for the internet of things. Mobile Information Systems, 2017, 2017. John Bethencourt, Amit Sahai, and Brent Waters. Ciphertext-policy attribute-based encryption. In 2007 IEEE symposium on security and privacy (SP’07), pages 321–334. IEEE, 2007. Etsi Cim and Context Information Management. NGSI-LD Information Model. 2019. Baojiang Cui and Tao Xi. Security analysis of openstack keystone. In 2015 9th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pages 283–288. IEEE, 2015. Zurich December. Specification of the Identity Mixer Cryptographic Library. Information Security, 2010. Jos´e L. Hern´andez-Ramos, Antonio J. Jara, Leandro Mar´ın, and Antonio F. Skarmeta G´omez. DCapBAC: embedding authorization logic into smart things through ECC optimizations. International Journal of Computer Mathematics, 2016. Jose L Hernandez-Ramos, Marcin Piotr Pawlowski, Antonio J Jara, Antonio F Skarmeta, and Latif Ladid. Toward a lightweight authentication and authorization framework for smart objects. IEEE Journal on Selected Areas in Communications, 33(4):690–702, 2015. Document Identifier. eXtensible Access Control Markup Language OASIS Standard , 1 Feb 2005. Oasis Standard, 2005. Alan H Karp, Harry Haury, and Michael H Davis. From ABAC to ZBAC : The Evolution of Access Control Models. ISSA Journal, 2010. Parikshit N. Mahalle, Bayu Anggorojati, Neeli Rashmi Prasad, and Ramjee Prasad. Identity driven capability based access control (ICAC) scheme for the Internet of Things. In 2012 IEEE International Conference on Advanced Networks and Telecommunciations Systems, ANTS 2012, 2012. Martin Naedele. An access control protocol for embedded devices. In 2006 IEEE International Conference on Industrial Informatics, INDIN’06, 2007. Erik Rissanen. extensible access control markup language (xacml) version 3.0 oasis standard, 2012, 2012. Antonio F. Skarmeta, Jose Santa, Juan A. Martinez, Josiane X. Parreira, Payam Barnaghi, Shirin Enshaeifar, Michail J. Beliatis, Mirko A. Presser, Thorben Iggena, Marten Fischer, Ralf Tonjes, Martin Strohbach, Alessandro Sforzin, and Hien Truong. IoTCrawler: Browsing the internet of things. In 2018 Global Internet of Things Summit, GIoTS 2018, 2018.

This page intentionally left blank

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved.

183

Subject Index access control 61 ambient assisted living 94 attack tree 94 blockchains 76 coding theory 148 context 61 countermeasure 148 CP-ABE 1 direct sum masking (DSM) 108, 148 Distributed Ledger Technologies (DLT) 76 dynamicity 61 enablers 167 fault injection attacks 108, 148 GDPR 24 high-order 148 Internet of Things (IoT) 1, 61, 76, 148 IoT Model Checking 94

IoT Platform 24 IoTCrawler 167 linear complementary dual codes 108 linear complementary pair of codes 108 orchestration 1 policy-based 1 privacy 24, 61, 76, 148, 167 privacy awareness 44 privacy-by-design 44 security 1, 61, 76, 148, 167 security of Internet of Things (IoT) 108 side channel attacks 108 side-channel analysis 148 smart contracts 76 trustworthiness 61 user-centric 44

This page intentionally left blank

Security and Privacy in the Internet of Things: Challenges and Solutions J.L.H. Ramos and A. Skarmeta (Eds.) IOS Press, 2020 © 2020 The authors and IOS Press. All rights reserved.

185

Author Index Andreoletti, D. Anton, P. Aroua, S. Augusto, J.C. Baron, B. Bellesini, F. Bragatto, T. Cardoso, F. Carlet, C. Castelluccia, C. Cavadenti, A. Conzon, D. Coustaty, M. Cristescu, I.-D. Croce, V. del Re, E. Falquet, G. Ferrari, A. Ferrera, E. Ferry, N. Fotiou, N. Gallon, A. Garcia-Carrillo, D. Ghadfi, S. Ghamri-Doudane, Y. Giménez Manuel, J.G. Giordano, S. Gomez-Krmer, P. Gonzalez-Gil, P. Guilley, S. Güneri, C. Haavala, M. Howells, G. Iturbe, E. Kortesniemi, Y.

44 76 129 94 44 76 76 44 108, 148 44 76 24 129 94 76 viii 129 44 24 61 76 61 1 129 129 94 44 129 167 108, 148 108 76 129 61 76

Lagutin, D. le Métayer, D. Leligou, H.C. Luceri, L. Manzoor, A. Martinez, J.A. Mcdonald-Maier, K. Mesnager, S. Molina-Zarca, A. Morel, V. Murphy, J. Musolesi, M. Oikonomidis, Y. Önen, M. Oualha, N. Özbudak, F. Polyzos, G.C. Rashid, M.R.A. Raveduto, G. Rios, E. Rouis, K. Santori, F. Sforzin, A. Siris, V. Skarmeta, A. Song, H. Tamani, N. Tao, X. Trakadas, P. Truong, H.T.T. Van Rompay, C. Verber, M. Wilk, C.

76 44 76 44 76 167 129 108, 148 1 44 129 44 76 44 1 108 76 24 76 61 129 76 167 76 1, 167 61 129 24 76 167 44 76 v

This page intentionally left blank