Progress in Cryptology – INDOCRYPT 2020: 21st International Conference on Cryptology in India, Bangalore, India, December 13–16, 2020, Proceedings [1st ed.] 9783030652760, 9783030652777

This book constitutes the refereed proceedings of the 21st International Conference on Cryptology in India, INDOCRYPT 20

686 125 32MB

English Pages XX, 906 [912] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Progress in Cryptology – INDOCRYPT 2020: 21st International Conference on Cryptology in India, Bangalore, India, December 13–16, 2020, Proceedings [1st ed.]
 9783030652760, 9783030652777

Table of contents :
Front Matter ....Pages i-xx
Front Matter ....Pages 1-1
Delayed Authentication: Preventing Replay and Relay Attacks in Private Contact Tracing (Krzysztof Pietrzak)....Pages 3-15
Proof-of-Reputation Blockchain with Nakamoto Fallback (Leonard Kleinrock, Rafail Ostrovsky, Vassilis Zikas)....Pages 16-38
Transciphering, Using FiLIP and TFHE for an Efficient Delegation of Computation (Clément Hoffmann, Pierrick Méaux, Thomas Ricosset)....Pages 39-61
Encrypted Key-Value Stores (Archita Agarwal, Seny Kamara)....Pages 62-85
Front Matter ....Pages 87-87
Formal Verification of Fair Exchange Based on Bitcoin Smart Contracts (Cheng Shi, Kazuki Yoneyama)....Pages 89-106
Certified Compilation for Cryptography: Extended x86 Instructions and Constant-Time Verification (José Bacelar Almeida, Manuel Barbosa, Gilles Barthe, Vincent Laporte, Tiago Oliveira)....Pages 107-127
Protocol Analysis with Time (Damián Aparicio-Sánchez, Santiago Escobar, Catherine Meadows, José Meseguer, Julia Sapiña)....Pages 128-150
Verifpal: Cryptographic Protocol Analysis for the Real World (Nadim Kobeissi, Georgio Nicolas, Mukesh Tiwari)....Pages 151-202
Front Matter ....Pages 203-203
On the Worst-Case Side-Channel Security of ECC Point Randomization in Embedded Devices (Melissa Azouaoui, François Durvaux, Romain Poussier, François-Xavier Standaert, Kostas Papagiannopoulos, Vincent Verneuil)....Pages 205-227
Efficient Hardware Implementations for Elliptic Curve Cryptography over Curve448 (Mojtaba Bisheh Niasar, Reza Azarderakhsh, Mehran Mozaffari Kermani)....Pages 228-247
Extending the Signed Non-zero Bit and Sign-Aligned Columns Methods to General Bases for Use in Cryptography (Abhraneel Dutta, Aaron Hutchinson, Koray Karabina)....Pages 248-270
Front Matter ....Pages 271-271
Cryptanalysis of the Permutation Based Algorithm SpoC (Liliya Kraleva, Raluca Posteuca, Vincent Rijmen)....Pages 273-293
More Glimpses of the RC4 Internal State Array (Pranab Chakraborty, Subhamoy Maitra)....Pages 294-311
Mixture Integral Attacks on Reduced-Round AES with a Known/Secret S-Box (Lorenzo Grassi, Markus Schofnegger)....Pages 312-331
Counting Active S-Boxes is not Enough (Orr Dunkelman, Abhishek Kumar, Eran Lambooij, Somitra Kumar Sanadhya)....Pages 332-344
Computing Expected Differential Probability of (Truncated) Differentials and Expected Linear Potential of (Multidimensional) Linear Hulls in SPN Block Ciphers (Maria Eichlseder, Gregor Leander, Shahram Rasoolzadeh)....Pages 345-369
Front Matter ....Pages 371-371
Quantum Cryptanalysis on Contracting Feistel Structures and Observation on Related-Key Settings (Carlos Cid, Akinori Hosoyamada, Yunwen Liu, Siang Meng Sim)....Pages 373-394
Evaluation of Quantum Cryptanalysis on SPECK (Ravi Anand, Arpita Maitra, Sourav Mukhopadhyay)....Pages 395-413
Front Matter ....Pages 415-415
Making the BKW Algorithm Practical for LWE (Alessandro Budroni, Qian Guo, Thomas Johansson, Erik Mårtensson, Paul Stankovski Wagner)....Pages 417-439
On a Dual/Hybrid Approach to Small Secret LWE (Thomas Espitau, Antoine Joux, Natalia Kharchenko)....Pages 440-462
Front Matter ....Pages 463-463
Adaptively Secure Threshold Symmetric-Key Encryption (Pratyay Mukherjee)....Pages 465-487
Vetted Encryption (Martha Norberg Hovd, Martijn Stam)....Pages 488-507
Security of Public Key Encryption Against Resetting Attacks (Juliane Krämer, Patrick Struck)....Pages 508-528
The Multi-Base Discrete Logarithm Problem: Tight Reductions and Non-rewinding Proofs for Schnorr Identification and Signatures (Mihir Bellare, Wei Dai)....Pages 529-552
Skipping the q in Group Signatures (Olivier Blazy, Saqib A. Kakvi)....Pages 553-575
Incremental Cryptography Revisited: PRFs, Nonces and Modular Design (Vivek Arte, Mihir Bellare, Louiza Khati)....Pages 576-598
Front Matter ....Pages 599-599
Gadget-Based iNTRU Lattice Trapdoors (Nicholas Genise, Baiyu Li)....Pages 601-623
Lattice-Based IBE with Equality Test Supporting Flexible Authorization in the Standard Model (Giang Linh Duc Nguyen, Willy Susilo, Dung Hoang Duong, HuyQuoc Le, Fuchun Guo)....Pages 624-643
Efficient Attribute-Based Proxy Re-Encryption with Constant Size Ciphertexts (Arinjita Paul, S. Sharmila Deva Selvi, C. Pandu Rangan)....Pages 644-665
Adaptive-Secure Identity-Based Inner-Product Functional Encryption and Its Leakage-Resilience (Linru Zhang, Xiangning Wang, Yuechen Chen, Siu-Ming Yiu)....Pages 666-690
CCA-Secure ABE Using Tag and Pair Encoding (Olivier Blazy, Sayantan Mukherjee)....Pages 691-714
Simpler Constructions of Asymmetric Primitives from Obfuscation (Pooya Farshim, Georg Fuchsbauer, Alain Passelègue)....Pages 715-738
Front Matter ....Pages 739-739
Adaptive Security of Practical Garbling Schemes (Zahra Jafargholi, Sabine Oechsner)....Pages 741-762
Constructive t-secure Homomorphic Secret Sharing for Low Degree Polynomials (Kittiphop Phalakarn, Vorapong Suppakitpaisarn, Nuttapong Attrapadung, Kanta Matsuura)....Pages 763-785
Perfectly-Secure Asynchronous MPC for General Adversaries (Extended Abstract) (Ashish Choudhury, Nikhil Pappu)....Pages 786-809
Improving the Efficiency of Optimally-Resilient Statistically-Secure Asynchronous Multi-party Computation (Ashish Choudhury)....Pages 810-831
High Throughput Secure MPC over Small Population in Hybrid Networks (Extended Abstract) (Ashish Choudhury, Aditya Hegde)....Pages 832-855
Front Matter ....Pages 857-857
Dual-Mode NIZKs: Possibility and Impossibility Results for Property Transfer (Vivek Arte, Mihir Bellare)....Pages 859-881
On Black-Box Extension of a Non-Interactive Zero-Knowledge Proof System for Secret Equality (Kyosuke Yamashita, Mehdi Tibouchi, Masayuki Abe)....Pages 882-904
Back Matter ....Pages 905-906

Citation preview

LNCS 12578

Karthikeyan Bhargavan Elisabeth Oswald Manoj Prabhakaran (Eds.)

Progress in Cryptology – INDOCRYPT 2020 21st International Conference on Cryptology in India Bangalore, India, December 13–16, 2020 Proceedings

Lecture Notes in Computer Science Founding Editors Gerhard Goos Karlsruhe Institute of Technology, Karlsruhe, Germany Juris Hartmanis Cornell University, Ithaca, NY, USA

Editorial Board Members Elisa Bertino Purdue University, West Lafayette, IN, USA Wen Gao Peking University, Beijing, China Bernhard Steffen TU Dortmund University, Dortmund, Germany Gerhard Woeginger RWTH Aachen, Aachen, Germany Moti Yung Columbia University, New York, NY, USA

12578

More information about this subseries at http://www.springer.com/series/7410

Karthikeyan Bhargavan Elisabeth Oswald Manoj Prabhakaran (Eds.) •



Progress in Cryptology – INDOCRYPT 2020 21st International Conference on Cryptology in India Bangalore, India, December 13–16, 2020 Proceedings

123

Editors Karthikeyan Bhargavan Inria Paris Paris, France

Elisabeth Oswald University of Klagenfurt Klagenfurt, Austria

Manoj Prabhakaran Department of Computer Science and Engineering Indian Institute of Technology Bombay Mumbai, Maharashtra, India

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-65276-0 ISBN 978-3-030-65277-7 (eBook) https://doi.org/10.1007/978-3-030-65277-7 LNCS Sublibrary: SL4 – Security and Cryptology © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

It is with great pleasure that we present the proceedings of the 21st International Conference on Cryptology (INDOCRYPT 2020) held in India during December 13– 16, 2020, as a virtual conference. INDOCRYPT is an international cryptography conference held every year in December, under the aegis of the Cryptology Research Society of India (CRSI). In the two decades since the inaugural edition in 2000, INDOCRYPT has established itself as a reputable international venue for publishing cryptology research, as well as a valuable resource for promoting research in India. It is currently organized in cooperation with the International Association for Cryptologic Research (IACR). This year's virtual conference was organized by a team centered at the International Institute of Information Technology Bangalore (IIITB). Originally planned to be held at the IIITB campus, the event was transformed into a virtual event to facilitate participation amidst the current pandemic. Past editions of INDOCRYPT were held in various cities in India: Kolkata (2000, 2006, 2012, 2016), Chennai (2001, 2004, 2007, 2011, 2017), Hyderabad (2002, 2010, 2019), New Delhi (2003, 2009, 2014, 2018), Bangalore (2005, 2015), Kharagpur (2008), and Mumbai (2013). The Program Committee (PC) for INDOCRYPT 2020 consisted of 61 experts from around the world. About 36% of the PC (including two of the chairs) was based in Europe, 28% in the USA, 23% in India (including one chair), and the rest from other countries in Asia and from New Zealand. The conference attracted 104 submissions. Of these, 20 papers were rejected by the PC chairs for not meeting the submission guidelines. The remaining 84 papers were reviewed by the PC, with most papers receiving 3 independent reviews (3 papers each received 1, 2, and 4 reviews). The double-blind reviews were carried out by the PC, with the help of 77 external reviewers, using the EasyChair conference management system. We take this opportunity to thank all the PC members and the external reviewers for a tremendous job! Despite the short review period, on the whole, the reviews were rigorous and detailed. In a handful of submissions, the reviews uncovered subtle errors; at the discretion of the PC, the authors were given a chance to respond, often resulting in fixes or withdrawal of those submissions. At the end of the review phase, 39 papers were selected for publication in these proceedings, 9 of which went through a shepherding process to ensure that various concerns raised by the reviewers were addressed before publication. As usual, the final uploaded versions were not reviewed by the PC, and the authors bear the full responsibility for their contents. The 104 submissions received involved 270 authors from 33 countries. Among the accepted papers, European authors had a share of about 42%, North American authors about 22%, Indian authors about 17%, authors from the rest of Asia about 16%, and Australian authors about 3%; the top 4 countries contributing to this list were the USA (21%), India (17%), France, and Japan (9% each).

vi

Preface

The program also included three invited talks by Ran Canetti (Boston University, USA), Véronique Cortier (CNRS, France), and Carmela Troncoso (EPFL, Switzerland), a tutorial on “Constructive Cryptography” led by Ueli Maurer (ETH Zürich, Switzerland), and a Rump Session to announce new results and works in progress. Apart from its traditional focus on areas in applied and theoretical cryptology, this year INDOCRYPT solicited papers in the area of Formal Methods for Cryptographic Systems. While this is an area which has the same high-level goal as cryptology – namely, providing information security guarantees – the avenues for interaction between these two areas have been limited. It is to mend this gap, and to work towards a future when the tools and techniques from these two areas would be developed and deployed synergistically, that INDOCRYPT 2020 has expanded its scope this year. These proceedings carry a short but interesting section on this theme featuring four papers. Two of the invited talks also fit this theme. In addition, a two-day long preconference workshop called “VeriCrypt: An Introduction to Tools for Verified Cryptography” provided students and practitioners of cryptography with a unique opportunity to learn about verified cryptography from the leading experts in the area. Altogether, we hope that these would intrigue and inspire many cryptographers to engage with Formal Methods. Conversely, we hope that the participants at INDOCRYPT 2020 from the Formal Methods community would be exposed to a wealth of new questions and concepts developed in the Cryptography community. We would like to thank CRSI for entrusting us with putting together the program for INDOCRYPT 2020. Thanks to the authors of all the submissions and the contribution of the entire PC, we have ended up with a rich and exciting program. We would also like to acknowledge the major contribution of the organizers, headed by the general chairs Prof. S. Sadagopan (Director, IIITB) and Dr. Vishal Saraswat (RBEI/ESY). In particular, we thank the organizing chair, Prof. Srinivas Vivek (IIITB) and his efficient team for helping us in a range of tasks, including putting together the conference webpage with all the relevant information and instructions for the authors, implementing replacements for features not included in the free license of EasyChair, and assembling the proceedings. We also thank Springer for continuing to support INDOCRYPT by publishing the proceedings as part of the LNCS series. Finally, we thank all the participants in the conference, including the authors of all the submissions, the attendees and the presenters, for their enthusiastic participation. We hope you find INDOCRYPT 2020 and these proceedings to be valuable and enjoyable! December 2020

Karthikeyan Bhargavan Elisabeth Oswald Manoj Prabhakaran

Organization

General Chairs S. Sadagopan Vishal Saraswat

IIIT Bangalore, India Robert Bosch Engineering and Business Solutions Pvt. Ltd., India

Program Committee Chairs Karthikeyan Bhargavan Elisabeth Oswald Manoj Prabhakaran

Inria Paris, France University of Klagenfurt, Austria Indian Institute of Technology Bombay, India

Program Committee Shweta Agrawal Shashank Agrawal Saikrishna Badrinarayanan Manuel Barbosa Mihir Bellare Davide Bellizia Begül Bilgin Bruno Blanchet Chris Brzuska Nishanth Chandran Ran Cohen Cas Cremers Apoorvaa Deshpande Maria Eichlseder Pooya Farshim Chaya Ganesh Deepak Garg Vipul Goyal Clémentine Gritti Divya Gupta Carmit Hazay James Howe Abhishek Jain Bhavana Kanukurthi Dakshita Khurana Markulf Kohlweiss

Indian Institute of Technology Madras, India Western Digital, USA Visa Research, USA HASLab - INESC TEC, Portugal University of California, San Diego, USA Université catholique de Louvain, Belgium Université catholique de Louvain, Belgium Inria Paris, France Aalto University, Finland Microsoft Research, India Northeastern University, USA CISPA Helmholtz Center for Information Security, Germany Snap Inc., USA IAIK TU Graz, Austria University of York, UK Indian Institute of Science, India Max Planck Institute for Software Systems, Germany Carnegie Mellon University, USA University of Canterbury, New Zealand Microsoft Research, India Bar-Ilan University, Israel PQShield, UK Johns Hopkins University, USA Indian Institute of Science, India University of Illinois at Urbana-Champaign, USA The University of Edinburgh, UK

viii

Organization

Venkata Koppula Steve Kremer Ralf Küsters N. V. Narendra Kumar Patrick Longa Bart Mennink Aikaterini Mitrokotsa Pratyay Mukherjee Debdeep Mukhopadhyay Prasad Naldurg Ryo Nishimaki Adam O’Neill Sabine Oechsner Anat Paskin-Cherniavsky Arpita Patra Raphael C.-W. Phan Romain Poussier Sanjiva Prasad R. Ramanujam Aseem Rastogi Mike Rosulek Arnab Roy Alessandra Scafuro Peter Schwabe Sourav Sen Gupta Akshayaram Srinivasan Pierre-Yves Strub S. P. Suresh Ni Trieu Prashant Vasudevan Bo-Yin Yang Santiago Zanella-Béguelin

Indian Institute of Technology Delhi, India Inria Paris, France University of Stuttgart, Germany IDRBT Hyderabad, India Microsoft Research Redmond, USA Radboud University Nijmegen, The Netherlands Chalmers University of Technology, Sweden Visa Research, USA Indian Institute of Technology Kharagpur, India Inria Paris, France NTT, Japan University of Massachusetts, USA Aarhus University, Denmark Ariel University, Israel Indian Institute of Science, India Monash University, Malaysia Nanyang Technological University, Singapore Indian Institute of Technology Delhi, India Institute of Mathematical Sciences Chennai, India Microsoft Research, India Oregon State University, USA University of Klagenfurt, Austria North Carolina State University, USA Max Planck Institute for Security and Privacy, Germany, and Radboud University, The Netherlands Nanyang Technological University, Singapore University of California, Berkeley, USA École Polytechnique, France Chennai Mathematical Institute, India Arizona State University, USA University of California, Berkeley, USA Academia Sinica, Taiwan Microsoft Research, USA

Organizing Committee Chair Srinivas Vivek

IIIT Bangalore, India

Organizing Committee Anju Alexander Janvi Chhabra Hemanth Chitti Aditya Hegde Deepak K. Shyam S. M.

IIIT Bangalore, India IIIT Bangalore, India IIIT Bangalore, India IIIT Bangalore, India National Institute of Technology Karnataka, India IIIT Bangalore, India

Organization

Girisha Shankar Santosh Upadhyaya Annapurna Valiveti Vivek Yadav

IISc Bengaluru, India IIIT Bangalore, India IIIT Bangalore, India IIIT Bangalore, India

Additional Reviewers Aayush Jain Abida Haque Aditya Hegde Amit Singh Bhati Anna Guinet Arpan Jati Ashwin Jha Bei Liang Benedikt Bunz Bhavana Obbattu Chen-Da Liu-Zhang Chetan Surana Damien Stehlé Daniel Coggia Daniel Masny Debapriya Basu Roy Deevashwer Rathee Divya Ravi Gayathri Garimella Georgia Tsaloli Gustavo Banegas Ian McQuoid Jack Doerner James Bartusek Julian Liedtke Kolin Paul Laasya Bangalore Lauren De Meyer Lorenzo Grassi Lukasz Chmielewski Madhavan Mukund Manaar Alam Mang Zhao Marc Rivinius Marco Baldi Markus Schofnegger Mayank Rathee Meenakshi Kansal

Michael Hamburg Michael Tunstall Michael Walter Miruna Rosca Monosij Maitra Nilanjan Datta Nishant Kumar Nishat Koti Paulo Barreto Pedram Hosseyni Pierrick Meaux Pratik Sarkar Rachit Garg Rafaël Del Pino Rishabh Bhadauria Rogério Pontes Ron Steinfeld Sarah McCarthy Satrajit Ghosh Satyanarayana Vusirikala Sebastian Hasler Shai Halevi Shivam Bhasin Shreyas Gupta Sikhar Patranabis Sogol Mazaheri Sruthi Sekar Subhamoy Maitra Subodh Sharma Søren Eller Thomsen Tahina Ramananandro Tapas Pal Vaishnavi Sundararajan Varsha Bhat Veronika Kuchta Yilei Chen Yoo-Seung Won Zhengzhong Jin

ix

x

Organization

Jointly Organized By

Under the Aegis of

In Cooperation with

Abstracts of Invited Talks

Can We Have Truly Secure Information Systems? Combining Formal Analysis and Cryptography is Key Ran Canetti1

Abstract. We constantly hear about new vulnerabilities in information systems. Beyond the direct harm caused by each vulnerability and exploit, the constant stream of exposed flaws, which range from small programming bugs to largescale design errors, undermine our general trust in information systems. This lack of trust, in turn, undermines our ability to fully realize the potential societal benefits that information systems carry—in particular their ability to preserve the privacy of sensitive data, while allowing controlled use of this data along with accountability for such use. This talk will argue that this unfortunate state of affairs is far from being inherent, and that if we combine technology and thinking from a number of communities, most prominently Programming Languages, Cryptography, and System design, it will indeed be possible to build large scale information systems that truly satisfy even intricate security and functionality requirements. We will survey some of the work done in this direction and will outline a potential road ahead. Key tools include: (a) developing techniques for using PL methodology and techniques to specify and assert higher-level security properties of multi-component systems, and (b) enabling security-preserving modular design and analysis of complex systems from simple building blocks. Key application areas will also be discussed.

1

Boston University. Member of the CPIIS. Supported by NSF Awards 1931714, 1801564, 1414119, and DARPA awards HR00112020021 and HR00112020023.

Electronic Voting: How Formal Methods Can Help Véronique Cortier1 CNRS, Université de Lorraine, Inria, LORIA, F-54000 Nancy, France [email protected] Abstract. Electronic voting is now used in many countries and in many elections, from small elections such as parents representatives in schools, to politically binding elections. Electronic voting facilitates counting and enables a wide variety of elections, such as preferential voting where voters rank their candidates by order of preference. In such a case, counting the votes by hand can be a very complex task that can, in contrast, easily be handled by a computer. Another advantage of electronic voting is that voters may vote from anywhere and it can last several weeks if needed. It is often used at least as a replacement of postal voting. However, electronic voting also comes with security threats and has been subject to severe attacks. For example, the Washington, D.C., Internet voting system has been attacked [7], during a trial just before the election. Recently, Switzerland has stopped temporarily using electronic voting after a public penetration test that has revealed severe weaknesses [6]. Ideally, electronic voting should offer the same guarantees that traditional paper-based election. In particular, it should guarantee vote privacy (no one should know how I voted) and verifiability: anyone should be able to check that the votes are accurately counted and correspond to the voters intent. In this talk, we will present the Belenios voting protocol [4] that has been used in more than 500 elections in 2020, through its voting platform1. We will discuss its security properties as well as its limitations. As for security protocols in general, the design of electronic voting systems is error-prone and it is hard to assess whether a protocol ensures the required security properties. For these reasons, a common good practice is to design a protocol together with a formal security analysis, as it has been done in the context of TLS1.3 [5] for example. The Swiss Chancellerie requires both computational and formal proofs of security before the deployment of an electronic voting system [1]. We will explain how the tools developed for the automated analysis of security protocols, like ProVerif [3], can help gaining a better confidence in e-voting protocols. One particular issue in the context of voting is that the formalization of security properties is still an on-going work, even for crucial properties like vote secrecy [2]. Hence electronic voting still raises many challenges for our research community.

1

https://belenios.loria.fr/admin

Electronic Voting: How Formal Methods Can Help

xv

References 1. Exigences techniques et administratives applicables au vote électronique. Chancel-lerie fédérale ChF, 2014. Swiss recommendation on e-voting. 2. Bernhard, D., Cortier, V., Galindo, D., Pereira, O., Warinschi, B.: A comprehensive analysis of game-based ballot privacy definitions. In: Proceedings of the 36th IEEE Symposium on Security and Privacy (S&P 2015), San Jose, CA, USA, pp. 499–516. IEEE Computer Society Press, May 2015 3. Blanchet, B.: Automatic verification of security protocols in the symbolic model: the verifier proverif. In: Aldini, A., Lopez, J., Martinelli, F. (eds.) FOSAD 2013, FOSAD 2012. LNCS, pp. 54–87, vol. 8604. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10082-1_3 4. Cortier, V., Gaudry, P., Glondu, S.: Belenios: a simple private and verifiable electronic voting system. In: Guttman, J., Landwehr, C., Meseguer, J., Pavlovic, D. (eds.) Foundations of Security, Protocols, and Equational Reasoning. LNCS, vol. 11565, pp. 214–238. Springer, Cham (2019) . https://doi.org/10.1007/978-3-030-19052-1_14 5. Delignat-Lavaud, A., et al.: Implementing and proving the TLS 1.3 record layer. In: IEEE Symposium on Security and Privacy (Oakland), pp. 463–482 (2017) 6. Lewis, S.J., Pereira, O., Teague, V.: Trap-door commitments in the SwissPost e-voting shuffle proof (2019). https://people.eng.unimelb.edu.au/vjteague/SwissVote 7. Wolchok, S., Wustrow, E., Isabel, D., Halderman, J.A.: Attacking the Washington, D.C. Internet voting system. In: Financial Cryptography and Data Security (FC 2012) (2012)

Engineering Privacy in Contact Tracing Apps

Carmela Troncoso EPFL/SPRING Lab carmela.troncoso@epfl.ch Abstract. A key measure to mitigate and slow down virical disease is contact tracing. Contact tracers traditionally relies on time-consuming activities performed by human contact tracers: interview positive patients to identify potential infected contacts, and communicate with those contacts to ensure they take precautions (e.g., self-isolate or take a test). As the number of cases increases, human contact tracers cannot timely perform their activities, decreasing their effectiveness at breaking transmission chains and hence at slowing down the virus spread. This situation prompted institutions and governments to seek help from technology to be able to scale mitigation measures. During 2020 we have witnessed the appearance of a number of Digital Proximity Tracing proposals which have three main goals: notifying contacts in a timely manner, notifying close contacts that may not be identified by manual contact tracing, and operating even when manual contact tracers cannot scale to the number of positive cases. These proposals typically rely on smartphones to gather proximity information that serves to identify contacts. In this talk we will present the Decentralized Privacy-Preserving Proximity Tracing protocol (DP3T) [1], that inspired Google and Apple's Exposure Notification and is now the basis of dozens of proximity tracing mobile apps around the world. We will discuss the requirements and constraints that drove the protocol design, and the security and privacy trade-offs that we had to confront. The protocol, however, is only a small part of a Digital Proximity tracing system which includes communication with the server and integration with health services. This talk will also summarize our experience designing and implementing these mechanisms under time pressure and continuous changes in the underlying libraries.

Reference 1. Troncoso, C., et al.: Decentralized Privacy-Preserving Proximity Tracing, 25 May 2020. https://github.com/DP-3T/documents/blob/master/DP3T%20White%20Paper.pdf

Contents

Applications Delayed Authentication: Preventing Replay and Relay Attacks in Private Contact Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Krzysztof Pietrzak Proof-of-Reputation Blockchain with Nakamoto Fallback . . . . . . . . . . . . . . . Leonard Kleinrock, Rafail Ostrovsky, and Vassilis Zikas Transciphering, Using FiLIP and TFHE for an Efficient Delegation of Computation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clément Hoffmann, Pierrick Méaux, and Thomas Ricosset Encrypted Key-Value Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Archita Agarwal and Seny Kamara

3 16

39 62

Formal Methods Formal Verification of Fair Exchange Based on Bitcoin Smart Contracts . . . . Cheng Shi and Kazuki Yoneyama Certified Compilation for Cryptography: Extended x86 Instructions and Constant-Time Verification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . José Bacelar Almeida, Manuel Barbosa, Gilles Barthe, Vincent Laporte, and Tiago Oliveira

89

107

Protocol Analysis with Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Damián Aparicio-Sánchez, Santiago Escobar, Catherine Meadows, José Meseguer, and Julia Sapiña

128

Verifpal: Cryptographic Protocol Analysis for the Real World . . . . . . . . . . . Nadim Kobeissi, Georgio Nicolas, and Mukesh Tiwari

151

Implementing Elliptic Curve Cryptography On the Worst-Case Side-Channel Security of ECC Point Randomization in Embedded Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Melissa Azouaoui, François Durvaux, Romain Poussier, François-Xavier Standaert, Kostas Papagiannopoulos, and Vincent Verneuil

205

xviii

Contents

Efficient Hardware Implementations for Elliptic Curve Cryptography over Curve448 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mojtaba Bisheh Niasar, Reza Azarderakhsh, and Mehran Mozaffari Kermani Extending the Signed Non-zero Bit and Sign-Aligned Columns Methods to General Bases for Use in Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . Abhraneel Dutta, Aaron Hutchinson, and Koray Karabina

228

248

Ciphers and Cryptanalysis Cryptanalysis of the Permutation Based Algorithm SpoC . . . . . . . . . . . . . . . Liliya Kraleva, Raluca Posteuca, and Vincent Rijmen

273

More Glimpses of the RC4 Internal State Array . . . . . . . . . . . . . . . . . . . . . Pranab Chakraborty and Subhamoy Maitra

294

Mixture Integral Attacks on Reduced-Round AES with a Known/Secret S-Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lorenzo Grassi and Markus Schofnegger Counting Active S-Boxes is not Enough . . . . . . . . . . . . . . . . . . . . . . . . . . Orr Dunkelman, Abhishek Kumar, Eran Lambooij, and Somitra Kumar Sanadhya Computing Expected Differential Probability of (Truncated) Differentials and Expected Linear Potential of (Multidimensional) Linear Hulls in SPN Block Ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Eichlseder, Gregor Leander, and Shahram Rasoolzadeh

312 332

345

Quantum Cryptanalysis Quantum Cryptanalysis on Contracting Feistel Structures and Observation on Related-Key Settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carlos Cid, Akinori Hosoyamada, Yunwen Liu, and Siang Meng Sim Evaluation of Quantum Cryptanalysis on SPECK . . . . . . . . . . . . . . . . . . . . Ravi Anand, Arpita Maitra, and Sourav Mukhopadhyay

373 395

Learning with Errors Making the BKW Algorithm Practical for LWE . . . . . . . . . . . . . . . . . . . . . Alessandro Budroni, Qian Guo, Thomas Johansson, Erik Mårtensson, and Paul Stankovski Wagner

417

On a Dual/Hybrid Approach to Small Secret LWE . . . . . . . . . . . . . . . . . . . Thomas Espitau, Antoine Joux, and Natalia Kharchenko

440

Contents

xix

Encryption and Signatures Adaptively Secure Threshold Symmetric-Key Encryption . . . . . . . . . . . . . . . Pratyay Mukherjee

465

Vetted Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martha Norberg Hovd and Martijn Stam

488

Security of Public Key Encryption Against Resetting Attacks . . . . . . . . . . . . Juliane Krämer and Patrick Struck

508

The Multi-Base Discrete Logarithm Problem: Tight Reductions and Non-rewinding Proofs for Schnorr Identification and Signatures . . . . . . . Mihir Bellare and Wei Dai

529

Skipping the q in Group Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Olivier Blazy and Saqib A. Kakvi

553

Incremental Cryptography Revisited: PRFs, Nonces and Modular Design . . . . Vivek Arte, Mihir Bellare, and Louiza Khati

576

Functional Encryption Gadget-Based iNTRU Lattice Trapdoors . . . . . . . . . . . . . . . . . . . . . . . . . . Nicholas Genise and Baiyu Li Lattice-Based IBE with Equality Test Supporting Flexible Authorization in the Standard Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giang Linh Duc Nguyen, Willy Susilo, Dung Hoang Duong, HuyQuoc Le, and Fuchun Guo

601

624

Efficient Attribute-Based Proxy Re-Encryption with Constant Size Ciphertexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arinjita Paul, S. Sharmila Deva Selvi, and C. Pandu Rangan

644

Adaptive-Secure Identity-Based Inner-Product Functional Encryption and Its Leakage-Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linru Zhang, Xiangning Wang, Yuechen Chen, and Siu-Ming Yiu

666

CCA-Secure ABE Using Tag and Pair Encoding. . . . . . . . . . . . . . . . . . . . . Olivier Blazy and Sayantan Mukherjee

691

Simpler Constructions of Asymmetric Primitives from Obfuscation . . . . . . . . Pooya Farshim, Georg Fuchsbauer, and Alain Passelègue

715

xx

Contents

Secure Multi-party Computation Adaptive Security of Practical Garbling Schemes . . . . . . . . . . . . . . . . . . . . Zahra Jafargholi and Sabine Oechsner Constructive t-secure Homomorphic Secret Sharing for Low Degree Polynomials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kittiphop Phalakarn, Vorapong Suppakitpaisarn, Nuttapong Attrapadung, and Kanta Matsuura

741

763

Perfectly-Secure Asynchronous MPC for General Adversaries (Extended Abstract). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ashish Choudhury and Nikhil Pappu

786

Improving the Efficiency of Optimally-Resilient Statistically-Secure Asynchronous Multi-party Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . Ashish Choudhury

810

High Throughput Secure MPC over Small Population in Hybrid Networks (Extended Abstract). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ashish Choudhury and Aditya Hegde

832

Non-interactive Zero-Knowledge Proofs Dual-Mode NIZKs: Possibility and Impossibility Results for Property Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vivek Arte and Mihir Bellare

859

On Black-Box Extension of a Non-Interactive Zero-Knowledge Proof System for Secret Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyosuke Yamashita, Mehdi Tibouchi, and Masayuki Abe

882

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

905

Applications

Delayed Authentication: Preventing Replay and Relay Attacks in Private Contact Tracing Krzysztof Pietrzak(B) IST Austria, Klosterneuburg, Austria [email protected]

Abstract. Currently several projects aim at designing and implementing protocols for privacy preserving automated contact tracing to help fight the current pandemic. Those proposal are quite similar, and in their most basic form basically propose an app for mobile phones which broadcasts frequently changing pseudorandom identifiers via (low energy) Bluetooth, and at the same time, the app stores IDs broadcast by phones in its proximity. Only if a user is tested positive, they upload either the beacons they did broadcast (which is the case in decentralized proposals as DP-3T, east and west coast PACT or Covid watch) or received (as in Popp-PT or ROBERT) during the last two weeks or so. Vaudenay [eprint 2020/399] observes that this basic scheme (he considers the DP-3T proposal) succumbs to relay and even replay attacks, and proposes more complex interactive schemes which prevent those attacks without giving up too many privacy aspects. Unfortunately interaction is problematic for this application for efficiency and security reasons. The countermeasures that have been suggested so far are either not practical or give up on key privacy aspects. We propose a simple non-interactive variant of the basic protocol that – (security) Provably prevents replay and (if location data is available) relay attacks. – (privacy) The data of all parties (even jointly) reveals no information on the location or time where encounters happened. – (efficiency) The broadcasted message can fit into 128 bits and uses only basic crypto (commitments and secret key authentication). Towards this end we introduce the concept of “delayed authentication”, which basically is a message authentication code where verification can be done in two steps, where the first doesn’t require the key, and the second doesn’t require the message.

1

Introduction

Automated contact tracing aims to simplify and accelerate the process of identifying people who have been in contact with the SARS-CoV-2 virus. There are several This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (682815 TOCNeT). c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 3–15, 2020. https://doi.org/10.1007/978-3-030-65277-7_1

4

K. Pietrzak

similar projects, including east [CKL+20] and west coast PACT [CGH+20], Covid Watch [CW20], DP-3T [TPH+20], Robert [Rob20] or Pepp-PT [Pep20]. For concreteness, in this paper we’ll use the basic DP-3T protocol to illustrate the concept and issues with those proposals. The acronym stands for Decentralized PrivacyPreserving Proximity Tracing.

Advertising app //

EphID is current ephemeral id

(basic DP-3T)

Receiving app

EphID locally store EphID

Fig. 1. The basic DP-3T protocol. If at some point a user is reported sick the app will learn (the keys required to recompute) their EphID’s of the last 14 d. If it locally stored one of those EphID’s this means they were likely close to an infected user.

In the DP-3T proposal users are assumed to have a mobile phone or another Bluetooth capable device with their application installed. At setup the app samples a random key SK0 . This key is updated every day as SKi = H(SKi−1 ) using a cryptographic hash function H. Each key defines n Ephemeral IDentifiers (EphID) derived from SKi using a pseudorandom generator and function as EphID1 EphID2  . . . EphIDn := Prg(Prf(SKi , “broadcast key”)) Those EphID’s are used in a random order during the day, each for 24 · 60/n minutes (say 30 min if we set n = 48). The current EphID is broadcast using Bluetooth in regular intervals to potential users in its proximity. Phones locally store the EphID’s they receive as well as their own SKi ’s of the last 14 days. If a user tests positive they can upload their key SKi from 14 days ago to the backend server, which will distribute it to all users. From this key, one can recompute all the EphID’s the infected user broadcasted in the last 14 d. So all other users can check if there’s a match with any of their locally stored EphID’s. If yes, it means they’ve presumably been in proximity to an infected person in the last two weeks and thus should take action (get tested, self isolate). This description is oversimplifying several aspects that are not relevant for this work. This is a more decentralized approach, which apart from DP-3T is followed by both PACTs and Covid Watch from the above mentioned proposals. In Pepp-PT or Robert the infected user uploads the EphID’s he received (not broadcasted), and the server (which here must know the secret keys and contact details of all users) contacts the senders of those EphID’s. Important aspects of this protocol are its simplicity, in particular the fact that the protocol is non-interactive as illustrated in Fig. 1, and its privacy properties: Even the combined information stored by all honest users reveals

Delayed Authentication: Preventing Replay and Relay Attacks

5

which encounters happened, but no location or time (apart from the day). Some privacy and security properties have been formally analyzed [DDL+20], but once we take into account malicious actors, this scheme has serious issues with its privacy and robustness properties.

2

Replay and Relay Attacks

Many papers including [ABIV20,KBS20,BDF+20] discuss privacy issues with DP-3T like protocols. Vaudenay [Vau20a,Vau20b] and Gvili [Gvi20] also discusses robustness of this scheme including replay and relay attacks. As an illustration of a replay attack consider an adversary who collects EphID’s in an environment where it’s likely infections will occur (like a hospital), and then broadcast those EphID’s to users at another location, say a competing company it wants to hurt. Later, when a user from the high risk location gets tested positive, the people in the company will be instructed to self isolate.

Advertising app // k, EphID are current // emphemeral key and

[Vau20a] replay-secure id

Receiving app

EphID pick challenge

challenge tag ← Mac(k, challenge)

tag locally store (EphID, challenge, tag)

Fig. 2. Vaudenay’s protocol secure against replay attacks. Apart from ephemeral IDs EphID (as in DP-3T) also ephemeral keys k are generated from SK. When later the user of the receiving app gets keys of infected users they will check for every tuple (k, EphID ) derived from those keys if they locally store a triple (EphID, challenge, tag) ?

with EphID = EphID . If yes, they will check whether tag = Mac(k, challenge) and only if this check verifies, assume they were close to an infected party.

Vaudenay suggest an extension of the basic DP-3T protocol, shown in Fig. 2, which is secure against replay attacks, but it comes at the prize of using interaction. For efficiency and security reasons, the current DP-3T proposal uses Bluetooth low energy beacons, which makes interaction problematic (we refer to [TPH+20] for more details). To see that this protocol is secure against a replay attack consider an adversary who re-sends the received EphID at a later time-point. The receiving party will send a random challenge which is almost certainly different from all the challenge values that were send when EphID was originally broadcast. The standard security notion for message authentication codes implies that the adversary, who does not know k, will almost certainly not

6

K. Pietrzak

be able to compute the right tag Mac(k, challenge ) for this challenge. Should the adversary still answer with a authenticator tag , the receiving user will reject the triple (EphID, challenge, tag ) once it learns k as tag = Mac(k, challenge ). A relay attack is a more sophisticated attack (than a replay attack). Here the adversary relays the messages from one location (e.g., the hospital) to another (e.g., the company it wants to hurt) in real time. The protocol from Fig. 2 is not secure against relay attacks. Vaudenay also suggests a protocol which thwarts relay attacks assuming both devices know their location (using GPS) and with an additional round of interaction. Our relay secure protocol also requires location data, but achieves security against relay attacks without interaction. Replay and relay attacks are listed as one of the main security aspects in the Mobile applications to support contact tracing in the EU’s fight against COVID19 document [eu20]. Identifier relay/replay prevention safeguards should be implemented to prevent those attacks, meaning that User A should not be able to record an identifier of User B and, afterwards, send User B’s identifier as his/her own. 2.1

Inverse-Sybil Attacks

This submission only deals with replay and relay attacks, but there’s at least one other attack by which one can launch false positives on a large scale in protocols including DP-3T; A group of users (malicious or with hacked devices) can use the same initial key on many devices. Should this key then be uploaded to the backend server, all users that were in proximity with one of those many devices would get an alert. This attack is one of the “terrorist attacks” described in [Vau20b] and called “inverse-sybil” attack in [AKP+20], this latter work proposes protocols that harden the app against this attack by making the broadcasted values unpredictable (concretely, either dependent on past encounters or location) and chaining them, so that later the chains of the individual devices cannot be merged, and thus only one device is allowed to uploads its encounters. Like in this work, the protocols in [AKP+20] only use simple crypto, everything can be done using just a hash function, but they have to assume collision resistance (for the chaining property), while in this work universal hashing is sufficient. A related attack discussed in [Vau20b] considers bribing infected users to upload tokens, the feasibility of such an attack using smart contracts has been analyzed in [AFV20].

3 3.1

Simple Solutions A Simple Solution Without Privacy

If one doesn’t care about storing the time of encounters, there’s a straight forward non-interactive protocol secure against replay and even relay attacks: take Vaudenay’s protocol as shown in Fig. 2 but replace challenge with the current

Delayed Authentication: Preventing Replay and Relay Attacks

7

time t, so the sender just broadcasts (EphID, tag = Mac(k, t)). The receiving app will store each such received tuple together with the coarse current time t . If the sender later reports sick and their ephemeral ID/key pairs (EphID, k) become public, the receiver checks for every such pair and every locally stored ? entry (EphID , tag, t ) where EphID = EphID if tag = Mac(k, t ). Note that this  check verifies iff t = t , i.e., the claimed time of the sender matches the time of the receiver. If the check fails the recipient rejects that tuple as a replay attack might have taken place. In centralized schemes (like Robert or Pepp-PT) the server, who knows all keys, can do this verification. This protocol has terrible privacy properties for sender and receiver as the time of encounters are locally stored by the receiving parties. Replay attacks are also not completely ruled out, but are still possible within one unit of time, which must be measured sufficiently coarse in this protocol to allow for some latency and asynchrony of clocks of sending and receiving party. Setting it to a second should be sufficient. By additionally authenticating the location, e.g. using coarse GPS coordinates, one can also prevent relay attacks. 3.2

A Simple Solution Using Digital Signatures

Using digital signatures instead of a message authentication code one can solve the privacy problem of the protocol above. Consider the protocol from above, but where using the ephemeral key k one first samples a signature public/secret key pair (sk, pk) ← keygen(k) using k as randomness. Instead of the MAC, we now broadcast the public key and a signature of the current time, i.e., (EphID, sign(sk, t), pk). Upon receiving a tuple (EphID, σ, pk) the recipient retrieves its time t and then runs the signature verification algorithm ? accept = verify(pk, σ, t ) to check if σ is a valid signature for the current time t under pk. If no, the tuple is ignored, otherwise it stores (EphID, pk), but not the signature itself! If later ephemeral ID/key pairs (EphID, k) of infected parties become known, it checks for every such tuple and every locally stored (EphID , pk) where EphID = EphID if pk is derived from k (i.e., pk = pk where (pk , sk ) ← keygen(k)). Only if all checks pass it assumes it was close to an infected party. As to security, it’s not hard to show that a successful relay attacks requires the adversary to forge a signature. While for privacy we observe that the tuple (EphID, pk) stored by the receiver is independent of the time at which the encounter happened. This protocol is less efficient than the previous construction simply because signatures are computationally more expensive than MACs, and already for this reason might not be acceptable for a tracing app. There’s also no chance to fit the message (a signature and public-key) into a 128 bits Bluetooth beacon as used by the current proposals.

8

3.3

K. Pietrzak

Our Contribution

We propose a construction which basically combines the best features of the two simple constructions outlined above. That is, a simple non-interactive scheme using symmetric cryptography that prevents replay (and if we assume location data is available, also relay) attacks, does not leak any time (or location) information and where the broadcasted value fits into a 128 bit beacon.

4

Delayed Authentication

The main tool in our protocols is the concept of “delayed authentication”. By combining a (weak) commitment scheme commit : M × R → Y and a standard message authentication code Mac : K × M → T (where Y ⊆ M ) we get a new (randomized) message authentication code DelayMac : K × M × R → T which allows for a two step authentication process, where the first doesn’t require the key, and the second is independent of the authenticated message. We need the following security from commit (weak, computationally) binding: It must be hard to find a tuple of message/randomness pairs (m, ρ),(m , ρ ) where m = m but commit(m , ρ ) = commit(m, ρ). This is the standard (computationally) binding property for commitment schemes, but we’ll actually just need a weaker binding property, where the adversary first must choose an m and gets a random ρ. And for this pair (m, ρ) must output a (m , ρ ) where m = m but commit(m , ρ ) = commit(m, ρ). In Sect. 6 we’ll exploit the fact that just this weaker notion is required to reduce communication. (statistically) hiding: For any m ∈ M and a uniformly random ρ ←$ R the commitment commit(m, ρ) is close to uniform (or some other fixed distribution) given m. In practice we can expect a well designed cryptographic hash function H to satdef isfy the above, i.e., use commit(m, ρ) = H(mρ), if the range of H is sufficiently smaller than the length |ρ| of ρ. We define the following randomized message authentication code DelayMac (ρ, Mac(k, σ)) ← DelayMac(k, m) where ρ ←$ R, σ := commit(m, ρ) We can verify such a message/tag pair m, (ρ, tag) in two steps. 1. Compute σ := commit(m, ρ), then store (σ, tag) and delete m, ρ. Note that this step does not require k, we call (σ, tag) the delayed tag. 2. Later, should the key k become available, authenticate the delayed tag check? ing tag = Mac(k, σ). We will require two security properties from DelayMac. First, it must satisfy the standard security notion for message authentication codes. Second, we want the delayed tag (σ, tag) together with k to statistically hide the authenticated message m. Both properties are easy to prove.

Delayed Authentication: Preventing Replay and Relay Attacks

9

Lemma 1. DelayMac is existentially unforgeable under chosen message attacks. Proof. Consider an adversary ADelayMac(k,·) who can request tags (ρi , tagi ) for messages mi of its choice (for a secret random key k), and finally outputs a forgery, i.e., a message tag/pair (m∗ , (ρ∗ , tag∗ )) for an m∗ on which it did not query the oracle and which passes verification. We will show that this forgery can either be turned into a forgery for Mac or to break the (weak) binding property of commit. As assuming Mac and commit are secure means both those events have negligible probability, we can conclude that every efficient A will only be able to find such a forgery with negligible probability (and thus DelayMac is secure as stated in the lemma). Let σi = commit(mi , ρi ), then if def

∀i : σi = σ ∗ = commit(m∗ , ρ∗ ) the forgery (m∗ , (ρ∗ , tag∗ )) for DelayMac can be turned into a forgery (σ ∗ , tag∗ ) for Mac. Otherwise, for some i it holds that commit(m∗ , ρ∗ ) = commit(mi , ρi )

and

m∗ = mi

and thus we broke the binding property of commit. In fact, as the ρi were chosen uniformly at random by the oracle, this already breaks the weak binding property (the simple reduction showing this is omitted).

Lemma 2. The delayed tag (σ, tag) = (commit(m, ρ), Mac(k, σ)) together with k statistically hides m. Proof. As we assume commit is statistically hiding, there exists some distribution X such that for any m and a uniform ρ, the commitment σ = commit(m, ρ) is  close to X. As k is independent of (m, ρ) and thus σ, by the data processing inequality also (σ, tag = Mac(k, σ), k) is  close to some fixed distribution Y (concretely, to the distribution Y ∼ (X, Mac(k, X), k)). In terms of indistinguishability, this means that even a computationally unbounded distinguisher who can choose two messages m0 , m1 , and then gets the delayed tag for mb for a random b ∈ {0, 1}, will not be able to guess b better than with probability 1/2 + /2. If commit is only computationally hiding, such indistinguishability only holds against computationally bounded distinguishers.



5

Our Protocol

Let us illustrate how we use delayed authentication in our protocol which is given in Fig. 3. As in [Vau20a], apart from the EphID’s, EphID1 EphID2  . . . EphIDn := Prg(Prf(SKi , “broadcast key”)) the app additionally computes an ephemeral secret key ki with each EphIDi k1 k2  . . . kn = Prg(Prf(SKi , “secret key”))

10

K. Pietrzak our protocol

Advertising app // k, EphID current emph. key,id // got coarse time and location (x, y) ← get GPS-coordinates t ← get current time // (ρ, tag) ← DelayMac(k, (t, x, y)) ρ ←$ R σ := commit((t, x, y), ρ) tag := Mac(k, σ)

ρ, tag, EphID t|1 , x|1 , y|1

Receiving app

(x , y  ) ← get GPS-coordinates t ← get current time round t , x , y  so their least significant bit matches the bits t|1 , x|1 , y|1 σ  := commit((t , x , y  ), ρ) locally store (σ  , tag, EphID)

Fig. 3. Our non-interactive contact tracing protocol that is secure against relay attacks. By ignoring the blue text we get a protocol that is only secure against replay attacks but doesn’t require location data, which might not always be available or due to security reasons not desired. If later a user gets sick and the app gets its (k, EphID ) pairs, it will check if it has stored a triple (σ, tag, EphID) with EphID = EphID , and if so, finish verification by checking the ?

delayed tag Mac(k, σ) = tag to detect potential replay or relay attacks.

Our protocol will use the current time, which must be measured rather coarse so that the advertizing and receiving applications have the same time, or are at most off by one unit (one or a few seconds seem reasonable for one unit of time). The replay secure protocol now works as follows – (broadcast). In regular intervals a message is broadcast which is computed as follows. Let EphID, k be the current ephemeral ID and key and t denote the coarse current time. Compute (ρ, tag) ← DelayMac(k, t) and broadcast (ρ, tag), EphID, t|1 where t|1 is the least significant bit of t. – (receive broadcast message). If the app receives a message (ρ, tag), EphID, t|1 it gets its current time t . If t|1 = t|1 then round t to the nearest value so the least significant bit matches t|1 . Compute σ  := commit(t , ρ) and locally store (σ  , tag, EphID). Note that the stored σ  will match the σ used by the sender if their measured times are off by at most one unit. – (receive message from backend server). If the app learns ephemeral ID/key tuples (EphID , k) of infected parties from the backend server, it checks if it stored a tuple (σ  , tag, EphID) where EphID = EphID . For every such tuple ?

it checks if Mac(k, σ  ) = tag, and only if the check passes, it assumes it was close to an infected party.

Delayed Authentication: Preventing Replay and Relay Attacks

11

Replay Security: To see how this approach prevents replay attacks, consider an adversary who records a broadcasted message (ρ, tag, EphID) at time t that was computed by the sender using its (at this point secret) ephemeral key k. At a later time t∗ > t+1 the adversary send a potentially altered message (ρ∗ , tag∗ , EphID) to a receiver. The receiving party will reject this tuple after learning the key k (i.e., when the originally broadcasting party reports sick) unless tag∗ = Mac(k, σ ∗ ) where σ ∗ = commit(t∗ , ρ∗ ) But this is a forgery for DelayMac, and as stated in Lemma 1, hard to find. Relay Security: If we additionally authenticate the location like this, as shown in blue in Fig. 3, we achieve security against relay attacks. Again the location must be coarse enough so that the sending and receiving parties x and y coordinates are off by at most one unit, while they shouldn’t be too coarse as a replay attack within neighbouring coordinates is possible. What an appropriate choice is depends on how precise GPS works, units of 50 m seems reasonable. Privacy: The tuple (σ, tag, EphID) stored by the receiver hides (in fact, is independent) of the time (and if used, also location) in which the encounter happened by Lemma 2. This only holds if the sender correctly computed the tag, we discuss malicious senders later in Sect. 7.2. Of course the receiving app needs to be careful how it stores such tuples, as e.g. simply storing them in the order as they were received would leak an ordering.

6

Fitting the Broadcast Message into 128 Bits

In the current DP-3T proposal the ephemeral identifiers EphID which are broadcast are 128 bits long, in this section we outline how our protocol can potentially achieve reasonable security and robustness while also just broadcasting 128 bits, but having more wiggle room would certainly be better. We need 3 bits for the time and location least significant bits t|1 , x|1 , y|1 , and distribute the remaining 125 bits by setting the length of the remaining 3 broadcasted values to |ρ| = 80, |tag| = 35, |EphID| = 10 In a nutshell, that gives us |ρ| = 80 bits computational security for the commitment, |tag| = 35 bits of statistical security for the authenticator while the probability of a false positive for a locally stored encounter with a reported key/identifier is 2−|tag|−|EphID| = 2−45 , we elaborate on this below. |ρ| = 80 (security of commit): We suggest to implement the commitment using a standard cryptographic hash function def

commit(m, ρ) = H(mρ) .

12

K. Pietrzak

It is important to specify how long the commitment should be, i.e., where we truncate the output of H. A conservative choice would be a length of |H(mρ)| = 256 bits, but then the hiding property of the commitment would only be computational (while the binding would be statistical). In practice this means a computationally unbounded adversary can break privacy (by finding the message/randomness in the locally stored commitment). If statistical hiding is required (due to legal or other reasons), we need to send the output length to 80 bits (or slightly shorter). Due to birthday attacks, that would be way too √ short to achieve binding as a collision for commit could be found with just 280 = 240 queries. As discussed in Sect. 4, we only need to satisfy a weak binding property where the adversary must find a preimage for a given output not just any collision, and for this notion we can get 80 bits security even with short 80 bit commitments. When doing this, one should harden the scheme against multi-instance attacks, where an adversary tries to find a ρ∗ s.t. σi = commit((t, x, y), ρ∗ ) for one from many previously intercepted σ1 , . . . , σn ’s in order to replay it. This can be done by adding the EphID to the commitment, i.e., specify that it’s computed as σ := commit((EphID, t, x, y), ρ) . This way only broadcast messages with the same EphID can be simultaneously attacked. |tag| = 35 (security of Mac): While we can choose strong ephemeral keys k for the Mac, say 128 bit keys, the fact that tag is only 35 bits means a random tag will verify with probability one in 235 ≈ 34 billions. This seems good enough to discourage replay or relay attack attempts. |EphID| + |tag| = 45 (probability of false positives): A stored encounter (σ  , tag, EphID) will match a unrelated identifier/key (k∗ , EphID∗ ) pair from the backend server by pure chance if EphID = EphID∗ and tag = Mac(k∗ , σ  ), which happens with probability 2−(|k|+|EphID|) = 2−45 . If we assume that EphID’s are rotated every 30 min, so we have 48 per day, the probability of a false positive for encounters in any given day, assuming we have n users of which m will be reported sick in the next 14 d, is m·n·48 2−45 . For 10 million users and 10000 cases that’s one false positive per week in total. |EphID| = 10 (efficiency of verifying messages from backend server): The discussion so far suggests one should set |EphID| = 0 and instead use those 10 bits to increase |tag|. The reason to keep a non-empty EphID is for efficiency reasons. For every locally stored encounter (σ  , tag, EphID) and every identifier/key (k∗ , EphID∗ ) (computed from keys received) from the backend server we need to compute the tag = Mac(k∗ , σ  ) whenever EphID = EphID∗ , thus for a 2−|EphID| fraction of such pairs. With |EphID| = 10 the unnecessary computation due to such identifier collisions should be small compared to the computation required to expand the keys received from the backendserver to the identifier/key pairs.

Delayed Authentication: Preventing Replay and Relay Attacks

7

13

Digital Evidence

The main design goal the presented protocol is security against replay and relay attacks, but there are also other issues with CP3T and similar protocols discussed in [TPH+20] and [Vau20a] which our scheme does not solve. There also is at least one privacy aspect that is aggravated by our protocol compared to the basic CP3T: for parties deviating from the protocol, it’s easier to produce digital evidence when and where an encounter took place. While we feel this is a minor issue compared to replay and relay attacks, one needs to be aware of such trade-offs. 7.1

Malicious Receiver

A malicious receiver can produce digital evidence about the advertizing user. This is also the case for Vaudenay’s protocols (and already mentioned in [Vau20a]). As a concrete example, the receiver can put a hash of the entire transcript (for our protocol that would mean the broadcast message, time and location) on a blockchain to timestamp it. Later, should the advertizing user test positive and his keys being released, this transcript serves as evidence about when and where the encounter took place. Timestamping is necessary here, as once the keys are released everyone can produce arbitrary transcripts. To actually break privacy of an individual, one of course also needs to somehow link the released keys to that person. 7.2

Malicious Sender

If the sending party stores the randomness ρ it broadcasts (together with the current location and time), it can later prove that it met a receiving user at time t and location (x, y) by providing the opening information (t, x, y, ρ) for a commitment σ = commit((t, x, y), ρ) stored by the receiver. Note that unlike in the malicious receiver case discussed above, the commitments σ are never supposed to leave the phone, even if the receiving user reports being sick. Thus this privacy break requires a successful attack on the receiver’s phone or the receiver must be forced to reveal this data.

8

Conclusions, Apple and Google

We propose a contact tracing protocol which is almost as simple and efficient as the basic versions of the DP-3T, PACT, Covid watch etc. proposals, but which also provides security against replay attacks. And if location data is available, even against relay attacks. Apple and Google [app20] recently announced an exposure notification API for iOS and Android which addresses two major problems one faces when implementing such protocols (1) by allowing to sync rotations of the Bluetooth’s MAC addresses and ephemeral keys one prevents tracing a device even as the key is

14

K. Pietrzak

rotated (2) using the API the app can run in the background, saving battery and adding convenience. Unfortunately, instead of allowing the community to invent and implement decentralized/privacy-preserving/robust tracing apps using these great tools, they decided to implement a tracing protocol on top of it, and the only way to use those features is via this particular protocol, leaving few choices to the actual protocol designers. Many, if not most, of the suggestions made by the community, including in this work, simply cannot be build on top of their solution. The Apple-Google protocol is basically the “simple solution without privacy” outlined in Subsect. 3.1. They chose very coarse time units of 10 min, which means replay attacks are still possible within a 10 min windows, while the time of encounters is stored to a precision of 10 min. While 10 min are way too short to give meaningful privacy guarantees, it’s certainly long enough for substantial replay attacks.1 Such attacks, even if detected, would seriously hurt the trust and deployment of the app. Another concern is the possibility of voter suppression using such attacks [GKK]. It would also give strong arguments to proponents of centralized solutions, preventing which seems to be a main motivation of Apple and Google for not allowing direct access to those features in the first place. We hope they will reconsider their stance on the issue and allow the community to again contribute towards coming up with the best tracking solutions.

References [ABIV20] Avitabile, G., Botta, V., Iovino, V., Visconti, I.: Towards defeating mass surveillance and SARS-CoV-2: The pronto-C2 fully decentralized automatic contact tracing system. Cryptology ePrint Archive, Report 2020/493 (2020). https://eprint.iacr.org/2020/493 [AFV20] Avitabile, G., Friolo, D., Visconti, I.: TEnK-U: terrorist attacks for fake exposure notifications in contact tracing systems. Cryptology ePrint Archive, Report 2020/1150 (2020). https://eprint.iacr.org/2020/1150 [AKP+20] Auerbach, B., et al.: Inverse-sybil attacks in automated contact tracing. Cryptology ePrint Archive, Report 2020/670 (2020). https://eprint.iacr. org/2020/670 [app20] Privacy-preserving contact tracing (2020). https://www.apple.com/ covid19/contacttracing [BDF+20] Baumg¨ artner, L., et al.: Mind the gap: security and privacy risks of contact tracing apps (2020) 1

Consider an app by which malicious covidiots can collect Bluetooth beacons, share them amongst each other, and then re-broadcast the beacons they jointly collected in the last 10 min, leading to many false positives. This attack arguably even works with fairly short (say 1 s) time windows as used in this paper. To really prevent such attacks it seems one either needs location data (as shown in this work using coarse GPS locations), interaction as in Vaudenay’s protocol [Vau20a] or public-key crypto [ABIV20, LAY+20, WL20, CKL+20].

Delayed Authentication: Preventing Replay and Relay Attacks

15

[CGH+20] Chan, J., et al.: PACT: privacy sensitive protocols and mechanisms for mobile contact tracing. CoRR, abs/2004.03544 (2020) [CKL+20] Canetti, R., et al.: Privacy-preserving automated exposure notification. Cryptology ePrint Archive, Report 2020/863 (2020). https://eprint.iacr. org/2020/863 [CW20] COVID watch (2020). https://www.covid-watch.org/ [DDL+20] Danz, N., Derwisch, O., Lehmann, A., Puenter, W., Stolle, M., Ziemann, J.: Security and privacy of decentralized cryptographic contact tracing. Cryptology ePrint Archive, Report 2020/1309 (2020). https://eprint.iacr. org/2020/1309 [eu20] Mobile applications to support contact tracing in the EU’s fight against COVID-19 (2020). https://ec.europa.eu/health/sites/health/files/ ehealth/docs/covid-19 apps en.pdf. Version 1.0, 15 Apr 2020 [GKK] Gennaro, R., Krellenstein, A., Krellenstein, J.: Exposure notification system may allow for large-scale voter suppression [Gvi20] Gvili, Y.: Security analysis of the COVID-19 contact tracing specifications by Apple inc. and Google inc. Cryptology ePrint Archive, Report 2020/428 (2020). https://eprint.iacr.org/2020/428 [KBS20] Kuhn, C., Beck, M., Strufe, T.: COVID notions: towards formal definitions - and documented understanding - of privacy goals and claimed protection in proximity-tracing services. CoRR, abs/2004.07723 (2020) [LAY+20] Liu, J.K., et al.: Privacy-preserving COVID-19 contact tracing app: a zeroknowledge proof approach. Cryptology ePrint Archive, Report 2020/528 (2020). https://eprint.iacr.org/2020/528 [Pep20] PEPP-PT: Pan-European privacy-preserving proximity tracing (2020). https://www.pepp-pt.org/ [Rob20] ROBERT: ROBust and privacy-presERving proximity Tracing (2020). https://github.com/ROBERT-proximity-tracing [TPH+20] Troncoso, C., et al.: DP3T: decentralized privacy-preserving proximity tracing (2020). https://github.com/DP-3T [Vau20a] Vaudenay, S.: Analysis of DP3T. Cryptology ePrint Archive, Report 2020/399 (2020). https://eprint.iacr.org/2020/399 [Vau20b] Vaudenay, S.: Centralized or decentralized? The contact tracing dilemma. Cryptology ePrint Archive, Report 2020/531 (2020). https://eprint.iacr. org/2020/531 [WL20] Wan, Z., Liu, X.: ContactChaser: a simple yet effective contact tracing scheme with strong privacy. Cryptology ePrint Archive, Report 2020/630 (2020). https://eprint.iacr.org/2020/630

Proof-of-Reputation Blockchain with Nakamoto Fallback Leonard Kleinrock1 , Rafail Ostrovsky1 , and Vassilis Zikas2(B) 1

2

Computer Science Department, UCLA, Los Angeles, USA {lk,rafail}@cs.ucla.edu School of Informatics, University of Edinburgh, Edinburgh, UK [email protected]

Abstract. Reputation is a major component of trustworthy systems. However, the subjective nature of reputation, makes it tricky to base a system’s security on it. In this work, we describe how to leverage reputation to establish a highly scalable and efficient blockchain. Our treatment puts emphasis on reputation fairness as a key feature of reputation-based protocols. We devise a definition of reputation fairness that ensures fair participation while giving chances to newly joining parties to participate and potentially build reputation. We also describe a concrete lottery in the random oracle model which achieves this definition of fairness. Our treatment of reputation-fairness can be of independent interest. To avoid potential safety and/or liveness concerns stemming from the subjective and volatile nature of reputation, we propose a hybrid design that uses a Nakamoto-style ledger as a fallback. To our knowledge, our proposal is the first cryptographically secure design of a proof-ofreputation-based (in short PoR-based) blockchain that fortifies its PoRbased security by optimized Nakamoto-style consensus. This results in a ledger protocol which is provably secure if the reputation system is accurate, and preserves its basic safety properties even if it is not, as long as the fallback blockchain does not fail. Keywords: Blockchain

1

· Proof of reputation · Byzantine agreement

Introduction

Many decisions taken in modern society are based on reputation: Fans are likely to follow suggestions from their idols, social network followers are likely to adopt suggestions of the friends and groups, and people often use publicly available ranking systems such as E-Bay, Yelp, AirBnB, Amazon, etc. to make decisions regarding online-shopping, choice of vacation, accommodation, eating out, insurances, investment, medical, etc. In this work we leverage the power of reputation to establish a reliable, permissionless blockchain. Our design assumes that certain parties in the system have a reputation-rank, which is recorded on the blockchain itself. Similar to public ranking systems as above, where the “stars” of a party are interpreted as a prediction of the quality of its offered services, the interpretation of our reputation-ranks is that the higher a party’s reputation c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 16–38, 2020. https://doi.org/10.1007/978-3-030-65277-7_2

Proof-of-Reputation Blockchain with Nakamoto Fallback

17

the higher are the chances that the party will behave honestly in maintaining the blockchain. In a nutshell, our design goals are two-fold: (1) To rely on the reputation system to build a simple, scalable decentralized ledger with optimized finality, communication, and computation, and with a formal proof that relates the security of the protocol to the quality of the reputation system; importantly, our PoR-blockchain aims to satisfy a new intuitive notion of participation fairness that promotes inclusivity. (2) To address the subjectivity of reputation as a resource, by backing our blockchain’s safety and liveness with a fallback mechanism that ensures that even if the reputation estimate is severely flawed, our protocol does not create long forks. Our Results. We devise a hybrid blockchain-ledger design which is primarily based on reputation but uses a Nakamoto ledger as fallback. We use the term Nakamoto ledger similar to [3] to refer to blockchain-based ledger protocols that follow the eventual consensus paradigm (e.g., Bitcoin, Ouroboros, etc.) and realize a ledger as described in [2,4]. For the purpose of exposition, in this work, we focus on the fallback ledger being proof-of-stake-based and tune our analysis accordingly—we refer to this paradigm as PoR/PoS-hybrid. design However, our treatment can be extended to other types of fallback blockchains (even those that do not follow the Nakamoto paradigm) under their respective assumptions. In the following we provide an overview of our design and discuss its properties in comparison to existing approaches. PoR-Blockchain. We assume a (dynamically updatable) set P of parties where a subset Pˆ ⊆ P of them are special parties called reputation parties. The parties wish to leverage the reputation of the reputation parties to securely maintain a ledger containing a sequence of blocks, each block containing a collection of messages that can represent arbitrary data (throughout this paper, we refer to this data as transactions).1 Informally, a reputation system for a party set Pˆ is similar to a probabilistic adversary structure [15]: It assigns to each subset of Pˆ a probability that this subset is corrupted by the adversary—we consider active, aka Byzantine, corruption. In its simplest form, which we refer to as static correlation-free, a ˆ independent boolean ranreputation system can be described by a vector of |P| dom variables. The probability that the ith variable is 1 is the probability that ˆ the i-th party in P—in a given canonical ordering, e.g., using the party-IDs—is honest. This is similar to independent reputation systems investigated in [1]. We also extend this notion by allowing the reputation to evolve as rounds advance, yielding a dynamic reputation system. The update happens in an epoch-based manner, where an epoch consists of a fixed number of rounds. To capture feasibility of PoR-based consensus, we introduce the notion of a feasible reputation 1

We do not specify here how this data is efficiently encoded into a block, e.g., so that they can be updated and addressed in an efficient manner; however, one can use the standard Merkle-tree approach used in many common blockchains, e.g., Bitcoin, Ethereum, Ouroboros, Algorand, etc.

18

L. Kleinrock et al.

system (cf. Definition 2) which for static reputation systems requires that there is a party-sampling algorithm, such that the probability that a super-majority of the selected parties is honest is overwhelming. (This is analogous to the feasibility condition from [1].) Our ledger protocol proceeds in rounds. Each block is associated with a slot, where a slot lasts a predefined number of rounds. We use the reputation system as follows (we restrict the discussion here to static correlation-free reputation): The contents of the genesis block—which is assumed to be available to any party joining the protocol and includes the (initial) reputation system along with a random nonce—are hashed to generate common randomness used in the first epoch. Since every party is assumed access to the genesis block, every party can locally run a lottery which is “biased” by the parties’ reputation using these coins to choose a committee for each slot. (For dynamic reputation, the contents of an epoch are used to extracts coins for the following epoch.) i for each slot i, The above implicit lottery is used to elect a slot committee CBA which is responsible for gathering all known transactions at the beginning of the slot that have not yet been inserted in the blockchain, and proposing the block corresponding to this slot along with evidence to enable all parties to agree on i of this block. More concretely, in every slot, every party in a random subset CBC i CBA pools all new and valid transactions received by the blockchain users into a i , by means of a byzantine broadcast protocol. set, and broadcasts this set to CBA i is then signed by every member The union of the tractions broadcasted by CBC i of CBA and circulated to the whole party set P who accept it if and only if it i i |/2 signatures from parties in CBA . As long as for each of the has at least |CBA slot committees the majority of its members is honest—a property which will be induced by an accurate and feasible reputation system—the above consensus protocol will achieve agreement on the proposed block. Of prime importance in our construction, and a novelty of our PoR-ledger, is a mechanism which ensures an intuitive notion of inclusivity. For clarity, we focus our description and analysis on static correlation-free reputation systems. Different ways to extend our result to dynamic and correlated adversaries are discussed in [21]. As proved in [1], for any feasible static correlation-free reputation system, the following sampler, denoted as Amax , outputs a committee with honest majority: Order the parties according to their reputation and choose a sufficient number (cf. [1]) of reputation parties with the highest reputations. The above simple Amax algorithm is optimal (up to negligible error) for minimizing the risk of electing a dishonest-majority committee, but it suffers from the following issues: First, if some malicious party establishes (e.g., by behaving honestly) a high reputation, then this party will be included in almost every committee. Second, the approach lacks a natural fairness property which would give all parties voting power according to their reputation (even to parties with low reputation). Such a fairness property is important for sustainable decentralization, as it does not deter participation of parties with low reputation making

Proof-of-Reputation Blockchain with Nakamoto Fallback

19

the overall system more inclusive.2 We propose and instantiate an appropriate notion of reputation fairness, termed PoR-fairness, which addresses the above concerns. The idea behind PoR-fairness is that every reputation party should i . Defining such a get a “fair” chance to be part of each slot-i committee CBA notion turns out to be non-trivial (cf. Sect. 3.1). Intuitively, a reputation-based lottery (i.e., committee selection protocol) is reputation-fair, if, (1) on average, the representation of parties on the selected committee becomes higher, the more likely those parties are to be honest (according to their reputation); (2) this representation increases as the ratio of parties with higher over parties with lower reputation increases; and (3) the probabilities of parties included in the lottery increase proportionally to their reputation. Realizing such a reputation-fair lottery also turns out to be a challenging task and a core technical contribution of our work. The Nakamoto Fallback. Arguably, reputation is a subjective and manipulable resource. For instance, the way reputation of new parties is assigned might be flawed or a malicious party might act honestly to build up its reputation, and then use its earned trust to attack the system. We address this in the following way: We back our PoR-blockchain with a light-weight use of a Nakamoto-style blockchain3 which will ensure that if the reputation system fails to provide security, this will be observed and agreed upon on the Nakamoto-chain. This ensures that reputation parties, who try to actively cheat (e.g., sign conflicting messages) or abstain to hurt liveness or safety, will be exposed on the fallback chain. The idea for our fallback is as follows: Parties report on the fallback chain (an appropriate hash of) their view of the PoR-blockchain. If any party observes an inconsistency of this with its own view, it contest the reported view, by posting an accusation along with appropriate evidence (signatures and hash-pointers). The original reporting party is then expected to respond to this accusation by the contents and signatures of the disputed block. Once the accusation is answered, it will either expose one of the two disputing parties as a cheater, or it will expose a misbehavior on the PoR-chain (e.g., signing conflicting messages). In either case the exposed party gets its reputation zeroed out and is excluded from the system. This fallback mechanism fortifies the security of the blockchain under an independent backup assumption, e.g., majority of honest stake. Note that the fallback blockchain is not used for communicating transactions, but only digests (hashes) of blocks from the main, reputation chain as discussed below. Furthermore, it does not need to run in a synchronized manner with the main PoR-chain. This allows a very light-weight use of the fallback chain, which as it is a blackbox use, can even be outsourced to an existing Nakamoto-style chain. We refer 2 3

For instance, one can consider a mechanism which rewards honest behavior by increasing the parties’ reputation. As discussed above, here we focus on a proof-of-stake Nakamoto-style blockchain, e.g., [20], but our fallback uses the Nakamoto blockchain in a blackbox manner and can therefore be instantiated using any blockchain that realizes a Bitcoin-style transaction ledger [4].

20

L. Kleinrock et al.

to this type of blockchain design as a PoR/PoS-Hybrid Blockchain. We remark that the generic fallback mechanism allows to recover from safety attacks. By an additional assumption on the quality of the reputation system we can also prevent liveness attacks as discussed in Sect. 5. Properties of Our Construction. The proposed reputation-based blockchain takes advantage of the nature of the reputation-system to improve on several properties of existing constructions as discussed below. Provided that the reputation is accurate, the parties will enjoy such improvements. As is the case with any assumption on resources, it is impossible to know a priori if the underlying assumption, in our case accuracy of the reputation, is true. However, unlike constructions which completely fail when their core assumption is false (e.g., dishonest majority of stake in PoS) our fallback mechanism will ensure that even if our primary assumption, i.e., accuracy of the reputation, is violated, still the basic security properties are not (or any violation is swiftly detected) as long as the secondary, fallback assumption holds. This yields a natural optimistic mechanism for use of resources as discussed below. In the following we discuss the advantages that our protocol in this optimistic mode, i.e., under the primary assumption that the reputation system is accurate. Efficiency and Scalability: A reputation system induces probabilities that a parties behaves maliciously, which, technically, provides information to the protocol designer (and parties) on the corruption capabilities of the adversary. This allows us to avoid the need for communication- and setup-heavy adaptive security tools, e.g., compute and communicate proofs of verifiable random functions.4 Additionally, it allows us to associate reputation parties with public keys and public physical IDs, e.g., public IP address, which means they can communicate through direct point-to-point channels rather than diffusion/gossip network. This yields both concrete and asymptotic improvements. First, depending on the network topology, this can improve the overall concrete message complexity and yield denial-of-service (DoS) attack protection in practice—open diffusion networks are more susceptible to DoS. Additionally, as most communication occurs only among the (polylogarithmic) size slot committees and between these committees and the player set, even ignoring the overhead of gossiping, the overall message complexity per slot is O(n log n) as opposed to the Ω(n2 ) complexity of standard blockchains relying solely on flooding. High Throughput (Transaction-Liveness): Existing solutions implicitly assign to each block one effective-block proposer [2,12,17,20]—multiple parties might win the lottery but only the proposal of one is adopted. Instead, in our PoR-blockchain a (small) committee CBC of proposers is chosen in each slot, and the union of their transactions-views is included. To ensure that honest transactions are included in a block it suffices that any of the block proposers in the corresponding CBC is honest. This will be true with probability at least 4

As a side note, our blockchain does address concerns about adaptivity in corruptions through its fallback mechanism, which can be adaptively secure.

Proof-of-Reputation Blockchain with Nakamoto Fallback

21

1 − 1/2L , where L is the size of CBC , as we are choosing L parties out of CBA which has honest majority. In comparison, in systems that choose one proposer per block, this probability is upper-bounded by (roughly) t/n, where t is the number of corrupted parties/resources (e.g., the amount of adversarially owned stake) and n is the total amount of resources. The above discrepancy can be impactful in situations where the transaction-submitting mechanism has high latency—e.g., due to bad/restricted access to the blockchain network, or some points to the blockchain network are unreliable. Finality: Since the parties decide on the next block by means of a Byzantine broadcast protocol, agreement is achieved instantly by the end of the slot. This is similar to standard synchronous BFT-based blockchains, and is in contrast to Nakamoto-style blockchains [7,20,24] which achieve eventual consistency, aka the common prefix property [16,26]. We stress that this is the case assuming the reputation system is accurate. One can argue that this might be an insufficient guarantee as to get full confidence and some users might want to wait for the situation to settle also on the fallback chain—i.e., ensure that no honest party contests their view. Nonetheless, it allows for a tiered use of the assumptions, which naturally fits situations where different transactions have different degree of importance, as is for example the case in cryptocurrencies: For small-amount transactions the users can trust the PoR-blockchain and consider the transaction settled as soon as it is finalized there. The more risk-averse users (or users with high-stake transactions) can wait for the backup chain to confirm that there is no accusation. The above mode is the natural blockchain analog of how reputation is used in reality: If a service or an investment is recommended by a highly reputable source, then it typically enjoys higher trust. However, for risky actions, the actors usually seek further assurances that might take longer time. Related Literature. Asharov et al. [1] defined reputation systems for multiparty computation (MPC) and proved necessary and sufficient conditions on a static reputation system for the existence of fair MPC—in particular, for the existence of an algorithm for selecting a committee where the majority of the participants is honest. To our knowledge, ours is the first work which puts forth a rigorous specification, protocol, model, and treatment of reputation-system-based blockchains. Attempts to combine consensus with reputation were previously made in the context of blockchains and cryptocurrencies. None of these attempts addresses the subjective nature of the reputation systems, i.e., if the reputation system is inaccurately estimated, their security fails. This is in contrast to our fallback guarantee which allows us to preserve basic safety (unforkability) properties which are essential in blockchains and cryptocurrencies. Additionally many of these works lack a protocol specification, security model and proofs, and often even a credible security argument [14], and/or rely on complicated reputation mechanisms and exogenous limitations on the adversary’s corruption power [9]. Alternative approaches, use the proof-of-work (bitcoin) paradigm to assign reputation, by considering miners more reputable if they allocate, over a long period, more hashing-power to their protocol [28].

22

L. Kleinrock et al.

Notably, [6] proposed a reputation-module which can build a scalable blockchain on top of a BFT-style consensus protocol, e.g., PBFT or Honey Badger [23]. The idea is that this reputation module can be used by the parties to select smaller next round committees. In addition to lacking a security proof, the entire module needs to operate over a broadcast channel created by the original BFT consensus protocol, as it uses a global view of the computation to accurately readjust the reputations. Hence, its security relies on the security of the underlying consensus protocol, even if reputation is accurate. Instead our PoR-blockchain construction is secure under the assumptions of accuracy of the reputation system, irrespective of the properties of the fallback blockchain. The result from [6] also proposed a notion of reputation-fairness, which renders a reputation-based lottery more fair the closer its outcome is to a uniform sample. This notion of fairness seems unsuitable for our goals, as it is unclear why low distance from uniform is a desirable property. Why should it be considered fair that a large set of parties with low reputation has better relative representation in the output than a small set with higher reputation? And how would this incentivize parties to build up their reputation? Our fairness definition addresses this concern, at a very low overhead. Hybrid blockchains which use an alternative consensus mechanism as a fallback were also previously used in Thunderella [27] and Meshcash [5]. Their protocols rely on smart refinements of the proof-of-work and/or proof-of-space-time paradigms, and uses novel methods to accelerate the blockchain and improve scalability and finality when a higher amount of the underlying resource is in honest hands while ensuring safety even under weaker guarantees. Finally, Afgjort [22] devises a finality layer module on top of a proof-of-stake blockchain. Their construction achieves fast finality under the combination of the assumptions underlying the PoS-blockchain—typically, honest majority of stake—and the assumption supporting the security of the finality layer. In contrast, our PoR/PoS-hybrid blockchain is secure as long as the reputation-system is accurate irrespective of the security of the underlying PoS-blockchain. Preliminaries. We use the standard definition of negligible and overwhelming: A function μ : N → R+ is negligible if for any polynomial p(k): μ(k) = O(1/p(k)); We say that a function f : N → [0, 1] is overwhelming if f (k) = 1 − μ(k) for some negligible function μ. Many of our statements and definitions assume an (often implicit) security parameter k. For two strings s1 , s2 ∈ {0, 1}∗ we denote by s1 ||s2 the concatenation of s1 and s2 . For some n ∈ N we will denote by [n] the set [n] = {1, . . . , n}. For a string s ∈ {0, 1}k and for some D ≤ k we will say that T (s) ≥ D if s has at least D leading zeros, i.e., s is of the form s = 0D ||s for some s ∈ {0, 1}k−D . Organization of the Remainder of the Paper. After discussing our model in Sect. 2, in Sect. 3 we define and instantiate a reputation-fair lottery. Section 4 describes a PoR-based blockchain-ledger protocol for static reputation systems, and Sect. 5 describes the hybrid PoR/PoS ledger protocol. Due to space limitations detailed proofs are deferred to the full version [21].

Proof-of-Reputation Blockchain with Nakamoto Fallback

2

23

The Model

Our ledger protocol is among a set of parties P = {P1 , . . . , Pn }. A subset Pˆ of the parties, called reputation parties, have a distinguished role in the protocol, and are tasked with proposing and confirming blocks to be added in the blockchain. To avoid overcomplicating our description, we describe our protocol here for a static set of parties; in [21] we discuss how this set can be dynamically updatable, similar to the dynamic participation of [2,20]. As is common in the blockchain literature, our statements are in the random oracle model, where a hash function is assumed to behave as a random function. As a setup, we assume that all parties have the genesis block which includes sufficient initial randomness and the reference reputation-system. In terms of cryptographic assumptions we will assume existentially unforgeable digital signatures [18] along with pseudorandom function. Efficient (and even post-quantum) variants of these primitives are known to exist under standard complexity assumptions, namely, existence of one-way functions or hardness of lattice problems. Communication and Synchrony. We assume synchronous communication, where messages sent by honest parties in some round are delivered by the beginning of the following round, and a rushing adversary [8]. The protocol timeline which is divided into slots, where each slot consists of a fixed number of rounds. An epoch consist of a predefined number of slots. We assume a cryptographic adversary who gets to actively corrupt parties—i.e., takes full control over them. Parties have two means of communicating. (1) A diffusion (multicast) network available to everyone in P, build by means of a standard flooding/gossiping protocol over a (potentially incomplete but) connected communication graph of unicast channels [4]—this is similar to [4,24], Ethereum [7], Cardano/Ouroboros [2,20], Algorand [17], Thunderella [27], etc. For simplicity, we abstract this network by means of a zero-delay multi-cast primitive (cf. [4]): When an honest party multicasts a message, this message is delivered to all (honest) parties in the network by the beginning of the following round.5 We note that one can use techniques from the blockchain literature [2,17,17] to relax this perfect synchrony assumption—at the cost of a stricter feasibility condition on the reputation system. A discussion of such a relaxation is included in [21]. (2) The second type of communication is among reputation parties. These parties have known physical identities (e.g., IP addresses) and can communicate through direct channels, without flooding (e.g., via TCP/IP). We remark that existence of such channels is not necessary for our security analysis—if they are there, they can be used otherwise, communication via the diffusion network is sufficient for security. However, if they do exist and are used then can considerably reduce the traffic and yield a more scalable/efficient solution, as discussed in the introduction. 5

Observe that the adversary might send a message to a subset of parties, but if any honest party is instructed by the protocol to forward it, then the message will be delivered (to all other honest parties) in the round when this forwarding occurs.

24

L. Kleinrock et al.

Reputation Systems. A reputation system Rep for m = O(k) reputation parties from a set Pˆ = {Pˆ1 , . . . , Pˆm } is a family of probability distributions (parameterized by m) over binary reputation vectors of length m, i.e., vectors of the type (h1 , . . . , hm ) ∈ {0, 1}m .6 Each hi is an indicator bit which takes the value 1 if Pˆi is honest and 0, otherwise. For example, Pr[(h1 , . . . , hm ) = (0, . . . , 0, 1)] = 0.6 means that with probability 0.6 every reputation party except Pˆm is dishonest. We consider two types of reputation systems: A static reputation system is a probability distribution as discussed above. This is similar to the reputation system considered in [1]. A dynamic reputation system instead is a sequence (ensemble) Rep = {Repρ }ρ∈N of distributions, where each Repρ is a static reputation system for a set Pˆρ of mρ ∈ N reputation parties. Such dynamic reputation systems are useful in an evolving and reactive primitive such as a ledger protocol, where the reputation of parties might change depending on their behavior in the system and/or other exogenous factors. We focus on the setting where each hi corresponds to the output of an independent indicator random variable Hi , i.e., whether or not a reputation party Pˆi behaves honestly does not depend on what other reputation parties do. In this case, a static reputation system can be described by a vector of m numbers between 0 and 1, i.e., Rep = (R1 , . . . , Rm ) ∈ [0, 1]m , where the interpretation of Rep is that with probability equal to Ri the party Pˆi will play honestly (i.e., Pr[Hi = 1] = Ri ).7 We then say that Ri is the reputation of party Pˆi . We refer to such a reputation system as a correlation-free reputation system. In the following we provide more details of the corruption capabilities that a correlation-free reputation system (static or dynamic) gives to the adversary. The adversary’s corruption capabilities are specified by the reputation system. A static reputation-bounded adversary for reputation system Rep, also referred to as a static Rep-adversary, corrupts the set of parties at the beginning of the protocol according to Rep, and sticks to this choice. In particular, given a reputation system Rep for m reputation parties, corruption with a static adversary occurs as follows: A vector (h1 , . . . , hm ) ∈ {0, 1}m is sampled according to the distribution defined in Rep, and for each hi = 0 the reputation party Pˆi ∈ Pˆ is corrupted by the adversary. For dynamic reputation systems, a stronger type of adversary, which we call epoch-resettable adversary corrupts a completely new set of parties at the beginning of each epoch, according to the reputation system at the beginning of that epoch—this is similarly to mobile adversaries [25]. Here we focus our analysis to the static case; an extension to epoch-resettable adversaries is discussed in [21].

6 7

For notational simplicity, we often refer to Rep as a probability distribution rather than an ensemble, i.e., we omit the explicit reference to the parameter m. Adaptive correlation-free reputation systems are described, analogously, as an ensemble of static reputation systems.

Proof-of-Reputation Blockchain with Nakamoto Fallback

3

25

Reputation-Based Lotteries

At the heart of our construction is a lottery that chooses a (sublinear) set of parties according to their reputation. To demonstrate the idea, let us first consider two extreme scenarios: Scenario 1: No reputation party Pˆi has Ri > 0.5. Scenario 2: All reputation parties Pˆi have (independent) reputation Ri > 0.5 +  for a constant , e.g., Ri > 0.51. In Scenario 1, one can prove that users cannot use the recommendation of the reputation parties to agree on a sequence of transactions. Roughly, the reason is that with good probability, the majority of the reputation parties might be dishonest and try to split the network of users, so that they accept conflicting transaction sequences. In Scenario 2, on the other hand, the situation is different. Here, by choosing a polylogarithmic random committee we can guarantee (except with negligible probability)8 that the majority of those parties will be honest (recall that we assume independent reputations), and we can therefore employ a consensus protocol to achieve agreement on each transaction (block). ˆ Definition 1. For a reputation system Rep for parties from a reputation set P, ˆ and a (possibly probabilistic) algorithm A for sampling a subset of parties from P, an Rep-adversary A, we say that Rep is (, A)-feasible for A if, with overwhelming probability,9 A outputs a set of parties such that at most a 1/2 −  fraction of these parties is corrupted by A. In the above definition, the corrupted parties are chosen according to Rep ˆ and independently of the coins of A. from the entire reputation-party set P, (Indeed, otherwise it would be trivial to corrupt a majority.) Definition 2. We say that a reputation system is -feasible for Rep-adversary A, if there exists a probabilistic polynomial-time (PPT) sampling algorithm A such that Rep is (, A)-feasible for A. It is easy to verify that to maximize the (expected) number of honest parties in the committee it suffices to always choose the parties with the highest reputation. In fact, [1] generalized this to arbitrary correlation-free reputation systems by proving that for any -feasible reputation system Rep (i.e., for any Rep-adversary A) the algorithm which orders that parties according to their reputation chooses sufficiently many (see. [1]) parties with the highest reputation induces a set which has honest majority. We denote this algorithm by Amax . Lemma 1 ([1]). A correlation-free reputation system is -feasible for a Repadversary A if and only if it is (, Amax )-feasible for A. As discussed in the introduction, despite yielding a maximally safe lottery, Amax has issues with fairness which renders it suboptimal for use in a blockchain ledger protocol. In the following we introduce an appropriate notion of reputation-fairness for lotteries and an algorithm for achieving it. 8 9

All our security statements here involve a negligible probability of error. For brevity we at times omit this from the statement. The probability is taken over the coins associated with the distribution of the reputation system, and the coins of A and A.

26

3.1

L. Kleinrock et al.

PoR-Fairness

As a warm up, let us consider a simple case, where all reputations parties can be partitioned in two subsets: Pˆ1 consisting of parties with reputation at least 0.76, and Pˆ2 consisting of parties with reputation between 0.51 and 0.75. Let ˆ = α1 + α2 ) |Pˆ1 | = α1 and |Pˆ2 | = α2 . We want to sample a small (sublinear in |P| number y of parties in total. Recall that we want to give every reputation party a chance (to be part of the committee) while ensuring that, the higher the reputation, the better his relative chances. A first attempt would be to first sample a set where each party Pˆi is sampled according to his reputation (i.e., with probability Ri ) and then reduce the size of the sampled set by randomly picking the desired number of parties. This seemingly natural idea suffers from the fact that if there are many parties with low reputation—this is not the case in our above example where everyone has reputation at least 0.51, but it might be the case in reality—then it will not yield an honest majority committee even when the reputation system is feasible. A second attempt is the following. Observe that per our specification of the above tiers, the parties in Pˆ1 are about twice more likely to be honest than parties in Pˆ2 . Hence we can try to devise a lottery which selects (on expectation) twice as many parties from Pˆ1 as the number of parties selected from Pˆ2 . This will make the final set sufficiently biased towards high reputations (which can ensure honest majorities) but has the following side-effect: The chances of a party being selected diminish with the number of parties in his reputation tier. This effectively penalizes large sets of high-reputation parties; but formation of such sets should be a desideratum for a blockchain protocol. To avoid this situation we tune our goals to require that when the higher-reputation set |Pˆ1 | is much larger than |Pˆ2 |, then we want to start shifting the selection towards Pˆ1 . This leads to the following informal fairness goal: Goal (Informal): We want to randomly select x1 parties from Pˆ1 and x2 parties from Pˆ2 so that: 1. x1 + x2 = y 1 2. x1 = 2 max{1, α α2 }x2 (representation fairness) 3. For each i ∈ {1, 2} : No party in Pˆi has significantly lower probability of getting picked than other parties in Pˆi (non-discrimination), but parties in Pˆ1 are twice as likely to be selected as parties in Pˆ2 (selection fairness). Assuming α1 and α2 are sufficiently large, the above goal can be achieved by the following sampler: For appropriately chosen numbers 1 and 2 ≥ 0 with 1 + 2 = y, sample 1 parties from Pˆ1 , and then sample 2 parties from Pˆ1 ∪ Pˆ2 (where if you sample a party from Pˆ1 twice, replace him with a random, upsampled party from Pˆ1 ). As it will become clear in the following general analysis, by carefully choosing 1 and 2 we can ensure that the conditions of the above goal are met. For the interested reader, we analyze the above lottery

Proof-of-Reputation Blockchain with Nakamoto Fallback

27

in [21]. Although this is a special case of the general lottery which follows, going over that simpler analysis might be helpful to a reader, who wishes to ease into our techniques and design choices. Our PoR-Fairness Definition and Lottery. We next discuss how to turn the above informal fairness goals into a formal definition, and generalize the above lottery mechanism to handle more than two reputation tiers and to allow for arbitrary reputations. To this direction we partition, as in the simple example, the reputations in m = O(1) tiers10 as follows: For a given small δ > 0, the first tier includes parties with reputation between m−1 m + δ and 1, the second + δ and m−1 tier includes parties with reputation between m−2 m m + δ, and so on. 11 Parties with reputation 0 are ignored. We refer to the above partitioning of the reputations as an m-tier partition. The main differences of the generalized reputation-fairness notion from the above informal goal, is that (1) we parameterize the relation between the representation of different ties by a parameter c (in the above informal goal c = 2) and (2) we do not only require an appropriate relation on the expectations of the numbers of parties from the different tiers but require that these numbers are concentrated around numbers that satisfy this relation. The formal reputation fairness definition follows. Definition 3. Let Pˆ1 , . . . , Pˆm be a partition of the reputation-party set Pˆ into m tiers as above (where the parties in Pˆ1 have the highest reputation) and let L be a lottery which selects xi parties from each Pˆi . For some c ≥ 1, we say that L is c-reputation-fair, or simply, c-fair if it satisfies the following properties: 1. (c-Representation Fairness): For j = 1, . . . , m, let cj = max{c, c ·

ˆj | |P ˆj+1 | }. |P

Then L is c-fair if for each j ∈ {0, . . . , m−1} and for every constant  ∈ (0, c): Pr [(cj − ) xj+1 ≤ xj ≤ (cj + ) xj+1 ] ≥ 1 − μ(k),

for some negligible function μ. ˆ 2. (c-Selection Fairness): For any pj ∈ ∪m i=1 Pi , let Memberj denote the indicator (binary) random variable which is 1 if pj is selected by the lottery and 0 otherwise. The L is c-selection-fair if for any i ∈ {1, . . . , m − 1}, for any pair (Pˆi1 , Pˆi2 ) ∈ Pˆi × Pˆi+1 , and any constant c < c: Pr[Memberi1 = 1] ≥ c − μ(k) Pr[Memberi2 = 1] for some negligible function μ. 10

11

This is analogous to the rankings of common reputation/recommendation systems, e.g., in Yelp, a party might have reputation represented by a number of stars from 0 to 5, along with their midpoints, i.e., 0.5, 1.5, 2.5, etc. This also gives us a way to effectively remove a reputation party—e.g., in case it is publicly caught cheating.

28

L. Kleinrock et al.

3. (Non-Discrimination): Let Memberi defined as above. The L is nondiscriminatory if for any Pˆi1 , Pˆi2 in the same Pˆi : Memberi1 ≈ Memberi2 , where ≈ in the above equation means that the random variables are computationally indistinguishable. At a high level the lottery for the m-tier case is similar in spirit to the twotier case: First we sample a number of 1 parties from the highest reputation set Pˆ1 , then we sample 2 parties from the union of second-highest and the ˆ ∪ Pˆ2 , then we sample 3 parties from the union of the three highest highest P1 reputation tiers Pˆ1 ∪ Pˆ2 ∪ Pˆ3 , and so on. As we prove, the values 1 , 2 , 3 etc. can be carefully chosen so that the above fairness goal is reached whenever there are sufficiently many parties in the different tiers. We next detail our generalized sampling mechanism and prove its security properties. We start by describing two standard methods for sampling a size-t subset of a party set P—where each party P ∈ P is associated with a unique identifier pid12 —which will both be utilized in our fair sampling algorithm. Intuitively, the first sampler samples the set with replacement and the second without. The first method, denoted by Rand, takes as input/parameters the set P, the size of the target set t—where naturally t < |P|—and a nonce r. It also makes use of a hash function h which we will assume behaves as a random oracle.13 In order to sample the set, for each party with ID pid, the sampler evaluates the random oracle on input (pid, r) and if the output has more than log |P| t tailing 0’s the party is added to the output set. By a simple Chernoff bound, the size of the output set P¯ will be concentrated around t. The second sampler denoted by RandSet is the straight-forward way to sample a random subset of t parties from P without replacement: Order the parties according to the output of h on input (pid, r) and select the ones with the highest value (where the output h is taken as the standard binary representation of integers). It follows directly from the fact that h behaves as a random oracle—and, therefore, assigns to each Pi ∈ P a random number from {0, . . . , 2k − 1}—that the above algorithm uniformly samples a set P¯ ⊂ P of size t out of all the possible size-t subsets of P. For completeness we have included detailed description of both samplers in [21]. Given the above two samplers, we can provide the formal description of our PoR-fair lottery, see Fig. 1. Theorem 1 states the achieved security. Theorem 1 (Reputation-Fair Lottery for m = O(1)-tiers). In the above ˆ Rep, (c1 , . . . , cm ), δ, , r), let , δ > 0 be strictly positive constants, lottery L(P, and for each i ∈ {1, . . . , m = O(1)}, let Xi be the random variable (r.v.) corresponding to the number of parties in the final committee that are from set Pˆi ; 12 13

In our blockchain construction, pid will the P ’s public key. In the random oracle model, r can be any unique nonce; however, for the epochresettable-adversary extension of our lottery we will need r to be a sufficiently fresh random value. Although most of our analysis here is in the static setting, we will still have r be such a random value to ensure compatibility with dynamic reputation.

Proof-of-Reputation Blockchain with Nakamoto Fallback

29

ˆ Rep, (c1 , . . . , cm = 1), δ, , r) L(P, 1. Divide the reputation parties into m tiers Pˆ1 , . . . , Pˆm as follows,a : For i = 0 . . . , m − 1, define Pˆm−i to be the set of parties in Pˆ with reputation Repj ∈ i (m + δ, i+1 + δ]. m 2. Initialize P¯i := ∅ for each i ∈ [m]. 3. For each i = 1, . . . , m: αi := |Pˆi |. 4. For each i = 1, . . . , m − 1: Let  m i αi+1 m j=1 αj j=i cj − αi j=i+1 cj log1+ n i = m m c α α q i+1 i j=1 q=j and let

m j=1 αj m := m m j=1

1 log1+ n c α q m q=j

5. For each i = 1, . . . , m − 1 do the following: – If | ∪ij=1 Pˆj | ≥ i : (a) Invoke Rand(∪ij=1 Pˆj , i ; (r||i)) to choose a set Qi of parties uniformly at random from ∪ij=1 Pˆj . ¯ ¯ ¯ ˆ (b) For each j ∈ [i] compute Qcol i,j := Qi ∩ Pj and update Pj := Pj ∪(Qi ∩ Pj ). (c) For each j ∈ [i] if Qcol i,j = ∅, then i. if |Pˆj \ P¯j | < |Qcol i,j | then reset the lottery and select Psel as the output of Amax . ii. Else + • Invoke RandSet(Pˆj \ P¯j , |Qcol i,j |; (r||i||j)) to choose a set Qi,j ; + ¯ ¯ • For each j ∈ [i] update Pj := Pj ∪ Qi,j . + m iii. Set Q+ i := ∪j=1 Qi,j i – Else (i.e., | ∪j=1 Pˆj | < i ): Reset the lottery and select (and output) Psel as the output of Amax for choosing log1+ n. ¯ 6. If the lottery was not reset in any of the above steps, then set Psel := ∪m j=1 Pj (= + ∪m (Q ∪ Q )) and output P . i sel i=1 i a

Where δ can be an arbitrary small constant.

Fig. 1. c-fair reputation-based lottery for m = O(1) tiers

ˆ

and for each i ∈ [m] let ci = max{c, c |Pˆ|Pi | | } where c = O(1) such that for i+1

1 ≤ m−2 some constant ξ ∈ (0, 1) : cm−1 2m − ξ. If for some constant f ∈ (0, 1/2) the reputation system Rep is f -feasible for a static Rep-bounded adversary A, then for the set Psel of parties selected by L the following properties hold with overwhelming probability (in the security parameter k):

1. |Psel | = Θ(log1+ n) 2. for some constant δ > 0 adversary A corrupts at most an 1/2 − δ fraction of the parties in Psel 3. If every set Pˆi has at least γ · log1+ n parties for some γ > 1, then the lottery is c-fair.

30

L. Kleinrock et al.

The complete proof can be found in [21]. In the following we included a sketch of the main proof ideas. Proof (sketch). We consider two cases: Case 1: L noes not reset, and Case 2: L resets. In Case 1, The lottery is never reset. This case is the bulk of the proof. First, using a combination of Chernoff bounds we prove that the random variable Xi corresponding to the number of parties from Pˆi selected in the lottery is concentrated around the (expected) value: xi := Exp(Xi ) = αi ·

m  j=i

j

j

q=1

(1)

αq

i.e., for any constant λi ∈ (0, 1): Pr [|(1 − λi )xi ≤ Xi ≤ (1 + λi )xi ] ≥ 1 − μi (k),

(2)

Hence, by inspection of the protocol one can verify that the xi ’s and the j ’s satisfy the following system of linear equations: (x1 , . . . , xm )T = B · (1 , . . . , m )T

(3)

Where B is an m × m matrix with the (i, j) position being  Bi,j =

j

αi

q=1

αq

, if i ≥ j

0, otherwise

The above system of m equations has 2 m unknowns. To solve it we add the following m equations which are derived from the desired reputation fairness: For each i ∈ [m − 1] : (4) xi := ci · xi+1 and

m 

xi = log1+ k

(5)

i=1

This yields 2 m linear equations. By solving the above system of equations we can compute: i

i =

j=1 αj m j j=1 q=1 cq

for each i ∈ [m − 1], and

αi+1

m

j=i cj

− αi

m

j=i+1 cj

αi+1 αi m

j=1 αj m := m m j=1

1 log1+ n. α c m q q=j

log1+ n,

Proof-of-Reputation Blockchain with Nakamoto Fallback

31

This already explains some of the mystery around the seemingly complicated choice of the i ’s in the protocol. m Next we observe that for each j ∈ [m] : i=1 Bi,j = 1 which implies that m 

j =

j=1

m 

Eq. 5

xi = log1+ k

(6)

i=1

Because in each round we choose parties whose number is from a distribution centered around  i , the above implies that the sum of the parties we sample is m centered around j=1 j = log1+ k which proves Property 1. Property 2 is proven by a delicate counting of the parties which are corrupted, using Chernoff bounds for bounding the number of corrupted parties selected by Rand (which selects with replacement) and Hoeffding’s inequality for bounding the number of parties selected by RandSet (which selects without replacement). The core idea of the argument is that because the reputation in different tiers reduces in a linear manner but the representation to the output of the lottery reduces in an exponential manner, even if the adversary corrupts for free all the selected parties from the lowest half reputation tiers, still the upper half will have a strong super-majority to compensate so that overall the majority is honest. Finally, the c-fairness (Property 3) is argued as follows: – The c-Representation Fairness follows directly from Eqs. 1, 2 and 4. – The non-discrimination property follows from the fact that our lottery picks each party in every Pˆi with exactly the same probability as any other party. – The c-Selection Fairness is proved by using the fact that the nondiscrimination property mandates that each party in Psel ∩ Pˆi is selected ˆi | ∩P . By using our counting of the sets’ cardinaliwith probability pi = |Psel ˆi | |P pi ties computed above we can show that for any constant c < c: pi+1 ≥ c . In Case 2, The lottery is reset and the output Psel is selected by means of invocation of algorithm Amax . This is the simpler case since Lemma 1 ensures that if the reputation system is f -feasible, then a fraction 1/2 + f of the parties in Psel will be honest except with negligible probability. Note that Amax is only invoked if a reset occurs, i.e., if in some step there are no sufficiently many parties to select from; this occurs only if any every set Pˆi does not have sufficiently many parties to choose from. But the above analysis, for δ < γ − 1, the sampling algorithms choose at most (1 + δ) log1+ n with overwhelming probability. Hence when each Pˆi has size at least γ · log1+ n, with overwhelming probability no reset occurs. In this case, by inspection of the protocol one can verify that the number of selected parties is |Psel | = log1+ n.

4

The PoR-Blockchain

We next describe how a PoR-fair lottery can be used to obtain a PoR-blockchain. The ground truth of the blockchain is recorded on the genesis block which

32

L. Kleinrock et al.

includes the (initial) set of reputation parties, their public keys, and the corresponding reputation vector. We assume a canonical way of validating transactions submitted in the same round, e.g., if two received transactions which have not-yet been inserted into a block would contradict each other (e.g., correspond to double-spending), a default rule of ignoring both can be adopted. We abstract this by means of a predicate Validate, that takes as input a sequence T (of transactions) along with a current vector TH of transaction history—composed by concatenating the transactions in the blockchain, and outputs a subset T  ⊆ T such that TH ||T  is a valid sequence of transactions. The idea of the protocol for proposing and agreeing on the block of any given slot is as follows: A small (i.e., polylogarithmic) slot-committee CBA is chosen using our above lottery—recall that the lottery will guarantee that the majority in CBA is honest and therefore it can run Byzantine agreement protocols (Consensus and Broadcast). From CBA a smaller (constant-size) committee CBC is randomly chosen to broadcast its transactions to everyone. Note that whenever in our protocol we say that a party P broadcasts a message, we mean that a Byzantine broadcast protocol is executed with P as sender; to avoid confusion we will signify this by saying that the message is broadcasted by means of protocol Broadcast. We will use multicasting to refer to the process of sending a message through the diffusion network to all parties. Using Broadcast to communicate the transactions known to CBC allows us to agree on the union of all transactions known to these parties. However, broadcasting to the whole player set has a big communication and round overhead. To avoid the overhead we use the following idea: Recall that the parties in CBA are all reputation parties and can therefore communicated directly. Thus, instead of directly broadcasting to the whole party set P the parties in CBC broadcast to the parties in CBA .14 The security of the broadcast protocol ensures that the parties in CBA agree on the broadcasted messages and therefore also on the union. What remains is to extend this agreement to the whole player set. This can be easily done since the majority of the parties in CBA is honest: Every party in CBA signs the union of the received broadcasted messages and sends it to every party in P. The fact that CBA has honest majority implies that the only message that might be accepted is this agreed upon union. Hence, once any P ∈ P receives such a majority-supported set, he can adopt it as the contents of this slot’s block. The above approach brings an asymptotic reduction on the communication complexity of the protocol from Ω(n2 ) down to O(n log n), for some constant  > 1. (The worst-case round complexity also has an asymptotic reduction but this depends on the actual reputation system and the choice of protocol Broadcast.) Additionally, the fact that reputation parties communicate 14

For clarity in our description we will use a deterministic broadcast protocol for Broadcast, e.g., the Dolev-Strong broadcast protocol [13] for which we know the exact number of rounds. However, since our lottery will ensure honest majority in CBA , using the techniques by Cohen et al. [10, 11], we can replace the roundexpensive Dolev-Strong broadcast protocol by an randomized, expected-constant round broadcast protocol for honest majorities, e.g., [19].

Proof-of-Reputation Blockchain with Nakamoto Fallback

33

over point-to-point (rather than gossiping) network is likely to further to improve the concrete communication complexity, at least for certain network topologies. A remaining question is: Where does the randomness for the selection of CBA and CBC come from? For the static reputation-restricted adversary considered here, we extract the randomness for choosing each CBA by repeated calls to the random oracle. In [21] we discuss how we can extend our treatment using standard techniques to tolerate epoch-resettable adversaries. The formal description of the protocol for announcing and agreeing on the next block can be found in Fig. 2. The proof of the following theorem follows the above line of argument and can also be found in [21].

ˆ P, Rep, Bρ−1 , ρ, δ, , L = O(1)): BlockAnnounce(P, 1. Each party in P locally runs the reputation-fair lottery ˆ Rep, (c1 , . . . , cm ), δ, , h(ρ||0)), where the cj s are as in Theorem 1, to sample L(P, a set CBA ⊂ Pˆ (of size polylog(n)); out of this set, the parties choose a random subset CBC of constant size L = O(1) by invoking RandSet(CBA , L; h(ρ||1)). 2. Each party Pˆi ∈ CBC acts as sender in an invocation of Broadcast with receivers the parties in CBA and input Pˆi ’s current transaction pool Ti ; Pˆi removes the broadcasted transactions from its local transaction pool. 3. All parties in CBA compute Tˆ = Validate(TH , T ) for T = ∪pi ∈CBC Ti . If some party Pˆj ∈ CBC did not broadcast a valid message in the previous round of the protocol, then all parties in CBA set Tj = {(abort, j)}. 4. Every Pˆj ∈ CBA signs h(Tˆ, h = h(Bρ−1 ), ρ), where Bρ−1 is the previous block,a and sends it to every party in CBC . 5. Each Pˆi ∈ CBC : If Pˆi receives at least |CBA |/2 signatures from parties in CBA on some (Tˆ, h), where ρ is the current slot and h = h(Bρ−1 ) is a valid hash pointer to the previous block, then Pˆi multicasts (Tˆ, h, ρ) along with all the corresponding signatures to all parties in P. 6. Each Pi ∈ P: Upon receiving any (Tˆ, h, ρ) along with signatures on it from at least |CBA |/2 parties from CBA , create a block consisting of (Tˆ, h, ρ) and the related signatures and add this block to the local (blockchain) ledger state as the current slot’s block and mark the current slot as completed. a

As is common in blockchains assuming a compact representation of the block, e.g., a Merkle-tree will allow for more efficiency by signing just the root.

Fig. 2. Block announcing protocol for Slot ρ

Theorem 2. Let Rep be a reputation system,  and δ be small positive constants, L denote our sampling algorithm (lottery) discussed in the previous section used for choosing CBA according to Rep, and the ci ’s be as in Theorem 1. If for a static adversary A, Rep is -feasible, and all parties in Pˆρ participate in slot ρ then in protocol BlockAnnounce, every node in P adds the same block on his local blockchain for slot ρ. Moreover, this block will include the union of all transactions known to parties in CBC at the beginning of round ρ.

34

L. Kleinrock et al.

Using BlockAnnounce it is straightforward to build a blockchain-based ledger: In each round parties collect all valid transactions they know and execute BlockAnnounce. For completeness, we provide a description of this simple reputation-based blockchain ledger protocol in [21]. Its security follows directly from the above theorem. We remark that although the above theorem is proven for static reputations and adversaries, its proof can be extended, using standard techniques, to dynamic reputation systems with epoch-resettable adversaries. This extension will also extend the corresponding PoR-based blockchain protocol to that setting.

5

The PoR/PoS-Hybrid Blockchain

As discussed in the introduction, a purely PoR-based blockchain protocol can be insecure if the reputation system is inaccurate. To fortify our ledger against such a situation we do the following: In parallel to the PoR-based blockchain above, we run an independent (and potentially slower) Nakamoto-style ledger protocol. As discussed, we focus our description on Proof-of-Stake-based blockchain (in short, PoS-blockchain) but our treatment can be applied to proof-of-work or even iterated-BFT ledger protocols [3]. As long as the majority of the stake is in honest hands the back-up PoS-blockchain will ensure that an inaccurate (or manipulated) reputation system does not result in an insecure ledger. More concretely, our PoR/PoS-hybrid design ensures the following: If the reputation system is accurate, i.e., it reflect the actual probabilities that the parties behave honestly, then our protocol will be secure and achieve the claimed efficiency. If the reputation is inaccurate and, as a result, a fork is created, but the honest majority of stake assumption holds, then parties will be able to detect the fork—and agree on this fact—within a small number of rounds. We stress that our design uses the assumptions in a tired manner: as long as the reputation system is accurate, the backup blockchain can neither yield false detection positives nor slow down the progress of the PoR-blockchain, even if the majority stake in the system has shifted to malicious parties (in which case the back-up PoS-blockchain might fork). However, if both accuracy of reputation and honest majority of stake fail, the security of the system cannot be guaranteed as with any construction based on assumptions. Here is how our hybrid ledger works: In every round the reputation parties post to the backup PoS-blockchain, the current slot’s hash pointers. This way every party will be able to efficiently verify their local view by comparing their local hashes to the ones posted on the backup blockchain. If any honest party observes a discrepancy, then she can complain by posting the block in his local PoR-chain, which is inconsistent with the pointer seen on the backup PoS-blockchain, along with the supporting set of signatures. We refer to this complaint as an accusation and to the party that posts it as an accuser. If the accuser fails to present a valid view (i.e., a block with sufficient signatures from the corresponding slot committee) then the accusation is considered invalid and the dispute is resolved in favor of the reputation party that had initially posted

Proof-of-Reputation Blockchain with Nakamoto Fallback

35

the disputed hash pointer, hereafter referred to as the accused party. Otherwise, the accused party will need to respond by posting (as a transaction on the backup PoS-blockchain) his own view of the disputed block. If this party fails, then the dispute is considered resolved in favor of the accuser. Otherwise, if both the accuser and the accused party post support on their claims, every party will be able to observe this fact and detect that the PoR-chain forked. In either case, any such accusation will result in detecting a malicious party: either the accuser, or the accused or some party that issued conflicting signatures on the PoR blockchain. (The detected party’s reputation will be then reduced to 0 and the party will be excluded from the protocol.) The detailed specification of our PoR/PoS-hybrid protocol ΠBC PoR/PoS is provided in [21] along with the proof of the following theorem, where we say that the reputation system is accurate if it reflects, within a negligible error the actual probabilities that the reputation parties are honest: Theorem 3. Let, , δ, c and L be as in Theorem 2. The following properties hold with overwhelming probability assuming that the reputation system is f feasible for some constant f ∈ (0, 1): If the PoR-system is accurate, then protocol ΠBC PoR/PoS satisfies the following properties except with negligible probability in the presence of a static Rep-adversary: – Safety (with Finality): At the end of every block-slot, all reputation parties have the same view of the blockchain. – Block Liveness: A new block is added to each local chain held by any reputation party at the end of each slot. – Transaction Liveness: Assuming all honest reputation parties receive a transaction, this transaction will be added to the blockchain within  slots except with probability 2−|CBC | . Furthermore, even if the reputation system is faulty (e.g., inaccurate) but the majority of the stake is held by honest parties, then if safety is violated, ΠBC PoR/PoS will publicly detect it (i.e., agreement on this fact will be reached among all parties) within 2k blocks of the PoS-blockchain. Note that since all honest (non-reputation) parties are connected with the reputation parties via a diffusion network, all that above guarantees will also be offered to them with a delay equal to the maximum delay for any message from a reputation party to reach a non-reputation party in the network. Detecting Liveness Attacks. The above design detects safety violations, i.e., forks, but does not detect the following attack on liveness: A flawed reputation system might allow the attacker controlling a majority of slot committee members to prevent the blockchain from advancing, by not issuing a block signed with sufficiently many signatures. Nonetheless, a mild additional assumption on the accuracy of the reputation system, i.e., that within a polylogarithmic number of rounds an honest-majority committee will be elected, allows to break this deadlock and detect such an attack. In a nutshell, to make sure the above attack

36

L. Kleinrock et al.

to liveness is exposed we employ the following mechanism: In every round, if a reputation party does not receive any block with majority support from the current committee in the last round of BlockAnnounce, then it issues a complaint on the fallback chain. If for any upcoming round ρ there have been complaints posted on the main chain by a majority of the members of CBA corresponding to round ρ, then the parties decide that the PoR-blockchain has halted. This ensure that at the latest when the next honest majority committee is elected, the above liveness attack will be exposed. We refer to [21] for a more detailed discussion.

6

Conclusions and Open Problems

Reputation has the potential to yield a highly scalable decentralized ledger. However, one needs to be careful in using it as it is a manipulable and subjective resource. We put forth and proved the security of a hybrid design which enjoys efficiency and scalability benefits by using reputation, while fortifying its security with a fallback blockchain relying on standard assumption such as honest majority of stake. Central in our treatment is a new (reputation-)fairness notion which aims to facilitate inclusivity of the resulting system. Our work establishes the basic security principles and the core distributed protocol of such a fair PoR/PoS-hybrid blockchain ledger. We believe that our treatment will seed a further investigation of reputation as a venue for scalable decentralization. To this direction, in addition to the various extensions pointed to throughout the paper and discussed in [21], there are two important research directions: (1) A rational analysis and associated mechanism that add economic robustness to the arguments and demonstrate the decentralization forces, and (2) a reliable mechanism for assigning reputation of the parties, e.g. using AI, and adjusting it according to their behavior both in the protocol, as well as potentially on the external recommendation systems. Acknowledgements. This research was supported by Sunday Group, Inc. A full version of this work can be found on the Cryptology ePrint Archive [21]. The authors would like to thank Yehuda Afek for useful discussions.

References 1. Asharov, G., Lindell, Y., Zarosim, H.: Fair and efficient secure multiparty computation with reputation systems. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013. LNCS, vol. 8270, pp. 201–220. Springer, Heidelberg (2013). https://doi.org/ 10.1007/978-3-642-42045-0 11 2. Badertscher, C., Gazi, P., Kiayias, A., Russell, A., Zikas, V.: Ouroboros genesis: composable proof-of-stake blockchains with dynamic availability. In: Lie, D., Mannan, M., Backes, M., Wang, X. (eds.) ACM CCS 2018, pp. 913–930. ACM Press (2018) 3. Badertscher, C., Gazi, P., Kiayias, A., Russell, A., Zikas, V.: Consensus redux: distributed ledgers in the face of adversarial supremacy. Cryptology ePrint Archive, Report 2020/1021 (2020). https://eprint.iacr.org/2020/1021

Proof-of-Reputation Blockchain with Nakamoto Fallback

37

4. Badertscher, C., Maurer, U., Tschudi, D., Zikas, V.: Bitcoin as a transaction ledger: a composable treatment. In: Katz, J., Shacham, H. (eds.) CRYPTO 2017. LNCS, vol. 10401, pp. 324–356. Springer, Cham (2017). https://doi.org/10.1007/978-3319-63688-7 11 5. Bentov, I., Hub´ aˇcek, P., Moran, T., Nadler, A.: Tortoise and hares consensus: the meshcash framework for incentive-compatible, scalable cryptocurrencies. IACR Cryptology ePrint Archive 2017/300 (2017) 6. Biryukov, A., Feher, D., Khovratovich, D.: Guru: universal reputation module for distributed consensus protocols. Cryptology ePrint Archive, Report 2017/671 (2017). http://eprint.iacr.org/2017/671 7. Buterin, V.: A next-generation smart contract and decentralized application platform (2013). https://github.com/ethereum/wiki/wiki/White-Paper 8. Canetti, R.: Security and composition of multiparty cryptographic protocols. J. Cryptol. 13(1), 143–202 (2000) 9. Chow, S.S.M.: Running on karma – P2P reputation and currency systems. In: Bao, F., Ling, S., Okamoto, T., Wang, H., Xing, C. (eds.) CANS 2007. LNCS, vol. 4856, pp. 146–158. Springer, Heidelberg (2007). https://doi.org/10.1007/9783-540-76969-9 10 10. Cohen, R., Coretti, S., Garay, J., Zikas, V.: Probabilistic termination and composability of cryptographic protocols. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. LNCS, vol. 9816, pp. 240–269. Springer, Heidelberg (2016). https://doi.org/ 10.1007/978-3-662-53015-3 9 11. Cohen, R., Coretti, S., Garay, J.A., Zikas, V.: Round-preserving parallel composition of probabilistic-termination cryptographic protocols. In: Chatzigiannakis, I., Indyk, P., Kuhn, F., Muscholl, A. (eds.) ICALP 2017, LIPIcs, vol. 80, pp. 37:1– 37:15. Schloss Dagstuhl (2017) 12. David, B., Gaˇzi, P., Kiayias, A., Russell, A.: Ouroboros Praos: an adaptivelysecure, semi-synchronous proof-of-stake blockchain. In: Nielsen, J.B., Rijmen, V. (eds.) EUROCRYPT 2018. LNCS, vol. 10821, pp. 66–98. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78375-8 3 13. Dolev, D., Strong, H.R.: Authenticated algorithms for byzantine agreement. SIAM J. Comput. 12(4), 656–666 (1983) 14. Gai, F., Wang, B., Deng, W., Peng, W.: A reputation-based consensus protocol for peer-to-peer network. In: DASFAA, Proof of Reputation (2018) 15. Garay, J., Ishai, Y., Ostrovsky, R., Zikas, V.: The price of low communication in secure multi-party computation. In: Katz, J., Shacham, H. (eds.) CRYPTO 2017. LNCS, vol. 10401, pp. 420–446. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-63688-7 14 16. Garay, J., Kiayias, A., Leonardos, N.: The bitcoin backbone protocol: analysis and applications. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9057, pp. 281–310. Springer, Heidelberg (2015). https://doi.org/10.1007/9783-662-46803-6 10 17. Gilad, Y., Hemo, R., Micali, S., Vlachos, G., Zeldovich, N.: Algorand: scaling byzantine agreements for cryptocurrencies. Cryptology ePrint Archive, Report 2017/454 (2017). http://eprint.iacr.org/2017/454 18. Goldwasser, S., Micali, S., Rivest, R.L.: A digital signature scheme secure against adaptive chosen-message attacks. SIAM J. Comput. 17(2), 281–308 (1988) 19. Katz, J., Koo, C.-Y.: On expected constant-round protocols for byzantine agreement. In: Dwork, C. (ed.) CRYPTO 2006. LNCS, vol. 4117, pp. 445–462. Springer, Heidelberg (2006). https://doi.org/10.1007/11818175 27

38

L. Kleinrock et al.

20. Kiayias, A., Russell, A., David, B., Oliynykov, R.: Ouroboros: a provably secure proof-of-stake blockchain protocol. In: Katz, J., Shacham, H. (eds.) CRYPTO 2017. LNCS, vol. 10401, pp. 357–388. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-63688-7 12 21. Kleinrock, L., Ostrovsky, R., Zikas, V.: A por/pos-hybrid blockchain: proof of reputation with nakamoto fallback. Cryptology ePrint Archive, Report 2020/381 (2020). https://eprint.iacr.org/2020/381 22. Magri, B., Matt, C., Nielsen, J.B., Tschudi, D.: Afgjort - a semi-synchronous finality layer for blockchains. IACR Cryptology ePrint Archive, 2019/504 (2019) 23. Miller, A., Xia, Y., Croman, K., Shi, E., Song, D.: The honey badger of BFT protocols. In: Weippl, E.R., Katzenbeisser, S., Kruegel, C., Myers, A.C., Halevi, S. (eds.) ACM CCS 2016, pp. 31–42. ACM Press (2016) 24. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008). http://bitcoin. org/bitcoin.pdf 25. Ostrovsky, R., Yung, M.: How to withstand mobile virus attacks (extended abstract). In: Logrippo, L. (ed.) Proceedings of the 10th ACM PODC, pp. 51– 59. ACM (1991) 26. Pass, R., Seeman, L., Shelat, A.: Analysis of the blockchain protocol in asynchronous networks. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017. LNCS, vol. 10211, pp. 643–673. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-56614-6 22 27. Pass, R., Shi, E.: Thunderella: blockchains with optimistic instant confirmation. In: Nielsen, J.B., Rijmen, V. (eds.) EUROCRYPT 2018. LNCS, vol. 10821, pp. 3–33. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78375-8 1 28. Yu, J., Kozhaya, D., Decouchant, J., Esteves-Verissimo, P.: RepuCoin: your reputation is your power. IEEE Trans. Comput. 68(8), 1225–1237 (2019)

Transciphering, Using FiLIP and TFHE for an Efficient Delegation of Computation Cl´ement Hoffmann1(B) , Pierrick M´eaux1(B) , and Thomas Ricosset2(B) 1

ICTEAM/ELEN/Crypto Group, Universit´e catholique de Louvain, Louvain la neuve, Belgium {clement.hoffmann,Pierrick.meaux}@uclouvain.be 2 Thales, Gennevilliers, France [email protected] Abstract. Improved filter permutators are designed to build stream ciphers that can be efficiently evaluated homomorphically. So far the transciphering with such ciphers has been implemented with homomorphic schemes from the second generation. In theory the third generation is more adapted for the particular design of these ciphers. In this article we study how suitable it is in practice. We implement the transciphering of different instances of the stream cipher family FiLIP with homomorphic encryption schemes of the third generation using the TFHE library. We focus on two kinds of filter for FiLIP. First we consider the direct sum of monomials, already evaluated using HElib and we show the improvements on these results. Then we focus on the XOR-threshold filter, we develop strategies to efficiently evaluate any symmetric Boolean function in an homomorphic way, allowing us to give the first timings for such filters. We investigate different approaches for the homomorphic evaluation: using the leveled homomorphic scheme TGSW, an hybrid approach combining TGSW and TLWE schemes, and the gate boostrapping approach. We discuss the costs in time and memory and the impact on delegation of computation of these different approaches, and we perform a comparison with others transciphering schemes. Keywords: Homomorphic encryption · TFHE · Improved filter permutator · Transciphering

1 Introduction Fully homomorphic encryption (FHE) enables to perform computations over encrypted data without decryption nor learning information on the data. Since the first construction due to Gentry [21] in 2009, FHE is considered as the main solution to conceive a secure delegation of computation. The principle is the following: the delegating party, say Alice, encrypts her data using a FHE scheme and sends it to the computing party, say Bob. He evaluates the functions asked by Alice on her encrypted data, and sends back the encrypted results, without learning the values of the data sent by Alice. Two main efficiency problems arise with this framework: the FHE ciphers are costly to compute for Alice, and the expansion factor between the plaintext size and the ciphertext size is prohibitive. Instead, an efficient framework to delegate computations is obtained with C. Hoffmann—This work has been done during an internship at Thales. c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 39–61, 2020. https://doi.org/10.1007/978-3-030-65277-7_3

40

C. Hoffmann et al.

an hybrid scheme, combining symmetric encryption (SE) and homomorphic encryption [31]. In this framework, Alice uses a classic symmetric scheme to encrypt her data before sending it to the server. The advantages of the SE schemes are the computation by devices with limited computational power or storage, and the optimal expansion factor: the ciphertext size is exactly the data size. Then, Bob transciphers these SE ciphertexts: he homomorphically evaluates the SE decryption to get homomorphic ciphertexts of Alice’s data. Finally, Bob performs the computations required by Alice on the encrypted data and sends back the encrypted results, as in the initial framework. With such hybrid framework, an efficient transciphering gives an efficient delegation of computation. Fully homomorphic encryption allows to evaluate any function, then the efficiency of this evaluation depends on how the function can be expressed in the native operations of the FHE scheme. Following Gentry’s blueprint, FHE schemes are based on a somewhat encryption scheme and a bootstrapping. The somewhat (or leveled) scheme allows to perform a limited amount of two different operations, for example XOR and AND. In an homomorphic ciphertext the message is masked by a quantity of noise, and this noise is increasing during the operations, more or less importantly depending on the operation nature. The bootstrapping is the key technique that resets a ciphertext to be used again in the somewhat encryption scheme. The different costs in time (and storage) between the homomorphic operations and the bootstrapping depend on the FHE scheme, usually1 there is the following hierarchy of cost: one operation is more efficient than the other, and the bootstrapping is way more costly than the operations. Therefore, an efficient transciphering is obtained by a SE scheme which decryption function is fitting with the FHE cost hierarchy. State-of-the-Art. The so-called second generation (2G) of FHE which is represented by the schemes [5, 6, 20, 26] has been widely used to implement transcipherings, using open-source libraries such as HElib [24] and SEAL [10]. In these schemes the multiplication increases the noise way more than the addition and the bootstrapping is very costly. Therefore, when evaluating a function as a circuit, the multiplicative depth dictates the efficiency. The SE schemes considered for transciphering have been standard schemes with low multiplicative depth such as: AES [15, 22], Simon [25], Prince [17] and Trivium [8]. More recent SE schemes are designed for advanced primitive such as multi-party computation, zero-knowledge and FHE and share the property of having a low multiplicative depth, and they give the most efficient transciherings so far. It is the case of the block-cipher LowMC [3, 4], and the stream-ciphers Kreyvium [8], FLIP [30], Rasta [16], and FiLIP [29]. The third generation (3G), beginning with GSW scheme [23] has a very different cost hierarchy. The multiplication is still more costly than the addition, but the error growth is asymmetric in the two operands, then long chains of multiplications can be evaluated without bootstrapping. The multiplicative depth does not dictate the efficiency in this case, and other SE schemes could give a better transciphering. This generation also allows the gate-bootstraping [11, 18], to evaluate a Boolean gate and perform the bootstrapping at the same time, as quick as 13 ms [12] over the TFHE library [14]. The 3G is promising for transciphering but none has been realized so far. The main reasons were the crippling sizes of the original ciphertexts in comparison with the 2G, and the difficulty to adapt a SE scheme to this particular cost hierarchy. 1

The situation is different for FHE schemes that apply a bootstraping at each gate.

Transciphering, Using FiLIP and TFHE

41

In this work we realize a transciphering with a FHE scheme of third generation. The SE scheme we consider is FiLIP [29], a stream-cipher designed for homomorphic evaluation. The principle of this stream cipher is to restrict the homomorphic computation to the evaluation of its filter: only one Boolean function f . For the security point of view, the filter needs to fulfill various cryptographic criteria on Boolean functions to resist known attacks up to a fixed level of security. If this filter can be evaluated appropriately with the homomorphic cost hierarchy, then the whole transciphering is efficient. We implement the transciphering using the TFHE library [14], offering different HE schemes of the third generation. We focus primarily on a version of the TGSW scheme, a variant of GSW [23] over the torus. Contributions. We analyse the homomorphic evaluation of FiLIP with the TGSW scheme, we implement the transcipherings with two families of filters, using three homomorphic schemes of the third generation. We study the homomorphic error growth of FiLIP with TGSW for two kinds of filters: direct sum of monomials (DSM) [29] and XOR-threshold functions suggested in [28]. For the DSM filter the bound on the error generalizes the bound of [30] on FLIP functions with GSW. To analyze the error growth of the second filter we show how to efficiently evaluate any symmetric Boolean function in 3G, and more particularly threshold functions. Then we bound the ciphertext error for XOR-threshold filters, confirming that a function with high multiplicative depth can be efficiently evaluated. We implement the two different kinds of filters for instances designed for 128 bitsecurity with TGSW. We analyse the noise in practice and the timings of this transciphering, which gives a latency of less than 2 s for the whole transciphering. We give a comparison with transcipherings from former works using the second homomorphic generation (on HElib for instance). For an equivalent resulting noise and security level, the latency of our transciphering is better than for the ones already existing. Finally, we implement the same variants of FiLIP with an a hybrid TGSW/TLWE scheme and with the gate-bootstrapping FHE of [12], reaching a latency of 1.0s for an only-additive homomorphic scheme. We provide comparisons between the three evaluations of FiLIP we implemented and the evaluation over HElib in [29]. Roadmap. In Sect. 2, we remind definitions and properties from the TFHE scheme and FiLIP and describe the TGSW scheme we will use. In Sect. 3, we study the homomorphic evaluation of FiLIP filters and give a theoretical bound the noise after the transciphering. Finally, we present our practical results (resulting noises and timings) in Sect. 4 and compare our implementations to the ones already existing.

2 Preliminaries We use the following notations: – B denotes the set {0, 1}, and [n] the set of integers from 1 to n. $

– x← − S means that x is uniformly randomly chosen in a set S. – T denotes the real torus R/Z, i.e. the real numbers modulo 1. – TN [X] denotes R[X]/(X N + 1) (mod 1), R denotes the ring of polynomials Z[X]/(X N + 1) and BN [X] denotes B[X]/(X N + 1).

42

C. Hoffmann et al.

– Vectors and matrices are denoted in bold (e.g. v, A). Mm,n (S) refers to the space of m × n dimensional matrices with coefficients in S. – wH denotes the Hamming weight of a binary vector. – MUX refers to the multiplexer: for binary variables MUX(x1 , x2 , x3 ) gives x2 if x1 = 0 and x3 if x1 = 1. We call x1 the control bit, x2 the value at 0 and x3 the value at 1. 2.1

Homomorphic Encryption and TFHE

In this section, we start by introducing definitions and properties from [11] on homomorphic encryption schemes and operations implemented in the TFHE library [14]. In a second time we describe the leveled homomorphic encryption scheme we will use for transciphering based on the TFHE definitions. TFHE Toolbox. The TFHE library [14] implements a gate-by-gate bootstrapping based on [11–13]. Different homomorphic encryption schemes are combined for this bootstrapping: LWE, TLWE and TGSW. We present only definitions and properties needed for the evaluation of the TGSW leveled encryption scheme we will use in this work, and refer to [11] for more details. Definition 1 (LWE samples). Let k ∈ N a dimension parameter, α ∈ R+ a noise parameter, s a uniformly distributed secret in some bounded set S ⊂ Zn . A LWE sample under the key s with noise parameter α is a couple (a, b) ∈ Tn × T where b − s, a follows a Gaussian distribution of standard deviation α. Definition 2 (TLWE samples). Let k ≥ 1, N a power of 2, α a noise parameter, $ s← − BN [X]k a TLWE secret key. A fresh TLWE sample of a message μ ∈ TN [X] with noise parameter α under the key s is a couple (a, b) ∈ TN [X]k × TN [X], where a is uniformly chosen in TN [X]k and b − s, a follows a Gaussian distribution of standard deviation α centered in μ. The scheme introduced in [11] gives a gate-bootstrapping of LWE ciphers. Instead, we focus on the homomorphic properties of TLWE: TLWE samples can be used to encrypt messages μ ∈ P ⊂ TN [X] as c = (a, b) ∈ TN [X]k × TN [X], where b = a, s + μ + e ∈ TN [X]. This variant of Regev’s secret key encryption scheme is additively homomorphic as far as each coefficient of e is smaller than half the minimal distance between the coefficients of two possible messages. The introduction of TGSW ciphers with a decomposition of TLWE ciphers gives us a multiplicative homomorphic scheme. Definition 3 (Gadget decomposition). For Bg ∈ N, let define the gadget matrix H ∈ M(k+1)·,k+1 (TN [X]) as:

Transciphering, Using FiLIP and TFHE



1/Bg ⎜ .. ⎜ . ⎜ ⎜ 1/Bg ⎜ ⎜ H = ⎜ ... ⎜ ⎜ 0 ⎜ ⎜ . ⎝ .. 0

43

⎞ ... 0 . ⎟ .. . .. ⎟ ⎟ ... 0 ⎟ ⎟ . ⎟ .. . .. ⎟ ⎟ . . . 1/Bg ⎟ ⎟ .. ⎟ .. . . ⎠ . . . 1/Bg

Decomph,β, (v) is a decomposition algorithm on the gadget H with quality β and precision  if and only if, for any TLWE sample v ∈ TN [X]k+1 , it efficiently and publicly outputs a small vector u ∈ R(k+1) such that ||u||∞ ≤ β and ||u×H−v||∞ ≤ . Furthermore, the expectation of u × H − v must be 0 when v is uniformly distributed in TN [X]k+1 . B

Such a decomposition with β = 2g and  = in [11]. It allows to define TGSW samples:

1 2Bg

exists and an example is described

Definition 4 (TGSW samples). Let  and k ≥ 1 two integers, α ≥ 0 a noise parameter and H the gadget matrix. Let s ∈ BN [X]k a TLWE key, then C ∈ M(k+1),k+1 (TN [X]) is a fresh TGSW sample of μ ∈ R such that μ · H = 0 with noise parameter α if and only if C = Z + μ · H where each row of Z ∈ M(k+1),k+1 (TN [X]) is a TLWE cipher of 0 with noise parameter α. Note that the product between μ and H is the R-module product, which means that each coefficient of H is multiplied by μ. TGSW ciphers remain homomorphically additive, and we can now introduce the homomorphic multiplications: Definition 5 (External and internal products). Let define the external product  and internal product  as:  : TGSW × TLWE → TLWE (A, b) → A  b = DecompH,β, (b) · A  : TGSW × TGSW → TGSW



⎜ (A, B) → A  B = ⎝

A  b1 .. .

⎞ ⎟ ⎠

A  b(k+1) where ∀i ∈ {1, . . . , (k + 1)}, bi corresponds to the i-th line of B. A TGSW Somewhat Homomorphic Scheme. We describe a version of TGSW as a somewhat homomorphic encryption scheme, allowing to perform a bounded number of additions and multiplications. We consider a secret key scheme with plaintext space P = {0, 1}.

44

C. Hoffmann et al.

Definition 6 (TGSW leveled homomorphic encryption scheme). Let k, N ∈ N∗ , N a power of 2 the dimension parameters. Let , Bg ∈ N the decomposition parameters. Let α ∈ R+ the noise parameter. $

− BN [X]k . – KeyGen(k, N ). From dimension parameters k, N , output sk ← k – Enc(sk, μ, , Bg , α). Using as inputs sk ∈ BN [X] , μ ∈ {0, 1}, , Bg decomposition parameters, and α the noise parameter: $ • Pick A ← − M(k+1),k (TN [X]). • Compute e = (ei )i∈[(k+1)] ∈ TN [X](k+1) where each ei follows a centered Gaussian distribution of standard deviation α. ⎞ ⎛ a1 , sk + e1 ⎟ ⎜ .. • Compute Z = ⎝ A ⎠ ∈ M(k+1),k+1 (TN [X]). . a(k+1) , sk + e(k+1) • Output C = Z + μ · H. – Dec(sk, C, , Bg ). Using as inputs the secret key sk, and a ciphertext C, • Denote (a, b) ∈ TN [X]k × TN [X] the (k + 1)th line of C, compute ϕ = b − sk, a ∈ TN [X]. • Round up the constant coefficient of ϕ to the closest Big ∈ T where i ∈ N, im . denoted B g • Output m ∈ {0, 1} the parity of im . – The Eval algorithm consists in iterating the homomorphic operations Add and Mul. • Add(C1 , C2 ) : C+ = C1 + C2 . • Mul(C1 , C2 , Bg , ) : C× = C1  C2 .

With this scheme, a TGSW ciphertext remains valid as far as the error terms are lower than 2B1 g . To follow the noise evolution with the homomorphic computations we use a worst case bound on the error coefficients (infinite norm), or an average bound on the variance of these coefficients, using the independence assumption formalized in [11]. To relate the error norm and the variance we use the Markov’s inequality applied on subgaussians as in [11], it allows to estimate the maximal variance that can be tolerated for a fixed decryption failure. Assumption 1 (Independence Heuristic ([11] Assumption 3.6, [13] Assumption 3.11)). All the coefficients of the error of TLWE or TGSW samples that occur in all the operations we consider are independent and concentrated. More precisely, they are σ-subgaussian where σ is the square-root of their variance. Proposition 1 (Markov’s inequality on subgaussians [11]). Let X be a σt2

subgaussian then ∀t > 0, P (|X| > t) ≤ 2e− 2σ2 . We summarize the noise evolution during the homomorphic evaluation proven in [11]. The equations are simplified since μ ∈ {0, 1}: since the plaintexts are binary, they directly appear in the noise formula.

Transciphering, Using FiLIP and TFHE

45

Proposition 2 (TGSW noise evolution, adapted form [11]). Using the notations of Definition 6, for i ∈ [3] let Ci be a TGSW cipher of μi with error noise variance Vi and infinite norm εi . We denote by V+ and ε+ the error variance and infinite norm of C1 +C2 , by V× and ε× the error variance and infinite norm of C1 C2 , by VM and εM the error variance and infinite norm of MUX(C1 , C2 , C3 ) = C1  (C3 − C2 ) + C2 . Then: ⎧ V+ ≤ V 1 + V 2 , ⎨ ε+ ≤ ε1 + ε 2 , V× ≤ c3 V1 + μ1 (c4 + V2 ) , ε× ≤ c1 e1 + μ1 (c2 + e2 ) , ⎩ εM ≤ max(ε2 , ε3 ) + c1 e1 + c2 , VM ≤ max(V2 , V3 ) + c3 V1 + c4 , where c1 = (k + 1)N (Bg /2), c2 = (1 + kN )/(2Bg ), c3 = (k + 1)N (Bg /2)2 , c4 = (1 + kN )/(2Bg )2 . The variances bounds are obtained assuming Assumption 1. 2.2 Boolean Functions and FiLIP We recall the definitions of a Boolean function, their common representation, and some families of functions. Definition 7 (Boolean function). A Boolean function f with n variables is a function from Fn2 to F2 . Definition 8 (Algebraic Normal Form (ANF)). We call Algebraic Normal Form of a Boolean function f its n-variable polynomial representation over F2 : 

aI xi = aI xI , f (x) = I⊆[n]

i∈I

I⊆[n]

where aI ∈ F2 . • The algebraic  degree of f equals the global degree max{I | aI =1} |I| of its ANF. • Any term i∈I xi in such an ANF is called a monomial and its degree equals |I|. Definition 9 (Direct sum of monomials & direct sum vector [30]). Let f be a Boolean function of n variables, we call f a Direct Sum of Monomials (or DSM) if the following holds for its ANF: ∀(I, J) such that aI = aJ = 1, I ∩ J ∈ {∅, I ∪ J}. Let f a DSM, we define its direct sum vector: mf = [m1 , m2 , . . . , mk ] of length k = deg(f ), where mi is the number of monomials of degree i of f : for i > 0, mi = |{aI = 1, such that |I| = i}|. Note that DSM corresponds to functions where each variable appears at most once in the ANF. Definition 10 (Threshold function). Let n ∈ N∗ , for any positive integers d ≤ n + 1 we define the Boolean function Td,n as:  1 if wH (x) ≥ d, ∀x = (x1 , . . . , xn ) ∈ Fn2 , Td,n (x) = 0 otherwise.

46

C. Hoffmann et al.

Definition 11 (XOR − THR function). For any positive integers k, d and n such that as: d ≤ n + 1 we define XTHRk,d,n for all z = (x1 , . . . , xk , y1 , . . . , yn ) ∈ Fk+n 2 XTHRk,d,n (z) = x1 + · · · + xk + Td,n (y1 , . . . , yn ) = XORk (x) + Td,n (y). The symmetric encryption schemes we will evaluate are binary streamciphers following the improved filter permutator construction [29], illustrated in Fig. 1. The encryption process is the following: denoting by N the size of the key, n the size of the input of the filtering function (n ≤ N ) and f the n-variables filtering function: – The forward secure PRNG outputs the subset, the permutation and the whitening at each clock cycle, – the subset Si is chosen as a subset of n elements over N , – the permutation Pi from n to n elements is chosen, – the whitening wi ∈ Fn2 is chosen, – the key-stream bit is computed as si = f (Pi (Si (K)) + wi ).

Fig. 1. Improved filter permutator constructions.

The streamcipher family FiLIP is an instantiation of the improved filter permutator paradigm where the PRNG is a variant of AES in counter mode, and the wire-cross permutations are generated by the Knuth shuffle. We will focus on 3 candidates proposed for 128-bit security: – FiLIP-1216 [29], DSM filter, mf = [128, 64, 0, 80, 0, 0, 0, 80], n = 1216, N = 214 , – FiLIP-1280 [29], DSM filter, mf = [128, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64], n = 1280, N = 212 , – FiLIP-144 [28], XOR-THR filter, f = XTHR81,32,63 , n = 144, N = 214 .

Transciphering, Using FiLIP and TFHE

47

3 Homomorphic Evaluation of FiLIP During the transciphering with FiLIP only the filter evaluation has an impact on the ciphertexts’ noise. Indeed, as explained in [29] the subset selection and wire-cross permutation are performed in clear. The whitening gives indexes where a homomorphic NOT needs to be applied, which can be realized without increasing the noise. In this section we study how to evaluate FiLIP filters efficiently with TGSW. First we give a Boolean circuit to compute any symmetric Boolean functions with few gates. This circuit improves the homomorphic evaluation for several FHE schemes, but its design is primarily based on optimizing GSW-like FHE evaluation. It will be the core of the evaluation of XOR-THR filters. Then we specify which representation and therefore which circuit we use for the two kinds of filters, giving a lower bound on the Boolean gates to homomorphically evaluate in each case. Finally we give an upper bound on the noise in the TGSW ciphertext obtained after evaluating the different filters. 3.1 Evaluation of Symmetric Boolean Functions In this part we focus on efficiently evaluating threshold functions. Since they are symmetric Boolean functions (the result does not depend on the order of the n inputs), they can be evaluated in different ways. The ANF of such functions is characterized in [27]; when n ≥ 2log d all the monomials of this degree appear in the ANF, which seems prohibitive for any evaluation. Nevertheless, evaluating branching programs [7] or finite automata [11] has been shown to be very promising with the third generation. Hence, we consider the evaluation of thresholds function, or other symmetric Boolean function, with MUX gates (multiplexers) rather than based on the ANF representation. Definition 12 (Circuit for symmetric functions). Let n ∈ N, n ≥ 2, we define the Boolean circuit Cn with n inputs x1 to xn and n + 1 outputs y0 to yn with the following gates: – n NOT gates N1 to Nn , where for i ∈ [n] each Ni has input xi . – 2(n − 1) AND gates Ai,0 and Ai,i for i ∈ [2, n], where: • the inputs of A2,0 are N1 and N2 , and for i ∈ [3, n] the inputs of Ai,0 are Ni and Ai−1,0 , • the inputs of A2,2 are x1 and x2 , and for i ∈ [3, n] the inputs of Ai,i are xi and   Ai−1,i−1 . – n2 MUX gates Mi,j where i ∈ [2, n] and j ∈ [i − 1], where: • the inputs of M2,1 are x2 for the control bit, x1 for the value at 0, and N1 for the value at 1, • the inputs of Mi,1 for i ∈ [3, n] are xi for the control bit, Mi−1,1 for the value at 0, and Ai−1,0 for the value at 1, • the inputs of Mi,i−1 for i ∈ [3, n] are xi for the control bit, Ai−1,i−1 for the value at 0, and Mi−1,i−2 for the value at 1, • the inputs of Mi,j for i ∈ [4, n] and j ∈ [2, i − 2] are xi for the control bit, Mi−1,j for the value at 0, and Mi−1,j−1 for the value at 1. The outputs are given by An,0 , Mn,1 , . . . , Mn,n−1 , An,n .

48

C. Hoffmann et al.

To illustrate we give the example of C7 in Fig. 2. The principle of this multi-output circuit is to compute the Hamming weight of the vector of inputs, as formalized in the following proposition: Proposition 3. Let n ∈ N, n ≥ 2, ∀x ∈ Fn2 the vector of outputs of Cn gives 1 at index wH (x) and 0 elsewhere. Proof. We do the proof by induction. We show that ∀i ∈ [2, n] only one of the gates Ai,j or Mi,j (where j ∈ [0, i]) gives 1. The index j of this gate corresponds the Hamming weight of the i-th-length prefix of x: (x1 , . . . , xi ). Note that the statements always neglect the NOT gates. For the initialization step we exhaust the four cases, the values for (x1 , x2 ) ∈ F22 of (A2,0 , M2,1 , A2,2 ) are: (0, 0; 1, 0, 0), (1, 0; 0, 1, 0), (0, 1; 0, 1, 0), and (1, 1; 0, 0, 1). For the four cases only one of the three gates of level 2 gives 1, the one with index j = wH (x1 , x2 ), which validates the initialization. Let assume for one i ∈ [2, n − 1] that only the gate of index j = wH (x1 , . . . , xi ) outputs 1 and all the others at level i give 0. Note that at level i + 1 the AND gate Ai+1,0 could by written as a MUX gate with control bit xi+1 , the output of Ai,0 for value at 0 and 0 for value at 1. Similarly, the AND gate Ai+1,i+1 could by written as a MUX gate with control bit xi+1 , the output of Ai,i for value at 1 and 0 for value at 0. Consequently, the i + 2 gates at level i + 1 correspond to MUX gates: they all output their value at 0 (i.e. value of their right parent) if xi+1 = 0, or value at 0 (i.e. left parent) if xi+1 = 1. Therefore, the vector of outputs at level i + 1 is the one of level i with a 0 appended on the right if xi+1 = 0 or on the left if xi+1 = 1. It guarantees that there is only one 1 at level i + 1, the index being wH (x1 , . . . , xi ) if xi+1 = 0 and wH (x1 , . . . , xi ) + 1 if xi+1 = 1, in both cases it corresponds to wH (x1 , · · · , xi+1 ). This proves the induction step, and allows to conclude the induction. Since the property applies for i = n the outputs y0 , · · · , yn of Cn for x ∈ Fn2 are such that ywH (x) = 1 and the others are equal to 0.   The interest of the circuit Cn is to compute symmetric Boolean functions with a low number of gates. Note that for i ∈ [0, n] the output yi gives the result of the indicator function of the set {x ∈ Fn2 | wH (x) = i}. Since these functions form a basis of the symmetric n-variable Boolean functions, any symmetric n-variable Boolean function can be computed by xoring outputs of Cn . For the threshold function Td,n it would consist in xoring the outputs from yd to yn . Nevertheless, computing Cn and xoring some of the outputs involves unnecessary computations, therefore we will prefer a simplified circuit for Td,n which has around twice less gates. To simplify the description we introduce a new Boolean gate, used instead of some MUX gates in Cn . Definition 13 (Modified MUX gate). Let x1 , x2 , x3 ∈ Fn2 , the modified MUX gate WUX is defined as WUX(x1 , x2 , x3 ) = x1 x3 + x2 , where by analogy with the MUX x1 is called the control bit, x2 the value at 0, and x3 the value at 1. Note that the only difference with the classic gate is that x3 is x3 + x2 in the MUX gate, and the WUX gate is simply computed with one AND and one XOR. The simpler circuit to compute Td,n is obtained by two modifications. First, we delete the gates only

Transciphering, Using FiLIP and TFHE

49

Fig. 2. Boolean circuit C7 . The control bit of each level of MUX is the input xi on the left side, the left arrow (→) in input of a MUX corresponds to the input for a control bit equal to 0 and the right arrow (→) for the value at 1.

used to compute the yi such that i < d (it corresponds to delete An−d+1,0 and all the gates depending on its value). Then, we subside the parts leading to yi such that i > d by sums as soon as possible in the computation using WUX gates (it corresponds to delete Ad+1,d+1 and all the gates depending on its value, and converting the MUX gates depending on the value of Ad,d into WUX gates). We formalize the obtained circuit in Definition 14, illustrate it with the example of T4,7 in Fig. 3, and assess its property in the following proposition. Definition 14 (Threshold circuit). Let d, n ∈ N, n ≥ 4, 2 ≤ d ≤ n − 2, we define the Boolean circuit Td,n with n inputs x1 to xn and one output with the following gates: – n − d NOT gates N1 to Nn−d , where for i ∈ [n − d] each Ni has input xi . – n − 2 AND gates Ai,0 for i ∈ [2, n − d] and Ai,i for i ∈ [2, d], where: • the inputs of A2,0 are N1 and N2 , and for i ∈ [3, n − d] the inputs of Ai,0 are Ni and Ai−1,0 , • the inputs of A2,2 are x1 and x2 , and for i ∈ [3, d] the inputs of Ai,i are xi and Ai−1,i−1 . – (n − d)(d − 1) MUX gates Mi,j for i ∈ [2, n − 1], j ∈ [max(1, i − (n − d)), min(i − 1, d − 1)], where: • the inputs of M2,1 are x2 for the control bit, x1 for the value at 0, and N1 for the value at 1, • the inputs of Mi,1 for i ∈ [3, n − d + 1] are xi for the control bit, Mi−1,1 for the value at 0, and Ai−1,0 for the value at 1,

50

C. Hoffmann et al.

• the inputs of Mi,i−1 for i ∈ [3, d] are xi for the control bit, Ai−1,i−1 for the value at 0, and Mi−1,i−2 for the value at 1, • the inputs of Mi,j for i ∈ [4, n−1] and j ∈ [max(1, i−(n−d)), min(i−1, d−1)] are xi for the control bit, Mi−1,j for the value at 0, and Mi−1,j−1 for the value at 1. – (n − d) WUX gates Wi,d for i ∈ [d + 1, n] where: • the inputs of Wd+1,d are xd+1 for the control bit, Ad,d for the value at 0, and Md,d−1 for the value at 1, • the inputs of Wi,d for i ∈ [d + 2, n] are xi for the control bit, Wi−1,d for the value at 0, and Mi−1,d−1 for the value at 1. The output is given by Wn,d .

Fig. 3. Boolean circuit T4,7 . The control bit of each level of MUX is the input xi on the left side, the left arrow (→) in input of a MUX (or WUX) corresponds to the input for a control bit equal to 0 and the right arrow (→) for the value at 1.

Proposition 4. Let d, n ∈ N, n ≥ 4, 2 ≤ d ≤ n − 2, ∀x ∈ Fn2 the Boolean circuit Td,n computes Td,n (x). Proof. By construction Td,n is obtained from Cn by two main transformations: deleting An−d+1,0 and the gates depending on its output (plus the d NOT gates such that i > n − d), and merging the gates depending on Ad,d into one WUX gate at each level i for i ∈ [d + 1, n]. We will first prove by induction that in the circuit obtained from Cn by

Transciphering, Using FiLIP and TFHE

51

merging the gates depending on Ad,d in n − d WUX gates Wi,d , the following property holds: Wi,d gives 1 if and only if wH (x1 , . . . , xi ) ≥ d. Then we will show that deleting all the part depending on An−d+1,0 does not impact the computation of Wn,d which gives the output. We define the Boolean circuit Cd,n as Cn where for i ∈ [d + 1, n] the AND gate Ai,i and the MUX gates Mi,j for j ∈ [d, i − 1] are merged into the WUX gate Wi,d . Wd+1,d has control bit xd+1 , Ad,d as value at 0 and Md,d−1 as value at 1. For i ∈ [d+2, n], Wi,d has control bit xi , Wi,d as value at 0 and Mi,d−1 as value at 1. We show by induction that ∀i ∈ [d + 1, n] Wi,d gives 1 if and only if wH (x1 , . . . , xi ) ≥ d, and only one of the gates Ai,0 , Wi,d , Mi,j for j ∈ [d − 1] gives 1. For the initialization step, i = d + 1, Cd,n up to the level d is exactly Cd , therefore using Proposition 3 Md,d−1 gives 1 if and only if wH (x1 , . . . , xd ) = d − 1 and Ad,d gives 1 if and only if wH (x1 , . . . , xd ) = d. We consider the three possible cases: both gates give 0, Md,d−1 gives 1, and Ad,d gives 1. In the first case the value of Wi,d is xd+1 ·0+0 = 0 and since wH (x1 , . . . , xd ) < d−1 we get wH (x1 , . . . , xd+1 ) < d. In the second case the value of Wi,d is xd+1 · 1 + 0 = xd+1 and since wH (x1 , . . . , xd ) = d − 1 we get wH (x1 , . . . , xd+1 ) = d if xd+1 = 1 and wH (x1 , . . . , xd+1 ) = d − 1 if xd+1 = 0. In the third case the value of Wi,d is xd+1 · 0 + 1 = 1 and since wH (x1 , . . . , xd ) = d we get wH (x1 , . . . , xd+1 ) ≥ d. Summarizing the three cases Wd+1,d gives 1 if and only if wH (x1 , . . . , xd+1 ) ≥ d. Note that the gates Ad+1,0 and Md+1,j for j ∈ [d − 1] are computed exactly as in Cd+1 . Hence, from Proposition 3 only one of them gives 1 if wH (x1 , . . . , xd+1 ) < d and none if wH (x1 , . . . , xd+1 ) ≥ d. Therefore only one of the gates Ad+1,0 , Wd+1,d , Md+1,j for j ∈ [d − 1] gives 1, which validates the initialization. Let assume that for one i ∈ [d+2, n] Wi,d gives 1 if and only if wH (x1 , . . . , xi ) ≥ d, and only one of the gates Ai,0 , Wi,d , Mi,j for j ∈ [d − 1] outputs 1. Note that the gates Ai+1,0 , Mi+1,j for j ∈ [d − 1] are computed exactly as in Ci+1 , therefore Proposition 3 guarantees that at most one outputs 1: if its index j = wH (x1 , . . . , xi+1 ). We consider the three possible cases: Wi,d and Mi,d−1 both give 0, Mi,d−1 gives 1, and Wi,d gives 1. In the first case the value computed by Wi+1,d is 0, since wH (x1 , . . . , xi ) < d − 1 (Wi,d and Mi,d−1 both null) then wH (x1 , . . . , xi+1 ) < d, and only one of the gates Ai+1,0 , Mi+1,j for j ∈ [d − 1] gives 1. In the second case the value computed by Wi+1,d is xi+1 , therefore if xi+1 = 0 then wH (x1 , . . . , xi+1 ) = wH (x1 , . . . , xi ) = d − 1 and Mi+1,d−1 is the only gate at level i + 1 giving 1. If xi+1 = 1 then wH (x1 , . . . , xi+1 ) = wH (x1 , . . . , xi ) + 1 = d, none of the gates Ai+1,0 , Mi+1,j for j ∈ [d − 1] gives 1, hence Wi+1,d is the only one giving 1. In the last case the value computed by Wi+1,d is 1, wH (x1 , . . . , xi ) ≥ d so wH (x1 , . . . , xi+1 ) ≥ d and only the gate Wi+1,d gives 1 at level i+1. Summarizing the different cases Wi+1,d gives 1 if and only if wH (x1 , . . . , xi+1 ) ≥ d, and only one of the gates Ai+1,0 , Wi+1,d , Mi+1,j for j ∈ [d − 1] outputs 1, which allows to conclude the induction. To finalize the proof, note that the output of Cd,n given by Wn,d , that we call zd , gives the value of Td,n (x). None of the gates depending on An−d+1,0 nor Ni for i ∈ [n − d + 1, n] are evaluated in the path leading to zd . Hence zd can be computed by the circuit modified from Cd,n by removing all these gates. It corresponds to the circuit Td,n , concluding the proof.  

52

C. Hoffmann et al.

Remark 1. The restrictions d ≥ 2 and n − d ≥ 2 come from the circuit description: always having AND gates on the left and right side. Valid circuits for the remaining values can be obtained by removing some of these gates (changing the general description): the Ai,0 gates are unnecessary for d ≥ n − 2 and the Ai,i gates are unnecessary for d ≤ 2. 3.2

Evaluating FiLIP Filters

For the homomorphic evaluation the filter needs to be computed without knowing the value in the input ciphertexts, hence using the same Boolean circuit for the 2n possible inputs. A circuit to evaluate a Boolean function f can be derived from its ANF: each monomial can be computed as the AND of the variables of this monomial, and the sum over all the monomials in the ANF can be performed with XOR gates. It is the strategy we use to evaluate DSM since they have a very sparse ANF (at most n monomials over 2n ). The situation is different for XOR-THR functions, the threshold part has a dense ANF. The threshold function we consider for FiLIP-144: T32,63 belongs to  t+1  the subfamily T2t ,2t+1 −1 and therefore its ANF consists in the 2 2t−1 monomials of degree 2t (see e.g. [27] Theorem 1). In this case the circuit based on the ANF would lead to XOR around 9 · 1017 AND of 32 terms. Instead we will use the circuit Td,n of Sect. 3.1. In the following proposition we give the number (and nature) of Boolean gates required to homomorphically compute the different filters. Proposition 5 (FiLIP’s filter evaluation). Let f be the filter function of FiLIP, f can be computed with at most the following number of Boolean gates: – f is the DSM with direct sum vector m  f = [m1 , . . . , mk ]: 0 NOT, m − 1 XOR, and k n − m AND where m = m i . i=1 – f is the XOR-threshold function XTHRk,d,n : n − d NOT, (n − d)(2d − 1) + k XOR, and (n − d)d + n − 2 AND. Proof. For a DSM, since each variable appears only once in the ANF, the result can be computed in n − 1 XOR and AND gates. The number of monomials being m, m − 1 XOR are needed, which also gives the number of AND. For a XOR-THR function, using Proposition 4 the Boolean circuit Td,n computes Td,n , which gives the number of NOT, AND, MUX, and WUX gates to evaluate it. Counting that a MUX can be computed with 2XOR and 1AND, and 1XOR and 1AND for a WUX, and that k XOR are necessary to sum the XORk part leads to the final number of gates.   3.3

Noise Evolution with TGSW

In this part we bound the error infinite norm and variance of the ciphers after transciphering in term of error parameters of the initial ciphers. Each XOR is evaluated by Add, each AND by Mul and NOT gate by subtracting the ciphertext to H.

Transciphering, Using FiLIP and TFHE

53

Proposition 6 (FiLIP error-growth). Let f be the filter function of FiLIP, let Ci , i ∈ [n], fresh TGSW ciphertexts with error variance and infinite norm V and ε, and Cf the ciphertext obtained by homomorphically evaluating f with error variance and infinite norm Vf and εf . The following bounds apply: – if f is the DSM with direct sum vector mf = [m1 , . . . , mk ] then: εf ≤ (n − m)(c1 ε + c2 ) + mε, and Vf ≤ (n − m)(c3 V + c4 ) + mV, – if f is the XOR-threshold function XTHRk,d,n then: εf ≤

(n + d − 2)(n − d + 1) (c1 ε + c2 ) + (n − d + k + 1)ε, and 2

Vf ≤

(n + d − 2)(n − d + 1) (c3 V + c4 ) + (n − d + k + 1)V, 2

where the variance bounds assume the independence heuristic Assumption 1. Proof. We refer to noise parameters for the two quantities error infinite norm and variance. The results on the error variance assume Assumption 1, not the ones on the error infinite norm. We begin the proof by considering the DSM filter. Using Proposition 2, the noise parameters associated to a product of i ∈ N∗ fresh ciphertexts (the noisiest ciphertext always taken as the second operand) are εΠi ≤ (i − 1)(c1 ε + c2 ) + ε and VΠi ≤ (i − 1)(c3 ε + c4 ) + V . It gives the noise parameters of the ciphertexts corresponding to the monomial of degree i. We use the noise formulas for the addition on the mi products of i ciphertexts (for i ∈ [k]) and then for the sum of these k ciphertexts. It finally gives: εf ≤

k i=1



k i=1

mi εΠi ≤

k

mi ((i − 1)(c1 ε + c2 ) + ε)

i=1

imi (c1 ε + c2 ) +

k

mi (ε − (c1 ε + c2 )) = (n − m)(c1 ε + c2 ) + mε,

i=1

where m is the number of monomials of f and n its number of variables. Similarly, Vf ≤ (n − m)(c3 V + c4 ) + mV . For the XOR-threshold function we start with the noise parameters for the evaluation of Td,n using the circuit Td,n . We bound the noise parameters of each ciphertext of the binary value obtained at each gate of the circuit since the ciphertext of Td,n is the output of the circuit. for readability we write noise parameters of the gate for the noise parameters of the ciphertext obtained after the evaluation of the circuit up to this gate. We proceed by studying separately the NOT, AND, MUX, and WUX gates, and we refer to their level in the circuit (the index i in Definition 14). First, the noise parameters of NOT gates is the same as its input. Then, the AND gates are obtained by products of fresh ciphertexts then the noise parameters of Ai,j are εΠi and VΠi . For the MUX gates we show by induction on the level i ∈ [2, n − 1] that all MUX of level i have noise

54

C. Hoffmann et al.

parameters εMi ≤ (i − 1)(c1 ε + c2 ) + ε and VMi ≤ (i − 1)(c3 V + c4 ) + V (the same bounds as for εΠi and VΠi ). For the initialization step, i = 2, the only MUX gate is computed as MUX(C2 , C1 , H − Ci ). Thereafter, the noise formula of the MUX in Proposition 2 gives εM2 ≤ ε + c1 ε + c2 , validating the initialization. For the induction step, we assume that at level i we have the bound εMi ≤ ε + (i − 1)(c1 ε + c2 ). By definition of Td,n each MUX gate at level i + 1 has input control bit xi+1 and the two other inputs are AND or MUX gates of the level i. From the induction hypothesis the error infinite norm relative to both these inputs is lower than or equal to ε + (i − 1)c1 ε + c2 . Hence, the noise formula for the MUX gives εMi+1 ≤ ε+(i−1)(c1 ε+c2 )+c1 ε+c2 , validating the induction step. The proof for VMi follows similar arguments, which allows to conclude the induction proof for the MUX gates. Finally, for the WUX gates we show by induction on k ∈ [n − d] that the WUX gate Wd+k,d has noise parameters εWd+k ≤ (2d + k − 2)(k + 1)(c1 ε + c2 )/2 + (k + 1)ε and similarly VWd+k ≤ (2d + k − 2)(k + 1)(c3 V + c3 )/2 + (k + 1)V assuming the independence heuristic. Note that k (2d+k−2)(k+1)(c1 ε+c2 )/2+(k+1)ε = i=0 ((d−1+i)(c1 ε+c2 )+ε) which is the formula we will use for the induction. For the initialization step, k = 1, the WUX gate Wd+1,d is computed as XOR(AND(xd+1 , Md,d−1 ), Ad,d ). Thereafter, the noise formulas of Mul and Add of Proposition 2 give εWd+1 ≤ d(c1 ε+c2 )+ε+(d−1)(c1 ε+c2 )+ε, 1 which is equivalent to i=0 ((d − 1 + i)(c1 ε + c2 ) + ε), validating the initialization. For the induction step, we assume that for one value of k in [n − d − 1] we have k εWd+k ≤ i=0 ((d − 1 + i)(c1 ε + c2 ) + ε). By definition of Td,n the WUX gate at level d + k + 1 has control bit xd+k+1 and the two other inputs are a MUX and the WUX of level k + d. Then, we know the noise parameters of the MUX from the precedent part and the error infinite norm of the WUX gate in input from the induction hypothesis. Hence, the product and addition noise formulas allow to conclude k+1 εWd+k+1 ≤ i=0 ((d−1+i)(c1 ε+c2 )+ε), validating the induction step. The proof for VWd+k follows similar arguments, which allows to conclude the induction proof for the MUX gates. The last WUX gate (Wn,d ) corresponds to the ciphertext of Td,n , therefore its noise parameters are εTd,n ≤ (n + d − 2)(n − d + 1)(c1 ε + c2 )/2 + (n − d + 1)ε and VTd,n ≤ (n + d − 2)(n − d + 1)(c3 V + c4 )/2 + (n − d + 1)V . Finally, the XTHRk,d,n filter is evaluated by performing k Add of fresh ciphertexts to the ciphertext corresponding to Td,n . Using the precedent part of the proof and the noise formulas of Proposition 2 for the addition we obtain εf ≤ (n + d − 2)(n − d + 1)(c1 ε + c2 )/2 + (n − d + 1 + k)ε and Vf ≤ (n + d − 2)(n − d + 1)(c3 ε + c4 )/2 + (n − d + 1 + k)V .   Remark 2. The homomorphic evaluation of Td,n can also be based on the circuit Cn , adding the last n−d output ciphertexts. It would lead to an infinite error norm εTd,n such that εTd,n ≤ (n − d + 1)((n − 1)(c1 ε + c2 ) + ε). With the same strategy, any n-variable symmetric Boolean function f can be evaluated with εf ≤ (n+1)((n−1)(c1 ε+c2 )+ε). Also, multiplying the outputs yi by xi for i ∈ [n] of Cn and summing these n products allows to compute the Hidden Weighted Bit Function (HWBF), which is known to have good cryptographic properties [32]. The corresponding evaluation would lead to εf ≤ n(n(c1 ε + c2 ) + ε).

Transciphering, Using FiLIP and TFHE

55

4 Implementation of FiLIP with TGSW In this section we present our implemention of the transciphering of FiLIP with the TGSW scheme. First, we precise our selection of parameters and the settings of our implementation. Then, we analyze the noise obtained during the evaluation of FiLIP, and we compare it to the limit bound at which the decryption fails. We give the timings of the transciphering for the different filter choices, and compare these results with the transciphering from other works. Finally, we implement two evaluations of FiLIP with other 3G schemes implemented in TFHE, and compare the different options. 4.1 Selection of Parameters Since the SE scheme has a bit-security of 128 we select the parameters to ensure the same security for the homomorphic scheme. Furthermore, we decide to fix the maximum probability of failure for the decryption at 2−128 , which is more restrictive that the usual choices in homomorphic evaluation. It is more coherent with the SE scheme which decryption is always correct, and it thwarts attacks using low decryption failure [19]. We use the estimator [2] for the concrete security of the LWE problem to fix the parameters k, N and α (α being the noise of a fresh cipher). More precisely for the TGSW scheme we estimate that the modulo for LWE (i.e. the coefficient ring) is equal to 232 , since the TFHE library represents elements x ∈ T as the integers 232 x encoded on 32 bits. Fixing k = 1 the scheme relies more precisely on RLWE, but since there is no known attacks that are more efficient against RLWE than against LWE, we believe that this estimator is accurate for this scheme. Accordingly, we use the following parameters: k = 1, N = 1024 and α = 1e−9, which ensures 128-bit security. Then we choose two sets of decomposition parameters  and Bg : – Set 1: k = 1, N = 1024, α = 1e−9, Bg = 25 ,  = 6. – Set 2: k = 1, N = 1024, α = 1e−9, Bg = 2,  = 20. Using Proposition 1 we can determine the maximal variance allowed with these param2 −1 eters: 2e(8Bg Vmax ) = 2−128 ⇐⇒ Vmax = (1032Bg2 ln(2))−1 . It gives Vmax = 1.37e−6 for the Set 1 and for the Set 2, Vmax = 3.49e−5. As we will see in the following subsections, the first set of parameters gives better results in term of noise and speed, but the second one is useful for the comparison with the hybrid TGSW/TLWE scheme in Sect. 4.4. 4.2 Noise Evolution in Practice We experimentally study the noise of the transciphering, it allows to compare it with the theoretical upper bounds of Proposition 6 which does not take into account the rounding error inhenrent in the representation of real numbers. In order to evaluate the concrete noise of the transciphering output ciphertexts, we implement a function that computes the noise of a ciphertext. This function behaves like the decryption one, except it does not round up to the closest message but compare to what a noiseless ciphertext would

56

C. Hoffmann et al.

be: It starts by isolating the coefficient in which we retrieve the message (the constant coefficient of the polynomial obtained by computing a scalar product with the (k+1)th line). It then outputs the difference between this value and the closest Big , which would be the message in the decryption algorithm. We compute the average noises and variance noises over samples of at least 100 independent ciphertexts (we do not gain much precision on the variance with more samples). We give the resulting ciphertexts noise in Table 1, where the average noises are given normalized to the decryption limit (i.e. divided by 2B1 g , since decryption fails if the noise exceeds this limit). We do not present the evaluation of FiLIP-144 with the Set 2 because the noise variance growth is too important. We could get around this issue by allowing a slightly higher probability of decryption failure or by finding a better inequality to bound subgaussians variance. However the Set 2 is mainly presented to compare to the hybrid TLWE/TGSW scheme (which use this set of parameters), and we only implemented this hybrid scheme with DSM filters since they are more efficient. Table 1. Transciphering noises of FiLIP over TGSW, comparison with different filters and sets of parameters for 128 bits security. The average noise represents the ciphertexts noise normalized  denotes the measured variance and Vmax denotes the by the maximum decryption noise. Vem maximum variance defined above. Filter

 Parameters Average noise Vem

Vmax

FiLIP-1280 Set 1 FiLIP-1216 FiLIP-144

1.17e−3 2.18e−3 3.26e−3

1.34e−10 1.37e−6 2.88e−10 7.19e−10

FiLIP-1280 Set 2 FiLIP-1216

9.22e−2 1.75e−1

7.41e−6 1.62e−5

3.49e−5

With this HE scheme, the noise evolution during the homomorphic product is asymmetrical, and as long as one of the operand is a fresh ciphertext the noise remains small. Unfortunately, when performing transciphering, we loose the ability to use fresh ciphertexts as operands. This limits the number of multiplications we can perform with our resulting ciphertexts (without bootstrapping) with these sets of parameters, allowing only operations over the R-module. Allowing more multiplications directly would require to increase the LWE modulus in order to get a better exit noise, which is not possible without in-depth changes to the TFHE library [14]. 4.3

Performance and Comparisons

We provide the timings for the transciphering of FiLIP, for instances of DSM filter and XOR-threshold filter with the TGSW scheme. The timings in this section were obtained with a personal laptop with processor Intel(R) Core(TM) i5-6600K CPU @ 3.50 GHz. We summarize the results in Table 2, the latency corresponds to the homomorphic evaluation of FiLIP with TGSW, and the key size refers to the FiLIP secret key encrypted

Transciphering, Using FiLIP and TFHE

57

with TGSW. One can notice that the key size is not a limiting factor since it is only used by the party doing the homomorphic computations, and is reasonable for this context. Since FiLIP is a stream-cipher, its latency is the time per bit necessary to evaluate the decryption. For the DSM filters, FiLIP-1216 has the best latency: less than 2 s/bit for a transciphering of 128 bits security, FiLIP-1280 is slower but allows to use a smaller symmetric key. The XOR-Threshold filter has a competitive latency with the two DSM filters, which shows that other filters than DSM can be used for a similar efficiency. We give a comparison of the transciphering we implemented and former ones. The other works were all evaluated with a 2G scheme [5] on the HElib [24] library. We consider SE schemes of 128 bits security, using homomorphic parameters guaranteeing at least the same security. The transcipherings we compare correspond to the fastest variants: having the smallest ciphertexts allowing a correct decryption after transciphering. We summarize the results in Table 3. The security parameter in the table comes from the HElib library estimator for the first part of the schemes (source [29]), but the real security would be lower [1]. We did not find the security parameters for these timings of Kreyvium but we assume it is more than 128 bits, and the ones for our evaluations come from the estimation in Sect. 4.1. Table 2. Transciphering timings of FiLIP over TGSW, comparison with 2 DSM filters (FiLIP1280 and FiLIP-1216) and a XOR-threshold filter (FiLIP-144), with 2 sets of parameters for 128 bits security. Filter

Set 1 Set 2 Latency (s) Key size (Mo) Latency (s) Key size (Mo)

FiLIP-1280 2.2

200

22.7

800

FiLIP-1216 1.9

800

20.0

2680

FiLIP-144

800





2.5

Our transciphering has a better latency than all the existing implementations with a neat gain: homomorphic FiLIP-1216 decryption is 10 times faster than previous transcipherings. It shows that 3G implementation can lead to a very competitive latency. However, since TGSW does not support message batching yet, it has a way lower throughput than other transcipherings implemented on HElib, which are able to take advantage of homomorphic batching techniques to decrypt hundreds of bits at once, thus compensating the latency. Indeed, homomorphic batching consists in encrypting up to N plaintexts into a single homomorphic ciphertext. After batching encryption, the resulting ciphertext allows only homomorphic evaluation in an SIMD fashion, i.e. homomorphic operations perform the corresponding componentwise operations over the plaintext vector. As a consequence, transcipherings have a better bit-rate when batching is supported, but using it limits homomorphic evaluations to vectorized operations and rotations via the Frobenius endomorphism.

58

C. Hoffmann et al.

Table 3. Performances for minimal FHE parameters, 128 bits security. The latency refers to the time in seconds required to obtained the first homomorphic ciphertext, λ is the estimated security parameter of the HE scheme, and Source refers to the article presenting these timings.

4.4

Cipher

Latency (s) λ

LowMCv2(14, 63, 256) FLIP-1394 Agrasta (129, 4) Rasta (525, 5) Rasta (351, 6) FiLIP-1216 FiLIP-1280

1629.3 26.53 20.26 277.24 195.40 24.37 26.59

Kreyvium-12

1547.0

FiLIP-1216 FiLIP-144

1.9 2.5

Source

132.1 [29] 146.8 134.7 128.9 128.9 186.3 146.8 [9] 157 157

This work

Different Homomorphic Evaluations of FiLIP

We implement two other evaluations of FiLIP with 3G schemes. First we use the gatebootstrapping of TFHE. In this scheme since the noise in not important anymore due to the bootstrapping at each computation the efficiency of a computation can be inferred by the number of gates in its circuit. The other scheme we implement is inspired by the automata circuit in [11]. The idea is to start the FiLIP evaluation with TGSW ciphertexts and a set of parameters such that Bg = 2, and switch to TLWE ciphertexts when evaluating the filter. Indeed, the two evaluations of Sect. 3.2 are possible with at most one of the operands being a TGSW at each step of the evaluation. The main idea is to extract a TLWE ciphertext at the beginning of the filter evaluation, and perform external product to get the resulting ciphertext. The transciphering outputs a TLWE ciphers, which means we have a simply homomorphic scheme, we cannot perform products between two such ciphers. This method guarantees good timings since it reduces the number of external products performed, but a bootstrapping to a leveled scheme is necessary for further computations. We compare the timings of these different approaches, and the evaluation of FiLIP with HElib in Table 4. Evaluating FiLIP-1280 with this scheme gives cipher with the same noise as in the TGSW scheme with the second set of parameters. This scheme allows us to compute transciphering with our best timings: 1018 ms/b. It is around twice better than with the TGSW but does not allow as many operations, a bootstrapping could be used to compensate it but the final timing would be less competitive. The timing we present for the FHE scheme provided by the TFHE library [14] comes from an implementation we wrote to verify that our leveled scheme is faster in practice. We measured that the average time to compute a gate with bootstrapping on our computer is about 20 ms instead of the 13 ms in [13], which is coherent with the total evaluation time. This approach gives a similar latency as the evaluation on HElib, therefore we can conclude from these timings that the TGSW scheme alone or with TLWE are more

Transciphering, Using FiLIP and TFHE

59

efficient to transcipher FiLIP cipher with a DSM filter. However, using TFHE gives us fully homomorphic ciphers on which we can perform any operation. Table 4. Transciphering timings of FiLIP over different HE schemes. HE scheme

FiLIP-1280 FiLIP-1216

TGSW 2.2 TGSW/TLWE 1.2 25.8 TFHE

1.9 1.0 22.7

BGV(HElib)

24.4

26.7

5 Conclusion In this paper, we presented the first implementation of transciphering with a third generation homomorphic scheme, and give a transciphering with the smallest latency so far: less than 2 s. We have also implemented for the first time the XOR-threshold filter variant of FiLIP, showing competitive timings with the DSM variant. We compared FiLIP evaluation with three different HE schemes which showed that the TGSW scheme was more appropriate in this case. Despite a large improvement for the latency, the throughput is not competitive with transciphers implemented on HElib which compensate by the number of bits batched in each ciphertext. A possible improvement would be to batch messages by encoding them in all the coefficients of our message polynomial. This would mean we could store N bits of message in a ciphertext instead of 1, which would be a huge gain. In order to implement batching, one could look on the strategies developed in [12] that describes different batching methods for a leveled HE, and try to adapt it for Boolean functions such as the filters we consider. Acknowledgments. This work has been funded in part by the European Union PROMETHEUS project (Horizon 2020 Research and Innovation Program, grant 780701), and by the French RISQ project (BPI-France, grant P141580). This work has been funded in part by the European Union (EU) and the Walloon Region through the FEDER project USERMedia (convention number 501907-379156). This work has been funded in part by the European Union (EU) through the ERC project 724725 (acronym SWORD). Pierrick M´eaux is funded by a F.R.S. Incoming PostDoc Fellowship.

References 1. Albrecht, M.R.: On dual lattice attacks against small-secret LWE and parameter choices in HElib and SEAL. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017. LNCS, vol. 10211, pp. 103–129. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56614-6 4 2. Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. Cryptology ePrint Archive, Report 2015/046 (2015)

60

C. Hoffmann et al.

3. Albrecht, M.R., Rechberger, C., Schneider, T., Tiessen, T., Zohner, M.: Ciphers for MPC and FHE. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 430–454. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46800-5 17 4. Albrecht, M.R., Rechberger, C., Schneider, T., Tiessen, T., Zohner, M.: Ciphers for MPC and FHE. IACR Crypt. ePrint Arch. 687 (2016) 5. Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. In: Goldwasser, S. (ed.) ITCS 2012, pp. 309–325. ACM, January 2012 6. Brakerski, Z., Vaikuntanathan, V.: Efficient fully homomorphic encryption from (standard) LWE. In: Ostrovsky, R. (ed.) 52nd FOCS, pp. 97–106. IEEE Computer Society Press, October 2011 7. Brakerski, Z., Vaikuntanathan, V.: Lattice-based FHE as secure as PKE. In: Naor, M. (ed.) ITCS 2014, pp. 1–12. ACM, January 2014 8. Canteaut, A., et al.: Stream ciphers: a practical solution for efficient homomorphic-ciphertext compression. In: Peyrin, T. (ed.) FSE 2016. LNCS, vol. 9783, pp. 313–333. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-52993-5 16 9. Canteaut, A., et al.: Stream ciphers: a practical solution for efficient homomorphic-ciphertext compression. J. Cryptol. 31, 885–916 (2018) 10. Chen, H., Laine, K., Player, R.: Simple encrypted arithmetic library - seal v2.1. Cryptology ePrint Archive, Report 2017/224 (2017) 11. Chillotti, I., Gama, N., Georgieva, M., Izabach`ene, M.: Faster fully homomorphic encryption: bootstrapping in less than 0.1 seconds. In: Cheon, J.H., Takagi, T. (eds.) ASIACRYPT 2016. LNCS, vol. 10031, pp. 3–33. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3662-53887-6 1 12. Chillotti, I., Gama, N., Georgieva, M., Izabach`ene, M.: Faster packed homomorphic operations and efficient circuit bootstrapping for TFHE. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10624, pp. 377–408. Springer, Cham (2017). https://doi.org/10. 1007/978-3-319-70694-8 14 13. Chillotti, I., Gama, N., Georgieva, M., Izabach`ene, M.: TFHE: fast fully homomorphic encryption over the torus. J. Cryptol. 33, 34–91 (2020). https://doi.org/10.1007/s00145-01909319-x 14. Chillotti, I., Gama, N., Georgieva, M., Izabach`ene, M.: TFHE: fast fully homomorphic encryption library, August 2016. https://tfhe.github.io/tfhe/ 15. Coron, J.-S., Lepoint, T., Tibouchi, M.: Scale-invariant fully homomorphic encryption over the integers. In: Krawczyk, H. (ed.) PKC 2014. LNCS, vol. 8383, pp. 311–328. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54631-0 18 16. Dobraunig, C., et al.: Rasta: a cipher with low ANDdepth and few ANDs per bit. In: Shacham, H., Boldyreva, A. (eds.) CRYPTO 2018. LNCS, vol. 10991, pp. 662–692. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96884-1 22 17. Dor¨oz, Y., Shahverdi, A., Eisenbarth, T., Sunar, B.: Toward practical homomorphic evaluation of block ciphers using prince. In: B¨ohme, R., Brenner, M., Moore, T., Smith, M. (eds.) FC 2014. LNCS, vol. 8438, pp. 208–220. Springer, Heidelberg (2014). https://doi.org/10. 1007/978-3-662-44774-1 17 18. Ducas, L., Micciancio, D.: FHEW: bootstrapping homomorphic encryption in less than a second. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 617– 640. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46800-5 24 19. D’Anvers, J.-P., Guo, Q., Johansson, T., Nilsson, A., Vercauteren, F., Verbauwhede, I.: Decryption failure attacks on IND-CCA secure lattice-based schemes. In: Lin, D., Sako, K. (eds.) PKC 2019. LNCS, vol. 11443, pp. 565–598. Springer, Cham (2019). https://doi. org/10.1007/978-3-030-17259-6 19 20. Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive, Report 2012/144 (2012)

Transciphering, Using FiLIP and TFHE

61

21. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Mitzenmacher, M. (ed.) 41st ACM STOC, pp. 169–178. ACM Press, May/June 2009 22. Gentry, C., Halevi, S., Smart, N.P.: Homomorphic evaluation of the AES circuit. In: SafaviNaini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 850–867. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32009-5 49 23. Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8042, pp. 75–92. Springer, Heidelberg (2013). https:// doi.org/10.1007/978-3-642-40041-4 5 24. Halevi, S., Shoup, V.: Algorithms in HElib. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8616, pp. 554–571. Springer, Heidelberg (2014). https://doi.org/10.1007/ 978-3-662-44371-2 31 25. Lepoint, T., Naehrig, M.: A comparison of the homomorphic encryption schemes FV and YASHE. In: Pointcheval, D., Vergnaud, D. (eds.) AFRICACRYPT 2014. LNCS, vol. 8469, pp. 318–335. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06734-6 20 26. L´opez-Alt, A., Tromer, E., Vaikuntanathan, V.: On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Karloff, H.J., Pitassi, T. (eds.) 44th ACM STOC, ACM Press, May 2012 27. M´eaux, P.: On the fast algebraic immunity of majority functions. In: Schwabe, P., Th´eriault, N. (eds.) LATINCRYPT 2019. LNCS, vol. 11774, pp. 86–105. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30530-7 5 28. M´eaux, P., Carlet, C., Journault, A., Standaert, F.: Improved filter permutators: combining symmetric encryption design, Boolean functions, low complexity cryptography, and homomorphic encryption, for private delegation of computations. Cryptology ePrint Archive, Report 2019/483 (2019) 29. M´eaux, P., Carlet, C., Journault, A., Standaert, F.-X.: Improved filter permutators for efficient FHE: better instances and implementations. In: Hao, F., Ruj, S., Sen Gupta, S. (eds.) INDOCRYPT 2019. LNCS, vol. 11898, pp. 68–91. Springer, Cham (2019). https://doi.org/ 10.1007/978-3-030-35423-7 4 30. M´eaux, P., Journault, A., Standaert, F.-X., Carlet, C.: Towards stream ciphers for efficient FHE with low-noise ciphertexts. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 311–343. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3662-49890-3 13 31. Naehrig, M., Lauter, K.E., Vaikuntanathan, V.: Can homomorphic encryption be practical? In: Proceedings of the 3rd ACM Cloud Computing Security Workshop, CCSW 2011, Chicago, IL, USA, 21 October 2011, pp. 113–124 (2011) 32. Wang, Q., Carlet, C., Stˇanicˇa, P., Tan, C.H.: Cryptographic properties of the hidden weighted bit function. Discret. Appl. Math. 174, 1–10 (2014)

Encrypted Key-Value Stores Archita Agarwal(B) and Seny Kamara Brown University, Providence, USA {archita agarwal,seny kamara}@brown.edu Abstract. Distributed key-value stores (KVS) are distributed databases that enable fast access to data distributed across a network of nodes. Prominent examples include Amazon’s Dynamo, Facebook’s Cassandra, Google’s BigTable and LinkedIn’s Voldemort. The design of secure and private key-value stores is an important problem because these systems are being used to store an increasing amount of sensitive data. Encrypting data at rest and decrypting it before use, however, is not enough because each decryption exposes the data and increases its likelihood of being stolen. End-to-end encryption, where data is kept encrypted at all times, is the best way to ensure data confidentiality. In this work, we study end-to-end encryption in distributed KVSs. We introduce the notion of an encrypted KVS and provide formal security definitions that capture the properties one would desire from such a system. We propose and analyze a concrete encrypted KVS construction which can be based on any unencrypted KVS. We first show that this construction leaks at most the operation equality (i.e., if and when two unknown queries are for the same search key) which is standard for similar schemes in the non-distributed setting. However, we also show that if the underlying KVS satisfies read your writes consistency, then the construction only leaks the operation equality of search keys that are handled by adversarially corrupted nodes—effectively showing that a certain level of consistency can improve the security of a system. In addition to providing the first formally analyzed end-to-end encrypted key-value store, our work identifies and leverages new and interesting connections between distributed systems and cryptography.

1

Introduction

A distributed key-value store (KVS) is a distributed storage system that stores label/value1 pairs and supports get and put queries. KVSs provide one of the simplest data models but have become fundamental to modern systems due to their high performance, scalability and availability. For example, some of the largest social networks, e-commerce websites, cloud services and community forums depend on key-value stores for their storage needs. Prominent examples of distributed KVSs include Amazon’s Dynamo [25], Facebook’s Cassandra [43], Google’s BigTable [22], LinkedIn’s Voldemort [54], Redis [5], MemcacheDB [4] and Riak [55]. 1

In this work we use the term label and reserve the term key to denote cryptographic keys.

c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 62–85, 2020. https://doi.org/10.1007/978-3-030-65277-7_4

Encrypted Key-Value Stores

63

Distributed KVSs are closely related to distributed hash tables (DHT) and, in fact, most are built on top of a DHT. However, since DHTs do not necessarily guarantee fault-tolerance, KVSs use various techniques to achieve availability in the face of node failures. The simplest approach is to replicate each label/value pair on multiple nodes and to use a replica control protocol to guarantee some form of consistency. End-to-End Encryption in KVSs. As an increasing amount of data is being stored and managed by KVSs, their security has become an important problem. Encryption is often proposed as a solution, but encrypting data in transit and at rest and decrypting it before use is not enough since each decryption exposes the data and increases its likelihood of being stolen. A better way to protect data is to use end-to-end encryption where a data owner encrypts its data with its own secret key (that is never shared). End-to-end encryption guarantees that data is encrypted at all times—even in use—which ensures data confidentiality. Our Contributions. In this work, we formally study the use of end-to-end encryption in KVSs. In particular, we extend the recently proposed framework of Agarwal and Kamara [7] from DHTs to KVSs. We formalize the goals of encryption in KVSs by introducing the notion of an encrypted key-value store (EKVS) and propose formal syntax and security definitions for these objects. The simplest way to design an EKVS is to store label/value pairs (, v) as (FK1 (), EncK2 (v)) in a standard/plaintext KVS, where F is a pseudo-random function and Enc is a symmetric encryption scheme. The underlying KVS will then replicate the encrypted pair, store the replicas on different storage nodes, handle routing, node failures and consistency. Throughout, we will refer to this approach as the standard scheme and we will use our framework to formally study its security properties. We make the following contributions: – formalizing KVSs: we provide an abstraction of KVSs that enables us to isolate and analyze several important properties of standard/plaintext KVSs that impact the security of the standard EKVS. More precisely, we find that the way a KVS distributes its data and the extent to which it load balances have a direct effect on what information an adversary can infer about a client’s queries. – distributed leakage analysis: an EKVS can be viewed as a distributed version of an encrypted dictionary which is a fundamental building block in the design of sub-linear encrypted search algorithms (ESA). All sub-linear ESAs leak some information—whether they are built from property-preserving encryption, structured encryption or oblivious RAMs—so our goal is to identify and prove the leakage profile of the standard scheme. Leakage analysis in the distributed setting is particularly challenging because the underlying distributed system (in our case the underlying KVS) can create very subtle correlations between encrypted data items and queries. As we will see, replication makes this even more challenging. We consider two cases: the single-user case where the EKVS stores the datasets of multiple clients but each dataset can only be read and updated by its owner; and the multi-user case where each dataset can be read and updated by multiple users.

64

A. Agarwal and S. Kamara

– leakage in the multi-user case: We show that in the multi-user setting, the standard scheme leaks the operation equality (i.e., if and when get and put operations are for the same label) over all operations; even operations that are not handled by corrupted nodes.2 This may seem surprising since it is not clear a-priori why an adversary would learn anything about data that it never “sees”. – leakage in the single-user case: In the single-user scenario, we show that, if the standard scheme’s underlying KVS achieves read your write (RYW) consistency, then it only leaks the operation equality over operations that are handled by corrupted nodes. This is particularly interesting as it suggests that stronger consistency guarantees improve the security of end-to-end encrypted KVSs. – comparison with DHTs: As mentioned earlier, the main difference between a DHT and a KVS is that the latter replicate data on multiple nodes. To ensure a consistent view of this data, KVSs need to implement some consistency model. Achieving strong consistency, however, is very costly so almost all practical systems achieve weaker notions which cannot guarantee that a unique value will always be associated to a given label. In particular, the value that will be returned will depend on factors such as network delay, synchronization policy and the ordering of concurrent operations. Therefore, an adversary that controls one or more of these factors can affect the outputs of a KVS. It therefore becomes crucial to understand and analyze this correlation when considering the security of an encrypted KVS. In contrast, this is not needed in the case of encrypted DHTs since they do not maintain replicas and hence consistency is not an issue. – concrete instantiations: We use our framework to study two concrete instantiations of the standard scheme. The first uses a KVS based on consistenthashing with zero-hop routing whereas the second uses a KVS based on consistent hashing with multi-hop routing.

2

Related Work

Key-Value Stores. NoSQL databases were developed as an alternative to relational databases to satisfy the performance and scalability requirements of large Internet companies. KVSs are the simplest kind of NoSQL databases. Even though such databases had already existed, they gained popularity when Amazon developed Dynamo [25], a KVS for its internal use. Since then many KVSs have been developed both in industry and academia. Most prominent ones are Facebook’s Cassandra [43], Google’s BigTable [22], LinkedIn’s Voldemort [54], Redis [5], MemcacheDB [4] and Riak [55]. All of them are eventually consistent but some of them can be tuned to provide strong consistency [5,43,55]. There have also been efforts to develop KVSs with stronger consistency such as causal consistency [11,44,45,56], and strong consistency [1–3,6]. 2

Note that the operation equality is a common leakage pattern in practical ESAs.

Encrypted Key-Value Stores

65

Encrypted Search. An encrypted search algorithm (ESA) is a search algorithm that operates on encrypted data. ESAs can be built from various cryptographic primitives including oblivious RAM (ORAM) [33], fully-homomorphic encryption (FHE) [31], property-preserving encryption (PPE) [8,12,14,46,50] and structured encryption (STE) [23] which is a generalization of searchable symmetric encryption [51]. Each of these approaches achieves different tradeoffs between efficiency, expressiveness and security/leakage. For large datasets, structured encryption seems to provide the best tradeoffs between these three dimensions: achieving sub-linear (and even optimal) search times and rich queries while leaking considerably less than PPE-based solutions and either the same as [39] or slightly more than ORAM-based solutions. Various aspects of STE have been extensively studied in the cryptographic literature including dynamism [15,16,19,28,32,41,42,52], locality [9,10,21,26,27], expressiveness [20,23,29,30,36,37,40,48,49,57] and leakage [13,18,34,38,39]. Encrypted keyvalue stores can be viewed as a form of distributed STE scheme. Such schemes were first considered by Agarwal and Kamara in [7] where they studied encrypted distributed hash tables.

3

Preliminaries

Notation. The set of all binary strings of length n is denoted as {0, 1}n , and the set of all finite binary strings as {0, 1}∗ . [n] is the set of integers {1, . . . , n}, and 2[n] is the corresponding power set. We write x ← χ to represent an element x $ being sampled from a distribution χ, and x ← X to represent an element x being sampled uniformly at random from a set X. The output x of an algorithm A is denoted by x ← A. If S is a set then |S| refers to its cardinality. If s is a string then |s| refers to its bit length. We denote by Ber(p) the Bernoulli distribution with parameter p. Dictionaries. A dictionary structure DX of capacity n holds a collection of n label/value pairs {(i , vi )}i≤n and supports get and put operations. We write vi := DX[i ] to denote getting the value associated with label i and DX[i ] := vi to denote the operation of associating the value vi in DX with label i . Leakage Profiles. Many cryptographic primitives and protocols leak information. Examples include encryption schemes, which reveal the length of the plaintext; secure multi-party computation protocols, which (necessarily) reveal about the parties’ inputs whatever can be inferred from the output(s); order-preserving encryption schemes, which reveal implicit and explicit bits of the plaintext; structured encryption schemes which reveal correlations between queries; and oblivious algorithms which reveal their runtime and the volume of data they read. Leakage-parameterized security definitions [23,24] extend the standard provable security paradigm used in cryptography by providing adversaries (and simulators) access to leakage over plaintext data. This leakage is formally and precisely captured by a leakage profile which can then be analyzed through cryptanalysis

66

A. Agarwal and S. Kamara

and further theoretical study. Leakage profiles can themselves be functions of one or several leakage patterns. Here, the only pattern we will consider is the operation equality which reveals if and when two (unknown) operations are for the same label. Consistency Guarantees. The consistency guarantee of a distributed system specifies the set of acceptable responses that a read operation can output. There are multiple consistency guarantees studied in the literature, including linearizability, sequential consistency, causal consistency and eventual consistency. Though strong consistency notions like linearizability are desirable, the Consistency, Availability, and Partition tolerance (CAP) Theorem states that strong consistency and availability cannot be achieved simultaneously in the presence of network partitions. Therefore, many practical systems settle for weaker consistency guarantees like sequential consistency, causal consistency and eventual consistency. We note that all these weaker consistency guarantees—with the exception of eventual consistency—all satisfy what is known as “Read Your Writes” (RYW) consistency which states that all the writes performed by a single client are visible to its subsequent reads. Many practical systems [1,2,5,44,45] guarantee RYW consistency.

4

Key-Value Stores

Here we extend the formal treatment of encrypted DHTs given by Agarwal and Kamara [7] to key-value stores. A key-value store is a distributed storage system that provides a key-value interface and that guarantees resiliency against node failures. It does so by replicating label/value pairs on multiple nodes. Similar to DHTs, there are two kinds of KVSs: perpetual and transient. Perpetual KVSs are composed of a fixed set of nodes that are all known at setup time. Transient KVSs, on the other hand, are designed for settings where nodes are not known a-priori and can join and leave at any time. Perpetual KVSs are suitable for “permissioned” settings like the backend infrastructure of large companies whereas transient KVSs are better suited to “permissionless” settings like peer-to-peer networks and permissionless blockchains. In this work, we study the security of perpetual KVSs. Perpetual KVSs. We formalize KVSs as a collection of six algorithms KVS = (Overlay, Alloc, FrontEnd, Daemon, Put, Get). The first three algorithms, Overlay, Alloc and FrontEnd are executed only once by the entity responsible for setting up the system. Overlay takes as input an integer n ≥ 1, and outputs a parameter ω from a space Ω. Alloc takes as input parameters ω, n and an integer ρ ≥ 1, and outputs a parameter ψ from a space Ψ . FrontEnd takes as input parameters ω and n and outputs a parameter φ from space Φ. Intuitively, the parameter φ will be used to determine a front end node for each label. These front end nodes will serve as the clients’ entry points in the network whenever they need

Encrypted Key-Value Stores

67

perform an operation on a label. We refer to ω, ψ, φ as the KVS parameters and represent them by Γ = (ω, ψ, φ). Each KVS has an address space A and the KVS parameters in Γ define different components of the KVS over this address space. For example, ω maps node names to addresses in A, ψ maps labels to addresses in A, φ determines the address of a front-end node (or starting node). The fourth algorithm, Daemon, takes Γ and n as input and is executed by every node in the network. Daemon is halted only when a node wishes to leave the network and it is responsible for setting up its calling node’s state for routing messages and for storing and retrieving label/value pairs from the node’s local storage. The fifth algorithm, Put, is executed by a client to store a label/value pair on the network. Put takes as input Γ and a label/value pair  and v, and outputs nothing. The sixth algorithm, Get, is executed by a client to retrieve the value associated to a given label from the network. Get takes as input Γ and a label  and outputs a value v. Since all KVS algorithms take Γ as input we sometimes omit it for visual clarity. Abstracting KVSs. To instantiate a KVS, the parameters ω and ψ must be chosen together with a subset C ⊆ N of active nodes (i.e., the nodes currently in the network) and an active set of labels K ⊆ L (i.e., the labels stored in the KVS). Once a KVS is instantiated, we describe KVSs using a tuple of function families (addr, replicas, route, fe) that are all parameterized by a subset of parameters in Γ . These functions are defined as addrω : N → A

replicasω,ψ : L → 2A

routeω : A × A → 2A ,

feφ : L → A

where addrω maps node names from a name space N to addresses from an address space A, replicasω,ψ maps labels from a label space L to the set of addresses of ρ nodes that store it, routeω maps two addresses to the addresses of the nodes on the route between them, and feφ maps labels to node addresses who forward client requests to the rest of the network.3 For visual clarity we abuse notation and represent the path between two addresses by a set of addresses instead of as a sequence of addresses, but we stress that paths are sequences. Given an address a and set of addresses S, we also sometimes write routeω (a, S) to represent the set of routes from a to all the addresses in S. Note that this is an abstract representation of a KVS that will be particularly useful to us to define random variables we need for our probabilistic analysis but, in practice, the overlay network, including its addressing and routing functions, are implemented by the Daemon algorithm. We sometimes refer to a pair (ω, C) as an overlay and to a pair (ψ, K) as an allocation. Abstractly speaking, we can think of an overlay as an assignment from active nodes to addresses and of an allocation as an assignment of active

3

For KVSs that allow their clients to connect directly to the replicas and do not use front end nodes, the abstraction can drop the fe mapping and be adjusted in the natural way.

68

A. Agarwal and S. Kamara

labels to addresses. In this sense, overlays and allocations are determined by a pair (ω, C) and (ψ, K), respectively.4 Visible Addresses. As in [7], a very useful notion for our purposes will be that of visible addresses. For a fixed overlay (ω, C) and a fixed replication parameter ρ, an address a ∈ A is s-visible to a node N ∈ C if there exists a label  ∈ L such that if ψ allocates  to a, then either: (1) addrω (N ) ∈ replicasω,ψ (); or (2) addrω (N ) ∈ routeω (s, replicasω,ψ ()). The intuition behind this is that if a label  is mapped to an address in Vis(s, N ) then N either stores the label  or routes it when the operation for  starts at address s. We point out that the visibility of a node changes as we change the starting address s. For example, the node maybe present on the path to one of the addresses if s was the starting address but not on the path if some other address s was the starting address. Throughout we assume the set of visible addresses to be efficiently computable. Since the set of s-visible addresses depends on parameters ω and ρ, and the set C of nodes that are currently active, we subscript Visω,C,ρ (s, N ) with all these parameters. Finally, as in [7], we also extend the notion to the set of svisible addresses Visω,C,ρ (s, S) for a set of nodes S ⊆ C which is defined simply as Visω,C,ρ (s, S) = ∪N ∈S Visω,C,ρ (s, N ). Again, for visual clarity, we will drop the subscripts wherever they are clear from the context. Front-End Distribution. As in [7], another important notion in our analysis is that of a label’s front-end distribution which is the probability distribution that governs the address of an operation’s “entry point” into the KVS network. It is captured by the random variable feφ (), where φ is sampled by the algorithm FrontEnd. In this work we assume front-end distributions to be label-independent in the sense that every label’s front-end node distribution is the same. We therefore simply refer to this distribution as the KVS’s front-end distribution. Allocation Distribution. The next notion important to our analysis is what we refer to as a label’s allocation distribution which is the probability distribution that governs the address at which a label is allocated. More precisely, this is captured by the random variable ψ(), where ψ is sampled by the algorithm Alloc. In this work, we assume allocation distributions are label-independent in the sense that every label’s allocation distribution is the same. We refer to this distribution as the KVS’s allocation distribution.5 Given a KVS’s allocation distribution, we also consider a distribution Δ(S) that is parameterized by a set of addresses S ⊆ A. This distribution is over S and has probability mass function Pr [ ψ() = a ] fψ (a) = , f (a) Pr [ ψ() ∈ S ] ψ a∈S

fΔ(S) (a) =  4

5

Note that for simplicity, we assume that ψ maps labels to a single address. This however can be extended in a straightforward way where ψ maps a label to multiple addresses. This would be required to model KVSs where replicas of a label are independent of each other. This is true for every KVS we are aware of [25, 43, 54, 55].

Encrypted Key-Value Stores

69

where fψ is the probability mass function of the KVS’s allocation distribution. Non-committing Allocations. As we will see in Sect. 6, our EKVS construction can be based on any KVS but the security of the resulting scheme will depend on certain properties of the underlying KVS. We describe these properties here. The first property that we require of a KVS is that the allocations it produces be non-committing in the sense that it supports a form of equivocation. More precisely, for some fixed overlay (ω, C) and allocation (ψ, K), there should exist some efficient mechanism to arbitrarily change/program ψ. In other words, there should exist a polynomial-time algorithm Program such that, for all (ω, C) and (ψ, K), given a label  ∈ L and an address a ∈ A, Program(, a) modifies the KVS so that ψ() = a. For the special case of consistent hashing based KVSs, which we study in Sect. 7, this can be achieved by modeling one of its hash functions as a random oracle. Balanced Overlays. The second property is related to how well the KVS load balances the label/value pairs it stores. While load balancing is clearly important for storage efficiency we will see, perhaps surprisingly, that it also has an impact on security. Intuitively, we say that an overlay (ω, C) is balanced if for all labels , that any set of θ nodes sees  is not too large. Definition 1 (Balanced overlays). Let ω ∈ Ω be an overlay parameter, C ⊆ N be a set of active nodes, and ρ ≥ 1 be a replication parameter. We say that an overlay (ω, C) is (ε, θ)-balanced if for all  ∈ L, and for all S ⊆ C with |S| = θ,   Pr replicasω,ψ () ∩ Visω,C,ρ (feφ (), S) = ∅ ≤ ε, where the probability is over the coins of Alloc and FrontEnd, and where ε can depend on θ. Definition 2 (Balanced KVS). We say that a key-value store KVS = (Overlay, Alloc, FrontEnd, Daemon, Put, Get) is (ε, δ, θ)-balanced if for all C ⊆ N, the probability that an overlay (ω, C) is (ε, θ)-balanced is at least 1 − δ over the coins of Overlay and where ε and δ can depend on C and θ.

5

Encrypted Key-Value Stores

In this Section, we formally define encrypted key-value stores. An EKVS is an end-to-end encrypted distributed system that instantiates a replicated dictionary data structure. 5.1

Syntax and Security Definitions

Syntax. We formalize EKVSs as a collection of seven algorithms EKVS = (Gen, Overlay, Alloc, FrontEnd, Daemon, Put, Get). The first algorithm Gen is executed by a client and takes as input a security parameter 1k and outputs a secret

70

A. Agarwal and S. Kamara

L Fig. 1. FKVS : The KVS functionality parameterized with leakage function L.

key K. All the other algorithms have the same syntax as before (See Sect. 4), with the difference that Get and Put also take the secret key K as input. Security. The definition is roughly the same as the one in [7] and is based on the real/ideal-world paradigm. This approach consists of defining two probabilistic experiments Real and Ideal where the former represents a real-world execution of the protocol where the parties are in the presence of an adversary, and the latter represents an ideal-world execution where the parties interact with a trusted functionality shown in Fig. 1. The protocol is secure if no environment can distinguish between the outputs of these two experiments. To capture the fact that a protocol could leak information to the adversary, we parameterize the definition with a leakage profile that consists of a leakage function L that captures the information leaked by the Put and Get operations. Our motivation for making the leakage explicit is to highlight its presence. Due to space constraints, we detail both the experiments more formally in the full version of the paper. Definition 3 (L-security). We say that an encrypted key-value store EKVS = (Gen, Overlay, Alloc, FrontEnd, Daemon, Put, Get) is L-secure, if for all ppt adversaries A and all ppt environments Z, there exists a ppt simulator Sim such that for all z ∈ {0, 1}∗ , | Pr[RealA,Z (k) = 1] − Pr[IdealSim,Z (k) = 1]| ≤ negl(k). Correctness. In the real/ideal-world paradigm, the security of a protocol is tied to its correctness. It is therefore important that our ideal functionality capture the correctness of the KVS as well. What this means is that the functionality should produce outputs that follow the same distribution as the outputs from a KVS. Unfortunately, in a setting with multiple clients sharing the data, even with the strongest consistency guarantees (e.g., linearizability), there are multiple possible responses for a read, and the one which the KVS actually outputs depends on behaviour of the network. Since the network behaviour is non-deterministic, the distribution over the possible outputs is also non-deterministic and hence

Encrypted Key-Value Stores

71

the functionality cannot model the distribution over outputs correctly without modelling the network inside it. However, if we restrict to a single client setting, RYW property ensures that a Get always outputs the latest value written to the KVS. Therefore the functionality FKVS models the correct distribution over the outputs: on a Get(), it outputs the last value written to DX[], and on a Put(, v), it updates the DX[] to v.

6

The Standard EKVS Scheme in the Single-User Setting

We now describe the standard approach to storing sensitive data on a KVS. This approach relies on simple cryptographic primitives and a non-committing and balanced KVS. Overview. The scheme EKVS = (Gen, Overlay, Alloc, FrontEnd, Daemon, Put, Get) is described in detail in Fig. 2 and, at a high level, works as follows. It makes black-box use of a key-value store KVS = (Overlay, Alloc, FrontEnd, Daemon, Put, Get), a pseudo-random function F and a symmetric-key encryption scheme SKE = (Gen, Enc, Dec). The Gen algorithm takes as input a security parameter 1k and uses it to generate a key K1 for the pseudo-random function F and a key K2 for the symmetric encryption scheme SKE. It then outputs a key K = (K1 , K2 ). The Overlay, Alloc, FrontEnd and Daemon algorithms respectively execute KVS.Overlay, KVS.Alloc, KVS.FrontEnd and KVS.Daemon to generate and output the parameters ω, ψ and φ. The Put algorithm takes as input the secret key K and a label/value pair (, v). It first computes t := FK1 () and e ← Enc(K2 , v) and then executes KVS.Put(t, e). The Get algorithm takes as input the secret key K and a label . It computes t := FK1 () and executes e ← KVS.Get(t). It then outputs SKE.Dec(K, e). Security. We now describe the leakage of EKVS. Intuitively, it reveals to the adversary the times at which a label is stored or retrieved with some probability. More formally, it is defined with the following stateful leakage function – Lε (DX, (op, , v)): 1. if  has never been seen (a) sample and store b ← Ber(ε) 2. if b = 1 (a) if op = put output (put, opeq()) (b) else if op = get output (get, opeq()) 3. else if b = 0 (a) output ⊥ where opeq is the operation equality pattern which reveals if and when a label was queried or put in the past. Discussion. We now explain why the leakage function is probabilistic and why it depends on the balance of the underlying KVS. Intuitively, one expects that the

72

A. Agarwal and S. Kamara

Fig. 2. The standard EKVS scheme

adversary’s view is only affected by get and put operations on labels that are either: (1) allocated to a corrupted node; or (2) allocated to an uncorrupted node whose path includes a corrupted node. In such a case, the adversary’s view would not be affected by all operations but only a subset of them. Our leakage function captures this intuition precisely and it is probabilistic because, in the real world, the subset of operations that affect the adversary’s view is determined by the choice of overlay, allocation and front-end function—all of which are chosen at random. The way this is handled in the leakage function is by sampling a bit b with some probability and revealing leakage on the current operation if b = 1. This determines the subset of operations whose leakage will be visible to the adversary. Now, for the simulation to go through, the operations simulated by the simulator need to be visible to the adversary with the same probability as in the real execution. But these probabilities depend on ω, ψ and φ which are not known to the leakage function. Note that this implies a rather strong definition in the sense that the scheme hides information about the overlay, the allocation and front-end function of the KVS.

Encrypted Key-Value Stores

73

Since ω, ψ and φ are unknown to the leakage function, the leakage function can only guess as to what they could be. But because the KVS is guaranteed to be (ε, δ, θ)-balanced, the leakage function can assume that, with probability at least 1 − δ, the overlay will be (ε, θ)-balanced which, in turn, guarantees that the probability that a label is visible to any adversary with at most θ corruptions is at most ε. Therefore, in our leakage function, we can set the probability that b = 1 to be ε in the hope that simulator can “adjust” the probability internally to be in accordance to the ω that it sampled. Note that the simulator can adjust the probability only if for its own chosen ω, the probability that a query is visible to the adversary is less than ε. But this will happen with probability at least 1 − δ so the simulation will work with probability at least 1 − δ. We are now ready to state our main security Theorem whose proof is in the full version of the paper. Theorem 1. If |I| ≤ θ and if KVS is RYW consistent, (ε, δ, θ)-balanced, has non-committing allocations and has label-independent allocation and front-end distributions, then EKVS is Lε -secure with probability at least 1 − δ − negl(k). Efficiency. The standard scheme does not add any overhead to time, round, communication and storage complexities of the underlying KVS.

7

A Concrete Instantiation Based on Consistent Hashing

In this section, we analyze the security of the standard EKVS when its underlying KVS is instantiated with a consistent hashing based KVS (CH-KVS). We first give a brief overview of consistent hashing and then show that: (1) it has noncommitting allocations in the random oracle model; and (2) it is balanced under two commonly used routing protocols. Setting Up a CH-KVS. For CH-KVSs, the space Ω is the set of all hash functions H1 from N to A = {0, . . . , 2m − 1}. Overlay samples a hash function H1 uniformly at random from H1 and outputs ω = H1 . The map addrω is the hash function itself so CH-KVSs assign to each active node N ∈ C an address H1 (N ) in A. We call the set χC = {H1 (N1 ), . . . , H1 (Nn )} of addresses assigned to active nodes a configuration. The parameter space Ψ is the set of all hash functions H2 from L to A = {0, . . . , 2m − 1}. Alloc samples a hash function H2 uniformly at random from H2 and outputs ψ = H2 . The map replicasω,ψ maps every label  in L to the addresses of ρ active nodes that follow H2 () in clockwise direction. More formally, replicasω,ψ is the mapping (succχC ◦ H2 , . . . , succρχC ◦ H2 ), where succχC is the successor function that assigns each address in A to its least upper bound in χC . Here, {0, . . . , 2m − 1} is viewed as a “ring” in the sense that the successor of 2m−1 is 0. CH-KVSs allow their clients to choose any node as the front-end node to issue its operations. Moreover, they do not restrict them to connect to the same node feφ (), everytime the client wants to query the same . This means that for

74

A. Agarwal and S. Kamara

CH-KVSs, feφ is not necessarily a function but can be a one-to-many relation. Unfortunately we cannot prove CH-KVSs to be balanced for arbitrary feφ s. We therefore modify CH-KVSs and model their space Φ as the set of all hash functions H3 from L to addresses of active nodes. FrontEnd samples a hash function H3 uniformly at random from H3 and outputs φ = H3 . The map feφ is the hash function H3 itself so it assigns a front-end node with address H3 () to each label . Routing Protocols. There are two common routing protocols with CH-KVSs; each with trade-offs in storage and efficiency. – Multi-hop routing. Based on H1 , the Daemon algorithm constructs a routing table by storing the addresses of the node’s 2i th successors where 0 ≤ i ≤ log n (we refer the reader to [53] for more details). Note that a routing table contains at most log n other nodes. The routing protocol is fairly simple: given a message destined to a node Nd , a node N checks if N = Nd . If not, the node forwards the message to the node N  in its routing table with an address closest to Nd . Note that the routeω map is deterministic given a fixed set of active nodes and it guarantees that any two nodes have a path of length at most log n. – Zero-hop routing. Based on H1 , the Daemon algorithm constructs a routing table by storing the addresses of all the other nodes in the routing table. Routing is then straightforward: given a message for Nd , simply forward it to the address of Nd . In short, for any two addresses s and d, routeω (s, d) = {s, d}. Storing and Retrieving. When a client wants to execute a Get/Put operation on a label , it forwards the operation to the front-end node of . The frontend node executes the operation on the client’s behalf as follows. It computes replicas() and forwards the operation to one of them. This replica is called the coordinator node. The coordinator then sends the operation to all (or a subset) the other replicas which then either update their state (on Put) or return a response back to the coordinator (on Get). In case more than one value is returned to the coordinator, it decides which value(s) is to be returned to the front-end. The choice of the coordinator node for a label  varies from KVS to KVS. It can be a fixed node or a different node between requests for label . Either way, it is always a node chosen from the set of replicas. This guarantees that the visibility of a label (and hence the leakage) does not change between requests. KVSs also employ different synchronization mechanisms, like Merkle trees and read repairs to synchronize divergent replicas. Non-committing Allocation. Given a label  and an address a, the allocation (H2 , K) can be changed by programming the random oracle H2 to output a when it is queried on . Allocation Distribution. We now describe the allocation distribution of CHKVSs. Since CH-KVSs assign labels to addresses using a random oracle H2 , it follows that for all overlays (H1 , C), all labels  ∈ L and addresses a ∈ A,

Encrypted Key-Value Stores

fH2 (a) = Pr [ H2 () = a ] =

75

1 , |A|

which implies that CH-KVSs have label-independent allocations. From this it also follows that Δ(S) has a probability mass function fΔ(S) (a) = 

 −1 1 |S| fψ (a) 1 = . = f (a) |A| |A| |S| a∈S ψ

Before describing the visibility of nodes in CH-KVSs and analyzing their balance under zero-hop and multi-hop routing protocols, we define notation that will be useful in our analysis. Notation. The arc of a node N is the set of addresses in A between N ’s predecessor and itself. Note that the arc of a node depends on a configuration χ. More formally, we write arcχ (N ) = (predχ (H1 (N )), . . . , H1 (N )], where predχ (N ) is the predecessor function which assigns each address in A to its largest lower bound in χ. We extend the notion of arc of a node to ρ-arcs of a node. A ρ-arc of a node N is the set of addresses between N ’s ρth predecessor and itself. More formally, we write arcρχ (N ) = (predρχ (H1 (N )), . . . , H1 (N )], where predρχ (H1 (N )) represents the predecessor function applied ρ times on H1 (N ). Intuitively, if H2 hashes a label  anywhere in ρ-arc of N , then N becomes one of the ρ replicas of . We denote by maxareas(χ, x), the sum of the lengths (sizes) of x largest arcs in configuration χ. The maximum area of a configuration χ is equal to maxareas(χ, ρθ). As we will later see, the maximum area is central to analyzing the balance of CH-KVSs. 7.1

Zero-Hop CH-KVSs

In this section, we analyse the visibility and balance of zero-hop CH-KVSs. Visible Addresses. Given a fixed overlay (H1 , C), an address s ∈ A and a node N ∈ C, if the starting address is s = H1 (N ), then VisχC (s, N ) = A. This is because H1 (N ) lies on routeχC (s, a) for all a ∈ A. Now for an address s ∈ A such that s = H1 (N ), we have   ρ   arcρχC (N ) VisχC (s, N ) = arcχC (N ) : H1 (N ) ∈ routeχC (s, H1 (N ))   ρ   arcρχC (N ) = arcχC (N ) : H1 (N ) ∈ {s, H1 (N )}   ρ   arcρχC (N ) = arcχC (N ) : H1 (N ) = H1 (N ) = arcρχC (N ) where the second equality follows from the fact that routeχC (s, H1 (N  )) = {s, H1 (N  )}, the third follows from the assumption that H1 (N ) = s, and the

76

A. Agarwal and S. Kamara

fourth from the fact that arcρχC (N ) = arcρχC (N  ) if H1 (N ) = H1 (N  ). Finally, for any set S ⊆ C, Visω,C (s, S) = ∪N ∈S Visω,C (s, N ). Balance of Zero-Hop CH-KVSs. Before analyzing the balance of CH-KVSs, we first recall a Lemma from Agarwal and Kamara [7] that upper bounds the sum of the lengths of the x largest arcs in a configuration χ in Chord. The sum is denoted by maxareas(χ, x). Since Chord is also based on consistent hashing, we use the corollary to bound the maximum area of CH-KVSs by substituting x = ρθ. Lemma 1 [7]. Let C ⊆ N be a set of active nodes. Then, for x ≤ |C|/e,

√ |C| 6|A|x 1 log − (e− |C| · log |C|). ≥1− Pr maxareas(χC , x) ≤ 2 |C| x |C| We are now ready to analyze the balance of zero-hop CH-KVSs. Theorem 2. Let C ⊆ N be a set of active nodes. If maxareas(χC , ρθ) ≤ λ, then χC is (ε, θ)-balanced with ε=

λ θ + |C| |A|

The proof of Theorem 2 is in the full version of the paper. Corollary 1. Let C be a set of active nodes. For all ρθ ≤ |C|/e, a zero-hop CH-KVS is (ε, δ, θ)-balanced for    √ |C| θ 1 − |C| ε= + (e · log |C|) 1 + 6ρ log and δ = |C| ρθ |C|2 The proof of Corollary 1 is in the full version of the paper. Remark. It follows from Corollary 1 that    |C| ρθ log ε=O |C| ρθ and δ = O(1/|C|2 ). Note that assigning labels uniformly at random to ρ nodes would achieve ε = ρθ/|C| so zero-hop CH-KVSs balance data fairly well. The Security of a Zero-Hop CH-KVS Based EKVS. In the following Corollary, we formally state the security of the standard scheme when its underlying KVS is instantiated with a zero-hop CH-KVS. Corollary 2. If |L| = Θ(2k ), |I| ≤ |C|/(ρe), and if EKVS is instantiated with a 2 RYW √ zero-hop CH-KVS, then it is Lε -secure with probability at least 1−1/|C| − (e− |C| · log |C|) − negl(k) in the random oracle model, where    |C| |I| ε= 1 + 6ρ log . |C| ρ|I|

Encrypted Key-Value Stores

77

The proof of Corollary 2 is in the full version of the paper. From the discussion of Corollary 1, we know,    |C| ρ|I| log ε=O |C| ρ|I| and δ = O(1/|C|2 ). Setting |I| = |C|/(ρα), for some α ≥ e, we have ε = O(log(α)/α). Recall that, on each query, the leakage function leaks the operation equality with probability at most ε. So intuitively this means that the adversary can expect to learn the operation equality of an O(log(α)/α) fraction of client operations if ρ|I| = |C|/α. Note that this confirms the intuition that distributing data suppresses its leakage. 7.2

Multi-hop CH-KVSs

In this section, we analyse the visibility and balance of multi-hop CH-KVSs. Since most of the details are similar to what was in the last section, we keep the description high level. Visible Addresses. Given a fixed overlay (H1 , C), an address s ∈ A and a node N ∈ C, if the starting address is s = H1 (N ), then VisχC (s, N ) = A. For an address s ∈ A such that s = H1 (N ), we have   ρ   arcρχC (N ) VisχC (s, N ) = arcχC (N ) : H1 (N ) ∈ routeχC (s, H1 (N )) Finally, for any set S ⊆ C, Visω,C (s, S) = ∪N ∈S Visω,C (s, N ). Balance of Multi-hop CH-KVSs. We now analyze the balance of multi-hop CH-KVSs. Theorem 3. Let C ⊆ N be a set of active nodes. If maxareas(χC , ρθ) ≤ λ, then χC is (ε, θ)-balanced with ε=

λ ρθ log |C| + |C| |A|

The proof of Theorem 3 is in the full version of the paper. Corollary 3. Let C be a set of active nodes. For all ρθ ≤ |C|/(e log |C|), a multi-hop CH-KVS is (ε, δ, θ)-balanced for    √ |C| ρθ 1 − |C| ε= + (e · log |C|) log |C| + 6 log and δ = |C| ρθ |C|2 The Corollary follows directly from Corollary 1 and Theorem 3. Notice that multi-hop CH-KVSs are not only less balanced than zero-hop CH-KVSs but also tolerate a lesser number of corruptions. This is the case because in a multi-hop CH-KVS there is a higher chance that an adversary sees a label since the routes are larger.

78

A. Agarwal and S. Kamara

Remark. It follows from Corollary 3 that   ρθ log |C| ε=O |C| and δ = O(1/|C|2 ). As discussed earlier, the optimal balance is ε = ρθ/|C|, which is achieved when labels are assigned uniformly at random to ρ nodes. Note that balance of multi-hop CH-KVSs is only log |C| factor away from optimal balance which is very good given that the optimal balance is achieved with no routing at all. The Security of a Multi-hop CH-KVS Based EKVS. In the following Corollary, we formally state the security of the standard scheme when its underlying KVS is instantiated with a multi-hop CH-KVS. Corollary 4. If |L| = Θ(2k ), |I| ≤ |C|/(ρe log |C|), and if EKVS is instantiated with a RYW multi-hop CH-KVS, then it is Lε -secure with probability at least √ − |C| 2 · log |C|) − negl(k) in the random oracle model, where 1 − 1/|C| − (e    |C| ρ|I| ε= log |C| + 6 log . |C| ρ|I| From the discussion of Theorem 3, we know that,   ρ|I| log |C| ε=O |C| and δ = O(1/|C|2 ). Setting |I| = |C|/(ρα log |C|), for some α ≥ e, we have ε = O(1/α), which intuitively means that the adversary can expect to learn the operation equality of an O(1/α) fraction of client operations.

8

The Standard EKVS Scheme in the Multi-user Setting

We now analyze the security of the standard scheme in a more general setting, i.e., where we no longer require the underlying KVS to satisfy RYW and where we no longer assume that a single client operates on the data. We call this setting the multi-user setting where multiple clients operate on the same data concurrently. We start by extending our security definition to the multi-user setting and then analyze the security of the standard scheme (from Fig. 2) in this new setting. The Ideal Multi-user KVS Functionality. The ideal multi-user KVS funcL is described in Fig. 3. The functionality stores all the values tionality FmKVS that were ever written to a label. It also associates a time τ with every value indicating when the value was written. On a Get operation, it sends leakage to the simulator which returns a time τ  . The functionality then returns the value associated with τ  to the client. Notice that, unlike single-user ideal functionality L , the multi-user ideal functionality can be influenced by the simulator. FKVS

Encrypted Key-Value Stores

79

L Fig. 3. FmKVS : The ideal multi-user KVS functionality parameterized with leakage function L.

Security Definition. The real and ideal experiments are the same as in Sect. 5 with the following differences. First, the experiments are executed not with a single client but with c clients C1 . . . Cc ; second, the environment adaptively sends operations to all these clients; and third, the ideal functionality of Fig. 1 is replaced with the ideal functionality described in Fig. 3. 8.1

Security of the Standard Scheme

We now analyze the security of the standard scheme when its underlying KVS is instantiated with a KVS that does not necessarily satisfy RYW consistency. We start by describing its stateful leakage function. – L(DX, (op, , v)): 1. if op = put output (put, opeq()) 2. else if op = get output (get, opeq()) where opeq is the operation equality pattern which reveals if and when a label was queried or put in the past. Single-User vs. Multi-user Leakage. Notice that the leakage profile achieved in the multi-user setting is a function of all the labels whereas the leakage profile achieved in the single-user setting was only a function of the labels that were (exclusively) stored and routed by the corrupted nodes. In particular, this implies that the multi-user leakage is worse than the single-user leakage and equivalent to the leakage achieved by standard (non-distributed) schemes. In following, we will refer to the labels stored and routed exclusively by honest nodes as “honest labels” and to all the other labels as “corrupted labels”. The reason that the single-user leakage is independent of the honest labels is because of the RYW consistency of the underlying KVS. More precisely, RYW consistency guarantees that for a given label, the user will read the latest value that it stored. This implies that the value it reads will be independent of any

80

A. Agarwal and S. Kamara

other label, including the corrupted labels. This is not the case, however, in the multi-user setting where RYW consistency does not guarantee that the honest labels will be independent of the corrupted labels. To see why, consider the following example. Let 1 be a corrupted label and let 2 be an honest label. Assume that both 1 and 2 initially have the value 0. Now consider the two sequences of operations executed by clients C1 and C2 shown in Fig. 4. Notice that both sequences are RYW consistent (this is the case because they satisfy a stronger consistency guarantee called sequential consistency). However, in sequence 1, Get(2 ) can output both 0 or 1 whereas, in sequence 2, if Get(1 ) outputs a 0, then Get(2 ) can only output 1. This example points out that operations on corrupted labels can impact operations on honest labels. Capturing exactly how operations on one label can effect operations on other labels for different consistency guarantees is challenging but might be helpful in designing solutions with better leakage profiles. We leave this as an open problem. Alternatively, it would be interesting to know if there is some consistency notion one could assume (in the multi-user setting) under which a better leakage profile could be achieved.

Fig. 4. Sequence 1 is on the left and Sequence 2 is on the right.

Security. We now state our security theorem, the proof of which is in the full version of the paper. Theorem 4. EKVS is L-secure with probability at least 1 − negl(k).

9

Conclusions and Future Work

In this work, we study end-to-end encryption in the context of KVSs. We formalize the security properties of the standard scheme in both the single-user and multi-user settings. We then use our framework to analyze the security of the standard scheme when its underlying KVS is instantiated with consistent hashing based KVS (with zero-hop and multi-hop routing). We see our work as an important step towards designing provably-secure end-to-end encrypted distributed systems like off-chain networks, distributed storage systems, distributed databases and distributed caches. Our work motivates several open problems and directions for future work.

Encrypted Key-Value Stores

81

Relationship Between Consistency Guarantees and Leakage. Recall that the standard scheme leaks the operation equality of all the labels in the multiuser setting (with no assumption on the consistency guarantees). However, if the underlying KVS satisfies RYW consistency, the scheme only leaks the operation equality of a subset of labels but in a single-user setting. The most immediate question is whether the leakage can be improved in the multi-user setting by assuming a stronger consistency guarantee. We however believe that even assuming linearizability, which is much stronger than RYW consistency, the standard scheme would still leak more in the multiuser setting than what it would in the single-user setting with RYW consistency. The question then is to find a lower bound on leakage in the multi-user setting. Beyond CH-KVS. Another direction is to study the security of the standard EKVS when it is instantiated with a KVS that is not based on consistent hashing or on the two routing schemes that we described. Instantiations based on Kademlia [47] and Koorde [35] would be particularly interesting due to the former’s popularity in practice and the latter’s theoretical efficiency. Because Koorde uses consistent hashing in its structure (though its routing is different and based on De Bruijn graphs) the bounds we introduce in this work to study CH-KVS’s balance might find use in analyzing Koorde. Kademlia, on the other hand, has a very different structure than CH-KVSs so it is likely that new custom techniques and bounds are needed to analyze its balance. New EKVS Constructions. A third direction is to design new EKVS schemes with better leakage profiles. Here, a “better” profile could be the same profile Lε achieved in this work but with a smaller ε than what we show. Alternatively, it could be a completely different leakage profile. This might be done, for example, by using more sophisticated techniques from structured encryption and oblivious RAMs. EKVSs in the Transient Setting. Another important direction of immediate practical interest is to study the security of EKVSs in the transient setting. As mentioned in Sect. 4, in transient setting, nodes are not known a-priori and can join and leave at any time. This setting is particularly suited to peer-topeer networks and permissionless blockchains. Agarwal and Kamara [7] study DHTs in the transient setting and it would be interesting to extend their work to transient KVSs as well. Stronger Adversarial Models. Our security definitions are in the standalone model and against an adversary that makes static corruptions. Extending our work to handle arbitrary compositions (e.g., using universal composability [17]) and adaptive corruptions would be very interesting.

82

A. Agarwal and S. Kamara

References 1. 2. 3. 4. 5. 6. 7. 8.

9.

10.

11.

12.

13. 14.

15. 16.

17.

18.

19.

20.

Apache ignite. https://ignite.apache.org/ Couchbase. https://www.couchbase.com/ FoundationDB. https://www.foundationdb.org/ MemcacheDB. https://github.com/LMDB/memcachedb/ Redis. https://redis.io/ XAP. https://www.gigaspaces.com/ Agarwal, A., Kamara, S.: Encrypted distributed hash tables. Cryptology ePrint Archive, Report 2019/1126 (2019). https://eprint.iacr.org/2019/1126 Agrawal, R., Kiernan, J., Srikant, R., Xu, Y.: Order preserving encryption for numeric data. In: ACM SIGMOD International Conference on Management of Data, pp. 563–574 (2004) Asharov, G., Naor, M., Segev, G., Shahaf, I.: Searchable symmetric encryption: optimal locality in linear space via two-dimensional balanced allocations. In: ACM Symposium on Theory of Computing, STOC 2016, pp. 1101–1114. ACM, New York (2016) Asharov, G., Segev, G., Shahaf, I.: Tight tradeoffs in searchable symmetric encryption. In: Shacham, H., Boldyreva, A. (eds.) CRYPTO 2018. LNCS, vol. 10991, pp. 407–436. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96884-1 14 Bailis, P., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Bolt-on causal consistency. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 761–772 (2013) Bellare, M., Boldyreva, A., O’Neill, A.: Deterministic and efficiently searchable encryption. In: Menezes, A. (ed.) CRYPTO 2007. LNCS, vol. 4622, pp. 535–552. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74143-5 30 Blackstone, L., Kamara, S., Moataz, T.: Revisiting leakage abuse attacks. In: Network and Distributed System Security Symposium (NDSS 2020) (2020) Boldyreva, A., Chenette, N., Lee, Y., O’Neill, A.: Order-preserving symmetric encryption. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 224–241. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01001-9 13 Bost, R.: Sophos - forward secure searchable encryption. In: ACM Conference on Computer and Communications Security (CCS 2016) (2016) Bost, R., Minaud, B., Ohrimenko, O.: Forward and backward private searchable encryption from constrained cryptographic primitives. In: ACM Conference on Computer and Communications Security (CCS 2017) (2017) Canetti, R.: Universally composable security: a new paradigm for cryptographic protocols. In: Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pp. 136–145. IEEE (2001) Cash, D., Grubbs, P., Perry, J., Ristenpart, T.: Leakage-abuse attacks against searchable encryption. In: ACM Conference on Communications and Computer Security (CCS 2015), pp. 668–679. ACM (2015) Cash, D., et al.: Dynamic searchable encryption in very-large databases: data structures and implementation. In: Network and Distributed System Security Symposium (NDSS 2014) (2014) Cash, D., Jarecki, S., Jutla, C., Krawczyk, H., Ro¸su, M.-C., Steiner, M.: Highlyscalable searchable symmetric encryption with support for boolean queries. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8042, pp. 353–373. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40041-4 20

Encrypted Key-Value Stores

83

21. Cash, D., Tessaro, S.: The locality of searchable symmetric encryption. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. LNCS, vol. 8441, pp. 351–368. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-55220-5 20 22. Chang, F., et al.: BigTable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008) 23. Chase, M., Kamara, S.: Structured encryption and controlled disclosure. In: Abe, M. (ed.) ASIACRYPT 2010. LNCS, vol. 6477, pp. 577–594. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17373-8 33 24. Curtmola, R., Garay, J., Kamara, S., Ostrovsky, R.: Searchable symmetric encryption: improved definitions and efficient constructions. In: ACM Conference on Computer and Communications Security (CCS 2006), pp. 79–88. ACM (2006) 25. DeCandia, G., et al.: Dynamo: amazon’s highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 41, 205–220 (2007) 26. Demertzis, I., Papadopoulos, D., Papamanthou, C.: Searchable encryption with optimal locality: achieving sublogarithmic read efficiency. In: Shacham, H., Boldyreva, A. (eds.) CRYPTO 2018. LNCS, vol. 10991, pp. 371–406. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96884-1 13 27. Demertzis, I., Papamanthou, C.: Fast searchable encryption with tunable locality. In: ACM International Conference on Management of Data, SIGMOD 2017, pp. 1053–1067. ACM, New York (2017) 28. Etemad, M., K¨ up¸cu ¨, A., Papamanthou, C., Evans, D.: Efficient dynamic searchable encryption with forward privacy. PoPETs 2018(1), 5–20 (2018) 29. Faber, S., Jarecki, S., Krawczyk, H., Nguyen, Q., Rosu, M., Steiner, M.: Rich queries on encrypted data: beyond exact matches. In: Pernul, G., Ryan, P.Y.A., Weippl, E. (eds.) ESORICS 2015. LNCS, vol. 9327, pp. 123–145. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24177-7 7 30. Fisch, B.A., et al.: Malicious-client security in blind seer: a scalable private DBMS. In: IEEE Symposium on Security and Privacy, pp. 395–410. IEEE (2015) 31. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: ACM Symposium on Theory of Computing (STOC 2009), pp. 169–178. ACM Press (2009) 32. Goh, E.-J.: Secure indexes. Technical report 2003/216, IACR ePrint Cryptography Archive (2003). http://eprint.iacr.org/2003/216 33. Goldreich, O., Ostrovsky, R.: Software protection and simulation on oblivious RAMs. J. ACM 43(3), 431–473 (1996) 34. Islam, M.S., Kuzu, M., Kantarcioglu, M.: Access pattern disclosure on searchable encryption: ramification, attack and mitigation. In: Network and Distributed System Security Symposium (NDSS 2012) (2012) 35. Kaashoek, M.F., Karger, D.R.: Koorde: a simple degree-optimal distributed hash table. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, pp. 98– 107. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45172-3 9 36. Kamara, S., Moataz, T.: Boolean searchable symmetric encryption with worst-case sub-linear complexity. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017. LNCS, vol. 10212, pp. 94–124. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-56617-7 4 37. Kamara, S., Moataz, T.: SQL on structurally-encrypted databases. In: Peyrin, T., Galbraith, S. (eds.) ASIACRYPT 2018. LNCS, vol. 11272, pp. 149–180. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03326-2 6 38. Kamara, S., Moataz, T.: Computationally volume-hiding structured encryption. In: Ishai, Y., Rijmen, V. (eds.) EUROCRYPT 2019. LNCS, vol. 11477, pp. 183–213. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17656-3 7

84

A. Agarwal and S. Kamara

39. Kamara, S., Moataz, T., Ohrimenko, O.: Structured encryption and leakage suppression. In: Shacham, H., Boldyreva, A. (eds.) CRYPTO 2018. LNCS, vol. 10991, pp. 339–370. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-968841 12 40. Kamara, S., Moataz, T., Zdonik, S., Zhao, Z.: An optimal relational database encryption scheme. Cryptology ePrint Archive, Report 2020/274 (2020). https:// eprint.iacr.org/2020/274 41. Kamara, S., Papamanthou, C.: Parallel and dynamic searchable symmetric encryption. In: Sadeghi, A.-R. (ed.) FC 2013. LNCS, vol. 7859, pp. 258–274. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39884-1 22 42. Kamara, S., Papamanthou, C., Roeder, T.: Dynamic searchable symmetric encryption. In: ACM Conference on Computer and Communications Security (CCS 2012). ACM Press (2012) 43. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010) 44. Lloyd, W., Freedman, M.J., Kaminsky, M., Andersen, D.G.: Don’t settle for eventual: scalable causal consistency for wide-area storage with COPS. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 401– 416 (2011) 45. Lloyd, W., Freedman, M.J., Kaminsky, M., Andersen, D.G.: Stronger semantics for low-latency geo-replicated storage. In: Presented as Part of the 10th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 2013), pp. 313–328 (2013) 46. Macedo, R., et al.: A practical framework for privacy-preserving NoSQL databases. In: 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS), pp. 11– 20. IEEE (2017) 47. Maymounkov, P., Mazi`eres, D.: Kademlia: a peer-to-peer information system based on the XOR metric. In: Druschel, P., Kaashoek, F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, pp. 53–65. Springer, Heidelberg (2002). https://doi.org/10. 1007/3-540-45748-8 5 48. Meng, X., Kamara, S., Nissim, K., Kollios, G.: GRECS: graph encryption for approximate shortest distance queries. In: ACM Conference on Computer and Communications Security (CCS 2015) (2015) 49. Pappas, V., et al.: Blind seer: a scalable private DBMS. In: 2014 IEEE Symposium on Security and Privacy (SP), pp. 359–374. IEEE (2014) 50. Poddar, R., Boelter, T., Popa, R.A.: Arx: an encrypted database using semantically secure encryption. Proc. VLDB Endow. 12(11), 1664–1678 (2019) 51. Song, D., Wagner, D., Perrig, A.: Practical techniques for searching on encrypted data. In: IEEE Symposium on Research in Security and Privacy, pp. 44–55. IEEE Computer Society (2000) 52. Stefanov, E., Papamanthou, C., Shi, E.: Practical dynamic searchable encryption with small leakage. In: Network and Distributed System Security Symposium (NDSS 2014) (2014) 53. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Comput. Commun. Rev. 31(4), 149–160 (2001) 54. Sumbaly, R., Kreps, J., Gao, L., Feinberg, A., Soman, C., Shah, S.: Serving largescale batch computed data with project Voldemort. In: Proceedings of the 10th USENIX conference on File and Storage Technologies, p. 18. USENIX Association (2012)

Encrypted Key-Value Stores

85

55. Basho Technologies: Riak. https://docs.basho.com/riak/kv/2.2.2/learn/dynamo/ 56. Wu, Z., Butkiewicz, M., Perkins, D., Katz-Bassett, E., Madhyastha, H.V.: SPANStore: cost-effective geo-replicated storage spanning multiple cloud services. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 292–308 (2013) 57. Zheng, W., Li, F., Popa, R.A., Stoica, I., Agarwal, R.: MiniCrypt: reconciling encryption and compression for big data stores. In: Proceedings of the Twelfth European Conference on Computer Systems, pp. 191–204 (2017)

Formal Methods

Formal Verification of Fair Exchange Based on Bitcoin Smart Contracts Cheng Shi and Kazuki Yoneyama(B) Ibaraki University, 4-12-1, Nakanarusawa, Hitachi-shi, Ibaraki, Japan [email protected]

Abstract. Smart contracts are protocols that can automatically execute a transaction including an electronic contract when a condition is satisfied without a trusted third party. In a representative use-case, a smart contract is executed when multiple parties fairly trade on a blockchain asset. On blockchain systems, a smart contract can be regarded as a system participant, responding to the information received, receiving and storing values, and sending information and values outwards. Also, a smart contract can temporarily keep assets, and always perform operations in accordance with prior rules. Many cryptocurrencies have implemented smart contracts. At POST2018, Atzei et al. give formulations of seven fair exchange protocols using smart contract on Bitcoin: oracle, escrow, intermediated payment, timed commitment, micropayment channels, fair lotteries, and contingent payment. However, they only give an informal discussion on security. In this paper, we verify the fairness of their seven protocols by using the formal verification tool ProVerif. As a result, we show that five protocols (the oracle, intermediated payment, timed commitment, micropayment channels and fair lotteries protocols) satisfy fairness, which were not proved formally. Also, we re-find known attacks to break fairness of two protocols (the escrow and contingent payment protocols). For the escrow protocol, we formalize the two-party scheme and the three-party scheme with an arbitrator, and show that the two-party scheme does not satisfy fairness as Atzei et al. showed. For the contingent payment protocol, we formalize the protocol with the non-interactive zero-knowledge proof (NIZK), and re-find the attack shown by Campanelli et al. at CCS 2017. Also, we show that a countermeasure with subversion NIZK against the attack works properly while it is not formally proved. Keywords: Formal methods

1 1.1

· Smart contracts · Fairness · ProVerif

Introduction Background

Szabo [37] proposed the concept of smart contracts. It is a protocol that can automatically execute a transaction including an electronic contract without C. Shi—Presently with Panasonic Corporation. c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 89–106, 2020. https://doi.org/10.1007/978-3-030-65277-7_5

90

C. Shi and K. Yoneyama

a trusted third party when the conditions are satisfied. Recently, implementations of smart contracts using distributed ledger technology and blockchain have attracted attention. A user who wants to execute a contract sends an appropriate transaction to a node in a peer-to-peer (P2P) network. Each user’s transaction is taken into the blockchain as an unprocessed transaction, and is executed when the transactions satisfy the contract conditions. For example, in the case of a contract representing a transaction on Bitcoin, if the contract is established, the transaction corresponds to the transfer of a specified amount of bitcoins. An important feature of smart contracts is that they can guarantee that contracts can be executed correctly without a centralized trusted authority. Also, the nodes that process transactions are considered mutually untrusted. Atzei et al. [11] proposed a formal framework for Bitcoin transactions. By using the framework, they introduce various types of fair exchange protocols [10] (oracle, escrow, intermediated payment, timed commitment, micropayment channels, fair lotteries, and contingent payment). These protocols represent several situations of transfer of bitcoins. For example, the escrow protocol provides that a specified amount of bitcoins is transferred from a buyer to a seller if the contract condition is satisfied, and the deposited bitcoins of the buyer is refunded otherwise. Since nodes are not trusted, we must consider the case that a malicious buyer or seller is included. Hence, these protocols must guarantee fairness such that the protocol never terminates in a situation that one participant unilaterally loses. In the case of the escrow protocol, the protocol must terminate in either of following situations; one is that bitcoins are transferred and the contract condition is satisfied, and the other is that bitcoins are refunded and the contract condition is not satisfied. Though it is well known that general fair exchange with two parties is impossible without a trusted third party [27], in their protocols blockchain works instead of the trusted third party. However, since they do not provide any formal security proof or formal verification, it is not clear that their scheme satisfies the desired fairness. Generally, it is difficult to evaluate security by hand because it is easy to make mistakes when the protocol is complicated. On the other hand, formal methods are known to be useful to verify security of cryptographic protocols easily and rigorously. Several automated security verification tools have been developed. ProVerif [5] is one of the most famous security verification tools, which can capture various security requirements such as confidentiality, authenticity, and indistinguishability. Thus, ProVerif has been used to verify many cryptographic protocols [8,9,15,19,29,31,32,35]. 1.2

Related Work

There have been various studies about security verification of cryptographic protocols with formal methods. ProVerif and CryptoVerif [17] have been used to verify standardized protocols such as ZRTP [19], SNMPv3 [9], mutual authentication protocol for GSM [8], LTE [29], OpenID Connect protocol [32], Google’s QUIC [35], TLS 1.3 [15], Signal [31], and others. Cremers and Horvat [25] verified 30 key exchange protocols standardized in ISO/IEC 11770-2 [3] and ISO/IEC

Formal Verification of Fair Exchange Based on Bitcoin Smart Contracts

91

11770-3 [4] with automated security verification tool Scyther [21], and found some spoofing attacks. Scyther and Tamarin Prover [36] have been also used to verify other standardized protocols such as IKEv1/v2 in IPsec [22], entity authentication protocols in ISO/IEC 9798 [13], TLS 1.3 [26], 5G-AKA [23] and smart grid authentication protocol SAv5 in IEEE 1815-2012 [24] and others. Backes et al. [12] verified fairness of several protocols (ASW, GJM and Secure Conversation Protocol) by using the notion of liveness. The contingent payment protocol on Bitcoin using non-interactive zeroknowledge proof (NIZK) is originally proposed by Maxwell [34] (called zeroknowledge contingent payment), and concretely implemented by using pay-tosudoku [18]. Campanelli et al. [20] show an attack on the contingent payment protocol by a malicious seller. Their attack uses the property that information of the witness of NIZK is leaked if the malicious seller generates an arbitrary common reference string (CRS). Also, they give some countermeasures against their attack, and one of them is based on subversion NIZK because subversion NIZK is resilient to malicious CRSs. There are some previous works formally verifying security of smart contracts. Luu et al. [33] introduce a symbolic execution tool Oyente to find potential security bugs of smart contracts for the existing Ethereum system. However, Oyente is neither sound nor complete because it can result in several false alarms even in trivial contracts. Bhargavan et al. [16] propose a framework to formally verify Ethereum smart contracts written in Solidity which is a language for implementing smart contracts. Their framework cannot deal with important constructs such as loops. Kalra et al. [30] introduce a framework, ZEUS, for automatic formal verification of smart contracts using abstract interpretation and symbolic model checking. It is shown that fairness of several representative contracts can be verified by ZEUS. Specifically, CrowdFundDao [1], DiceRoll [2], StandardToken [6], Wallet [7] and Common policy are verified. These tools aim to verify Ethereum smart contracts and cannot be straightforwardly applied to Bitcoin smart contracts. 1.3

Our Contribution

In this paper, we formalize Atzei et al.’s seven fair exchange protocols [10] and verify fairness of these protocols by using ProVerif [5]. Our key idea to formalize fairness is using a combination of verifications of reachability and correspondence assertion in the ProVerif functionality. Reachability means that there is a process reaching an event. Correspondence assertion means that an event (postevent) always occurs if another event (pre-event) occurs. We use reachability to formalize if the event of transferring or refunding bitcoins occurs, and use correspondence assertion to formalize if the event of transferring (resp. refunding) bitcoins always occurs in the case of satisfying (resp. failing) the contract condition. The verification of reachability is necessary because correspondence assertion is always true if the post-event (i.e., transferring or refunding bitcoins) never occurs in the process. Also, we use correspondence assertion in the contingent payment protocol to formalize if the event of transferring bitcoins from

92

C. Shi and K. Yoneyama

a seller to a buyer always occurs in the case that the seller gets a secret of the buyer. As far as we know, this is the first result to formalize and verify fairness of fair exchange protocols by ProVerif. As the verification result, we firstly show that five protocols (the oracle, the intermediated payment, the timed commitment, the micropayment channels, and the fair lotteries protocols) satisfy fairness. Since there was no security proof for these five protocols, it is the first result to formally verify security of them. On the other hand, we also show that other two protocols (the escrow and the contingent payment protocols) do not satisfy fairness; it means that ProVerif finds concrete attacks. Specifically, the escrow protocol has two settings; one is the two-party setting and the other is the three-party setting with an arbitrator, and an attack is re-found for the two-party setting as already shown in [10]. The attack causes situations such that a malicious buyer does not transfer bitcoins to a seller even if the contract condition is satisfied, or bitcoins are not refunded even if the contract condition is not satisfied. Also, we find an attack against the contingent payment protocol, which is the same as Campanelli et al.’s one [20]. Thus, ProVerif is useful to find attacks caused by a misuse of NIZK. Moreover, we examine the verification of the contingent payment protocol with the countermeasure by using subversion NIZK. As a result, we show that no attack is found; thus, the countermeasure correctly works. It is also meaningful because any formal security proof is not shown in [20]. Paper Organization. Section 2 shows some preliminaries including Bitcoin transactions and scripts, the formal model of smart contracts on Bitcoin and ProVerif. Section 3 shows our formalization and verification of the escrow protocol. Section 4 shows our formalization and verification of the contingent payment protocol. Note that we cannot show formalizations and verifications of all protocols due to the space limitation. Verification results of the oracle, the intermediated payment, the timed commitment, the micropayment channels and the fair lotteries protocols are omitted. We will give verification results of these protocols in the full version of this paper.

2 2.1

Preliminaries Bitcoin Transactions

Here, we briefly recall the formulation of the Bitcoin transaction. For details, please see [11]. Transactions represent transfers of bitcoins. The balance of a user is determined by the amount of unspent bitcoins through one or more transactions. Coinbase transactions T0 can generate fresh bitcoins. Each transaction has three fields: in, wit and out. in points to another transaction, and wit specifies a witness which makes the script within ‘out’ of the pointed transaction in ‘in’ evaluate to true. T0 .in and T0 .wit are set to ⊥ because T0 does not point backwards to any other transaction (since T0 is the first one on the blockchain). For example, when T0 .out is set to (λx.x < 51, 1BTC), then it means that for the script λx.x < 51

Formal Verification of Fair Exchange Based on Bitcoin Smart Contracts

93

that given as input a value x, if x < 51 holds, then 1BTC can be transferred to other transactions. Then, we consider transaction TA which has TA .in : T0 , TA .wit : 42 and TA .out : (λx.versigkA (x), 1BTC). To redeem from there 1BTC, A must provide a witness which makes the script within T0 .out evaluate to true. In this case, the witness is 42, hence the redeem succeeds, and T0 is spent. The script within TA .out is the popular one in Bitcoin such that it verifies the signature x with A’s public key. The message against which the signature is verified is the transaction which attempts to redeem TA . 2.2

Bitcoin Scripts

A witness such that the script evaluate to “true” can redeem the bitcoins retained in the associated (unspent) output. In the abstract model, scripts are terms of the form λz.e, where z is a sequence of variables occurring in e, and e is an expression with the following syntax. e ::= x | k | e + e | e − e | e = e | e < e | if e then e else e | |e| | H(e) | versigk (e) | absAfter t : e | relAfter t : e x means variables, k means constants, and (+, −, =, z’ = z is true” and the verification of reachability outputs “RESULT not table(block(T, b, z’)) is false”, then fairness of the transferring case is satisfied. Fairness of the refunding case is similarly verified. This formalization is somewhat theoretically interesting because previous techniques for security requirements by applying reachability (e.g., confidentiality) verify that an adversarial event does not occur (i.e., if ProVerif outputs “RESULT not attacker(s) is true”, then the protocol satisfies confidentiality for secret s.). On the other hand, in our verification, we use reachability to verify if a successful event occurs. 1 2 3

(* case of transferring *) query T:transaction, b:host, z’:bitcoin; table(block(T,b,z’)). query T:transaction, b:host, z’:bitcoin; table(block(T,b,z’)) ==> z’=z.

4 5 6 7

(* case of refunding *) query T:transaction, a:host, z’:bitcoin; table(block(T,a,z’)). query T:transaction, a:host, z’:bitcoin; table(block(T,a,z’)) ==> z’=z.

Formalize Participants and Blockchain. The buyer and seller subprocesses are easily defined as the protocol specification. We formalize the blockchain as a participant in the system. The role of the blockchain is verifying signatures for each transaction and multi-signatures, and operating the table (i.e., transfer of bitcoins). In order to manage transactions, the blockchain has the table tran. Here, we show the blockchain subprocess according to the case that the transaction is completed (transferring). 1 2 3 4 5 6 7 8

let blockchain(T:transaction,Asignpu:verkey,T’B:transaction, Bsignpu:verkey)= in(c,(signAtoblock’:bitstring)); if Schecksign(signAtoblock’,Asignpu,T) then insert tran(T); in(c,(signAtoB’’:bitstring,signBtoblock’:bitstring)); if Mchecksign(signAtoB’’,signBtoblock’,Asignpu,Bsignpu,T’B) then insert tran(T’B); insert block(T’B,b,z’).

In the three-party protocol, arbitrator C sends a signature if an unfair situation occurs. Here, we show the arbitrator subprocess according to the case that the transaction is completed (transferring).

100

C. Shi and K. Yoneyama Table 3. Verification results of escrow protocol Two-party setting w/o arbitrator Transferring

Refunding

Honest buyer Malicious buyer Honest seller Malicious seller 

×



×

Three-party setting with arbitrator Transferring

Refunding

Malicious buyer

Malicious seller

   means that fairness is satisfied. × means that fairness is not satisfied. 1 2 3 4

3.3

let C(T:transaction,T’B:transaction,Csignpr:sigkey)= get tran(=T) in let signCtoB=sign(T’B,Csignpr) in out(c,(signCtoB)).

Verification Result

Here, we show the verification result for the escrow protocol as Table 3. For the two-party setting, if both the buyer and the seller are honest, fairness is satisfied. However, if the buyer or the seller is malicious, fairness is not satisfied. For the three-party setting, even if the buyer or the seller is malicious, fairness is satisfied.

4

Formalization and Verification of Contingent Payment

In this section, we give our formalization of the contingent payment protocol and the verification result with ProVerif. 4.1

Non-interactive Zero-Knowledge Proof

First, we recall the notion of NIZK and subversion NIZK. A NIZK scheme Π for relation R contains the following polynomial time algorithms. crs ← Π.Pg(1λ ) generates a common reference string crs. π ← Π.P(1λ , crs, x, w) for an honest prover, given statement x and witness w ∈ R(x), generates a proof π such that x ∈ L(R) (relation R defining the language L(R) ∈ N P ). d ← Π.V(1λ , crs, x, π) for a verifier generates a decision d ∈ {true, f alse} indicating whether π is a valid proof that x ∈ L(R). In NIZK schemes, zero-knowledge must be satisfied for an honestly generated CRS. Zero-knowledge intuitively means that no information of the witness w is leaked from proof π. In the ordinary definition of zero-knowledge, there is no guarantee if the CRS is not generated by a trusted party (e.g., the verifier can know the randomness when generating the CRS).

Formal Verification of Fair Exchange Based on Bitcoin Smart Contracts

101

Bellare et al. [14] proposed the notion of subversion NIZK. In the subversion NIZK, zero-knowledge is preserved even when the verifier chooses the CRS. Note that due to known impossibility results [28], zero-knowledge obtained in this case is somewhat weak. 4.2

Protocol Description

Next, we recall the contingent payment protocol with NIZK (called zeroknowledge contingent payment). In the contingent payment protocol, the seller proves that he/she certainly has the secret, and the buyer pays bitcoins if the proof is valid. The seller B knows a secret s and generates a fresh key k. Then, B encrypts secret s using key k such that Enck (s) = c. B also computes hk such that SHA256(k) = hk . B then sends c and hk together with a zero-knowledge proof that c is encryption of s under the key k and that SHA256(k) = hk (i.e., The witness is (k, s).) to the buyer A. To eliminate the need for a trusted third party, buyer A generates the CRS to serve as the trusted third party. Once A has verified the proof, A puts a transaction to the blockchain such that A pays bitcoins and specifies that B can only redeem it if B provides a value k  such that SHA256(k  ) = hk . B then publishes k and claims the redemption. A having learned k can now decrypt c, and hence A learns s. The Bitcoin script supports checking if a value k is the preimage of h such that h = SHA256(k). The contingent payment protocol uses the transactions shown in Table 4. in : (TA ,1) indicates that A’s bitcoins are being deposited. Transaction Tcp (h) checks the signature of B and that the content is SHA256(k) = hk calculated by B (non-timeout situation). If no solution is provided by the deadline t, Tcp (h) allows A to refund bitcoins by using the transaction Trefund (h) (timeout situation). In transaction Topen (h), B provides a value k  such that SHA256(k  ) = hk . A’s process PA is as follows: PA = B?(c, hk , z).P + τ P = if verif y(c, hk , z) then put Tcp (hk ){wit → sigkaaA (Tcp (hk ))}.P  else 0 P  = ask Topen (hk ) as x.P  (Dgetk (x) (c)) + put Trefund (kk ){wit → sigkaaA (Trefund (hk ))}

Upon receiving c, hk , and the proof z the buyer verifies z. If the verification succeeds, A puts Tcp (hk ) on the blockchain. Then, A waits for Topen , from which A can retrieve the key k, and use the solution Dgetk (x) (c) in P ”. In this way, B can redeem n bitcoin. If B does not put Topen after t time units A can get A’s deposit back through Trefund .

102

C. Shi and K. Yoneyama Table 4. Transactions of contingent payment protocol Tcp (h) in : (TA ,1) wit : ⊥ out : (λxζ.(versigkB (ζ) and SHA256(x) = h) or relAfter t : versigkA (ζ), nBTC) Trefund (h) in : (Tcp (h),1) wit : ⊥ out : (λζ.versigkA (ζ), nBTC) relLock : t

4.3

Topen (h) in : (Tcp (h),1) wit : ⊥ out : (λζ.versigkB (ζ),nBTC)

Our Formalization

Here, we show our formalization of the contingent payment protocol. We give separate ProVerif verification codes for the non-timeout and timeout situations. Also, we show a formalization of the countermeasure with subversion NIZK. As the escrow protocol, we just explain distinguished points of codes and omit trivial points. Formalize NIZK and Subversion NIZK. Since NIZK does not guarantee zero-knowledge if the CRS is generated by the buyer, we define an adversarial function subvert1 and subvert2 for subverting the witness from the proof and knowledge of the randomness when generating the CRS. The reason why we need two functions subvert1 and subvert2 is that is cannot be defined in ProVerif to output two terms. Hence, we separately define subvert1 as the subverting function of the first witness (i.e., hash key k) and subvert2 as the subverting function of the second witness (i.e., secret s). An honest buyer does not use subvert1 and subvert2. Also, if the randomness when generating the CRS is not known, function prove does not leak any information of witness s. Our formalization of NIZK is as follows: 1 2 3

4

5

6

type pi. (* type of proof *) fun crsgen(bitstring):bitstring. (* function of generating CRS from randomness *) fun prove(bitstring,bitstring,bitstring,key,bitstring):pi. (* function of generating proof from CRS, hash value, ciphertext, hash key and secret *) reduc forall k:key,s:bitstring,rnd:bitstring; subvert1(rnd,prove( crsgen(rnd),Hash(k),senc(k,s),k,s)) = k. (* function of subverting 1st witness *) reduc forall k:key,s:bitstring,rnd:bitstring; subvert2(rnd,prove( crsgen(rnd),Hash(k),senc(k,s),k,s)) = s. (* function of subverting 2nd witness *) reduc forall k:key,s:bitstring,rnd:bitstring; verify(crsgen(rnd), Hash(k),senc(k,s),prove(crsgen(rnd),Hash(k),senc(k,s),k,s))= true. (* function of verification *)

Formal Verification of Fair Exchange Based on Bitcoin Smart Contracts

103

In subversion NIZK, even if the CRS is generated by the buyer, no information of witness s is leaked. Hence, we can formalize subversion NIZK by deleting function subvert. Formalize Fairness. Fairness in the contingent payment protocol must guarantee that bitcoins always are transferred from a buyer to a seller if the buyer obtains the seller’s secret (non-timeout), and the deposited bitcoins of the buyer is always refunded otherwise (timeout). The timeout situation is almost the same as the escrow protocol; thus, we can formalize fairness by using the combination of reachability and correspondence assertions. In the non-timeout situation, we formalize fairness as block(Topen,b,z) is always added to table block when the malicious buyer obtains secret s (In ProVerif, this event is represented as attacker(s)). We use correspondence assertion to verify whether the event of adding to table block always occurs if attacker(s) occurs. Since ProVerif does not allow us to contain event block(Topen,b,z) directly for correspondence assertion, we define event payment such that it occurs if and only if block(Topen,b,z) occurs. 1 2 3

(* case of timeout *) query T:transaction, a:host, z’:bitcoin; table(block(T,a,z’)). query T:transaction, a:host, z’:bitcoin; table(block(T,a,z’)) ==> z’=z.

4 5 6

4.4

(* case of non-timeout *) query x:transaction; attacker(s)==>event(payment(x)).

Verification Result

Here, we show the verification result for the contingent payment protocol as Table 5. With ordinary NIZK, fairness is satisfied in the timeout situation. However, fairness is not satisfied in the non-timeout situation because ProVerif finds an attack. With subversion NIZK, fairness is satisfied even in the non-timeout situation. Found Attack. We describe the found attack against the protocol with ordinary NIZK in the non-timeout situation. A malicious buyer generates randomness rnd’. By using function crsgen, the buyer generates crs’ with randomness rnd’. The buyer sets crs’ as the CRS. The seller generates hash key k, encrypts secret s with k to ciphertext c and computes hash value hk of k. By using crs’, hk, c, k and s, the seller also generates proof zk where (k, s) is the witness. 5. The seller sends hash value hk, ciphertext c and proof zk to the buyer. 6. By using randomness rnd’, proof zk and function subvert2, the buyer can obtain secret s.

1. 2. 3. 4.

This attack is essentially the same as the attack by Campanelli et al. [20]. Thus, we can re-find their attack by ProVerif.

104

C. Shi and K. Yoneyama Table 5. Verification results of contingent payment protocol Using ordinary NIZK Using subversion NIZK Non-timeout Timeout Non-timeout Timeout ×     means that fairness is satisfied. × means that fairness is not satisfied.

5

Conclusion

In this paper, we verified fairness of seven fair exchange protocols formulated in [10] by using an automated verification tool, ProVerif. We showed that the escrow and the contingent payment protocol do not satisfy fairness while other protocols satisfy that. Concretely, for the escrow protocol, we examined the twoparty setting and the three-party setting with an arbitrator and showed that the two-party setting does not satisfy fairness for a malicious buyer or seller. Also, for the contingent payment protocol, we re-found the attack shown by Campanelli et al. [20], and showed that their countermeasure using subversion NIZK correctly works. In a theoretical sense, we give the first formalization of the notion of fairness of fair exchange protocols in ProVerif. It will be useful to formalize and verify other cryptographic protocols that need some kind of fairness.

References 1. CrowdFundDAO. https://live.ether.camp/account/9b37508b5f859682382d8cb646 7a5c7fc5d02e9c/contract 2. DiceRoll. https://ropsten.etherscan.io/address/0xb95bbe8ee98a21b5ef7778ec1bb 5910ea843f8f7 3. ISO/IEC 11770-2:2018 - IT Security techniques - Key management - Part 2: Mechanisms using symmetric techniques. https://www.iso.org/standard/73207.html 4. ISO/IEC 11770-3:2015 - Information technology - Security techniques - Key management - Part 3: Mechanisms using asymmetric techniques. https://www.iso.org/ standard/60237.html 5. ProVerif 2.00. http://prosecco.gforge.inria.fr/personal/bblanche/proverif 6. StandardToken. https://git.io/vFAlg 7. Wallet. https://etherscan.io/address/0xab7c74abc0c4d48d1bdad5dcb26153fc8780 f83e 8. Ammayappan, K.: Seamless interoperation of LTE-UMTS-GSM requires flawless UMTS and GSM. In: International Conference on Advanced Computing, Networking and Security, pp. 169–174 (2013) 9. Asadi, S., Shahhoseini, H.S.: Formal security analysis of authentication in SNMPv3 protocol by an automated tool. In: IST 2012, pp. 306–313 (2012) 10. Atzei, N., Bartoletti, M., Cimoli, T., Lande, S., Zunino, R.: SoK: unraveling bitcoin smart contracts. In: Bauer, L., K¨ usters, R. (eds.) POST 2018. LNCS, vol. 10804, pp. 217–242. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89722-6 9

Formal Verification of Fair Exchange Based on Bitcoin Smart Contracts

105

11. Atzei, N., Bartoletti, M., Lande, S., Zunino, R.: A formal model of Bitcoin transactions. In: Meiklejohn, S., Sako, K. (eds.) FC 2018. LNCS, vol. 10957, pp. 541–560. Springer, Heidelberg (2018). https://doi.org/10.1007/978-3-662-58387-6 29 12. Backes, M., Dreier, J., Kremer, S., K¨ unnemann, R.: A novel approach for reasoning about liveness in cryptographic protocols and its application to fair exchange. In: Euro S&P 2017, pp. 76–91 (2017) 13. Basin, D., Cremers, C., Meier, S.: Provably repairing the ISO/IEC 9798 standard for entity authentication. J. Comput. Secur. 21(6), 817–846 (2013) 14. Bellare, M., Fuchsbauer, G., Scafuro, A.: NIZKs with an untrusted CRS: security in the face of parameter subversion. In: Cheon, J.H., Takagi, T. (eds.) ASIACRYPT 2016. LNCS, vol. 10032, pp. 777–804. Springer, Heidelberg (2016). https://doi.org/ 10.1007/978-3-662-53890-6 26 15. Bhargavan, K., Blanchet, B., Kobeissi, N.: Verified models and reference implementations for the TLS 1.3 standard candidate. In: IEEE Symposium on Security and Privacy 2017, pp. 483–502 (2017) 16. Bhargavan, K., et al.: Formal verification of smart contracts: short paper. In: PLAS@CCS 2016, pp. 91–96 (2016) 17. Blanchet, B.: CryptoVerif: cryptographic protocol verifier in the computational model. https://prosecco.gforge.inria.fr/personal/bblanche/cryptoverif/ 18. Bowe, S.: Pay-to-sudoku (2016). https://github.com/zcash/pay-to-sudoku 19. Bresciani, R., Butterfield, A.: ProVerif analysis of the ZRTP protocol. Int. J. Infonom. (IJI) 3(3), 1060–1064 (2010) 20. Campanelli, M., Gennaro, R., Goldfeder, S., Nizzardo, L.: Zero-knowledge contingent payments revisited: attacks and payments for services. In: ACM Conference on Computer and Communications Security 2017, pp. 229–243 (2017) 21. Cremers, C.: The Scyther Tool. https://people.cispa.io/cas.cremers/scyther/ 22. Cremers, C.: Key exchange in IPsec revisited: formal analysis of IKEv1 and IKEv2. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 315–334. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23822-2 18 23. Cremers, C., Dehnel-Wild, M.: Component-based formal analysis of 5G-AKA: channel assumptions and session confusion. In: NDSS 2019 (2019) 24. Cremers, C., Dehnel-Wild, M., Milner, K.: Secure authentication in the grid: a formal analysis of DNP3: SAv5. J. Comput. Secur. 27(2), 203–232 (2019) 25. Cremers, C., Horvat, M.: Improving the ISO/IEC 11770 standard for key management techniques. Int. J. Inf. Secur. 15(6), 659–673 (2015). https://doi.org/10. 1007/s10207-015-0306-9 26. Cremers, C., Horvat, M., Hoyland, J., Scott, S., van der Merwe, T.: A comprehensive symbolic analysis of TLS 1.3. In: CCS 2017, pp. 1773–1788 (2017) 27. Garbinato, B., Rickebusch, I.: Impossibility results on fair exchange. In: IICS 2010, pp. 507–518 (2010) 28. Goldreich, O., Oren, Y.: Definitions and properties of zero-knowledge proof systems. J. Cryptol. 7(1), 1–32 (1994) 29. Ben Henda, N., Norrman, K.: Formal analysis of security procedures in LTE a feasibility study. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 341–361. Springer, Cham (2014). https://doi.org/10.1007/ 978-3-319-11379-1 17 30. Kalra, S., Goel, S., Dhawan, M., Sharma, S.: ZEUS: analyzing safety of smart contracts. In: NDSS 2018 (2018) 31. Kobeissi, N., Bhargavan, K., Blanchet, B.: Automated verification for secure messaging protocols and their implementations: a symbolic and computational approach. In: Euro S&P 2017, pp. 435–450 (2017)

106

C. Shi and K. Yoneyama

32. Lu, J., Zhang, J., Li, J., Wan, Z., Meng, B.: Automatic verification of security of OpenID connect protocol with ProVerif. 3PGCIC 2016. LNDECT, vol. 1, pp. 209–220. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-49109-7 20 33. Luu, L., Chu, D.H., Olickel, H., Saxena, P., Hobor, A.: Making smart contracts smarter. In: ACM Conference on Computer and Communications Security 2016, pp. 254–269 (2016) 34. Maxwell, G.: Zero knowledge contingent payment (2011). https://en.bitcoin.it/ wiki/Zero Knowledge Contingent Payment 35. Sakurada, H., Yoneyama, K., Hanatani, Y., Yoshida, M.: Analyzing and fixing the QACCE security of QUIC. In: Chen, L., McGrew, D., Mitchell, C. (eds.) SSR 2016. LNCS, vol. 10074, pp. 1–31. Springer, Cham (2016). https://doi.org/10.1007/9783-319-49100-4 1 36. Schmidt, B., Meier, S., Cremers, C., Basin, D.: Tamarin prover. http://tamarinprover.github.io/ 37. Szabo, N.: Formalizing and securing relationships on public networks. First Monday, 1 September (1997). http://firstmonday.org/ojs/index.php/fm/article/view/ 548/469

Certified Compilation for Cryptography: Extended x86 Instructions and Constant-Time Verification Jos´e Bacelar Almeida1 , Manuel Barbosa2 , Gilles Barthe3 , Vincent Laporte4 , and Tiago Oliveira2(B) 2

1 INESC TEC, Universidade do Minho, Braga, Portugal INESC TEC and FCUP, Universidade do Porto, Porto, Portugal [email protected] 3 Max Planck Institute for Security and Privacy, IMDEA Software Institute, Madrid, Spain 4 Universit´e de Lorraine, CNRS, Inria, LORIA, Nancy, France

Abstract. We present a new tool for the generation and verification of high-assurance high-speed machine-level cryptography implementations: a certified C compiler supporting instruction extensions to the x86. We demonstrate the practical applicability of our tool by incorporating it into supercop: a toolkit for measuring the performance of cryptographic software, which includes over 2000 different implementations. We show i. that the coverage of x86 implementations in supercop increases significantly due to the added support of instruction extensions via intrinsics and ii. that the obtained verifiably correct implementations are much closer in performance to unverified ones. We extend our compiler with a specialized type system that acts at pre-assembly level; this is the first constant-time verifier that can deal with extended instruction sets. We confirm that, by using instruction extensions, the performance penalty for verifiably constant-time code can be greatly reduced. Keywords: Certified compiler

1

· simd · supercop · Constant-time

Introduction

A key challenge in building secure software systems is to develop good implementations of cryptographic primitives like encryption schemes, hash functions, and signatures. Because these primitives are so-highly relied on, it is important that their implementations achieve maximal efficiency, functional correctness, and protection against side-channel attacks. Unfortunately, it is difficult to achieve these goals: severe security vulnerabilities are exposed in many implementations despite heavy scrutiny [1,2,40]. Computer-aided cryptography is an area of research that aims to address the challenges in creating high-assurance cryptography designs and implementations, by creating domain-specific formal verification techniques and tools [10]. This work extends the range of tools that can be used for computer-aided cryptography. c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 107–127, 2020. https://doi.org/10.1007/978-3-030-65277-7_6

108

J. B. Almeida et al.

There are many paths to achieve high-assurance cryptography for assemblylevel implementations. The most direct path is naturally to prove functional correctness and side-channel protection directly at assembly level. This is the approach taken in [21] for proving functional correctness of Curve25519 and more recently in [18,26] for proving functional correctness of several implementations. However, verifying assembly programs is less intuitive, for instance due to the unstructured control flow, and there choice of verification tools is more limited. An alternative, and popular, approach is to verify source-level programs. This approach is taken for instance in [41], where Zinzindohoue et al. develop functionally verified implementations of popular cryptographic algorithms. The benefits are two-fold: there is a broader choice of verification tools. Second, the verification can benefit from the high-level abstractions enforced at source level. However, this approach introduces the question of trust in the compiler. Indeed, formal guarantees can be undermined by incorrect compilers. For instance, bugs in optimization passes can generate functionally-incorrect assembly code from functionally-correct source code [29,36,39]. Worse yet, since compiler optimizations do not generally even purport to preserve security [23], compilers can generate assembly code vulnerable to side-channel attacks from carefully scrutinized source code. The prevailing approach to eliminate trust in the compiler is to use certified compilation, see e.g. [30]. Informally, a certified compiler is a compiler in which each compiler pass is augmented with a formal proof asserting that it preserves the behavior of programs. This entails that functionally-correct source code is compiled into functionally-correct assembly. While certified compilation is a promising approach to guarantee functional correctness, it entails key limitations described next. These limitations are particularly-relevant for cryptographic code, where sub-optimal execution time can be a deal-breaker for code deployment. First, the optimizations supported by currently-available certified compilers do not produce code competitive with that produced by the most-aggressively-optimizing (non-certifying) compilers. Indeed, the certification requirement limits the set of optimizations that can currently be performed. For example, global value numbering is an optimization that is missing from most compilers; see however [13]. The preeminent certified compiler, CompCert [30], generates code whose performance typically lies between GCC’s -O1 and -O2 optimization levels. While this is an impressive achievement, and arguably a small price to pay for provably-correct optimizations, the efficiency requirements on implementations of cryptographic primitives may necessitate more aggressive optimizations. Second, the current certified compilers do not apply to the most efficient implementations. This is primarily due to two factors: the presence of special compiler intrinsics and inline assembly code, and the incompleteness of automatic assembly-level verification. For example, many implementations invoke compiler intrinsics corresponding to sse bit-wise and word-shuffling instructions. Therefore, a last line of work is to develop certified compilers for high speed cryptography. Contributions. In this paper we further develop the route of obtaining competitive high-assurance cryptography implementations via general-purpose certified compilers. We present a new version of CompCert that significantly narrows the

Certified Compilation for Cryptography

109

measured performance gap and implements a static analysis capable of verifying constant-time for CompCert-generated pre-assembly code. This new certified compiler comes with the ability to handle the special compiler intrinsics used in implementations of cryptographic implementations. To generate performance measurements, we consider the supercop benchmarking framework, which includes an extensive repository of implementations of cryptographic primitives. We pose that the performance penalty incurred by formal verification should be measured with respect to the fastest known implementations, even if these are hand-crafted directly at the assembly level and validated using heuristic means. Thus for each cryptographic primitive P we consider the performance penalty of a given compiler C to be the ratio of the performance of the best-performing implementation of P which is actually able to be compiled with C against by the best-performing implementation of P compiled with any compiler. In particular these ratios can grow either because C produces less-optimal code than some other compiler, or because C cannot process a given implementation, e.g., due to special compiler intrinsics or inlined assembly. Our findings are that the average performance penalties due to certified compilation lie between factors of 16 and 21, depending on the version of CompCert. For several primitives, we observe penalties in the range between two and three orders of magnitude, although the majority of implementations compensates for these degenerate cases. Next, we turn to side-channel protection. The gold standard for cryptographic implementations is the constant-time property, whereby programs’ memory access behaviors are independent from secret data—for both code and data memory. In other words, for fixed public data such as input length, the memory locations accessed by loads and stores, including instruction loads, is fixed. The constant-time property is a countermeasure against timing-attacks, whereby attackers learn secrets by measuring execution time variations, e.g. due to cache behavior. While this verification traditionally amounted to manual scrutiny of the generated assembly code by cryptographic engineers, recent work has shown progress in automatic verification to alleviate this burden [3,6,7,20,22,38]. The prevailing trend in this line of work is to carry verification of side-channel protection and this is typically performed towards the end of the compilation chain or directly on the generated assembly code. This is because even certified compilers may not preserve countermeasures against side-channel attacks – a notable exception being the CompCertCT compiler that has been formally proved to preserve these countermeasures [12]. Following this trend, our version of CompCert includes an intrinsics-aware constant-time verifier, following the type-checking approach at Mach level of [11]. The reason for focusing on the Mach intermediate language is that, although it is very close to assembly, is more suitable for analysis. The type system is described as a data-flow analysis, which keeps track of secret-dependent data and rejects programs that potentially use this data to violate the constanttime property. Because our type system is able to check programs that rely on instruction extensions, we are able to compile C code into functionally correct and verifiably constant-time implementations offering unprecedented efficiency. As an example, we can verify an implementation of aez relying on aes-ni that executes

110

J. B. Almeida et al.

100 times faster than the fastest implementation that could be compiled with the original version of CompCert.

2

Related Work

Our work follows prior work in verified compilers and computer-aided cryptography. We refer the interested reader to [10] for an extensive recent review of the state of the art in computer-aided cryptography, namely the different techniques and tools for functional correctness and constant-time verification. We focus here on closely related works on (secure) verified compilation to cryptography. The earlier applications of verified compilers to cryptographic implementations were inspired by CompCert [30], a moderately optimizing verified compiler for a significant fragment of Almeida et al. [4] leverage the correctness theorem of CompCert to derive both functional correctness and side-channel resistance (in the program counter model [32]) for an implementation of rsa-oaep encryption scheme as standardized in pkcs#1 v2.1. In a subsequent, related work, Barthe et al. [11] build a verified static analysis at pre-assembly level for cryptographic constant-time, ensuring that programs which pass the analysis do not branch on secrets and do not perform secret-dependent memory accesses, guaranteeing that such implementations are protected (to some extent) against cache-based side-channel attacks; moreover, they use their static analysis to validate several implementations from the NaCl library. Our work builds on this development. More recently, Almeida et al. [5] propose a general methodology for carrying provable security of algorithmic descriptions of cryptographic constructions and functional correctness of source-level implementations to assembly-level implementations, and for proving that assembly-level implementations are constanttime; moreover, they apply their methodology to an implementation of mee-cbc. In parallel, Appel et al. [8,9] have developed general-purpose program logics to reason about functional properties of source-level programs, and applied these program logics to prove functional correctness of a realistic sha-256 implementation; in a follow-up work, Beringer et al. [15] combine the Foundational Cryptography Framework of [33] to build a machine-checked proof of the (elementary) reductionist argument for hmac. Fiat-Crypto [24] is a recently proposed framework for the development of high-speed functionally correct implementations of arithmetic libraries for cryptography. Certified compilation is used to convert a high-level algebraic specification of the library functionality into C code that can subsequently be compiled into executable code. Our approach is complementary to Fiat-Crypto, in that our verified compiler can be used to compile the generated C code, carrying the functional correctness and constant-time guarantees to low level code, including code that relies on intrinsics. Hacl∗ [41] is a library of formally verified, high-speed implementations. Hacl∗ is included in recent versions of Mozilla Firefox’s NSS security engine. It has recently been extended to vectorized implementations [34]. F∗ programs from Hacl∗ library can be compiled into C code using KreMLin [35] and then compiled into assembly, for instance, using CompCert. Kremlin is a high-assurance

Certified Compilation for Cryptography

111

compiler/extractor tool that generates provably correct C code from F∗ specifications. Additionally KreMLin is the tool used to generate the TLS1.3 code verified in Project Everest.1 Again, our verified compiler can be used to compile C code relying on intrinsics generated by KreMLin with the guarantee that functional correctness is preserved. Finally, we mention work on Jasmin [3,6], a new pre-assembly language for the implementation of high-assurance crypto code. Jasmin comes with a verified compiler that is guaranteed to preserve, not only functional correctness but also source-level constant-time properties [12,14] The Jasmin language also supports intrinsics and has been shown to give rise to competitive assembly implementations of cryptographic libraries. Our work provides an alternative to Jasmin, and indeed direct assembly verification using e.g., Vale [18], when functionally correct implementations are generated at C level. Very recently, Fromherz et al. [25] demonstrated the feasibility of formally verifying C code with inlined assembly. These lines of work are complementary to our own.

3

Background on x86 Instruction Extensions

x86 is a family of instruction sets that dates back to the Intel 8086/8088 processors launched of the late 70s. Throughout the years, successor Intel processors (and compatible models from other manufacturers, namely amd) evolved from 16-bit to 32-bit architectures. The x86 designation is typically used to refer to instruction sets that are backward compatible with this family of processors. In this paper we will use x86 to (loosely) refer to the set of instructions that is supported transparently by most compilers that claim to support x86-compatible 32-bit architectures. We will use amd64 to refer to x86 extensions for 64-bit architectures, which we do not address specifically in this paper (although we present some data that permits evaluating what is lost by imposing this restriction). In addition to the core x86 instructions, some architectures support additional domain-specific instruction sets, so-called instruction extensions. We will describe the instruction extensions introduced by Intel in the x86 architecture, since these are the ones that we target in this work. Intel introduced mmx in 1997, which included eight 64-bit registers (called mmx0-mmx7) in 32-bit machines and allowed for single instruction, multiple data (simd) operations over these registers for integer values of sizes 8 to 64. A simd instruction permits computing the same operation simultaneously over several values, e.g., by seeing a 64-bit register as two 32-bit values. In mmx the new 64-bit registers were overlapped with floating-point registers, which was a limitation. The Streaming simd Extensions (sse) introduced in 1999 removed this limitation by introducing eight 128-bit registers (xmm0-xmm7) that could be used as four single-precision floating point registers in the simd style. The sse2 improvement introduced in 2001 provided a better alternative to the original mmx extension2 by allowing the xmm registers to be also used to process integer 1 2

https://project-everest.github.io. In some cases relying on mmx in parallel to sse can give a performance advantages.

112

J. B. Almeida et al.

data of various sizes. For this reason we do not consider the original mmx extensions in our tools. Subsequent sse3 and sse4 extensions increased the number of operations that can be performed over the xmm registers. More recently, Intel launched the Advanced Vector Extensions (avx and avx2) that introduced 256-bit registers and instructions to allow simd computations over these registers. Our tools do not yet provide support for the avx extensions. Another important class of instruction extensions are those associated with cryptographic computations. Intel added support for hardware-assisted aes computations in 2011 (aes-ni), and announced extensions for the sha hash function family in 2013.3 Intrinsics. Instruction extensions are usually domain-specific, and they provide relatively complex operations that should be directly available to the programmer, even if the programmer is using a high-level language such as C. For this reason, compilers typically provide a special mechanism to allow a programmer to specifically request the usage of an extended instruction; this mechanism is typically called an intrinsic. Intrinsics in C compilers such as gcc and clang are simply special function names and data types that are handled by the compiler in a different way to normal function declarations/definitions; for the most part, usage of these special functions is passed transparently through various compiler passes, and eventually translated into a single assembly instruction.

4

Adding x86 Instruction Extensions to CompCert

Our extension to CompCert was adapted from the 2.2 distribution of CompCert,4 and it focuses only on the part of the distribution that targets the ia32 architecture. There is no particular reason for our choice of CompCert version, except that this was the most recent release when our project started. Equivalent enhancements can be made to more recent versions of CompCert with some additional development effort. Relevant CompCert features. The architecture of CompCert is depicted in Fig. 1. We follow [30] in this description. The source language of CompCert is called CompCertC, which covers most of the iso C 99 standard and some features of iso C 2011 such as the Alignof and Alignas attributes. Some features of C not directly supported in CompCertC v2.2, such as structreturning functions, are supported via rewriting from the C source during parsing. The semantics of CompCertC is formalized in Coq and it makes precise many behaviors unspecified or undefined in the C standard, whilst assigning other undefined behaviours as “bad”. Memory is modeled as a collection of byteaddressable disjoint blocks, which permits formalizing in detail pointer arithmetic and pointer casts. CompCert gradually converts the CompCertC input down to assembly going through several intermediate languages. Parts of CompCert are not implemented 3 4

We did not have access to a machine running sha instruction extensions. http://compcert.inria.fr/.

Certified Compilation for Cryptography

113

Fig. 1. CompCert architecture.

directly in Coq. These include the non-certified translator from C to CompCertC and the pretty-printer of the assembly output file. Additionally, some internal transformations of the compiler, notably register allocation, are implemented outside of Coq, but then subject to a translation validation step that guarantees that the transformation preserves the program semantics. The front-end of the compiler comprises the translations down to Cminor: this is a typeless version of C, where side-effects have been removed from expressionevaluation; local variables are independent from the memory model; memory loads/stores and address computations are made explicit; and a simplified control structure (e.g. a single infinite loop construct with explicit block exit statements). The backend starts by converting the Cminor program into one that uses processor specific instructions, when these are recognized as beneficial, and then converted into a standard Register Transfer Language (RTL) format, where the control-flow is represented using a Control Flow Graph (cfg): each node contains a machine-level instruction operating over pseudo-registers. Optimizations including constant propagation and common sub-expression elimination are then carried out in the RTL format, before the register allocation phase that produces what is called a LTL program: here pseudo-registers are replaced with hardware registers or abstract stack locations. The transformation to Linear format linearizes the CFG, introducing labels and explicit branching. The remaining transformation steps comprise the Mach format that deals with the layout of stack frames in accordance to the function calling conventions, and the final Asm language modeling the target assembly language.

114

J. B. Almeida et al.

The generation of the executable file is not included in the certified portion of CompCert – instead, the Asm abstract syntax is pretty-printed and the resulting programs is assembled/linked using standard system tools. Semantic preservation. CompCert is proven to ensure the classical notion of correctness for compilers known as semantic preservation. Intuitively, this property guarantees that, for any given source program S, the compiler will produce a compiled program T that operates consistently with the semantics of S. Consistency is defined based on a notion of observable behaviour of a program, which captures the interaction between the program’s execution and the environment. Let us denote the evaluation of a program P over inputs p, resulting in outputs o and observable behaviour B as P ( p) ⇓ (o, B). Then, semantic preservation can thus be written as ∀B, p, o, T ( p) ⇓ (o, B)

=⇒

S( p) ⇓ (o, B)

meaning that any observable behaviour of the target program is an admissible observable behaviour of the source program. Observable behaviours in CompCert are possibly infinite sequence of events that model interactions of the program with the outside world, such as accesses to volatile variables, calls to system libraries, or user defined events (so called annotations). High-level view of our CompCert extension. Our extension to CompCert is consistent with the typical treatment of instruction extensions in widely used compilers such as gcc: instruction extensions appear as intrinsics, i.e., specially named functions at source level. Calls to intrinsics are preserved during the first stages of the compilation, and eventually they are mapped into (typically) one assembly instruction at the end of the compilation. Intrinsic-specific knowledge is added to the compiler infrastructure only when this is strictly necessary, e.g., to deal with special register allocation restrictions; so transformations and optimizations treat intrinsic calls as black-box operations. We have extended CompCert with generic intrinsics configuration files. Our current configuration was automatically generated from the gcc documentation5 and the machine-readable x86 assembly documentation from x86asm.net.6 This configuration file allows the CompCert parser to recognize gcc-like intrinsics as a new class of built-in functions that were added to the CompCert semantics. For this, we needed to extend the core libraries of CompCert with a new integer type corresponding to 128-bit integers; in turn this implies introducing matching changes to the various intermediate languages and compiler passes to deal with 128-bit registers and memory operations (e.g., a new set of alignment constraints; calling conventions; etc.). The new built-ins associated with intrinsics are similar to other CompCert builtins, apart from the fact that they will be recognized by their name, and they may carry immediate arguments (i.e., constant arguments that must be known at compile-time, and are mapped directly to the generated assembly code). These extended built-in operations are propagated down 5 6

http://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html. http://ref.x86asm.net.

Certified Compilation for Cryptography

115

to assembly level, and are replaced with the corresponding assembly instructions at the pretty-printing pass. All changes were made so as to be, as much as possible, intrinsics-agnostic, which means that new instruction extensions can be added simply by modifying the configuration file. Overall, the development modified/added approx. 6.3k lines of Coq and ML, spread among 87 files from the CompCert distribution. We now present our modifications to CompCert in more detail. Modifications to the CompCert front-end. Modifications at the compiler front-end are generally dedicated to making sure that the use of intrinsics in the source file are recognized and adequately mapped into the CompCertC abstract syntax tree, and that they are subsequently propagated down to the Cminor level. This includes modifications and extension to the C parser to recognize the gccstyle syntax extensions for simd vector types (e.g., the vector size attribute), as well as adapted versions of intrinsics header files giving a reasonable support for source-level compatibility between both compilers. These header files trigger the generation of the added builtins, whose specification is included on the configuration file. For each new builtin, the following data is specified: – the function identifier that is used to recognized the intrinsic by name; – the signature of the intrinsic, i.e., the types of the input parameters and return type; – an instruction class identifier that is used to group different intrinsics into different sets that can be activated/deactivated for recognition in different platforms (this is linked to a set of command-line option switches); – the assembly instruction(s) that should be used when pretty-printing an assembly file in which that particular built-in operation appears; – a Boolean value indicating whether the associated assembly instruction is two-address, which is relevant for register allocation later on. Translation into CompCertC maps all vector types/values into a new 128-bit scalar type. Subsequent transformations were extended to support this data type. Modifications to the CompCert backend. The most intrusive modifications to CompCert were done at the back-end level, most prominently in the register allocation stage. CompCert uses a non-verified graph-coloring algorithm to compute a candidate register allocation, whose output is then checked within Coq for correctness. We added the eight 128-bit xmm register-bank to the machine description, taking into account that floating point operations in CompCert were already using 64-bit half of these registers. This implied extending the notion of interference used during register allocation and adapting the corresponding proof of correctness. During the constructions of the stack-frame layout, the calling convention for vector parameters/return-values was implemented supporting up to 4 parameters and the return-value passed on registers. The final component of our extensions was the addition to the assembly pretty-printer, supporting a flexible specification of the code to be produced by each built-in. Consequences for semantics preservation. Our new version of CompCert comes with an extended semantics preservation theorem that has essentially

116

J. B. Almeida et al.

the same statement as the original one. The difference resides in the fact that the machine model now explicitly allows built-in functions to manipulate 128-bit values. Note that, although we did not add a detailed formalization of the semantics of all instruction extensions, this is not a limitation when it comes to the correctness of the compiler itself: indeed, our theorem says that, whatever semantics are associated by a machine architecture to a particular extended instruction, these will have precisely the same meaning at source level. This is a powerful statement, since it allows us to deal with arbitrary instruction extensions in a uniform way. Such detailed semantics would be important if one wished to reason about the meaning of a program at source level, e.g., to prove that it computes a particular function. In these cases a formal semantics can be given just for the relevant instructions.

5 5.1

Experimental Results Coverage

supercop (System for Unified Performance Evaluation Related to Cryptographic Operations and Primitives) is a toolkit for measuring the performance of cryptographic software. In this work we looked at version 20170105 of the toolkit. supercop evaluates cryptographic primitives according to several criteria, the most prominent of which for the purpose of this work is execution time. A supercop release contains all the machinery required to carry out performance evaluation in a specific computer; any computer can be used as a representative for a particular architecture and collected data can be submitted to a central repository of collected benchmarks. The implementations contained in a supercop release are organized in three hierarchical levels. At the top level reside so-called operations these correspond to the algorithms that constitute an abstract interface to a cryptographic component; for example, for digital signatures the operations comprise key generation, signature generation and signature verification. For each cryptographic component there can be many instantiations. These are called primitives in supercop. Different primitives will provide different tradeoffs; for examples some primitives may rely on standardised components such as aes, which may be a requirement for some applications, whereas other primitives may offer better performance by relying on custom-designed stream ciphers. Finally, for each primitive there may be many implementations, e.g., targetting different architectures, with different side-channel countermeasures, or simply adopting different implementation strategies. In total, the supercop release we considered contained 2153 such implementations for 593 primitives. In Table 1 we present a more detailed summary of these counts, focusing on some interesting categories for this work. In particular, we detail the following successive refinements of the original implementation set: i. the number of implementations that target the x86-32 architecture (x86), which we identified by excluding all implementations that explicitly indicate a different target architecture; ii. how many of the above implementations remain (x86-C) if we exclude those that are given (even partially) in languages other than C, such as assembly; and iii. how many of these use instruction extensions

Certified Compilation for Cryptography

117

Table 1. supercop implementation histogram. Operations

x86

x86-C x86-ext amd64

aead

644

548

118

660

auth

19

19

6

19

box

2

2

0

2

core

25

25

0

29

123

17

11

185

dh encrypt hash hashblocks

13

11

4

13

550

464

144

664

16

15

5

21

onetimeauth

9

2

0

11

scalarmult

7

5

0

14

secretbox

2

2

0

2

sign stream verify Total count

62

47

11

65

168

95

31

228

3

3

0

3

1643 1255

330

1916

(x86-ext) such as those described in Sect. 3. Additionally, we give an implementation count that extends the x86 one by including also implementations that explicitly target 64-bit architectures (amd64); this gives an idea of how much coverage is lost by restricting attention to 32-bit architectures. One can see that 330 implementations resorting to x86 instruction extensions can be found in supercop, corresponding to 168 primitives—out of a total of 576 primitives that come equipped with a x86 implementation. This set of primitives represents the universe over which the new formal verification tools that we put forth in this work will provide benefits over pre-existing tools. Before moving to this detailed analysis, we conclude this sub-section with a high-level view of the data we collected in supercop that permits comparing certified compilers to general-purpose compilers. This statistic is a byproduct of our work and we believe it may be of independent interest, as it gives us an indication of what the state-of-the-art in certified compilation implies for cryptography. Table 2 gives coverage statistics, i.e., how many implementations each compiler was able to successfully convert into executable code accepted by supercop in the machine we used for benchmarking. This machine has the following characteristics: Intel Core i7-4600U processor, clocked at 2.1 GHz, with 8 Gb of RAM, running Ubuntu version 16.04. We note that supercop exhaustively tries many possible compilation strategies for each compiler in a given machine. The baseline here corresponds to implementations in the set tagged as x86-C in Table 1 that were successfully compiled with gcc version 5.4.0 or clang version 3.8.0. The apparent discrepancy (1020 versus 1643) to the number of possible

118

J. B. Almeida et al. Table 2. supercop coverage statistics for various compilers. Architecture operations x86-32

amd64

Baseline ccomp-2.2 ccomp-ext ccomp-3.0 Baseline ccomp-3.0 aead

343

258

290

178

506

269

auth

16

10

10

8

19

10

box

2

2

2

2

2

2

core

21

25

25

25

29

25

dh

2

2

2

2

7

3

encrypt

4

5

1

5

5

6

471

323

356

239

562

380

hash hashblocks

12

8

8

8

16

11

onetimeauth

5

5

5

6

7

7

scalarmult

6

6

6

6

13

9

secretbox

2

2

2

2

2

2

sign stream verify Total count

12

0

0

2

18

3

121

91

114

91

152

19

3

3

3

3

3

3

1020

740

824

577

1341

749

x86 implementations indicated in Table 1 is justified by the fact that some implementations omit the target architecture and incompatibility with x86 is detected only at compile-time.7 In the table, ccomp refers to CompCert and ccomp-ext refers to the CompCert extension we presented in Sect. 4. One important conclusion we draw from this table is that, at the moment, the version of CompCert we present in this paper has the highest coverage out of all certified compiler versions, due to its support for intrinsics. Nevertheless, we still do not have full coverage of all intrinsics, which justifies the coverage gap to the baseline. In particular, we do not support the m64 mmx type nor avx operations, as mentioned in Sect. 3. Furthermore, we do not use any form of syntactic sugar to hide the use of intrinsics, e.g., allowing xor operations (ˆ) over 128-bit values, which is assumed by some implementations fine-tuned for specific compilers. 5.2

Methodology for Performance Evaluation

We will be measuring and comparing performance penalties incurred by using a particular compiler. These penalties originate in two types of limitations: i. the compiler does not cover the most efficient implementations, i.e., it simply does not compile them; or ii. intrinsic limitations in the optimization capabilities of the compiler. In particular, we will evaluate the trade-off between assurance and performance when compiling cryptographic code written in C for different versions of CompCert. Our metric will be based on average timing ratios 7

The degenerate red value in the table is caused by implementations that use macros to detect intrinsic support; ccomp-ext activates these macros, but then launches an error in a gcc-specific cast.

Certified Compilation for Cryptography

119

with respect to a baseline measurement. In all cases, the timing ratio is always reported to the fastest implementation overall, often given in assembly, as compiled by a non-verified optimizing compiler in the best possible configuration selected by supercop. We now detail how we compute our metrics. Performance Metrics. We consider each supercop operation separately, so let us fix an arbitrary one called o ∈ O, where O is the set of all operations in supercop. Let C be the set of compilation tools activated in supercop and P (o) a set of primitives that realize o. Denote I(p) as the set of all implementations provided for primitive p ∈ P (o). Let also tpC denote the fastest timing value reported by supercop over all implementations i ∈ I(p), for primitive p ∈ P (o), when compiled with all of the compilers in C. Note that, if such a value tpC has been reported by supercop, then this means that at least one implementation i ∈ I(p) was correctly compiled by at least one of the configured compilers in C. Furthermore, tpC corresponds to the target code that runs faster over all the implementations given for p, and over all compilation options that were exhaustively attempted over C. To establish a baseline, we have configured supercop with gcc version 5.4.0 and clang version 3.8.0 and collected measurements for all primitives. Let us denote this set of compilers by C ∗ . We then independently configured and executed supercop with different singleton sets of compilers corresponding to different versions of CompCert. Let us designate these by C2.2 , C3.0 and C2.2-ext , where the last one corresponds to our extension to CompCert described in Sect. 4. Again we collected information for all primitives in supercop. For a given operation o ∈ O we assess a compiler configuration C by computing average ratio: 1  tpC o · = , RC |P | tpC ∗ p∈P

where we impose that tpC and tpC ∗ have both been reported by supercop, i.e., that at least one implementation in I(p) was successfully compiled via C and one (possibly different) implementation in I(p) was successfully compiled by C ∗ . When we compare two compiler configurations C1 and C2 we simply compute o o and RC . However, in this case we first filter out any primiindependently RC 1 2 tives for which either C1 or C2 did not successfully compile any implementations. The same principle is applied when more than two compiler configurations are compared; hence, as we include more compiler versions, the number of primitives considered in the rations tends to decrease. In all tables we report the number of primitives |P | considered in the reported ratios. Finally, since we are evaluating the penalty for using certified compilers, we introduced an extra restriction on the set of selected primitives: we want to consider only the performance of implementations covered by the correctness theorems. Our approach was heuristic here: if supercop reports that the most efficient implementation compiled by a CompCert version (including our new one) includes assembly snippets, we treat this primitive as if no implementation was successfully compiled.

120

J. B. Almeida et al.

The cost of certified compilation. If one looks at the performance penalty per operation for CompCert version 2.2 and version 3.0, as detailed above, and take the average over all operations, the we obtain a factor of 3.34 and 2.58, respectively. Note that, in primitives such as AES-GCM, the timing ratio is huge and can reach 700-fold because baseline implementations use AES-NI, while CompCert is generating code for AES. Nevertheless, these findings are consistent with what is usually reported for other application domains, and it does show that CompCert version 3.0 has significantly reduced the performance penalty when compared to previous versions. Note, however, that this is at the cost of a reduction in coverage (cf. Table 2). More recent versions of CompCert that we have benchmarked using a different set-up confirm a gradual improvement in the optimization capabilities of the compiler. 5.3

Performance Boost from Certified Intrinsics-Aware Compilation

In this section we measure the performance improvements achieved by our new version of CompCert supporting instruction extensions. Table 3 shows two views of the collected results: the top table compares three versions of CompCert, whereas the bottom table compares only the vanilla version of CompCert 2.2 with our extended version of it. In the bottom table we list only the lines where the set of considered primitives differs from the top table. The results speak for themselves: for operations where a significant number of primitives come equipped with an intrinsics-relying implementation, the performance penalty falls by a factor of 5 when comparing to CompCert 2.2, and a factor above 3 when comparing to CompCert 3.0. In Table 3 we are including primitives for which no implementation relying on instruction extensions is given. In that case our new version of CompCert does not give an advantage, and so the performance gain is diluted. To give a better idea of the impact for primitives where instruction extensions are considered, we present in Table 4 the average ratios that result from restricting the analysis to primitives where instruction extensions are used. These results show that, as would be expected, intrinsics-based implementations allow a huge speed-up when compared to implementations in plain C. The most significant improvements are visible in the aead operations, where one important contributing factor is the enormous speed boost that comes with relying on an aes hardware implementation, rather than a software one.

6

An Intrinsics-Aware Constant-Time Checker

We now address two limitations of existing approaches to verifying constanttime implementations. The first limitation is the lack of support for instruction extensions. The second limitation is that, if one is looking to use a certified compiler that is not guaranteed to preserve the constant-time property, then using a constant-time verifier at source level does not guarantee constant-time at the target level. We integrated a new constant-time verification tool into the

Certified Compilation for Cryptography

121

Table 3. Performance ratios aggregated by instantiated operation.

extended version of CompCert that we introduced in Sect. 4 and it follows the type-checking approach at Mach level of [11]. The reason for focusing on the Mach intermediate language is that, although it is very close to assembly, is more suitable for analysis. The checker operates in three steps. First a value analysis computes an overapproximation of the values of the pointers: this is key for the precision of the checker when the program to verify stores sensitive data into memory. Then, a type system infers what are the run-time values that may depend on sensitive data. Finally, the policy checker validates that neither the control-flow nor the memory accesses depend on sensitive data. Type system overview. The type system assigns a security level at each program point and in each calling context to each register and memory location— collectively called locations. Here, a calling context is a stack of call sites. Security levels are taken in the usual security lattice with two points High and Low. Locations are labeled High at a particular program point and calling context if they may hold a value that depends on secret data whenever execution reaches

122

J. B. Almeida et al.

Table 4. Performance ratios aggregated by instantiated operation, restricted to primitives including at least one implementation relying on instruction extensions.

that point in that context. The ones that are labeled Low are guaranteed to always hold values derived from public data only. The type system is described as a data-flow analysis. Typing rules describe how the type information evolves when executing a single instruction. For instance, for an arithmetic operation like x = y + z; the corresponding rule mandates that the security level of x, after the execution of this instruction, should be, at least, the least upper bound of the security levels of y and z. Finally, rules for instructions that manipulate the memory rely on an external points-to information to resolve the targets of pointers. As an example, the rule for instruction x = *p; states that the security level of x after the instruction is above the security levels of all memory cells that may be targeted by pointer p. Note that the type-system applies to whole programs, rather than to individual functions; therefore a typing derivation actually unfolds the call-graph and it cannot be used in the presence of recursion. The implementation of this type system relies on the generic implementation of Kildall’s algorithm in CompCert [30]. Once a typing derivation is found for a function, we check that the inferred type is consistent with the constant-time policy. For instance, the type information before a branch if(b) . . . else . . . should be such that all locations involved in condition b have security level Low. Furthermore, the security level of all pointers that are used in memory accesses are required to be Low. Program analysis required for type-checking. Our type-system relies on a general purpose value analysis that is targeted to the inference of pointsto information. It builds a flow-sensitive and context-sensitive approximation of the targets of pointers. However, in low-level languages, the boundary between pointer arithmetic and other computations is blurred. We thus need an analysis that can precisely cope with bit-vector arithmetic so as to infer precise approximations of the pointer offsets. Our implementation builds on the ideas present in

Certified Compilation for Cryptography

123

the Verasco static analyzer [28]. On one hand, we reuse one of its non-relational numerical abstract domains that is suitable for the analysis of pointer offsets [16]. On the other hand, we implemented a memory abstract domain similar to Verasco’s [17]. The result of the analysis is computed by iterating the abstract semantics of the program until a fixed point is found. It uses Bourdoncle algorithm [19] to build, for each function, an iteration strategy; when encountering a function call, the called function is analyzed in the current state through a recursive call of the analyzer, effectively unfolding the call-graph, for maximal precision. Between the memory abstract domain and the iterator, we squeezed in a trace partitioning domain [31,37] that is dedicated to the full unrolling of array initialisation loops. This domain is driven by annotations in the source code: the programmer must indicate that the loop should be fully unrolled in order to take advantage of the added precision of this analysis. Support for intrinsics. Handling the intrinsic instructions in the analyses needs special care. To keep the analyses general (i.e., not tied to a specific instruction set), the type-system relies on an external oracle that classifies every built-in call in one of the following categories: pure, memory load, and memory store. This oracle is trusted and is built on the configuration files described in Sect. 4. A call to a built-in is pure when it has no effect beyond explicitly writing to some registers. Moreover, the security level of the result is the least upper bound of the levels of its arguments. For instance the call y = mm and si128(x, mask); which computes the bitwise logical AND of its 128bit arguments, is pure: its effect is limited to writing to the y variable. Also, the content of the y variable will be considered public only if the two arguments are public. The builtins that belong to the memory load category are the ones which treat one of their arguments as a pointer and read memory through it. They need to be treated as a memory load in the constant-time analysis. For instance, the call v = mm loadu si128(p); is classified as a load through pointer p. Therefore, to comply with the constant-time policy, the value of this pointer must have the Low security level. Finally, the built-ins in the memory store category are the ones that write to the memory, and must be treated as such in both analyses. For instance, the call mm storeu si128(p, x); is classified as a memory store of the value of x to the address targeted by p. Enhancements with respect to [11]. This work improves the checker for constant-time of Barthe et al. [11] in several ways. First the value analysis is much more precise than their alias type-system: not only it is inter-procedural and context-sensitive, but it also finely analyzes the pointer offsets, so that the typesystem for constant-time can cope with memory blocks that hold high and low data at the same time (in disjoint regions of the block). In particular, this means that local variables that may hold sensitive data need not be promoted to global variables. Second, our checker is inter-procedural, therefore can analyze programs with functions without a complete inlining prior to the analysis. Finally, our analyses soundly and precisely handle compiler intrinsics.

124

J. B. Almeida et al.

An example aez. [27] is an authenticated encryption scheme that was designed with the use of hardware support for aes computations in mind. The implementations for aez included in supercop comprise both reference code written purely in C, and high-speed code relying on aes-ni extensions. Our experiments in running CompCert 2.2 and CompCert with intrinsics support over aez indicates that the ratio with respect to the non-verified baseline compilation is over 700 in the case of the former and drops to roughly 7 when intrinsics support is added. We ran our new type-system over the aez implementation, and we found a constant-time violation, albeit a benevolent one. The code causing the violation is the following: if (d&& !is zero(vandnot(loadu(pad+abytes), final0))) return -1; This is part of the AEZCore implementation for the decryption operation: it checks whether the correct padding has been returned upon decryption and immediately exits the function if the check fails. Strictly speaking this is a violation of the constant-time policy, as the inverted value depends on the secret key. However, this violation can be justified down to the fact that the result of the check will be made publicly available anyway. Rather than doing this, we modified the aez implementation so as to store the result of the check and return it only after all subsequent operations are carried out. The modified code was accepted by our type-checker. As a result, we obtain a verifiably correct and verifiably constant-time implementation of aez. The combined level of assurance and speed of this implementation is unprecedented and is only possible due to the guarantees provided by the tools presented in this paper.

7

Conclusions and Upcoming Developments

Our work initiates a systematic study of the coverage of formal methods tools for cryptographic implementations, and develops generalizations of the CompCert verified compiler and of a constant-time static analysis to accommodate intrinsics. The statistics are encouraging, but there is significant room for achieving further coverage. The development is available at https://github.com/haslab/ccomp-simd. We are currently porting our work to version 3.7 of CompCert, which will allow us to benefit from numerous new features that have been added since. Most notably, support to 64 bit architectures (in particular amd64), which by itself widens the applicability of the tool, and opens the way to support intrinsics for new vector extensions such as avx, avx2 and avx-512. Finally, we are also updating our benchmarking set-up to the most recent versions of supercop, GCC and clang. We do not expect the main conclusions to change, but the number of assessed implementations will grow significantly. Acknowledgements. This work is financed by National Funds through the FCT Funda¸ca ˜o para a Ciˆencia e a Tecnologia (Portuguese Foundation for Science and Technology) within the project PTDC/CCI-INF/31698/2017, and by the Norte Portugal

Certified Compilation for Cryptography

125

Regional Operational Programme (NORTE 2020) under the Portugal 2020 Partnership Agreement, through the European Regional Development Fund (ERDF) and also by national funds through the FCT, within project NORTE-01-0145-FEDER-028550 (REASSURE).

References 1. Albrecht, M.R., Paterson, K.G.: Lucky microseconds: a timing attack on Amazon’s s2n implementation of TLS. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 622–643. Springer, Heidelberg (2016). https://doi.org/ 10.1007/978-3-662-49890-3 24 2. AlFardan, N.J., Paterson, K.G.: Lucky thirteen: breaking the TLS and DTLS record protocols. In: IEEE Symposium on Security and Privacy, SP 2013, pp. 526–540. IEEE Computer Society (2013) 3. Almeida, J.B., et al.: Jasmin: high-assurance and high-speed cryptography. In: Thuraisingham, B.M., Evans, D., Malkin, T., Xu, D. (eds.) Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, 30 October–03 November 2017, pp. 1807–1823. ACM (2017) 4. Almeida, J.B., Barbosa, M., Barthe, G., Dupressoir, F.: Certified computer-aided cryptography: efficient provably secure machine code from high-level implementations. In: ACM CCS (2013) 5. Almeida, J.B., Barbosa, M., Barthe, G., Dupressoir, F.: Verifiable side-channel security of cryptographic implementations: constant-time MEE-CBC. In: Peyrin, T. (ed.) FSE 2016. LNCS, vol. 9783, pp. 163–184. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-52993-5 9 6. Almeida, J.B., et al.: The last mile: high-assurance and high-speed cryptographic implementations. In: 2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, 18–21 May 2020, pp. 965–982. IEEE (2020) 7. Almeida, J.C.B., Barbosa, M., Barthe, G., Dupressoir, F., Emmi, M.: Verifying constant-time implementations. In: 25th USENIX Security Symposium (USENIX Security 2016), Austin, TX, August 2016. USENIX Association (2016) 8. Appel, A.W.: Program Logics - For Certified Compilers. Cambridge University Press, Cambridge (2014) 9. Appel, A.W.: Verification of a cryptographic primitive: SHA-256. ACM Trans. Program. Lang. Syst. 37(2), 7:1–7:31 (2015) 10. Barbosa, M., et al.: SoK: computer-aided cryptography. In: IEEE Symposium on Security and Privacy (SP). IEEE (2021). https://oaklandsok.github.io/papers/ barbosa2021.pdf 11. Barthe, G., Betarte, G., Campo, J.D., Luna, C., Pichardie, D.: System-level noninterference for constant-time cryptography. In: ACM SIGSAC Conference on Computer and Communications Security, CCS 2014. ACM (2014) 12. Barthe, G., et al.: Formal verification of a constant-time preserving C compiler. Proc. ACM Program. Lang. 4(POPL), 7:1–7:30 (2020) 13. Barthe, G., Demange, D., Pichardie, D.: Formal verification of an SSA-based middle-end for CompCert. ACM Trans. Program. Lang. Syst. (TOPLAS) 36(1), 4 (2014) 14. Barthe, G., Gr´egoire, B., Laporte, V.: Secure compilation of side-channel countermeasures: the case of cryptographic “constant-time”. In: 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, 9–12 July 2018, pp. 328–343. IEEE Computer Society (2018)

126

J. B. Almeida et al.

15. Beringer, L., Petcher, A., Ye, K.Q., Appel, A.W.: Verified correctness and security of OpenSSL HMAC. In: Jung, J., Holz, T. (eds.) 24th USENIX Security Symposium, USENIX Security 2015, Washington, D.C., USA, 12–14 August 2015, pp. 207–221. USENIX Association (2015) 16. Blazy, S., Laporte, V., Maroneze, A., Pichardie, D.: Formal verification of a C value analysis based on abstract interpretation. In: Logozzo, F., F¨ ahndrich, M. (eds.) SAS 2013. LNCS, vol. 7935, pp. 324–344. Springer, Heidelberg (2013). https://doi.org/ 10.1007/978-3-642-38856-9 18 17. Blazy, S., Laporte, V., Pichardie, D.: An abstract memory functor for verified C static analyzers. In: Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, (ICFP), pp. 325–337 (2016) 18. Bond, B., et al.: Vale: verifying high-performance cryptographic assembly code. In: Proceedings of the 26th USENIX Conference on Security Symposium, SEC 2017, USA, pp. 917-934. USENIX Association (2017) 19. Bourdoncle, F.: Efficient chaotic iteration strategies with widenings. In: Bjørner, D., Broy, M., Pottosin, I.V. (eds.) FMP&TA 1993. LNCS, vol. 735, pp. 128–141. Springer, Heidelberg (1993). https://doi.org/10.1007/BFb0039704 20. Cauligi, S., et al.: FaCT: a DSL for timing-sensitive computation. In: McKinley, K.S., Fisher, K. (eds.) Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, 22–26 June 2019, pp. 174–189. ACM (2019) 21. Chen, Y., et al.: Verifying curve25519 software. In: Ahn, G., Yung, M., Li, N. (eds.) Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014, pp. 299–309. ACM (2014) 22. Daniel, L.-A., Bardin, S., Rezk, T.: BINSEC/REL: efficient relational symbolic execution for constant-time at binary level. In: 2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, 18–21 May 2020, pp. 1021–1038. IEEE (2020) 23. D’Silva, V., Payer, M., Song, D.X.: The correctness-security gap in compiler optimization. In: 2015 IEEE Symposium on Security and Privacy Workshops, SPW 2015, San Jose, CA, USA, 21–22 May 2015, pp. 73–87. IEEE Computer Society (2015) 24. Erbsen, A., Philipoom, J., Gross, J., Sloan, R., Chlipala, A.: Simple high-level code for cryptographic arithmetic - with proofs, without compromises. In: 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, 19–23 May 2019, pp. 1202–1219. IEEE (2019) 25. Fromherz, A., Giannarakis, N., Hawblitzel, C., Parno, B., Rastogi, A., Swamy, N.: A verified, efficient embedding of a verifiable assembly language. In: Principles of Programming Languages (POPL 2019). ACM (2019) 26. Fu, Y., Liu, J., Shi, X., Tsai, M., Wang, B., Yang, B.: Signed cryptographic program verification with typed CryptoLine. In: Cavallaro, L., Kinder, J., Wang, X., Katz, J. (eds.) Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, 11–15 November 2019, pp. 1591–1606. ACM (2019) 27. Hoang, V.T., Krovetz, T., Rogaway, P.: Robust authenticated-encryption AEZ and the problem that it solves. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 15–44. Springer, Heidelberg (2015). https://doi.org/10.1007/ 978-3-662-46800-5 2

Certified Compilation for Cryptography

127

28. Jourdan, J.-H., Laporte, V., Blazy, S., Leroy, X., Pichardie, D.: A formally-verified C static analyzer. In: Proceedings of the 42th Symposium on Principles of Programming Languages (POPL). ACM (2015) 29. Le, V., Afshari, M., Su, Z.: Compiler validation via equivalence modulo inputs. In: O’Boyle, M.F.P., Pingali, K. (eds.) ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, Edinburgh, United Kingdom, 09–11 June 2014, pp. 216–226. ACM (2014) 30. Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107– 115 (2009) 31. Mauborgne, L., Rival, X.: Trace partitioning in abstract interpretation based static analyzers. In: Sagiv, M. (ed.) ESOP 2005. LNCS, vol. 3444, pp. 5–20. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31987-0 2 32. Molnar, D., Piotrowski, M., Schultz, D., Wagner, D.: The program counter security model: automatic detection and removal of control-flow side channel attacks. In: Won, D.H., Kim, S. (eds.) ICISC 2005. LNCS, vol. 3935, pp. 156–168. Springer, Heidelberg (2006). https://doi.org/10.1007/11734727 14 33. Petcher, A., Morrisett, G.: The foundational cryptography framework. In: Focardi, R., Myers, A. (eds.) POST 2015. LNCS, vol. 9036, pp. 53–72. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46666-7 4 34. Polubelova, M., et al.: HACL×N: verified generic SIMD crypto (for all your favorite platforms). IACR Cryptology ePrint Archive, 2020:572 (2020) 35. Protzenko, J., et al.: Verified low-level programming embedded in f* . CoRR, abs/1703.00053 (2017) 36. Regehr, J., Chen, Y., Cuoq, P., Eide, E., Ellison, C., Yang, X.: Test-case reduction for C compiler bugs. In: Vitek, J., Lin, H., Tip, F. (eds.) ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2012, Beijing, China, 11–16 June 2012, pp. 335–346. ACM (2012) 37. Rival, X., Mauborgne, L.: The trace partitioning abstract domain. ACM Trans. Program. Lang. Syst. (TOPLAS) 29(5), 1–44 (2007) 38. Rodrigues, B., Pereira, F., Aranha, D.: Sparse representation of implicit flows with applications to side-channel detection. In: Proceedings of Compiler Construction (2016) 39. Sun, C., Le, V., Zhang, Q., Su, Z.: Toward understanding compiler bugs in GCC and LLVM. In: Zeller, A., Roychoudhury, A. (eds.) Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbr¨ ucken, Germany, 18–20 July 2016, pp. 294–305. ACM (2016) 40. Yarom, Y., Genkin, D., Heninger, N.: CacheBleed: a timing attack on OpenSSL constant time RSA. In: Gierlichs, B., Poschmann, A.Y. (eds.) CHES 2016. LNCS, vol. 9813, pp. 346–367. Springer, Heidelberg (2016). https://doi.org/10.1007/9783-662-53140-2 17 41. Zinzindohou´e, J.-K., Bhargavan, K., Protzenko, J., Beurdouche, B.: HACL*: a verified modern cryptographic library. In: SIGSAC Conference on Computer and Communications Security, pp. 1789–1806 (2017)

Protocol Analysis with Time Dami´ an Aparicio-S´ anchez1 , Santiago Escobar1 , Catherine Meadows2(B) , na1 Jos´e Meseguer3 , and Julia Sapi˜ 1

3

VRAIN, Universitat Polit`ecnica de Val`encia, Valencia, Spain {daapsnc,jsapina}@upv.es, [email protected] 2 Naval Research Laboratory, Washington DC, USA [email protected] University of Illinois at Urbana-Champaign, Champaign, IL, USA [email protected]

Abstract. We present a framework suited to the analysis of cryptographic protocols that make use of time in their execution. We provide a process algebra syntax that makes time information available to processes, and a transition semantics that takes account of fundamental properties of time. Additional properties can be added by the user if desirable. This timed protocol framework can be implemented either as a simulation tool or as a symbolic analysis tool in which time references are represented by logical variables, and in which the properties of time are implemented as constraints on those time logical variables. These constraints are carried along the symbolic execution of the protocol. The satisfiability of these constraints can be evaluated as the analysis proceeds, so attacks that violate the laws of physics can be rejected as impossible. We demonstrate the feasibility of our approach by using the Maude-NPA protocol analyzer together with an SMT solver that is used to evaluate the satisfiability of timing constraints. We provide a sound and complete protocol transformation from our timed process algebra to the Maude-NPA syntax and semantics, and we prove its soundness and completeness. We then use the tool to analyze Mafia fraud and distance hijacking attacks on a suite of distance-bounding protocols.

1

Introduction

Time is an important aspect of many cryptographic protocols, and there has been increasing interest in the formal analysis of protocols that use time. Model checking of protocols that use time can be done using either an explicit time model, or by using an untimed model and showing it is sound and complete with respect to a timed model. The former is more intuitive for the user, but the latter is often chosen because not all cryptographic protocol analysis tools This paper was partially supported by the EU (FEDER) and the Spanish MCIU under grant RTI2018-094403-B-C32, by the Spanish Generalitat Valenciana under grant PROMETEO/2019/098 and APOSTD/2019/127, by the US Air Force Office of Scientific Research under award number FA9550-17-1-0286, and by ONR Code 311. c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 128–150, 2020. https://doi.org/10.1007/978-3-030-65277-7_7

Protocol Analysis with Time

129

support reasoning about time. In this paper we describe a solution that combines the advantages of both approaches. An explicit timed specification language is developed with a timed syntax and semantics, and is automatically translated to an existing untimed language. The user however writes protocol specifications and queries in the timed language. In this paper we describe how such an approach has been applied to the Maude-NPA tool by taking advantage of its built-in support for constraints. We believe that this approach can be applied to other tools that support constraint handling as well. There are a number of security protocols that make use of time. In general, there are two types: those that make use of assumptions about time, most often assuming some sort of loose synchronization, and those that guarantee these assumptions. The first kind includes protocols such as Kerberos [18], which uses timestamps to defend against replay attacks, the TESLA protocol [23], which relies on loose synchronization to amortize digital signatures, and blockchain protocols, which use timestamps to order blocks in the chain. The other kind provides guarantees based on physical properties of time: for example, distance bounding, which guarantees that a prover is within a certain distance of a verifier, and secure time synchronization, which guarantees that the clocks of two different nodes are synchronized within a certain margin of error. In this paper, we concentrate on protocols using distance bounding, both because it has been well-studied, and because the timing constraints are relatively simple. A number of approaches have been applied to the analysis of distance bounding protocols. In [15], an epistemic logic for distance bounding analysis is presented where timing is captured by means of timed channels, which are described axiomatically. Time of sending and receiving messages can be deduced by using these timed channel axioms. In [2], Basin et al. define a formal model for reasoning about physical properties of security protocols, including timing and location, which they formalize in Isabelle/HOL and use it to analyze several distance bounding protocols, by applying a technique similar to Paulson’s inductive approach [22]. In [8], Debant et al. develop a timing model for AKiSS, a tool for verifying protocol equivalence in the bounded session model, and use it to analyze distance bounding protocols. In [21], Nigam et al. develop a model of timing side channels in terms of constraints and use it to define a timed version of observational equivalence for protocols. They have developed a tool for verifying observational equivalence that relies on SMT solvers. Other work concentrates on simplifying the problem so it can be more easily analyzed by a model checker, but proving that the simple problem is sound and complete with respect to the original problem so that the analysis is useful. In this regard, Nigam et al. [20] and Debant et al. [9] show that it is safe to limit the size and complexity of the topologies, and Mauw et al. [14] and Chothia et al. [5] develop timed and untimed models and show that analysis in the untimed model is sound and complete with respect to the timed model. In this paper we illustrate our approach by developing a timed protocol semantics suitable for the analysis of protocols that use constraints on time and distance, such as distance bounding, and that can be implemented as either

130

D. Aparicio-S´ anchez et al.

a simulation tool for generating and checking concrete configurations, or as a symbolic analysis tool that allows the exploration of all relevant configurations. We realize the timed semantics by translating it into the semantics of the Maude-NPA protocol analysis tool, in which timing properties are expressed as constraints. The constraints generated during the Maude-NPA search are then checked using an SMT solver. There are several things that help us. One is that we consider a metric space with distance constraints. Many tools support constraint handling, e.g., MaudeNPA [10] and Tamarin [16]. Another is that time can be naturally added to a process algebra. Many tools support processes, e.g., Maude-NPA [27] and AKISS [8]. The rest of this paper is organized as follows. In Sect. 2, we recall the BrandsChaum protocol, which is used as the running example throughout the paper. In Sect. 3, we present the timed process algebra with its intended semantics. In Sect. 4, we present a sound and complete protocol transformation from our timed process algebra to an untimed process algebra. In Sect. 5, we show how our timed process algebra can be transformed into Maude-NPA strand notation. In Sect. 6, we present our experiments. We conclude in Sect. 7.

2

The Brands-Chaum Distance Bounding Protocol

In the following, we recall the Brands-Chaum distance bounding protocol of [3], which we will use as the running example for the whole paper. Example 1. The Brands-Chaum protocol specifies communication between a verifier V and a prover P. P needs to authenticate itself to V, and also needs to prove that it is within a distance “d” of it. X;Y denotes concatenation of two messages X and Y , commit(N, Sr) denotes commitment of secret Sr with a nonce N , open(N, Sr, C) denotes opening a commitment C using the nonce N and checking whether it carries the secret Sr, ⊕ is the exclusive-or operator, and sign(A, M ) denotes A signing message M . A typical interaction between the prover and the verifier is as follows: P → V : commit(NP , SP ) //The prover sends his name and a commitment V → P : NV //The verifier sends a nonce //and records the time when this message was sent P → V : NP ⊕ N V //The verifier checks the answer of this exclusive-or //message arrives within two times a fixed distance P → V : SP //The prover sends the committed secret //and the verifier checks open(NP , SP , commit(NP , SP )) P → V : sign P (NV ; NP ⊕ NV ) //The prover signs the two rapid exchange messages

Protocol Analysis with Time

131

The previous informal Alice&Bob notation can be naturally extended to include time. We consider wireless communication between the participants located at an arbitrary given topology (participants do not move from their assigned locations) with distance constraints, where time and distance are equivalent for simplification and are represented by a real number. We assume a metric space with a distance function d : A × A → Real from a set A of participants such that d(A, A) = 0, d(A, B) = d(B, A), and d(A, B) ≤ d(A, C) + d(C, B). Then, time information is added to the protocol. First, we add the time when a message was sent or received as a subindex Pt1 → Vt2 . Second, time constraints associated to the metric space are added: (i) the sending and receiving times of a message differ by the distance between them and (ii) the time difference between two consecutive actions of a participant must be greater or equal to zero. Third, the distance bounding constraint of the verifier is represented as an arbitrary distance d. Time constraints are written using quantifier-free formulas in linear real arithmetic. For convenience, in linear equalities and inequalities (with or ≥), we ˙ = if y < x then x − y else 0 allow both 2 ∗ x = x + x and the monus function x−y as definitional extensions. In the following timed sequence of actions, a vertical bar is included to differentiate between the process and some constraints associated to the metric space. We remove the constraint open(NP , SP , commit(NP , SP )) for simplification. Pt1 → Vt1 : Vt2 → Pt2 : Pt3 → Vt3 : V : Pt4 → Vt4 : Pt5 → Vt5 :

commit(NP , SP ) | t1 NV | t2 NP ⊕ NV | t3  ˙ t3 − t2 ≤ 2 ∗ d SP | t4 sign P (NV ; NP ⊕ NV ) | t5

= t1 + d(P, V ) = t2 + d(P, V ) ∧ t2 ≥ t1 = t3 + d(P, V ) ∧ t3 ≥ t2 = t4 + d(P, V ) ∧ t4 ≥ t3 ∧ t4 ≥ t3 = t5 + d(P, V ) ∧ t5 ≥ t4 ∧ t5 ≥ t4

The Brands-Chaum protocol is designed to defend against mafia frauds, where an honest prover is outside the neighborhood of the verifier (i.e., d(P, V ) > d) but an intruder is inside (i.e., d(I, V ) ≤ d), pretending to be the honest prover. The following is an example of an attempted mafia fraud, in which the intruder simply forwards messages back and forth between the prover and the verifier. We write I(P ) to denote an intruder pretending to be an honest prover P . : commit(NP , SP ) | t2 = t1 + d(P, I) Pt1 →It2 : commit(NP , SP ) | t3 = t2 + d(V, I) I(P )t2 →Vt3 | t4 = t3 + d(V, I) Vt3 →I(P )t4 : NV : NV | t5 = t4 + d(P, I) It4 →Pt5 : NP ⊕ N V | t6 = t5 + d(P, I) Pt5 →It6 : NP ⊕ N V | t7 = t6 + d(V, I) I(P )t6 →Vt7 ˙ 3 ≤2∗d V : t7 −t : SP | t9 = t8 + d(P, I) ∧ t8 ≥ t5 Pt8 →It9 I(P )t10 →Vt11 : SP | t11 = t10 + d(V, I) ∧ t11 ≥ t7 I(P )t12 →Vt13 : signP (NV ; NP ⊕ NV )| t13 = t12 + d(V, I) ∧ t13 ≥ t11 Note that, in order for this trace to be consistent with the metric space, it would require that 2 ∗ d(V, I) + 2 ∗ d(P, I) ≤ 2 ∗ d, which is unsatisfiable by

132

D. Aparicio-S´ anchez et al.

d(V, P ) > d > 0 and the triangular inequality d(V, P ) ≤ d(V, I) + d(P, I), which implies that the attack is not possible. However, a distance hijacking attack is possible (i.e., the time and distance constraints are satisfiable) where an intruder located outside the neighborhood of the verifier (i.e., d(V, I) > d) succeeds in convincing the verifier that he is inside the neighborhood by exploiting the presence of an honest prover in the neighborhood (i.e., d(V, P ) ≤ d) to achieve his goal. The following is an example of a successful distance hijacking, in which the intruder listens to the exchanges messages between the prover and the verifier but builds the last message. Pt1 → Vt2 Vt2 → Pt3 , It3 Pt3 → Vt4 , It4 V Pt5 → Vt6 I(P )t7 → Vt8

3

: : : : : :

commit(NP , SP ) | t2 NV | t3 NP ⊕ N V | t4 ˙ t2 ≤ 2 ∗ d t4 − SP | t6 sign I (NV ; NP ⊕ NV ) | t8

= t1 + d(P, V ) = t2 + d(P, V ) ∧ t3 = t2 + d(I, V ) = t3 + d(P, V ) ∧ t4 = t3 + d(I, V ) = t5 + d(P, V ) ∧ t5 ≥ t3 ∧ t6 ≥ t4 = t7 + d(I, V ) ∧ t7 ≥ t4 ∧ t8 ≥ t6

A Timed Process Algebra

In this section, we present our timed process algebra and its intended semantics. We restrict ourselves to a semantics that can be used to reason about time and distance. We discuss how this could be extended in Sect. 7. To illustrate our approach, we use Maude-NPA’s process algebra and semantics described in [27], extending it with a global clock and time information. 3.1

New Syntax for Time

In our timed protocol process algebra, the behaviors of both honest principals and the intruders are represented by labeled processes. Therefore, a protocol is specified as a set of labeled processes. Each process performs a sequence of actions, namely sending (+m) or receiving (−m) a message m, but without knowing who actually sent or received it. Each process may also perform deterministic or non-deterministic choices. We define a protocol P in the timed protocol process algebra, written PTPA , as a pair of the form PTPA = ((ΣT P AP , ETPAP ), PTPA ), where (ΣT P AP , ETPAP ) is the equational theory specifying the equational properties of the cryptographic functions and the state structure, and PTPA is a ΣT P AP -term denoting a well-formed timed process. The timed protocol process algebra’s syntax ΣTPA is parameterized by a sort Msg of messages. Moreover, time is represented by a new sort Real, since we allow conditional expressions on time using linear arithmetic for the reals. Similar to [27], processes support four different kinds of choice: (i) a process expression P ? Q supports explicit non-deterministic choice between P and Q; (ii) a choice variable X? appearing in a send message expression +m supports implicit non-deterministic choice of its value, which can furthermore be an unbounded non-deterministic choice if X? ranges over an infinite set; (iii) a

Protocol Analysis with Time

133

conditional if C then P else Q supports explicit deterministic choice between P and Q determined by the result of its condition C; and (iv) a receive message expression −m(X1 , ..., Xn ) supports implicit deterministic choice about accepting or rejecting a received message, depending on whether or not it matches the pattern m(X1 , ..., Xn ). This deterministic choice is implicit, but it could be made explicit by replacing −m(X1 , ..., Xn ) · P by the semantically equivalent conditional expression −X. if X = m(X1 , ..., Xn ) then P else nilP · P , where X is a variable of sort Msg, which therefore accepts any message. The timed process algebra has the following syntax, also similar to that of [27] plus the addition of the suffix @Real to the sending and receiving actions: ProcConf ::= LProc | ProcConf & ProcConf | ∅ ProcId ::= (Role, Nat) LProc ::= (ProcId , Nat) Proc Proc ::= nilP | + (Msg@Real ) | − (Msg@Real ) | Proc · Proc | Proc ? Proc | if Cond then Proc else Proc – ProcConf stands for a process configuration, i.e., a set of labeled processes, where the symbol & is used to denote set union for sets of labeled processes. – ProcId stands for a process identifier, where Role refers to the role of the process in the protocol (e.g., prover or verifier) and Nat is a natural number denoting the identity of the process, which distinguishes different instances (sessions) of a process specification. – LProc stands for a labeled process, i.e., a process Proc with a label (ProcId , J). For convenience, we sometimes write (Role, I, J), where J indicates that the action at stage J of the process (Role, I) will be the next one to be executed, i.e., the first J − 1 actions of the process for role Role have already been executed. Note that the I and J of a process (Role, I, J) are omitted in a protocol specification. – Proc defines the actions that can be executed within a process, where +Msg@T , and −Msg@T respectively denote sending out a message or receiving a message Msg. Note that T must be a variable where the underlying metric space determines the exact sending or receiving time, which can be used later in the process. Moreover, “Proc · Proc” denotes sequential composition of processes, where symbol . is associative and has the empty process nilP as identity. Finally, “Proc ? Proc” denotes an explicit nondeterministic choice, whereas “if Cond then Proc else Proc” denotes an explicit deterministic choice, whose continuation depends on the satisfaction of the constraint Cond . Note that choice is explicitly represented by either a non-deterministic choice between P1 ? P2 or by the deterministic evaluation of a conditional expression if Cond then P1 else P2 , but it is also implicitly represented by the instantiation of a variable in different runs. In all process specifications we assume four disjoint kinds of variables, similar to the variables of [27] plus time variables:

134

D. Aparicio-S´ anchez et al.

– fresh variables: each one of these variables receives a distinct constant value from a data type Vfresh , denoting unguessable values such as nonces. Throughout this paper we will denote this kind of variables as f, f1 , f2 , . . .. – choice variables: variables first appearing in a sent message +M , which can be substituted by any value arbitrarily chosen from a possibly infinite domain. A choice variable indicates an implicit non-deterministic choice. Given a protocol with choice variables, each possible substitution of these variables denotes a possible run of the protocol. We always denote choice variables by letters postfixed with the symbol “?” as a subscript, e.g., A? , B? , . . .. – pattern variables: variables first appearing in a received message −M . These variables will be instantiated when matching sent and received messages. Implicit deterministic choices are indicated by terms containing pattern variables, since failing to match a pattern term leads to the rejection of a message. A pattern term plays the implicit role of a guard, so that, depending on the different ways of matching it, the protocol can have different continuations. Pattern variables are written with uppercase letters, e.g., A, B, NA , . . .. – time variables: a process cannot access the global clock, which implies that a time variable T of a reception or sending action +(M @T ) can never appear in M but can appear in the remaining part of the process. Also, given a receiving action −(M1 @t1 ) and a sending action +(M2 @t2 ) in a process of the form P1 ·−(M1 @t1 )·P2 ·+(M2 @t2 )·P3 , the assumption that timed actions are performed from left to right forces the constraint t1 ≤ t2 . Time variables are always written with a (subscripted) t, e.g., t1 , t1 , t2 , t2 , . . .. These conditions about variables are formalized by the function wf : Proc → Bool defined in Fig. 1, for well-formed processes. The definition of wf uses an auxiliary function shVar : Proc → VarSet, which is defined in Fig. 2. wf (P · +(M @T )) = wf (P ) if (Vars(M ) ∩ Vars(P )) ⊆ shVar (P ) ∧ T ∈ / Vars(M ) ∪ Vars(P ) wf (P · −(M @T )) = wf (P ) if (Vars(M ) ∩ Vars(P )) ⊆ shVar (P ) ∧ T ∈ / Vars(M ) ∪ Vars(P ) wf (P · (if T then Q else R)) = wf (P · Q) ∧ wf (P · R) if P = nilP and Q = nilP and Vars(T ) ⊆ shVar (P ) wf (P · (Q ? R)) = wf (P · Q) ∧ wf (P · R)

if Q = nilP orR = nilP

wf (P · nilP ) = wf (P ) wf (nilP ) = True.

Fig. 1. The well-formed function

Protocol Analysis with Time

135

shVar (+(M @T ) · P ) = Vars(M ) ∪ shVar (P ) shVar (−(M @T ) · P ) = Vars(M ) ∪ shVar (P ) shVar ((if T then P else Q) · R) = Vars(T ) ∪ (shVar (P ) ∩ shVar (Q)) ∪ shVar (R) shVar ((P ? Q) · R) = (shVar (P ) ∩ shVar (Q)) ∪ shVar (R) shVar (nilP ) = ∅

Fig. 2. The shared variables auxiliary function

Example 2. Let us specify the Brands and Chaum protocol of Example 1, where variables are distinct between processes. A nonce is represented as n(A? , f ), whereas a secret value is represented as s(A? , f ). The identifier of each process is represented by a choice variable A? . Recall that there is an arbitrary distance d > 0. (Verifier ) : −(Commit@t1 ) · +(n(V? , f1 )@t2 ) · −((n(V? , f1 ) ⊕ NP )@t3 ) · ˙ 2 ≤2∗d if t3 −t then −(SP @t4 ) · if open(NP , SP , Commit) then −(sign(P, n(V? , f1 ); NP ⊕ n(V? , f1 ))@t5 ) else nilP else nilP (Prover ) : +(commit(n(P? , f1 ), s(P? , f2 ))@t1 ) · −(NV @t2 ) · +((NV ⊕ n(P? , f1 ))@t3 ) · +(s(P? , f2 )@t4 ) · +(sign(P? , NV ; n(P? , f2 ) ⊕ NV )@t5 ) 3.2

Timed Intruder Model

The active Dolev-Yao intruder model is followed, which implies an intruder can intercept, forward, or create messages from received messages. However, intruders are located. Therefore, they cannot change the physics of the metric space, e.g., cannot send messages from a different location or intercept a message that it is not within range. In our timed intruder model, we consider several located intruders, modeled by the distance function d : ProcId × ProcId → Real , each with a family of capabilities (concatenation, deconcatenation, encryption, decryption, etc.), and each capability may have arbitrarily many instances. The combined actions of two intruders requires time, i.e., their distance; but a single intruder can perform many actions in zero time. Adding time cost to single-intruder actions could be done with additional time constraints, but is outside the scope of this paper. Note that, unlike in the standard Dolev-Yao model, we cannot assume just one

136

D. Aparicio-S´ anchez et al.

intruder, since the time required for a principal to communicate with a given intruder is an observable characteristic of that intruder. Thus, although the Mafia fraud and distance hijacking attacks considered in the experiments presented in this paper only require configurations with just one prover, one verifier and one intruder, the framework itself allows general participant configurations with multiple intruders. Example 3. In our timed process algebra, the family of capabilities associated to an intruder k are also described as processes. For instance, concatenating two received messages is represented by the process (where time variables t1 , t2 , t3 are not actually used by the process) (k .Conc) : −(X@t1 ) · −(Y @t2 ) · +(X; Y @t3 ) and extracting one of them from a concatenation is described by the process (k .Deconc) : −(X; Y @t1 ) · +(X@t2 ) Roles of intruder capabilities include the identifier of the intruder, and it is possible to combine several intruder capabilities from the same or from different intruders. For example, we may say that the +(X; Y @T ) of a process I1 .Conc associated to an intruder I1 may be synchronized with the −(X; Y @T  ) of a process I2 .Deconc associated to an intruder I2. The metric space fixes T  = T + d(I1, I2), where d(I1, I2) > 0 if I1 = I2 and d(I1, I2) = 0 if I1 = I2. A special forwarding intruder capability, not considered in the standard Dolev-Yao model, has to be included in order to take into account the time travelled by a message from an honest participant to the intruder and later to another participant, probably an intruder again. (k .Forward ) : −(X@t1 ) · +(X@t2 ) 3.3

Timed Process Semantics

A state of a protocol P consists of a set of (possibly partially executed) labeled processes, a set of terms in the network {N et}, and the global clock. That is, a state is a term of the form {LP1 & · · · & LPn | {Net} | t¯}. In the timed process algebra, the only time information available to a process is the variable T associated to input and output messages M @T . However, once these messages have been sent or received, we include them in the network Net with extra information. When a message M @T is sent, we store M @ (A : t → ∅) denoting that message M was sent by process A at the global time clock t, and propagate T → t within the process A. When this message is received by an action M  @T  of process B (honest participant or intruder) at the global clock time t , M is matched against M  modulo the cryptographic functions, T  → t is propagated within the process B, and B : t is added to the stored message, following the general pattern M @ (A : t → (B1 : t1 · · · Bn : tn )). The rewrite theory (ΣTPAP +State , ETPAP , RTPAP ) characterizes the behavior of a protocol P, where ΣTPAP +State extends ΣT P AP , by adding state constructor

Protocol Analysis with Time

137

symbols. We assume that a protocol run begins with an empty state, i.e., a state with an empty set of labeled processes, an empty network, and at time zero. Therefore, the initial empty state is always of the form {∅ | {∅} | 0.0}. Note that, in a specific run, all the distances are provided a priori according to the metric space and a chosen topology, whereas in a symbolic analysis, they will simply be variables, probably occurring within time constraints. State changes are defined by a set RTPAP of rewrite rules given below. Each transition rule in RTPAP is labeled with a tuple (ro, i , j , a, n, t), where: – – – –

ro is the role of the labeled process being executed in the transition. i denotes the instance of the same role being executed in the transition. j denotes the process’ step number since its beginning. a is a ground term identifying the action that is being performed in the transition. It has different possible values: “+m” or “−m” if the message m was sent (and added to the network) or received, respectively; “m” if the message m was sent but did not increase the network, “?” if the transition performs an explicit non-deterministic choice, “T ” if the transition performs an explicit deterministic choice, “Time” when the global clock is incremented, or “New ” when a new process is added. – n is a number that, if the action that is being executed is an explicit choice, indicates which branch has been chosen as the process continuation. In this case n takes the value of either 1 or 2. If the transition does not perform any explicit choice, then n = 0. – t is the global clock at each transition step. Note that in the transition rules RTPAP shown below, Net denotes the network, represented by a set of messages of the form M @ (A : t → (B1 : t1 · · · Bn : tn )), P denotes the rest of the process being executed and P S denotes the rest of labeled processes of the state (which can be the empty set ∅). – Sending a message is represented by the two transition rules below, depending on whether the message M is stored, (TPA++), or just discarded, (TPA+). In (TPA++), we store the sent message with its sending information, (ro, i) : t¯, and add an empty set for those who will be receiving the message in the future (M σ  @(ro, i) : t¯ → ∅). {(ro, i, j) (+M @t · P ) & P S | {N et} | t¯} −→(ro,i,j,+(M σ ),0,t¯) {(ro, i, j + 1) P σ  & P S | {(M σ  @(ro, i) : t¯ → ∅), N et} | t¯} if (M σ  : (ro, i) : t¯ → ∅) ∈ / Net where σ is a ground substitution binding choice variables in M and σ  = σ {t → t¯} (TPA++) {(ro, i, j) (+M @t · P ) & P S | {N et} | t¯} −→(ro,i,j,M σ ,0,t¯) {(ro, i, j + 1) P σ  & P S | {N et} | t¯} where σ is a ground substitution binding choice variables in M and σ  = σ {t → t¯} (TPA+)

138

D. Aparicio-S´ anchez et al.

– Receiving a message is represented by the transition rule below. We add the reception information to the stored message, i.e., we replace (M  @((ro , k) : t → AS)) by (M  @((ro , k) : t → (AS (ro, i) : t¯)). {(ro, i, j) (−(M @t) · P ) & P S | {(M  @((ro , k) : t → AS)), N et} | t¯} −→(ro,i,j,−(M σ ),0,t¯) {(ro, i, j + 1) P σ  & P S | {(M  @((ro , k) : t → (AS (ro, i) : t¯)), N et} | t¯} IF ∃σ : M  =EP M σ, t¯ = t + d((ro , k), (ro, i)), σ  = σ {t → tˆ} (TPA-) – An explicit deterministic choice is defined as follows. More specifically, the rule (TPAif1) describes the then case, i.e., if the constraint T is satisfied, then the process continues as P , whereas rule (TPAif2) describes the else case, that is, if the constraint T is not satisfied, the process continues as Q. {(ro, i, j) ((if T then P else Q) · R) & P S | {N et} | t¯} −→(ro,i,j,T,1,t¯) {(ro, i, j + 1) (P · R) & P S | {N et} | t¯}IF T {(ro, i, j) ((if T then P else Q) · R) & P S | {N et} | t¯} −→(ro,i,j,T,2,t¯) {(ro, i, j + 1) (Q · R) & P S | {N et} | t¯}IF¬T

(TPAif1) (TPAif2)

– An explicit non-deterministic choice is defined as follows. The process can continue either as P , denoted by rule (TPA?1), or as Q, denoted by rule (TPA?2). {(ro, i, j) ((P ? Q) · R) & P S | {N et} | t¯} −→(ro,i,j,?,1,t¯) {(ro, i, j + 1) (P · R) & P S | {N et} | t¯} {(ro, i, j) ((P ? Q) · R) & P S | {N et} | t¯} −→(ro,i,j,?,2,t¯) {(ro, i, j + 1)(Q · R) & P S | {N et} | t¯}

(TPA?1) (TPA?2)

– Global Time advancement is represented by the transition rule below that increments the global clock enough to make one sent message arrive to its closest destination. {P S | {N et} | t¯} −→(⊥,⊥,⊥,Time,0,t¯+t ) {P S | {N et} | t¯ + t } (PTime) IF t = mte(P S, N et, t¯) ∧ t = 0 where the function mte is defined as follows: mte(∅, N et, t¯) = ∞ mte(P &P S, N et, t¯) = min(mte(P, N et, t¯), mte(P S, N et, t¯)) mte((ro, i, j) nilP , N et, t¯) = ∞ mte((ro, i, j) + (M @t) · P, N et, t¯) = 0 mte((ro, i, j) − (M @t) · P, N et, t¯) =   d((ro, i), (ro , i )) | (M  @(ro , i ) : t0 → AS) ∈ N et min ∧∃σ : M σ =B M  mte((ro, i, j) (if T then P else Q) · R, N et, t¯) = 0

Protocol Analysis with Time

139

mte((ro, i, j) P1 ?P2 , N et, t¯) = 0 Note that the function mte evaluates to 0 if some instantaneous action by the previous rules can be performed. Otherwise, mte computes the smallest non-zero time increment required for some already sent message (existing in the network) to be received by some process (by matching with such an existing message in the network). Remark. The timed process semantics assumes a metric space with a distance function d : ProcId × ProcId → Real such that (i) d(A, A) = 0, (ii) d(A, B) = d(B, A), and (iii) d(A, B) ≤ d(A, C) + d(C, B). For every message M @ (A : t → (B1 : t1 · · · Bn : tn )) stored in the network Net, our semantics assumes that (iv) ti = t + d(A, Bi ), ∀1 ≤ i ≤ n. Furthermore, according to our wireless communication model, our semantics assumes (v) a time sequence monotonicity property, i.e., there is no other process C such that d(A, C) ≤ d(A, Bi ) for some i, 1 ≤ i ≤ n, and C is not included in the set of recipients of the message M . Also, for each class of attacks such as the Mafia fraud or the hijacking attack, (vi) some extra topology constraints may be necessary. However, in Sect. 4, timed processes are transformed into untimed processes with time constraints and the transformation takes care only of conditions (i), (ii), and (iv). For a fixed number of participants, all the instances of the triangle inequality (iii) as well as constraints (vi) should be added by the user. In the general case, conditions (iii), (v), and (vi) can be partially specified and fully checked on a successful trace. – New processes can be added as follows. ⎫ ⎧ ∀ (ro) Pk ∈ PPA ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ {P S | {N et} | t¯} ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ −→(ro,i+1,1,N ew,0,t¯) ⎪ ⎪{(ro, i + 1, 1, x? σ, y? σ) Pk σρro,i+1 & P S | {N et} | t¯} ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ where ρro,i+1 is a fresh substitution, ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ σ is a ground substitution binding x? and y? , and i = id(P S, ro) (TPA&) The auxiliary function id counts the instances of a role id(∅, ro) = 0  max(id(P S, ro), i) if ro = ro id((ro , i, j)P &P S, ro) = id(P S, ro) if ro = ro where P S denotes a process configuration, P a process, and ro, ro role names. Therefore, the behavior of a timed protocol in the process algebra is defined by the set of transition rules RTPAP = {(TPA++), (TPA+), (PTime), (TPA-), (TPAif1), (TPAif2), (TPA?1), (TPA?2) } ∪ (TPA&). Example 4. Continuing Example 2, a possible run of the protocol is represented in Fig. 3 for a prover p, an intruder i, and a verifier v. A simpler, graphical representation of the same run is included at the top of the figure. There, the

140

D. Aparicio-S´ anchez et al.

Fig. 3. Brand and Chaum execution for a prover, an intruder, and a verifier

Protocol Analysis with Time

141

neighborhood distance is d = 1.0, the distance between the prover and the verifier is d(p, v) = 2.0, but the distance between the prover and the intruder as well as the distance between the verifier and the intruder are d(v, i) = d(p, i) = 1.0, i.e., the honest prover p is outside v’s neighborhood, d(v, p) > d, where d(v, p) = d(v, i) + d(p, i). Only the first part of the rapid message exchange sequence is represented and the forwarding action of the intruder is denoted by i.F . The prover sends the commitment m1 = commit(n(p, f1 ), s(p, f2 )) at instant t¯0 = 0.0 and is received by the intruder at instant t¯1 = 1.0. The intruder forwards m1 at instant t¯1 and is received by the verifier at instant t¯2 = 2.0. Then, the verifier sends m2 = n(v, f3 ) at instant t¯2 , which is received by the intruder at instant t¯3 = 3.0. The intruder forwards m2 at instant t¯3 , which is received by the prover at instant t¯4 = 4.0. Then, the prover sends m4 = (m2 ⊕ n(p, f1 )) at instant t¯4 and is received by the intruder at instant t¯5 = 5.0. Finally, the intruder forwards m4 at instant t¯5 and is received by the verifier at instant t¯6 = 6.0. Thus, the verifier sent m2 at time t¯2 = 2.0 and received m4 at time t¯6 = 6.0. But the protocol cannot complete the run, since t¯6 − t¯2 = 4.0 < 2∗d = 2.0 is unsatisfiable. Our time protocol semantics can already be implemented straightforwardly as a simulation tool. For instance, [15] describes distance bounding protocols using an authentication logic, which describes the evolution of the protocol, [20] provides a strand-based framework for distance bounding protocols based on simulation with time constraints, and [8] defines distance bounding protocol using some applied-pi calculus. Note, however, that, since the number of metric space configurations is infinite, model checking a protocol for a concrete configuration with a simulation tool is very limited, since it cannot prove the absence of an attack for all configurations. For this reason, we follow a symbolic approach that can explore all relevant configurations. In the following section, we provide a sound and complete protocol transformation from our timed process algebra to the untimed process algebra of the Maude-NPA tool. In order to do this, we make use of an approach introduced by Nigam et al. [20] in which properties of time, which can include both those following from physics and those checked by principals, are represented by linear constraints on the reals. As a path is built, an SMT solver can be used to check that the constraints are satisfiable, as is done in [21].

4

Timed Process Algebra into Untimed Process Algebra with Time Variables and Timing Constraints

In this section, we consider a more general constraint satisfiability approach, where all possible (not only some) runs are symbolically analyzed. This provides both a trace-based insecure statement, i.e., a run leading to an insecure secrecy or authentication property is discovered given enough resources, and an unsatisfiability-based secure statement, i.e., there is no run leading to an insecure secrecy or authentication property due to time constraint unsatisfiability.

142

D. Aparicio-S´ anchez et al.

Example 5. Consider again the run of the Brands-Chaum protocol given in Fig. 3. All the terms of sort Real, written in blue color, are indeed variables that get an assignment during the run based on the distance function. Then, it is possible to obtain a symbolic trace from the run of Fig. 3, where the following time constraints are accumulated: t¯1 t¯2 t¯3 t¯4 t¯5 t¯6

= t¯0 + d((p, 0), (i.F, 0)), = t¯1 + d((v, 0), (i.F, 0)), = t¯2 + d((v, 0), (i.F, 1)), = t¯3 + d((p, 0), (i.F, 1)), = t¯4 + d((p, 0), (i.F, 2)), = t¯5 + d((v, 0), (i.F, 2)),

d((p, 0), (i.F, 0)) ≥ 0 d((v, 0), (i.F, 0)) ≥ 0 d((v, 0), (i.F, 1)) ≥ 0 d((p, 0), (i.F, 1)) ≥ 0 d((p, 0), (i.F, 2)) ≥ 0 d((v, 0), (i.F, 2)) ≥ 0

Note that these constraints are unsatisfiable when combined with (i) the assumption d > 0, (ii) the verifier check t¯6 − t¯2 ≤ 2 ∗ d, (iii) the assumption that the honest prover is outside the verifier’s neighborhood, d((p, 0), (v, 0)) > d, (iv) the triangular inequality from the metric space, d((p, 0), (v, 0)) ≤ d((p, 0), (i.F, 0)) + d((i.F, 0), (v, 0)), and (v) the assumption that there is only one intruder d((i.F, 0), (i.F, 1)) = 0 and d((i.F, 0), (i.F, 2)) = 0. As explained previously in the remark, there are some implicit conditions based on the mte function to calculate the time increment to the closest destination of a message. However, the mte function disappears in the untimed process algebra and those implicit conditions are incorporated into the symbolic run. In the following, we define a transformation of the timed process algebra by (i) removing the global clock; (ii) adding the time data into untimed messages of a process algebra without time (as done in [20]); and (iii) adding linear arithmetic conditions over the reals for the time constraints (as is done in [21]). The soundness and completeness proof of the transformation is included in the full version of the paper, available at https://arxiv.org/abs/2010.13707. Since all the relevant time information is actually stored in messages of the form M @ (A : t → (B1 : t1 · · · Bn : tn )) and controlled by the transition rules (TPA++), (TPA+), and (TPA-), the mapping tpa2pa of Definition 1 below transforms each message M @t of a timed process into a message M @ (A : t? → AS ? ) of an untimed process. That is, we use a timed choice variable t? for the sending time and a variable AS ? for the reception information (B1 : t1 · · · Bn : tn ) associated to the sent message. Since choice variables are replaced by specific values, both t? and AS ? will be replaced by the appropriate values that make the execution and all its time constraints possible. Note that these two choice variables will be replaced by logical variables during the symbolic execution. Definition 1 (Adding Time Variables and Time Constraints to Untimed Processes). The mapping tpa2pa from timed processes into untimed processes and its auxiliary mapping tpa2pa* are defined as follows:

Protocol Analysis with Time

143

tpa2pa(∅) = ∅ tpa2pa((ro,i,j) P & PS ) = (ro,i,j) tpa2pa*(P ,ro,i) & tpa2pa(PS ) tpa2pa*(nilP , ro, i) = nilP tpa2pa*( +(M @t) . P, ro, i) = +(M @((ro, i) : t? → AS ? )) . tpa2pa*(P γ, ro, i) where γ = {t → t? } tpa2pa*( −(M @t) . P, ro, i) = − (M @((ro , i ) : t → ((ro, i) : t)  AS)) . if t = t + d((ro, i), (ro , i )) ∧ d((ro, i), (ro , i )) ≥ 0 then tpa2pa*(P, ro, i) else nilP tpa2pa*( (if C then P else Q) . R,ro,i,x,y) = (if C then tpa2pa*(P ,ro,i,x,y) else tpa2pa*(Q,ro,i,x,y)) . tpa2pa*(R,ro,i,x,y) tpa2pa*( (P ? Q) . R,ro,i,x,y) = (tpa2pa*(P ,ro,i,x,y) ? tpa2pa*(Q,ro,i,x,y)) . tpa2pa*(R,ro,i,x,y)

where t? and AS ? are choice variables different for each one of the sending actions, ro , i , t , d, AS are pattern variables different for each one of the receiving actions, P , Q, and R are processes, M is a message, and C is a constraint. Example 6. The timed processes of Example 2 are transformed into the following untimed processes. We remove the “else nilP ” branches for clarity. (Verifier ) : −(Commit @ A1 : t1 → V? : t1 AS1 ) · if t1 = t1 + d(A1 , V? ) ∧ d(A1 , V? ) ≥ 0 then +(n(V? , f1 ) @ V? : t2? → AS2? ) · −((n(V? , f1 ) ⊕ NP ) @ A3 : t3 → V? : t3 AS3 ) · if t3 = t3 + d(A3 , V? ) ∧ d(A3 , V? ) ≥ 0 then ˙ 2 ? ≤ 2 ∗ d then if t3 −t −(SP @ A4 : t4 → V? : t4 AS4 ) · if t4 = t4 + d(A4 , V? ) ∧ d(A4 , V? ) ≥ 0 then if open(NP , SP , Commit) then −(sign(P, n(V? , f1 ); NP ⊕ n(V? , f1 )) @ A5 : t5 → V? : t5 AS5 ) if t5 = t5 + d(A5 , V? ) ∧ d(A5 , V? ) ≥ 0 (Prover ) : +(commit(n(P? , f1 ), s(P? , f2 ))@P? : t1? → AS1? ) · −(V ; NV @ A2 : t2 → V? : t2 AS2 ) · if t2 = t2 + d(A2 , P? ) ∧ d(A2 , P? ) ≥ 0 then +((NV ⊕ n(P? , f1 ))@P? : t3 ? → AS3 ? ) · +(s(P? , f2 )@P? : t4? → AS4? ) · +(sign(P? , NV ; n(P? , f2 ) ⊕ NV )@P? : t5 ? → AS5 ? ))

144

D. Aparicio-S´ anchez et al.

Example 7. The timed processes of Example 3 for the intruder are transformed into the following untimed processes. Note that we use the intruder identifier I associated to each role instead of a choice variable I? . (I .Conc) : −(X@ A1 : t1 → I : t1 AS1 ) · if t1 = t1 + d(A1 , I) ∧ d(A1 , I) ≥ 0 then −(Y @ A2 : t2 → I : t2 AS2 ) · if t2 = t2 + d(A2 , I) ∧ d(A2 , I) ≥ 0 then +(X; Y @I : t3 ? → AS ? )

(I .Deconc) : −(X; Y @ A1 : t1 → I : t1 AS1 ) · if t1 = t1 + d(A1 , I) ∧ d(A1 , I) ≥ 0 then +(X@I : t2? → AS ? ) (I .Forward ) : −(X@ A1 : t1 → I : t1 AS1 ) · if t1 = t1 + d(A1 , I) ∧ d(A1 , I) ≥ 0 then +(X@I : t2? → AS ? ) Once a timed process is transformed into an untimed process with time variables and time constraints using the notation of Maude-NPA, we rely on both a soundness and completeness proof from the Maude-NPA process notation into Maude-NPA forward rewriting semantics and on a soundness and completeness proof from Maude-NPA forward rewriting semantics into Maude-NPA backwards symbolic semantics, see [26,27]. Since the Maude-NPA backwards symbolic semantics already considers constraints in a very general sense [10], we only need to perform the additional satisfiability check for linear arithmetic over the reals.

5

Timed Process Algebra into Strands in Maude-NPA

This section is provided to help in understanding the experimental output. Although Maude-NPA accepts protocol specifications in either the process algebra language or the strand space language, it still gives outputs only in the strand space notation. Thus, in order to make our experimental output easier to understand, we describe the translation from timed process into strands with time variables and time constraints. This translation is also sound and complete, as it imitates the transformation of Sect. 4 and the transformation of [26,27]. Strands [25] are used in Maude-NPA to represent both the actions of honest principals (with a strand specified for each protocol role) and those of an intruder (with a strand for each action an intruder is able to perform on messages). In Maude-NPA, strands evolve over time. The symbol | is used to divide past ± ± ± and future. That is, given a strand [ msg± 1 , . . . , msgi | msgi+1 , . . . , msgk ], ± ± ± messages msg± 1 , . . . , msgi are the past messages, and messages msgi+1 , . . . , msgk

Protocol Analysis with Time

145

are the future messages (msg± i+1 is the immediate future message). Constraints ± can be also inserted into strands. A strand [msg± 1 , . . . , msgk ] is shorthand for ± ± [nil | msg1 , . . . , msgk , nil]. An initial state is a state where the bar is at the beginning for all strands in the state, and the network has no possible intruder fact of the form m ∈ I. A final state is a state where the bar is at the end for all strands in the state and there is no negative intruder fact of the form m ∈ / I. In the following example, we illustrate how the timed process algebra can be transformed into strands specifications of Maude-NPA. Example 8. The timed processes of Example 2 are transformed into the following strand specification. (Verifier ) : [−(Commit @ A1 : t1 → V : t1 AS1 ), (t1 = t1 + d(A1 , V ) ∧ d(A1 , V ) ≥ 0), +(n(V, f1 ) @ V : t2 → AS2 ), −((n(V, f1 ) ⊕ NP ) @ A3 : t3 → V : t3 AS3 ), (t3 = t3 + d(A3 , V ) ∧ d(A3 , V ) ≥ 0), ˙ 2 ≤ 2 ∗ d), (t3 −t −(SP @ A4 : t4 → V : t4 AS4 ), (t4 = t4 + d(A4 , V ) ∧ d(A4 , V ) ≥ 0), open(NP , SP , Commit), −(sign(P, n(V, f1 ); NP ⊕ n(V, f1 ))@ A5 : t5 → V : t5 AS5 ), (t5 = t5 + d(A5 , V ) ∧ d(A5 , V ) ≥ 0)] (Prover ) : [+(commit(n(P, f1 ), s(P, f2 ))@P : t1 → AS1 ), −(NV @ A2 : t2 → V : t2 AS2 ), (t2 = t2 + d(A2 , P ) ∧ d(A2 , P ) ≥ 0), +((NV ⊕ n(P, f1 ))@P : t3 → AS3 ), +(s(P, f2 )@P : t4 → AS4 ), +(sign(P, NV ; n(P, f2 ) ⊕ NV )@P : t5 → AS5 )] We specify the desired security properties in terms of attack patterns including logical variables, which describe the insecure states that Maude-NPA is trying to prove unreachable. Specifically, the tool attempts to find a backwards narrowing sequence path from the attack pattern to an initial state until it can no longer form any backwards narrowing steps, at which point it terminates. If it has not found an initial state, the attack pattern is judged unreachable. The following example shows how a classic mafia fraud attack for the BrandsChaum protocol can be encoded in Maude-NPA’s strand notation. Example 9. Following the strand specification of the Brands-Chaum protocol given in Example 8, the mafia attack of Example 1 is given as the following attack pattern. Note that Maude-NPA uses symbol === for equality on the reals, +=+ for addition on the reals, *=* for multiplication on the reals, and -=- for subtraction

146

D. Aparicio-S´ anchez et al.

on the reals. Also, we consider one prover p, one verifier v, and one intruder i at fixed locations. Extra time constraints are included in an smt section, where a triangular inequality has been added. The mafia fraud attack is secure for Brands-Chaum and no initial state is found in the backwards search. eq ATTACK-STATE(1) --- Mafia fraud = :: r :: --- Verifier [ nil, -(commit(n(p,r1),s(p,r2)) @ i : t1 -> v : t2), ((t2 === t1 +=+ d(i,v)) and d(i,v) >= 0/1), +(n(v,r) @ v : t2 -> i : t2’’), -(n(v,r) * n(p,r1) @ i : t3 -> v : t4), (t3 >= t2 and (t4 === t3 +=+ d(i,v)) and d(i,v) >= 0/1), ((t4 -=- t2) i : t1’’), -(n(v,r) @ i : t2’’ -> p : t3’), ((t3’ === t2’’ +=+ d(i,p)) and d(i,p) >= 0/1), +(n(v,r) * n(p,r1) @ p : t3’ -> i : t3’’) | nil ] || smt(d(v,p) > 0/1 and d(i,p) > 0/1 and d(i,v) > 0/1 and d(v,i) = d(v,p) and d(v,p) > d) || nil || nil || nil [nonexec] .

6

Experiments

As a feasibility study, we have encoded several distance bounding protocols in Maude-NPA. It was necessary to slightly alter the Maude-NPA tool by (i) including minor modifications to the state space reduction techniques to allow for timed messages; (ii) the introduction of the sort Real and its associated operations; and (iii) the connection of Maude-NPA to a Satisfiability Modulo Theories (SMT) solver1 (see [19] for details on SMT). The specifications, outputs, and the modified version of Maude-NPA are available at http://personales.upv.es/sanesro/ indocrypt2020/. Although the timed model allows an unbounded number of principals, the attack patterns used to specify insecure goal states allow us to limit the number of principals in a natural way. In this case we specified one verifier, one prover, and one attacker, but allowed an unbounded number of sessions. In Table 1 above we present the results for the different distance-bounding protocols that we have analyzed. Two attacks have been analyzed for each protocol: a mafia fraud attack (i.e., an attacker tries to convince the verifier that an honest prover is closer to him than he really is), and a distance hijacking attack (i.e., a dishonest prover located far away succeeds in convincing a verifier that they are actually close, and he may only exploit the presence of honest participants in the neighborhood to achieve his goal). Symbol  means the property is satisfied and × means an attack was found. The columns labelled tm(sec) give the times in seconds that it took for a search to complete. Finally the column labeled PreProc gives the time it takes Maude-NPA to perform some preprocessing on the specification that eliminates searches for some provably unreachable state. This only needs to be done once, after which the results can be used for any query, so it is displayed separately. 1

Several SMT solvers are publicly available, but the programming language Maude [6] currently supports CVC4 [7] and Yices [28].

Protocol Analysis with Time

147

Table 1. Experiments performed for different distance-bounding protocols Protocol

PreProc (s) Mafia tm (s) Hijacking tm (s)

Brands and Chaum [3]

3.0



4.3

×

11.4

Meadows et al. (nV ⊕ nP ,P ) [15] 3.7



1.3



22.5

Meadows et al. (nV ,nP ⊕ P ) [15] 3.5



1.1

×

1.5

Hancke and Kuhn [11]

1.2



12.5



0.7

MAD [4]

5.1



110.5

×

318.8

Swiss-Knife [12]

3.1



4.8



24.5

Munilla et al. [17]

1.7



107.1



4.5

CRCS [24]

3.0



450.1

×

68.6

TREAD [1]

2.4



4.7

×

4.2

We note that, since our semantics is defined over arbitrary metric spaces, not just Euclidean space, it is also necessary to verify that an attack returned by the tool is realizable over Euclidean space. We note that the Mafia and hijacking attacks returned by Maude-NPA in these experiments are all realizable on a straight line, and hence are realizable over n-dimensional Euclidean space for any n. In general, this realizability check can be done via a final step in which the constraints with the Euclidean metric substituted for distance is checked via an SMT solver that supports checking quadratic constraints over the reals, such as Yices [28], Z3 [29], or Mathematica [13]. Although this feature is not yet implemented in Maude-NPA, we have begun experimenting with these solvers.

7

Conclusions

We have developed a timed model for protocol analysis based on timing constraints, and provided a prototype extension of Maude-NPA handling protocols with time by taking advantage of Maude’s support of SMT solvers, as was done by Nigam et al. in [21], and Maude-NPA’s support of constraint handling. We also performed some initial analyses to test the feasibility of the approach. This approach should be applicable to other tools that support constraint handling. There are several ways this work can be extended. One is to extend the ability of the tool to reason about a larger numbers or principals, in particular an unbounded number of principals. This includes an unbounded number of attackers; since each attacker must have its own location, we cannot assume a single attacker as in Dolev-Yao. Our specification and query language, and its semantics, supports reasoning about an unbounded number of principals, so this is a question of developing means of telling when a principal or state is redundant and developing state space reduction techniques based on this.

148

D. Aparicio-S´ anchez et al.

Another important extension is to protocols that require the full Euclidean space model, in particular those in which location needs to be explicitly included in the constraints. This includes for example protocols used for localization. For this, we have begun experimenting with SMT solvers that support solving quadratic constraints over the reals. Looking further afield, we consider adding different types of timing models. In the timing model used in this paper, time is synonymous with distance. But we may also be interested including other ways in which time is advanced, e.g. the amount of time a principal takes to perform internal processing tasks. In our model, the method in which timing is advanced is specified by the mte function, which is in turn used to generate constraints on which messages can be ordered. Thus changing the way in which timing is advanced can be accomplished by modifying the mte function. Thus, potential future research includes design of generic mte functions together with rules on their instantiation that guarantee soundness and completeness Finally, there is also no reason for us to limit ourselves to time and location. This approach should be applicable to other quantitative properties as well. For example, the inclusion of cost and utility would allow us to tackle new classes of problems not usually addressed by cryptographic protocol analysis tools, such as performance analyses (e.g., resistance against denial of service attacks), or even analysis of game-theoretic properties of protocols, thus opening up a whole new set of problems to explore.

References 1. Avoine, G., et al.: A terrorist-fraud resistant and extractor-free anonymous distance-bounding protocol. In Proceedings of the Asia Conference on Computer and Communications Security (AsiaCCS 2017), pp. 800–814. ACM Press (2017) 2. Basin, D.A., Capkun, S., Schaller, P., Schmidt, B.: Formal reasoning about physical properties of security protocols. ACM Trans. Inf. Syst. Securi. 14(2), 16:1–16:28 (2011) 3. Brands, S., Chaum, D.: Distance-bounding protocols. In: Helleseth, T. (ed.) EUROCRYPT 1993. LNCS, vol. 765, pp. 344–359. Springer, Heidelberg (1994). https:// doi.org/10.1007/3-540-48285-7 30 4. Capkun, S., Butty´ an, L., Hubaux, J.-P.: SECTOR: secure tracking of node encounters in multi-hop wireless networks. In: Proceedings of the 1st ACM Workshop on Security of Ad Hoc and Sensor Networks (SASN 2003), pp. 21–32. Association for Computing Machinery (2003) 5. Chothia, T., de Ruiter, J., Smyth, B.: Modelling and analysis of a hierarchy of distance bounding attacks. In: Proceedings of the 27th USENIX Security Symposium (USENIX Security 2018), pp. 1563–1580. USENIX (2018) 6. Clavel, M., et al.: Maude Manual (Version 3.0). Technical report, SRI International Computer Science Laboratory (2020). http://maude.cs.uiuc.edu 7. The CVC4 SMT Solver (2020). https://cvc4.github.io 8. Debant, A., Delaune, S.: Symbolic verification of distance bounding protocols. In: Nielson, F., Sands, D. (eds.) POST 2019. LNCS, vol. 11426, pp. 149–174. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17138-4 7

Protocol Analysis with Time

149

9. Debant, A., Delaune, S., Wiedling, C.: A symbolic framework to analyse physical proximity in security protocols. In: Proceedings of the 38th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2018), Leibniz International Proceedings in Informatics (LIPIcs), vol. 122, pp. 29:1–29:20. Schloss Dagstuhl - Leibniz-Zentrum f¨ ur Informatik (2018) 10. Escobar, S., Meadows, C., Meseguer, J., Santiago, S.: Symbolic protocol analysis with disequality constraints modulo equational theories. In: Bodei, C., Ferrari, G.L., Priami, C. (eds.) Programming Languages with Applications to Biology and Security. LNCS, vol. 9465, pp. 238–261. Springer, Cham (2015). https://doi.org/ 10.1007/978-3-319-25527-9 16 11. Hancke, G.P., Kuhn, M.G.: An RFID distance bounding protocol. In: Proceedings of the 1st IEEE International Conference on Security and Privacy for Emerging Areas in Communications Networks (SecureComm 2005), pp. 67–73. IEEE Computer Society Press (2005) 12. Kim, C.H., Avoine, G., Koeune, F., Standaert, F.-X., Pereira, O.: The Swiss-Knife RFID distance bounding protocol. In: Lee, P.J., Cheon, J.H. (eds.) ICISC 2008. LNCS, vol. 5461, pp. 98–115. Springer, Heidelberg (2009). https://doi.org/10.1007/ 978-3-642-00730-9 7 13. Wolfram Mathematica (2020). https://www.wolfram.com/mathematica 14. Mauw, S., Smith, Z., Toro-Pozo, J., Trujillo-Rasua, R.: Distance-bounding protocols: verification without time and location. In: Proceedings of the 39th IEEE Symposium on Security and Privacy (S&P 2018), pp. 549–566. IEEE Computer Society Press (2018) 15. Meadows, C., Poovendran, R., Pavlovic, D., Chang, L.W., Syverson, P.: Distance bounding protocols: authentication logic analysis and collusion attacks. In: Poovendran, R., Roy, S., Wang, C. (eds.) Secure Localization and Time Synchronization for Wireless Sensor and Ad Hoc Networks: Advances in Information Security, vol. 30, pp. 279–298. Springer, Boston (2007). https://doi.org/10.1007/978-0387-46276-9 12 16. Meier, S., Schmidt, B., Cremers, C., Basin, D.: The TAMARIN prover for the symbolic analysis of security protocols. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 696–701. Springer, Heidelberg (2013). https://doi.org/ 10.1007/978-3-642-39799-8 48 17. Munilla, J., Peinado, A.: Distance bounding protocols for RFID enhanced by using void-challenges and analysis in noisy channels. Wirel. Commun. Mob. Comput. 8(9), 1227–1232 (2008) 18. Neumann, C., Yu, T., Hartman, S., Raeburn, K.: The Kerberos network authentication service (V5). Request Comments 4120, 1–37 (2005) 19. Nieuwenhuis, R., Oliveras, A., Tinelli, C.: Solving SAT and SAT modulo theories: from an abstract Davis-Putnam-Logemann-Loveland procedure to DPLL(T). Commun. ACM 53(6), 937–977 (2006) 20. Nigam, V., Talcott, C., Aires Urquiza, A.: Towards the automated verification of cyber-physical security protocols: bounding the number of timed intruders. In: Askoxylakis, I., Ioannidis, S., Katsikas, S., Meadows, C. (eds.) ESORICS 2016. LNCS, vol. 9879, pp. 450–470. Springer, Cham (2016). https://doi.org/10.1007/ 978-3-319-45741-3 23 21. Nigam, V., Talcott, C., Urquiza, A.A.: Symbolic timed observational equivalence. Computing Research Repository, abs/1801.04066 (2018) 22. Paulson, L.C.: The inductive approach to verifying cryptographic protocols. J. Comput. Secur. 6(1–2), 85–128 (1998)

150

D. Aparicio-S´ anchez et al.

23. Perrig, A., Song, D., Canetti, R., Tygar, J.D., Briscoe, B.: Timed efficient stream loss-tolerant authentication (TESLA): multicast source authentication transform introduction. Request Comments 4082, 1–22 (2005) 24. Rasmussen, K.B., Capkun, S.: Realization of RF distance bounding. In: Proceedings of the 19th USENIX Security Symposium (USENIX Security 2010), pp. 389– 402. USENIX (2010) 25. Thayer, F.J., Herzog, J.C., Guttman, J.D.: Strand spaces: proving security protocols correct. J. Comput. Secur. 7(1), 191–230 (1999) 26. Yang, F., Escobar, S., Meadows, C., Meseguer, J.: Strand spaces with choice via a process algebra semantics. Computing Research Repository, abs/1904.09946 (2019) 27. Yang, F., Escobar, S., Meadows, C., Meseguer, J., Santiago, S.: Strand spaces with choice via a process algebra semantics. In: Proceedings of the 18th International Symposium on Principles and Practice of Declarative Programming (PPDP 2016), pp. 76–89. ACM Press (2016) 28. The Yices SMT Solver (2020). https://yices.csl.sri.com 29. The Z3 SMT Solver (2020). https://github.com/Z3Prover/z3

Verifpal: Cryptographic Protocol Analysis for the Real World Nadim Kobeissi1(B) , Georgio Nicolas1 , and Mukesh Tiwari2 1

Symbolic Software, Paris, France [email protected]

2

University of Melbourne, Melbourne, Australia

Abstract. Verifpal is a new automated modeling framework and verifier for cryptographic protocols, optimized with heuristics for common-case protocol specifications, that aims to work better for real-world practitioners, students and engineers without sacrificing comprehensive formal verification features. In order to achieve this, Verifpal introduces a new, intuitive language for modeling protocols that is easier to write and understand than the languages employed by existing tools. Its formal verification paradigm is also designed explicitly to provide protocol modeling that avoids user error. Verifpal is able to model protocols under an active attacker with unbounded sessions and fresh values, and supports queries for advanced security properties such as forward secrecy or key compromise impersonation. Furthermore, Verifpal’s semantics have been formalized within the Coq theorem prover, and Verifpal models can be automatically translated into Coq as well as into ProVerif models for further verification. Verifpal has already been used to verify security properties for Signal, Scuttlebutt, TLS 1.3 as well as the first formal model for the DP-3T pandemic-tracing protocol, which we present in this work. Through Verifpal, we show that advanced verification with formalized semantics and sound logic can exist without any expense towards the convenience of real-world practitioners. Keywords: Formal analysis

1

· Protocol analysis · Protocol modeling

Introduction

Internet communications rely on a handful of protocols, such as Transport Layer Security (TLS), SSH and Signal, in order to keep user data confidential. These protocols often aim to achieve ambitious security properties (such as post-compromise security [30]) across complex use-cases (such as support for message synchronization across multiple devices.) Given the broad set of operations and states supported by these protocols, verifying that they do indeed achieve their desired security goals across all use-case scenarios has proven to be non-trivial [14,17,18]. c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 151–202, 2020. https://doi.org/10.1007/978-3-030-65277-7_8

152

N. Kobeissi et al.

Automated formal verification tools have seen an encouraging success in helping to model the security of these protocols. Recently, the Signal secure messaging protocol [54], the TLS 1.3 web encryption standard [15], the 5G wireless communication standard [9,32], the Scuttlebutt decentralized messaging protocol [35], the Bluetooth standard [35], the Let’s Encrypt certificate issuance system [16,51], the Noise Protocol Framework [48,55] and the WireGuard [44] Virtual Private Network (VPN) protocol [59] have all been analyzed using automated formal verification (Fig. 1).

Fig. 1. Comparison between Verifpal and other tools for symbolic security analysis, using established impartial third-party criteria [7]. Verifpal analysis supports unbounded executions (including interleaving protocol sessions), equational theory (although not as refined as Tamarin’s), mutable principal states, trace properties and is able to link results to implementations via Coq (and soon Go). Verifpal does not support equivalence properties at the same level as Maude-NPA, ProVerif, Tamarin and DEEPSPEC, but does offer queries for notions of unlinkability between values. Verifpal also focuses on providing a substantially more intuitive overall framework for real-world protocol modeling and analysis through its language and built-in primitive definitions, although such a claim is more tricky to compare.

Despite this increase in the usage of formal verification tools, and despite the success obtained with this approach, automated formal verification technology remains unused outside certain specific realms of academia: an illustrative fact is that almost all of the example results cited above have, as a co-author, one of the designers of the automated formal verification tool that was used to obtain the research result. We conjecture that this lack of adoption is leading an increase in the number of weaknesses in cryptographic protocols: in the case of TLS, protocol designers did not use formal verification technology in the protocol’s design phase up until TLS 1.3, and that was only due to automated formal verification helping discover a large number of attacks in TLS 1.2 and below [14, 15,17], and was, again, only accomplished via collaboration with the designers of the formal verification tools themselves. 1.1

Verifpal’s Design Goals

It is important to discern that Verifpal does not aim to produce security proofs in the traditions of tools such as CryptoVerif [19]. In deciding Verifpal’s priorities,

Verifpal: Cryptographic Protocol Analysis for the Real World

153

we slam the brakes at the moment where the learning curve, effort and analysis cost begin to have strongly diminishing returns for the user while still maintaining a responsible level of rigor via a formal treatment of Verifpal’s semantics and analysis methodology. Our bet is that this path forward for Verifpal will lead to a hugely more substantial impact for engineers and practitioners than traditional automated proof modeling tools. In this paper, we will for example see how Verifpal makes compromises in analysis completeness that preclude its ability to output full proofs but that greatly increase the likelihood of analysis termination (a significant problem for tools such as ProVerif) without having an apparently significant impact on the analysis of real-world, non-Ivory-Tower protocols. Verifpal is able to analyze the security of complex protocols, such as Signal, and query for complex attack scenarios such as post-compromise security and key compromise impersonation, across unbounded session executions of the protocol and with fresh values not being shared across sessions. By giving practitioners this powerful symbolic analysis paradigm in an intuitive package, Verifpal stands a chance at making symbolic formal verification a staple in the diet of any protocol designer. 1.2

Simplifying Protocol Analysis with Verifpal

Extensive experience with automated formal verification tools has led us to the hypothesis that the prerequisite knowledge, modeling languages and structure in which the tools formalize their results are a significant barrier against wider adoption. Verifpal is an attempt to overcome this barrier. Building upon contemporary research in symbolic formal verification, Verifpal’s main aim is to appeal more to real-world practitioners, students and engineers without sacrificing comprehensive formal verification features. Verifpal has four main design principles: An Intuitive Language for Modeling Protocols. Verifpal’s internal logic relies on the deconstruction and reconstruction of abstract terms, similar to existing symbolic verification tools. However, it reasons about the protocol model with explicit principals: Alice and Bob exist, they have independent states, they know certain values and perform operations with cryptographic primitives. They send messages to each other over the network, and so on. The Verifpal language is meant to illustrate protocols close to how one may describe them in an informal conversation, while still being precise and expressive enough for formal modeling. We argue that this paradigm extends beyond mere convenience, but extends protocol modeling and verification towards a necessary level of intuitiveness for real adoption. Modeling That Avoids User Error. Verifpal does not allow users to define their own cryptographic primitives. Instead, it comes with built-in cryptographic functions: ENC and DEC representing encryption and decryption, AEAD_ENC and AEAD_DEC representing authenticated encryption and decryption, RINGSIGN and SIGN representing asymmetric primitives, etc.—this is meant to remove the

154

N. Kobeissi et al.

potential for users to define fundamental cryptographic operations incorrectly. Verifpal also adopts a global name-space for all constants and does not allow constants to be redefined or assigned to one another. This enforces models that are clean and easy to follow. Furthermore, Sect. 3.1 briefly describes Verifpal’s use of heuristics in order to avoid non-termination due to state space explosion, a common problem with automated protocol verification tools. Easy to Understand Analysis Output. Existing tools provide “attack traces” that illustrate a deduction using session-tagged values in a chain of symbolic deconstructions. Verifpal follows a different approach: while it is analyzing a model, it outputs notes on which values it is able to deconstruct, conceive of, or reconstruct. When a contradiction is found for a query, the result is related in a readable format that ties the attack to a real-world scenario. This is done by using terminology to indicate how the attack could have been possible, such as through a mayor-in-the-middle attack on ephemeral keys. Compatibility with the Coq Theorem Prover. The Verifpal language and passive attacker analysis methodology has recently been formalized within the Coq theorem prover [13]. Consequently, Verifpal models can be automatically translated within Coq using the Verifpal software. This allows for further analysis in more established frameworks while also granting a higher level of confidence in Verifpal’s analysis methodology. Currently, the Coq work provides a complete formalized illustration of the Verifpal language semantics and of Verifpal analysis under a passive attacker. Our eventual goal is to use Coq as an attestation layer to Verifpal’s soundness logic and show that Verifpal analysis results can be attested as sound via the generated Coq implementations. In addition, Verifpal models can also be translated into ProVerif models for an additional parallel verification venue. 1.3

Related Work

Verifpal arrives roughly two decades since automated formal verification became a research focus. Here, we outline some of the more pertinent formal verification tools, use cases and broader methodologies this research area has seen, and which Verifpal aims to supersede in terms of accessibility and real-world usability. Verifpal is heavily inspired by the ProVerif [21,22] protocol verifier, designed by Bruno Blanchet. It does not construct all terms out of Horn clauses [26] in the way that ProVerif does, and it does not use the applied pi-calculus [1] as its modeling language. However, its analysis logic is inspired by ProVerif and is similarly based on the Dolev-Yao model [43]. ProVerif’s construction/deconstruction/rewrite logic is also mirrored in Verifpal’s own design. ProVerif has been recently used to formally verify TLS 1.2 and TLS 1.3 [15], Let’s Encrypt’s ACME certificate issuance protocol [16], the Signal secure messaging protocol [54], the Noise Protocol Framework [55], the Plutus network filesystem [23], e-voting protocols [5,31,33,41], FIDO [63] and many more use cases. The Tamarin [66] protocol prover also works under the symbolic model, but derives the progeny of its analysis from principals’ state transitions rather than

Verifpal: Cryptographic Protocol Analysis for the Real World

155

from the viewpoint of an attacker observing and manipulating network messages. It is also different from ProVerif in its analysis style, and its modeling language is unique within the domain. Tamarin has been recently used to formally verify Scuttlebutt [35], TLS [34], WireGuard [45], 5G [9,32], the Noise Protocol Framework [48], multiple e-voting protocols [10,25] and many more use cases. Scyther1 [11,38], whose authors also work on Tamarin, offers unbounded verification with guarantees of termination but uses a more accessible and explicit modeling language than Tamarin. Scyther has been used to analyze IKEv1 and IKEv2 [39] (used in IPSec), a large amount of Authenticated Key Exchange (AKE) protocols such as HMQV, UM and NAXOS [8], and to check for “multiprotocol attacks” [37]. Research focus seems to be moving towards Tamarin, but Scyther is still sometimes used. AVISPA [4]’s modeling language is somewhat similar to Verifpal’s: both have a focus on describing “actors” with “roles”, and explicitly attempt to allow the user to illustrate the protocol intuitively, as if describing actors in a theatrical play. Despite this, work on AVISPA seems to have largely moved to a successor tool, AVANTSSAR [3] which shares many of the same authors. In 2016, a new authentication protocol was designed and prototyped with AVISPA [2]. In 2011, Facebook’s Connect single sign-on protocol was modeled with AVISPA [60]. FDR [47] is not specifically a protocol verifier, but rather a refinement and equivalence checker for processes written using the Communicating Sequential Processes language [50]. CSP can be used to illustrate processes that capture secure channel protocols, and security queries can be illustrated as refinements or properties resulting from these processes. In that sense, FDR can act as a protocol verifier. In 2014, an RFID authentication protocol was formally verified using FDR [69]. A performance analysis of symbolic formal verification tools by Lafourcade and Pus [56], conducted in 2015, as well as a preceding study by Cremers and Lafourcade in 2011 [36] found mixed results, with ProVerif coming out on top more often than not. ProVerif and Tamarin appear to be the current titans of the symbolic verification space, and they tend to compliment each other due to diverging design decisions: for example, ProVerif does not require human assistance for verification, but sometimes may not terminate and may also sometimes find false attacks (although it is proven not to miss attacks.) Tamarin, on the other hand, claims to always yield a proof or an attack, but may require human assistance, therefore making it less suited for fully automated analysis—in some cases, fully automated analysis can be necessary to achieve certain research goals [55]. 1.4

Formal Verification Paradigms

Verifpal, as well as all of the tools cited above, analyze protocols in the symbolic model. There are other methodologies in which to formally verify protocols, 1

Not to be confused with the bug/flying-type Pok´emon of the same name, which, despite its “ninja-like agility and speed” [62], does not appear to have published work in formal verification.

156

N. Kobeissi et al.

including the computational model or, for example, by using SMT solvers. We choose the symbolic model as the focus of our research due to its academic success record in verifying contemporary protocols and due to its propensity for fully automated analysis. It should be noted, however, that more precise analysis can often be achieved using the aforementioned formal verification methodologies. Cryptographers, on the other hand, prefer to use computational models and do their proofs by hand. A full comparison between these styles [20] is beyond the scope of this work; here we briefly outline their differences in terms of the tools currently used in the field. ProVerif, Tamarin, AVISPA and other tools analyze symbolic protocol models, whereas tools such as CryptoVerif [19] verify computational models. The input languages for both types of tools can be similar. However, in the symbolic model, messages are modeled as abstract terms. Processes can generate new nonces and keys, which are treated as atomic opaque terms that are fresh and unguessable. Functions map terms to terms. For example, encryption constructs a complex term from its arguments (key and plaintext) that can only be deconstructed by decryption (with the same key). In ProVerif, for example, the attacker is an arbitrary process running in parallel with the protocol, which can read and write messages on public channels and can manipulate them symbolically. In the computational model, messages are concrete bit-strings. Freshly generated nonces and keys are randomly sampled bit-strings that the attacker can guess with some probability (depending on their length). Encryption and decryption are functions on bit-strings to which we may associate standard cryptographic assumptions such as IND-CCA. The attacker is a probabilistic polynomial-time process running in parallel. The analysis techniques employed by the two tools are quite different. Symbolic verifiers search for a protocol trace that violates the security goal, whereas computational model verification tries to construct a cryptographic proof that the protocol is equivalent (with high probability) to a trivially secure protocol. Symbolic verifiers are easy to automate, while computational model tools, such as CryptoVerif, are semi-automated: it can search for proofs but requires human guidance for non-trivial protocols. Queries can also be modeled similarly in symbolic and computational models as between events, but analysis differs: in symbolic analysis, we typically ask whether the attacker can derive a secret, whereas in the computational model, we ask whether it can distinguish a secret from a random bit-string. Recently, the F programming language [65], which exports type definitions to the Z3 theorem prover [40], has been used to produce implementations of TLS [65] and Signal that are formally verified for functional correctness at the level of the implementation itself [64]. 1.5

Contributions

We present the following contributions:

Verifpal: Cryptographic Protocol Analysis for the Real World

157

– In Sect. 1, we introduce Verifpal and provide a comparison against existing automated verification tools in the symbolic model, as well as a recap of the current state of the art. – In Sect. 2, we introduce the Verifpal modeling language and provide some justifications for the language’s design choices as well as examples. – In Sect. 3, we discuss Verifpal’s protocol analysis logic and whether we can be certain that Verifpal will not miss an attack on a protocol model. We also show that Verifpal can find attacks on sophisticated protocols, matching results previously obtained in ProVerif, and demonstrate Verifpal’s improved protocol analysis trace output which makes discovered attacks easier to discern for the user. – In Sect. 4, we provide the first formal model of the DP-3T decentralized pandemic-tracing protocol [68], written in Verifpal, with queries and results on unlinkability, freshness, confidentiality and message authentication. – In Sect. 5, we introduce Verifpal’s Coq compatibility layer. We show how Verifpal’s semantics and verification logic (for passive attacker only) are captured in the Coq theorem prover, as well as how Verifpal can translate arbitrary Verifpal models into Coq and ProVerif for further analysis. – In Sect. 6, we conclude with a discussion of future work. Verifpal is available as free and open source software at https://verifpal.com. In addition, Verifpal provides a Visual Studio Code extension that enables it to function as an IDE for the modeling, analysis and verification of cryptographic protocols (Fig. 2).

Fig. 2. A complete example Verifpal model of a simple protocol is shown on the left.

2

The Verifpal Language

Verifpal’s language is meant to be simple while allowing the user to capture comprehensive protocols. We posit that an intuitive language that reads similarly

158

N. Kobeissi et al.

to regular descriptions of secure channel protocols will provide a valuable asset in terms of modeling cryptographic protocols, and design Verifpal’s language around that assertion. This is radically different from how the languages of tools such as ProVerif and Tamarin are designed: the former is derived from the applied-pi calculus and the latter from a formalism of state transitions, making it reasonable to say that readability and intuitiveness were not the primary goals of these languages. When describing a protocol in Verifpal, we begin by defining whether the model will be analyzed under a passive or active attacker. Then, we define the principals engaging in activity other than the attacker. These could be Alice and Bob, a Server and one or more Clients, etc. Once we have described the actions of more than one principal, it’s time to illustrate the messages being sent across the network. Then, after having illustrated the principals’ actions and their messages, we may finally describe the questions, or queries (can a passive attacker read the first message that Alice sent to Bob? Can Alice be impersonated by an active attacker?) that we will ask Verifpal. 2.1

Primitives in Verifpal

In Verifpal, cryptographic primitives are essentially “perfect”. That is to say, hash functions are perfect one way functions, and not susceptible to something like length extension attacks. It is also not possible to model for, say, encryption primitives that use 40-bit keys, which could be guessed easily, since encryption functions are perfect pseudo-random permutations, and so on. Internally in Verifpal’s standard implementation, all primitives are defined using a common spec called PrimitiveSpec which restricts how they can be expressed to a set of common rules. Aside from information such as the primitive’s names, arity and number of outputs, each PrimitiveSpec defines a primitive solely via a combination of four standard rules: – Decompose. Given a primitive’s output and a defined subset of its inputs, reveal one of its inputs. (Given ENC(k, m) and k, reveal m). – Recompose. Given a subset of a primitive’s outputs, reveal one of its inputs. (Given a, b, reveal x if a,b,_ = SHAMIR_SPLIT(x)). – Rewrite. Given a matching defined pattern within a primitive’s inputs, rewrite the primitive expression itself into a logical subset of its inputs. (Given DEC(k, ENC(k, m)), rewrite the entire expression DEC(k, ENC(k, m)) to m). – Rebuild. Given a primitive whose inputs are all the outputs of some same other primitive, rewrite the primitive expression itself into a logical subset of its inputs. (Given SHAMIR_JOIN(a, b) where a, b, c = SHAMIR_SPLIT(x), rewrite the entire expression SHAMIR_JOIN(a, b) to x). If analyzing under a passive attacker, then Verifpal will only execute the model once. Therefore, if a checked primitive fails, the entire verification procedure will abort. Under an active attacker, however, Verifpal is forced to execute the model once over for every possible permutation of the inputs that can be

Verifpal: Cryptographic Protocol Analysis for the Real World

159

affected by the attacker. Therefore, a failed checked primitive may not abort all executions—and messages obtained before the failure of the checked primitive are still valid for analysis, perhaps even in future sessions. Messages, Guarded Constants, Checked Primitives and Phases. Sending messages over the network is simple. Only constants may be sent within messages: Example: Messages Alice→ Bob: ga, e1 Bob→ Alice: [gb], e2

In the first line of the above, Alice is the sender and Bob is the recipient. Notice how Alice is sending Bob her long-term public key ga = Gˆa. An active attacker could intercept ga and replace it with a value that they control. But what if we want to model our protocol such that Alice has pre-authenticated Bob’s public key gb = Gˆb? This is where guarded constants become useful. In the second message from the above example, we see that gb is surrounded by brackets ([]). This makes it a “guarded” constant, meaning that while an active attacker can still read it, they cannot tamper with it. In that sense it is “guarded” against the active attacker. In Verifpal, ASSERT, SPLIT, AEAD_DEC, SIGNVERIF and RINGSIGNVERIF are “checkable” primitives: if we add a question mark (?) after one of these primitives, then the model execution will abort should AEAD_DEC fail authenticated decryption, or should ASSERT fail to find its two provided inputs equal, or should SIGNVERIF fail to verify the signature against the provided message and public key. For example: SIGNVERIF(k, m, s)? makes this instantiation of SIGNVERIF a “checked” primitive. Phases allow Verifpal to express notions of temporal logic, which allow for reliable modeling of post-compromise security properties such as forward secrecy or future secrecy. When modeling with an active attacker, a new phase can be declared: Example: Phases Bob→ Alice: b1 phase[1] principal Alice[leaks a2]

In the above example, the attacker won’t be able to learn a2 until the execution of everything that occurred in phase 0 (the initial phase of any model) is concluded. Furthermore, the attacker can only manipulate a2 within the confines of the phases in which it is communicated. That is to say, the attacker will have knowledge of b1 when doing analysis in phase 1, but won’t be able to manipulate b1 in phase 1. The attacker won’t have knowledge of a2 during phase 0, but will be able to manipulate b1 in phase 0. Phases are useful to model scenarios where, for example, the attacker manages to steal Alice’s keys strictly after a protocol has been executed, allowing the attacker to use their knowledge of that key material, but only outside of actually injecting it into a running protocol session.

160

N. Kobeissi et al.

Values are learned at the earliest phase in which they are communicated, and can only be manipulated within phases in which they are communicated, which can be more than one phase since Alice can for example send a2 later to Carol, to Damian, etc. Importantly, values derived from mutations of b1 in phase 0 cannot be used to construct new values in phase 1. 2.2

Queries

Here are examples of three different types of queries: Simple Example Protocol: Queries queries[ confidentiality? m1 authentication? Bob→ unlinkability? ga, m1 ]

Alice: e1

The above example is drawn from Verifpal’s current four query types: – Confidentiality Queries: Confidentiality queries are the most basic of all Verifpal queries. We ask: “can the attacker obtain m1?”—where m1 is a sensitive message. If the answer is yes, then the attacker was able to obtain the message, despite it being presumably encrypted. When used in conjunction with phases, confidentiality queries can however be used to model for advanced security properties such as forward secrecy. – Authentication Queries: Authentication queries rely heavily on Verifpal’s notion of “checked” or “checkable” primitives. Intuitively, the goal of authentication queries is to ask whether Bob will rely on some value e1 in an important protocol operation (such as signature verification or authenticated decryption) if and only if he received that value from Alice. If Bob is successful in using e1 for signature verification or a similar operation without it having been necessarily sent by Alice, then authentication is violated for e1, and the attacker was able to impersonate Alice in communicating that value. – Freshness Queries: Freshness queries are useful for detecting replay attacks, where an attacker could manipulate one message to make it seem valid in two different contexts. In passive attacker mode, a freshness query will check whether a value is “fresh” between sessions (i.e. if it has at least one composing element that is generated, non-static). In active attacker mode, it will check whether a value can be rendered “non-fresh” (i.e. static between sessions) and subsequently successfully used between sessions. – Unlinkability Queries: Protocols such as DP-3T (see Sect. 4), voting protocols and RFID-based protocols posit an “unlinkability” security property on some of their components or processes. Definitions for unlinkability vary wildly despite the best efforts of researchers [6,49,67], but in Verifpal, we adopt the following definition: “for two observed values, the adversary cannot distinguish between a protocol execution in which they belong to the same user and a protocol execution in which they belong to two different users.”

Verifpal: Cryptographic Protocol Analysis for the Real World

2.3

161

Query Options

Imagine that we want to check if Alice will only send some message to Carol if it has first authenticated it from Bob. This can be accomplished by adding the precondition option to the authentication query for e: Query Options Example queries[authentication? Bob→ Alice: e[ precondition[Alice→ Carol: m2]]]

The above query essentially expresses: “The event of Carol receiving m2 from Alice shall only occur if Alice has previously received and authenticated an encryption of m2 as coming from Bob.”

3

Analysis in Verifpal

Verifpal’s active attacker analysis methodology follows a simple set of procedures and algorithms. The overall process is comprised of five steps (see Fig. 3 for an illustration): 1. Gather values. Attacker passively observes a protocol execution and gathers all values shared publicly between principals. 2. Insert learned values into attacker state. Attacker’s state (VA ) obtains newly learned values. 3. Apply transformations. Attacker applies the four transformations (detailed below) on all obtained values. 4. Prepare mutations for next session. If the attacker has learned new values due to the transformations executed in the previous step, they create a combinatorial table of all possible value substitutions, and from that, derive a set of all possible value substitutions across future executions of the protocol on the network. 5. Iterate across protocol mutations. Attacker proceeds to execute the protocol across sessions, each time “mutating” the execution by mayor-in-themiddling a value. Attacker then returns to step 1 of this list. The process continues so long as the attacker keeps learning new values. After each step, Verifpal checks to see if it has found a contradiction to any of the queries specified in the model and informs the user if such a contradiction is found. The four main transformations mentioned above are the following: – Resolve. Resolves a certain constant to its assigned value (for example, a primitive or an equation). Executed on VA , the set of all values known by the attacker. – Deconstruct. Attempts to deconstruct a primitive or an equation. In order to deconstruct a primitive, the attacker must possess sufficient values to satisfy the primitive’s rewrite rule. For example, the attacker must possess k and e in order to obtain m by deconstructing e = ENC(k, m) with k. In order to reconstruct an equation, the attacker must similarly possess all but one private exponent. Executed on VA , the set of all values known by the attacker.

162

N. Kobeissi et al.

Fig. 3. Verifpal analysis methodology. On the left, the three fundamental types usable in Verifpal models are illustrated. As noted in Sect. 2.1, all primitives are defined via a standard PrimitiveSpec structure with four logical rules. On the right, a model analysis is illustrated: first, the Verifpal model is parsed and translated into a global immutable “knowledge map” structure from which a “principal state” is derived for each declared principal. Based on the messages exchanged between these principal states, the attacker obtains values to which it can recursively apply the four transformations discussed in Sect. 3 before executing mutated sessions while still following the heuristics touched upon in Sect. 3.1, until it is unable to learn new values.

– Reconstruct. Attempts to reconstruct primitives and equations given that the attacker possesses all of the component values. Executed on VA , the set of all values known by the attacker, as well as on VP , the values known by the principal whose state is currently being evaluated by the attacker. – Equivalize. Determines if the attacker can reconstruct or equivalize any values within VP from VA . If so, then these equivalent values are added to VA . Verifpal’s goal is to obtain as many values as it is logically possible from their viewpoint as an attacker on the network. As a passive attacker, Verifpal can only do this by deconstructing the values made available as they are shared between principals, and potentially reconstructing them into different values. As an active attacker, Verifpal can modify unguarded constants as they cross the network. Each modification could result in learning new values, so an unbounded number of modifications can occur over an unbounded number of protocol executions.

Verifpal: Cryptographic Protocol Analysis for the Real World

163

“Fresh” (i.e. generated) values are not kept across different protocol executions, as they are assumed to be different for every session of the protocol. An active attacker can also generate their own values, such as a key pair that they control, and fabricate new values that they use as substitutes for any unguarded constants sent between principals. If, during a protocol execution, a checked primitive fails, that session execution is aborted and the attacker moves on to the next one. However, values obtained thus far in that particular session execution are kept. 3.1

Preventing State Space Explosion

A common problem among symbolic model protocol verifiers is that for complex protocols, the space of the user states and value combinations that the verifier must assess becomes too large for the verifier to terminate in a reasonable time. Verifpal optimizes for this problem via certain heuristic techniques: first, Verifpal separates its analysis into a number of stages in which it gradually allows itself to modify more and more elements of principals’ states. Only in later stages are the internal values of certain primitives (which are labeled “explosive” in their PrimitiveSpec) mutated. Verifpal also imposes other restrictions, such as limiting the maximum number of inputs and outputs of any primitive to five. Thus, Verifpal achieves unbounded state analysis, similarly to ProVerif, but also applies a set of heuristics that are hopefully more likely to achieve termination in a more reasonable time for large models (such as those seen for TLS 1.3 or Signal with more than three messages). Verifpal also leverages multi-threading and other such techniques to achieve faster analysis. Verifpal’s stages segment its search strategy in essentially the following way, with the aim to hold back infinite mutation recursion depth as far as possible, unless queries cannot be contradicted without it: – Stage 1: All of the elements of passive attacker analysis, plus constants and equation exponents may be mutated to nil only and not to each other (for equations, this means that gˆa mutates to gˆnil but not to gˆb). – Stage 2: All of the elements of Stage 1, plus non-explosive primitives are mutated but without exceeding a call depth that is pre-determined in relation to the way in which they were employed by principals in the Verifpal model. For example, HASH(HASH(x)) will not mutate to HASH(HASH(HASH(y))) (since the call depth is deeper in the mutation), and ENC(HASH(k), Gˆy) will not mutate to ENC(PW_HASH(k), k) (since the “skeleton” of the original primitive does not employ PW_HASH, but HASH, and employs an equation (Gˆy) as the second argument and not a constant (k)). – Stage 3: All of the elements of Stage 2, with the inclusion of explosive primitives. – Stage 4: All of the elements of Stage 3, with the addition of constants and equation exponents being replaced with one another and not just nil. – Stage 4 and beyond: All of the elements of Stage 3, with the addition of primitives being allowed a mutation depth of n − 3 where n represents the

164

N. Kobeissi et al.

current Stage, so long as the resulting mutations have the same “skeleton” as defined in Stage 2. 3.2

Analysis Results of Real-World Protocols

It is important to understand that the measures Verifpal takes to encourage analysis termination, as touched upon earlier in Sect. 3.1, do not affect the comprehensiveness of results that Verifpal can obtain from the analysis of real-world protocols. Verifpal ships with an integration testing suite comprised of 54 testing protocols. Of these, we highlight the following non-trivial protocols which have also been modeled in other symbolic analysis tools: – Signal is modeled in Verifpal as well as in ProVerif [54] and in Tamarin [29]. All three analyses obtain matching results when checking for message confidentiality, authentication and post-compromise security. Post-compromise security is modeled using temporal logic (see Sect. 2.1) in all three analysis frameworks. – Scuttlebutt is modeled in Verifpal as well as in ProVerif [57], CryptoVerif and in Tamarin [35]. All three analyses obtain matching results when checking for message confidentiality and authentication queries. – Verifpal also obtained results matching state-of-the art analysis on the Telegram MTProto “secure chat” protocol [52,58], Firefox Sync and ProtonMail’s email encryption feature towards recipients that do not use ProtonMail [53]. Queries were contradicted (or not contradicted) in the same scenarios across Verifpal, ProVerif and Tamarin, depending on which key materials were leaked, and when. Various forms of partial state leakage were tested. Aside from the above relatively sophisticated protocols, Verifpal obtained matching results on many variants of Needham-Schroeder, the “FFGG” “parallel attack” protocol discussed in Sect. 3.3, and over 50 other test protocols, some of which are mirrored in ProVerif’s own test suite. Finally, Verifpal was used by the popular Zoom telecommunications software in May 2020 during the entire conception and design process of their revised [24] end-to-end encryption protocol. During this collaboration, Verifpal not only helped the Zoom team design their protocol from scratch but also spotted non-obvious attacks which the Zoom team were able to fix prior to publication. 3.3

Improving Readability of Protocol Analysis Traces

ProVerif and Verifpal both ship with a set of example protocol models. Of those protocol models, the “FFGG” protocol [61] is included due to it requiring a parallel attack in order for confidentiality queries to be contradicted. We take Verifpal and ProVerif’s models of FFGG and modify them to be as functionally and structurally similar as possible.2 2

The full Verifpal and ProVerif FFGG models are available at https://source. symbolic.software/verifpal/verifpal/-/tree/master/examples.

Verifpal: Cryptographic Protocol Analysis for the Real World

165

Fig. 4. ProVerif and Verifpal attack traces for the protocol discussed in Sect. 3.3.

Figure 4 shows a ProVerif trace compared to a Verifpal trace for the confidentiality query contradiction on message m in FFGG.3 The Verifpal trace shows the two parallel sessions required for the attack to be pulled off, clearly noting which values had to be mutated by the attacker alongside their original resolved values. Each session ends with the message that had to be obtained by the attacker and re-used in the following session for the attack to work. In this case, msg2, which resolved to PKE_ENC(Gˆskb, CONCAT(n1, m, n1)), was injected by the active attacker to replace msg in the second session as it traveled across the network. As discussed in Sect. 1.2, one of Verifpal’s design principles is to improve the readability of protocol analysis traces. In line with this goal, Verifpal’s trace also makes it easier to see how the mutation of preceding values affects the resolution of values that are composed of those mutated values. In longer, more complex protocol models, Verifpal is able to still output traces of relatively similar size and simplicity, whereas the growth of complexity and length in ProVerif traces is more substantial.

4

Case Study: Contact Tracing

During the COVID-19 pandemic, a rise was observed in the number of proposals for privacy-preserving pandemic and contact tracing protocols. Arguably the most popular and well-analyzed of these proposals is the Decentralized PrivacyPreserving Proximity Tracing (DP-3T) protocol [68], which aims to “simplify and accelerate the process of identifying people who have been in contact with an infected person, thus providing a technological foundation to help slow the spread of the SARS-CoV-2 virus”, and to “minimize privacy and security risks for individuals and communities and guarantee the highest level of data protection.” 3

ProVerif is also capable of outputting graphical representations of attack traces, which could make them easier to read and understand.

166

4.1

N. Kobeissi et al.

Modeling DP-3T in Verifpal

To demonstrate DP-3T, we will assume that the principals participating in this simulation are the following: – A population of 3 individuals: Alice, Bob, and Charlie, each of them possessing a smartphone: SmartphoneA, SmartphoneB, and SmartphoneC respectively; – A Healthcare Authority serving this population; – A Backend Server, that individuals can communicate with to obtain daily information. We begin by defining an attacker which matches with our security model, which, in this case, is an active attacker. We then proceed to illustrate our model as a sequence of days in which DP-3T is in operation within the life cycle of a pandemic. Day 0: Setup Phase. We assume that no new individuals were diagnosed with the disease on Day 0 of using DP-3T. This means that the Healthcare Authority and the Backend Server will not act at this stage and we can simply ignore them for now. The DP-3T specification states that every principal, when first joining the system, should generate a random secret key (SK) to be used for one day only. For every SK value, and the knowledge of a public “broadcast key” value, principals should compute multiple Unique Ephemeral ID values (EphID) using a combination of a PRG and a PRF. The method of generating EphID is analogous with the HKDF function from Verifpal. We could add the following lines of code to our file in order to model Alice’s SmartphoneA: DP-3T: SmartphoneA, B and C Setup principal SmartphoneA[ knows public BroadcastKey generates SK0A EphID00A, EphID01A, EphID02A = HKDF(nil, SK0A, BroadcastKey) ]

Whenever two principals would come to be in physical proximity of each other, they would automatically exchange EphIDs. Once a principal uses an EphID value, they discard it and use another one when performing an exchange with another principal. Let’s imagine that Alice and Bob came into contact. It would mean that Alice sent EphID00A in a message to Bob and that Bob sent EphID00B to Alice. Further, let’s say that in the conclusion of Day 0, Bob sits behind Charlie in the Bus. Day 1. The Backend Server will automatically publish the SK values of people who were infected to the members of the general population. These values were previously unpublished and thus were private and only known by their generators and the server.

Verifpal: Cryptographic Protocol Analysis for the Real World

167

DP-3T: BackendServer Communication principal BackendServer[ knows private infectedPatients0 ] BackendServer→ SmartphoneA: infectedPatients0 // Also to SmartphoneB/C

Every day starting from Day 1, DP-3T mandates that principals will generate new SK values. The new value will be equal to the hash of the SK value from the day before. Principals will also generate EphIDs just like before. DP-3T: EphID Generation principal SmartphoneA[ SK1A = HASH(SK0A) EphID10A, EphID11A, EphID12A = HKDF(nil, SK1A, BroadcastKey) ] // Similar principal blocks for SmartphoneB/C here

Thankfully, Alice, Bob and Charlie were committed to self-confinement and have stayed at home, so they did not exchange EphIDs with anyone. Day 2. A similar sequence of events takes place. Since it is sufficient to define the values that we will need later on in our model, we will just define a block for Alice. DP-3T: EphID Generation principal SmartphoneA[ SK2A = HASH(SK1A) EphID20A, EphID21A, EphID22A = HKDF(nil, SK2A, BroadcastKey) ]

Fast-Forward to Day 15. Unfortunately, Alice tests positive for COVID-19. Since this breaks the routine that happened between Day 1 and Day 15, we will announce a new phase (see Sect. 2.1) in our protocol model: DP-3T: Declaring a New Phase phase[1]

Alice decides to announce her infection anonymously using DP-3T. This means that she will have to securely communicate SK1A (her SK value from 14 days ago) to the Backend Server, using a unique trigger token provided by the healthcare authority. Assuming that the Backend Server and the Healthcare Authority share a secure connection, and that a key ephemeral_sk has been exchanged off the wire by the Healthcare Authority, Alice, and the Backend Server, the Healthcare Authority will encrypt a freshly generated triggerToken using ephemeral_sk and send it to both Alice and the Backend Server. DP-3T: Sending Tokens to HealthCareAuthority principal HealthCareAuthority[ generates triggerToken knows private ephemeral_sk m1 = ENC(ephemeral_sk, triggerToken) ] HealthCareAuthority→ BackendServer : [m1] HealthCareAuthority→ SmartphoneA : m1

168

N. Kobeissi et al.

Then, Alice would have to use an AEAD cipher to encrypt SK1A using as the key and triggerToken as additional data and send the output to the BackendServer. Note that Alice can only obtain triggerToken after decrypting m1 using ephemeral_sk.

ephemeral_sk

DP-3T: Communicating with BackendServer principal SmartphoneA[ knows private ephemeral_sk m1_dec = DEC(ephemeral_sk, m1) m2 = AEAD_ENC(ephemeral_sk, SK1A, m1_dec) ] SmartphoneA→ BackendServer: m2

4.2

DP-3T Analysis Results

Since SK1A is now shared publicly, the DP-3T software running on anyone’s phone should be able to re-generate all EphID values generated by the owner of SK1A starting from 14 days prior to the day of diagnosis. These values would then be compared with the list of EphIDs they have received. Everyone who came in contact with Alice will therefore be notified that they have exchanged EphIDs with someone who has been diagnosed with the illness without revealing the identity of that person. DP-3T: Queries queries[ // Check if values shared 15 days before testing get flagged confidentiality? EphID02A // Check if Alice’s previous EphIDs can be computed by passerbys confidentiality? EphID10A, EphID11A, EphID12A, EphID20A, EphID21A, EphID22A // Is the server able to Authenticate Alice as the sender of m2? authentication? SmartphoneA→ BackendServer: m2 // Unlinkability of HKDF values unlinkability? EphID02A, EphID00A, EphID01A ]

The results of our initial modeling in Verifpal suggest to us the following: – No EphIDs generated by Alice are known by any parties before Alice announces her illness. – EphID02A remains confidential even after Alice declaring her illness. Note that it was generated 15 days before Alice got tested. – All of the following values EphID10A, EphID11A, EphID12A, EphID20A, EphID21A, EphID22A have been recoverable by an attacker in phase[1] after Alice announces her illness. These results come in line with what is expected from the protocol. We note that the security of communication channels between Healthcare Authorities, Backend Servers, and Individuals have not been defined, and we have placed our hypothetical security conditions in order to focus on quickly sketching the DP-3T protocol. While further analysis will be required in order to better elucidate the extent of the obtained security guarantees, Verifpal radically speeds up this process by allowing for the automated translation of easy-to-write Verifpal models to full-fat Coq and ProVerif models, as discussed in Sect. 5.

Verifpal: Cryptographic Protocol Analysis for the Real World

5

169

Verifpal in Coq and ProVerif

Verifpal’s core verification logic and semantics can be captured in Coq using our Verifpal Coq Library. This library includes high level functions that can be used to perform analysis on any valid protocol modeled using the Verifpal language. Additionally, a Verifpal functionality has been developed that automatically generates Coq code which uses the high level functions from our library, when input with a protocol file. This automates the process of translating Verifpal models into representations that could be further analysed using Coq’s powerful paradigm of constructive logic. Once executed, this Coq code would yield results for the queries defined in the protocol model. Parallel Analysis Confirmation in Coq and ProVerif. In addition to being able to output Coq implementations of Verifpal models, Verifpal is also able to translate Verifpal protocol models into ProVerif models. A similar approach is used: the generated models include a pre-defined library implementing all Verifpal primitives in the applied-pi calculus. ProVerif tables are used to keep track of principal states, with principal blocks being converted to let declarations. A public channel is used to exchange values and to potentially leak them to the attacker. Finally, the top-level process is declared as a parallel execution of all principal let declarations. This latter formulation of the Verifpal model in ProVerif allows us to make use of ProVerif’s ability to model the parallel execution of processes. By providing robust support for automatic translation of arbitrary models into Coq and ProVerif, Verifpal simultaneously allows for its own semantics to be defined more concretely and in relationship to established verification paradigms, while also increasing confidence in its own verification methodology by mirroring its results on security queries within the analysis framework of tools that have existed for decades. Verifpal Semantics in Coq. We define several types to capture all of the primitives of the Verifpal language in Coq. For example, we have defined constant, Principal, and knowledgemap as inductive types to capture the notions of constant, principal and knowledgemap from Verifpal respectively. Whenever a principal declares, generates, assigns, leaks, or receives a message, an item of knowledge would be added to their state. Suppose that Alice wants to send c to Bob, and that the latest knowledgemap contains Alice’s internal state a, ma, ka as well as Bob’s state, most relevantly a. We use send_message to send c from Alice to Bob and thereby update the knowledgemap of both principals. Bob’s state gets updated with the value c, to contain both a and c, after the function is executed. All of the primitives supported by Verifpal are formally specified in our Coq library. Outputs of certain primitives are defined as sub-types of the type constant. As an illustrative example, we demonstrate a lemma that proves decidable equality between elements of type constant. This lemma essentially captures the functionality of the ASSERT core primitive. When Alice performs c = ENC(ka, ma), and then sends c over the wire, we would expect that the decryption of c would only yield the plaintext ma if and

170

N. Kobeissi et al.

only if the key used to decrypt c is the same one that was used for encrypting ma, as defined in our formalization of the DEC primitive (see Sect. 2). We provide additional lemmas to prove that our model satisfies the behavior expected from our primitives. For example, we can prove that DEC(kb, ENC(ka, ma)) would yield ma using the enc_dec Theorem (see Sect. 2). Verifpal Analysis in Coq. Using the functionality provided by the Verifpal Coq library, and the Coq code generation feature of Verifpal, it is possible to perform a symbolic execution of any protocol that can be modeled using Verifpal. In addition, it is possible to independently run the axioms on which our primitives and analysis methodology are defined by simply running the included proofs that are written using the Ltac tactics language supported by Coq. The passive attacker methodology in the Verifpal Coq Library is analogous to that defined in Sect. 3: 1. The attacker can gather values: any value leaked, or declared as public is automatically added to the attacker’s list of knowledge. In addition, any value sent over the wire is known by the attacker. 2. The attacker attempts to apply transformations on the values learned. The definiton of these transformations accompany our primitive definitions and can be independently verifiable. 3. This process is repeated so long as the attacker was able to learn new values. We formalize this methodology using an Attacker inductive type. An instance of type Attacker contains the attacker type, a list of constant values that are known by the attacker, as well as the mutability status for every item of knowledge. constant_meta acts as a wrapper type for constant with the purpose of adding metadata relevant to the declearation of a constant. constant_meta, along with some helper types, is defined as follows: constant_meta elements are stored inside the Principal data structure and constitute the principal’s knowledge. Any value that is transmitted over the wire, is also sent as a constant_meta along with its corresponding metadata. Step 1 of the analysis methodology is modeled with the help of two functions: – absorb_message_attacker enables an Attacker to learn any value when it is being sent over the wire. – absorb_knowledgemap_attacker enables an Attacker to iterate over Principal elements found in the knowledgemap and their lists of constant_meta items. The attacker can learn a constant_meta that they come across strictly if its (l:leak_state) value equals leaked or if its (q:qualifier) equals public, otherwise the value is ignored. At the end of phase[0] of the example protocol, the attacker would have learned the constant c because it was sent over the wire. At the end of phase[1], the attacker would have learned a in addition to c because it was leaked by Alice. In phase[1], the attacker was able to reconstruct HASH1_c a after learning a then consequently attempted DEC(HASH1_c a)c. As discussed earlier, the DEC

Verifpal: Cryptographic Protocol Analysis for the Real World

171

operation would reveal the plaintext if the key provided is equivalent to the encryption key. Developing further we obtain DEC(HASH1_c a) (ENC_c ka ma) then DEC(HASH1_c a) (ENC(HASH1_c a)ma), the attacker would then automatically apply the enc_dec lemma (shown in Sect. 2) to deduce ma and add it to its knowledge. It is worth noting that all transformations that can be applied by the attacker, just like primitives, are accompanied with independently provable lemmas and theorems. Verifpal queries are analogous to decidable processes and help us reason about protocols. The confidentiality query defined would translate to “is the attacker able to obtain the value ma after the protocol is executed?” To answer this, we search in the attacker’s knowledge for a value that is equal to ma using the search_by_name_attacker function; if such a value is found, the query “fails”, otherwise it “passes”. In this case the query would fail, as the attacker was able to obtain ma by applying the methodology from the previous section. Generating a Coq implementation of the protocol discussed will yield an identical result, and could allow the user to verify the soundness of this result by executing the proofs included in the code.

6

Discussion and Conclusion

Verifpal’s focus on prioritizing usability leads it to have no road map to support, for example, declaring custom primitives or rewrite rules as supported in ProVerif and Tamarin. However, future work focuses on giving Verifpal the fine control that tools such as ProVerif can offer over how protocol processes are executed. However, Verifpal has recently managed to gain support for protocol phases and parametrized queries (useful for modeling post-compromise security) as well as querying for unlinkability and other advanced features. Verifpal also ships with a Visual Studio Code extension that turns Verifpal into essentially an IDE for the modeling, development, testing and analysis of protocol models. The extension offers live analysis feedback and diagram visualizations of models being described and supports translating models automatically into Coq. We plan to also launch within the coming weeks support for translating Verifpal models into prototype Go implementations immediately, allowing for live real-world testing of described protocols. Verifpal is also fully capable of supporting a more nuanced definition of primitives recently seen in other symbolic verifiers—for example, recent, more precise models for signature schemes [51] in Tamarin can be fully integrated into Verifpal’s design. We also plan to add support for more primitives as these are suggested by the Verifpal user community. We believe that Verifpal’s verification framework gives it full jurisdiction over maturing its language and feature set, such that it can grow to satisfy the fundamental verification needs of protocol developers without having the barrier-to-entry present in tools such as ProVerif and Tamarin. Verifpal is currently available as free and open source software for Windows, Linux, macOS and FreeBSD, along with a user manual that goes more in-depth into the Verifpal language and analysis methodology, at https://verifpal.com.

172

A

N. Kobeissi et al.

Partial Extract of DP-3T Verifpal Model Automatic Coq Translation

Verifpal: Cryptographic Protocol Analysis for the Real World

173

174

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

175

176

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

177

178

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

179

180

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

181

182

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

183

184

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

185

186

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

187

188

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

189

190

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

191

192

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

193

194

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

195

196

N. Kobeissi et al.

Verifpal: Cryptographic Protocol Analysis for the Real World

197

198

N. Kobeissi et al.

References 1. Abadi, M., Blanchet, B., Fournet, C.: The applied pi calculus: mobile values, new names, and secure communication. J. ACM 65(1), 1:1–1:41 (2018). https://doi. org/10.1145/3127586 2. Amin, R., Islam, S.H., Karati, A., Biswas, G.: Design of an enhanced authentication protocol and its verification using AVISPA. In: 2016 3rd International Conference on Recent Advances in Information Technology (RAIT), pp. 404–409. IEEE (2016) 3. Armando, A., et al.: The AVANTSSAR platform for the automated validation of trust and security of service-oriented architectures. In: Flanagan, C., K¨ onig, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 267–282. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28756-5 19 4. Armando, A., et al.: The AVISPA tool for the automated validation of internet security protocols and applications. In: Etessami, K., Rajamani, S.K. (eds.) CAV 2005. LNCS, vol. 3576, pp. 281–285. Springer, Heidelberg (2005). https://doi.org/ 10.1007/11513988 27 5. Backes, M., Hritcu, C., Maffei, M.: Automated verification of remote electronic voting protocols in the applied pi-calculus. In: IEEE Computer Security Foundations Symposium, pp. 195–209. IEEE (2008) 6. Baelde, D., Delaune, S., Moreau, S.: A method for proving unlinkability of stateful protocols. Ph.D. thesis, Irisa (2020) 7. Barbosa, M., et al.: SoK: computer-aided cryptography. In: IEEE Symposium on Security and Privacy (S&P). IEEE (2021) 8. Basin, D., Cremers, C.: Modeling and analyzing security in the presence of compromising adversaries. In: Gritzalis, D., Preneel, B., Theoharidou, M. (eds.) ESORICS 2010. LNCS, vol. 6345, pp. 340–356. Springer, Heidelberg (2010). https://doi.org/ 10.1007/978-3-642-15497-3 21 9. Basin, D., Dreier, J., Hirschi, L., Radomirovic, S., Sasse, R., Stettler, V.: A formal analysis of 5G authentication. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1383–1396. ACM (2018) 10. Basin, D., Radomirovic, S., Schmid, L.: Alethea: a provably secure random sample voting protocol. In: IEEE 31st Computer Security Foundations Symposium (CSF), pp. 283–297. IEEE (2018) 11. Basin, D., Cremers, C.: Degrees of security: protocol guarantees in the face of compromising adversaries. In: Dawar, A., Veith, H. (eds.) CSL 2010. LNCS, vol. 6247, pp. 1–18. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-64215205-4 1 12. Bengtson, J., Bhargavan, K., Fournet, C., Gordon, A.D., Maffeis, S.: Refinement types for secure implementations. ACM Trans. Program. Lang. Syst. (TOPLAS) 33(2), 1–45 (2011) 13. Bertot, Y., Cast´eran, P.: Interactive Theorem Proving and Program Development: Coq’Art: The Calculus of Inductive Constructions. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-662-07964-5

Verifpal: Cryptographic Protocol Analysis for the Real World

199

14. Beurdouche, B., et al.: A messy state of the union: taming the composite state machines of TLS. In: IEEE Symposium on Security and Privacy (S&P), pp. 535– 552. IEEE (2015) 15. Bhargavan, K., Blanchet, B., Kobeissi, N.: Verified models and reference implementations for the TLS 1.3 standard candidate. In: IEEE Symposium on Security and Privacy (S&P), pp. 483–502. IEEE (2017) 16. Bhargavan, K., Delignat-Lavaud, A., Kobeissi, N.: Formal modeling and verification for domain validation and ACME. In: Kiayias, A. (ed.) FC 2017. LNCS, vol. 10322, pp. 561–578. Springer, Cham (2017). https://doi.org/10.1007/978-3-31970972-7 32 17. Bhargavan, K., Lavaud, A.D., Fournet, C., Pironti, A., Strub, P.Y.: Triple handshakes and cookie cutters: breaking and fixing authentication over TLS. In: IEEE Symposium on Security and Privacy (S&P), pp. 98–113. IEEE (2014) 18. Bhargavan, K., Leurent, G.: On the practical (in-) security of 64-bit block ciphers: collision attacks on HTTP over TLS and OpenVPN. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 456– 467 (2016) 19. Blanchet, B.: CryptoVerif: computationally sound mechanized prover for cryptographic protocols. In: Dagstuhl Seminar on Applied Formal Protocol Verification, p. 117 (2007) 20. Blanchet, B.: Security protocol verification: symbolic and computational models. In: Degano, P., Guttman, J.D. (eds.) POST 2012. LNCS, vol. 7215, pp. 3–29. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28641-4 2 21. Blanchet, B.: Automatic verification of security protocols in the symbolic model: the verifier ProVerif. In: Aldini, A., Lopez, J., Martinelli, F. (eds.) FOSAD 20122013. LNCS, vol. 8604, pp. 54–87. Springer, Cham (2014). https://doi.org/10.1007/ 978-3-319-10082-1 3 22. Blanchet, B.: Modeling and verifying security protocols with the applied pi calculus R Priv. Secur. 1(1–2), 1–135 (2016) and ProVerif. Found. Trends 23. Blanchet, B., Chaudhuri, A.: Automated formal analysis of a protocol for secure file sharing on untrusted storage. In: IEEE Symposium on Security and Privacy (S&P), pp. 417–431. IEEE (2008) 24. Blum, J., et al.: E2E encryption for Zoom meetings (2020). https://github.com/ zoom/zoom-e2e-whitepaper 25. Bruni, A., Drewsen, E., Sch¨ urmann, C.: Towards a mechanized proof of selene receipt-freeness and vote-privacy. In: Krimmer, R., Volkamer, M., Braun Binder, N., Kersting, N., Pereira, O., Sch¨ urmann, C. (eds.) E-Vote-ID 2017. LNCS, vol. 10615, pp. 110–126. Springer, Cham (2017). https://doi.org/10.1007/978-3-31968687-5 7 26. Chandra, A.K., Harel, D.: Horn clause queries and generalizations. J. Log. Program. 2(1), 1–15 (1985) 27. Cheval, V., Blanchet, B.: Proving more observational equivalences with ProVerif. In: Basin, D., Mitchell, J.C. (eds.) POST 2013. LNCS, vol. 7796, pp. 226–246. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36830-1 12 28. Cheval, V., Kremer, S., Rakotonirina, I.: DEEPSEC: deciding equivalence properties in security protocols theory and practice. Research report, INRIA, Nancy, May 2018. https://hal.inria.fr/hal-01698177 29. Cohn-Gordon, K., Cremers, C., Dowling, B., Garratt, L., Stebila, D.: A formal security analysis of the signal messaging protocol. In: 2017 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 451–466. IEEE (2017)

200

N. Kobeissi et al.

30. Cohn-Gordon, K., Cremers, C., Garratt, L.: On post-compromise security. In: IEEE Computer Security Foundations Symposium (CSF), pp. 164–178. IEEE (2016) 31. Cortier, V., Wiedling, C.: A formal analysis of the norwegian E-voting protocol. In: Degano, P., Guttman, J.D. (eds.) POST 2012. LNCS, vol. 7215, pp. 109–128. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28641-4 7 32. Cremers, C., Dehnel-Wild, M.: Component-based formal analysis of 5G-AKA: channel assumptions and session confusion. In: 2019 Network and Distributed System Security Symposium (NDSS) (2019) 33. Cremers, C., Hirschi, L.: Improving automated symbolic analysis of ballot secrecy for E-voting protocols: a method based on sufficient conditions. In: IEEE European Symposium on Security and Privacy (EuroS&P) (2019) 34. Cremers, C., Horvat, M., Hoyland, J., Scott, S., van der Merwe, T.: A comprehensive symbolic analysis of TLS 1.3. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1773–1788. ACM (2017) 35. Cremers, C., Jackson, D.: Prime, order please! Revisiting small subgroup and invalid curve attacks on protocols using Diffie-Hellman. In: 2019 IEEE Computer Security Foundations Symposium (CSF) (2019) 36. Cremers, C.J.F., Lafourcade, P., Nadeau, P.: Comparing state spaces in automatic security protocol analysis. In: Cortier, V., Kirchner, C., Okada, M., Sakurada, H. (eds.) Formal to Practical Security. LNCS, vol. 5458, pp. 70–94. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02002-5 5 37. Cremers, C.: Feasibility of multi-protocol attacks. In: Proceedings of the First International Conference on Availability, Reliability and Security (ARES), pp. 287– 294. IEEE Computer Society, Vienna, April 2006. http://www.win.tue.nl/∼ecss/ downloads/mpa-ares.pdf 38. Cremers, C.J.F.: The Scyther tool: verification, falsification, and analysis of security protocols. In: Gupta, A., Malik, S. (eds.) CAV 2008. LNCS, vol. 5123, pp. 414–418. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-705451 38 39. Cremers, C.: Key exchange in IPsec revisited: formal analysis of IKEv1 and IKEv2. In: Atluri, V., Diaz, C. (eds.) ESORICS 2011. LNCS, vol. 6879, pp. 315–334. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23822-2 18 40. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3 24 41. Delaune, S., Kremer, S., Ryan, M.: Verifying privacy-type properties of electronic voting protocols. J. Comput. Secur. 17(4), 435–487 (2009) 42. Doghmi, S.F., Guttman, J.D., Thayer, F.J.: Searching for shapes in cryptographic protocols. In: Grumberg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 523–537. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-712091 41 43. Dolev, D., Yao, A.: On the security of public key protocols. IEEE Trans. Inf. Theory 29(2), 198–208 (1983) 44. Donenfeld, J.A.: WireGuard: next generation kernel network tunnel. In: Network and Distributed System Security Symposium (NDSS) (2017) 45. Donenfeld, J.A., Milner, K.: Formal verification of the WireGuard protocol. Technical report (2017) 46. Escobar, S., Meadows, C., Meseguer, J.: Maude-NPA: cryptographic protocol analysis modulo equational properties. In: Aldini, A., Barthe, G., Gorrieri, R. (eds.) FOSAD 2007-2009. LNCS, vol. 5705, pp. 1–50. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03829-7 1

Verifpal: Cryptographic Protocol Analysis for the Real World

201

47. Gibson-Robinson, T., Armstrong, P., Boulgakov, A., Roscoe, A.W.: FDR3—a mod´ ern refinement checker for CSP. In: Abrah´ am, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 187–201. Springer, Heidelberg (2014). https://doi.org/ 10.1007/978-3-642-54862-8 13 48. Girol, G., Hirschi, L., Sasse, R., Jackson, D., Cremers, C., Basin, D.: A spectral analysis of noise: a comprehensive, automated, formal analysis of Diffie-Hellman protocols. In: 29th USENIX Security Symposium (USENIX Security 2020). USENIX Association, Boston, August 2020. https://www.usenix.org/conference/ usenixsecurity20/presentation/girol 49. Hirschi, L., Baelde, D., Delaune, S.: A method for verifying privacy-type properties: the unbounded case. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 564–581. IEEE (2016) 50. Hoare, C.A.R.: Communicating sequential processes. In: Hansen, P.B. (ed.) The Origin of Concurrent Programming, pp. 413–443. Springer, New York (1978). https://doi.org/10.1007/978-1-4757-3472-0 16 51. Jackson, D., Cremers, C., Cohn-Gordon, K., Sasse, R.: Seems legit: automated analysis of subtle attacks on protocols that use signatures. In: ACM CCS 2019 (2019) 52. Jakobsen, J., Orlandi, C.: On the CCA (in) security of MTProto. In: Proceedings of the 6th Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 113–116 (2016) 53. Kobeissi, N.: An analysis of the protonmail cryptographic architecture. IACR Cryptology ePrint Archive 2018/1121 (2018) 54. Kobeissi, N., Bhargavan, K., Blanchet, B.: Automated verification for secure messaging protocols and their implementations: a symbolic and computational approach. In: IEEE European Symposium on Security and Privacy (EuroS&P), pp. 435–450. IEEE (2017) 55. Kobeissi, N., Nicolas, G., Bhargavan, K.: Noise explorer: fully automated modeling and verification for arbitrary noise protocols. In: IEEE European Symposium on Security and Privacy (EuroS&P) (2019) 56. Lafourcade, P., Puys, M.: Performance evaluations of cryptographic protocols verification tools dealing with algebraic properties. In: Garcia-Alfaro, J., Kranakis, E., Bonfante, G. (eds.) FPS 2015. LNCS, vol. 9482, pp. 137–155. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30303-1 9 57. Lapiha, O.: A cryptographic investigation of secure scuttlebutt. Technical report, ´ Ecole Normale Sup´erieure (2019) 58. Lee, J., Choi, R., Kim, S., Kim, K.: Security analysis of end-to-end encryption in telegram. In: Simposio en Criptograf´ıa Seguridad Inform´ atica, Naha, Jap´ on (2017). https://bit.ly/36aX3TK 59. Lipp, B., Blanchet, B., Bhargavan, K.: A mechanised cryptographic proof of the WireGuard virtual private network protocol. In: IEEE European Symposium on Security and Privacy (EuroS&P) (2019) 60. Miculan, M., Urban, C.: Formal analysis of Facebook connect single sign-on authentication protocol. In: SOFSEM, vol. 11, pp. 22–28. Citeseer (2011) 61. Millen, J.: A necessarily parallel attack. In: Workshop on Formal Methods and Security Protocols. Citeseer (1999) 62. Oak, P.: Kanto regional Pok´edex. Kanto Region J. Pok´emon Res. 19 (1996) 63. Pereira, O., Rochet, F., Wiedling, C.: Formal analysis of the FIDO 1.x protocol. In: Imine, A., Fernandez, J.M., Marion, J.-Y., Logrippo, L., Garcia-Alfaro, J. (eds.) FPS 2017. LNCS, vol. 10723, pp. 68–82. Springer, Cham (2018). https://doi.org/ 10.1007/978-3-319-75650-9 5

202

N. Kobeissi et al.

64. Protzenko, J., Beurdouche, B., Merigoux, D., Bhargavan, K.: Formally verified cryptographic web applications in WebAssembly. In: IEEE Symposium on Security and Privacy (S&P). IEEE (2019) 65. Protzenko, J., et al.: Verified low-level programming embedded in F. In: 2017 Proceedings of the ACM on Programming Languages (ICFP), vol. 1 (2017) 66. Schmidt, B., Meier, S., Cremers, C., Basin, D.: Automated analysis of DiffieHellman protocols and advanced security properties. In: Chong, S. (ed.) IEEE Computer Security Foundations Symposium (CSF), Cambridge, MA, USA, 25–27 June 2012, pp. 78–94. IEEE (2012) 67. Steinbrecher, S., K¨ opsell, S.: Modelling unlinkability. In: Dingledine, R. (ed.) PET 2003. LNCS, vol. 2760, pp. 32–47. Springer, Heidelberg (2003). https://doi.org/10. 1007/978-3-540-40956-4 3 68. Tronosco, C., et al.: Decentralized privacy-preserving proximity tracing, April 2020 69. Woo-Sik, B.: Formal verification of an RFID authentication protocol based on hash function and secret code. Wireless Pers. Commun. 79(4), 2595–2609 (2014). https://doi.org/10.1007/s11277-014-1745-8

Implementing Elliptic Curve Cryptography

On the Worst-Case Side-Channel Security of ECC Point Randomization in Embedded Devices Melissa Azouaoui1,2(B) , Fran¸cois Durvaux1,3 , Romain Poussier4 , Fran¸cois-Xavier Standaert1 , Kostas Papagiannopoulos2 , and Vincent Verneuil2 1

4

Universit´e Catholique de Louvain, Louvain-la-Neuve, Belgium [email protected] 2 NXP Semiconductors, Hamburg, Germany 3 Silex Insight, Mont-Saint-Guibert, Belgium Temasek Laboratories, Nanyang Technological University, Singapore, Singapore

Abstract. Point randomization is an important countermeasure to protect Elliptic Curve Cryptography (ECC) implementations against sidechannel attacks. In this paper, we revisit its worst-case security in front of advanced side-channel adversaries taking advantage of analytical techniques in order to exploit all the leakage samples of an implementation. Our main contributions in this respect are the following: first, we show that due to the nature of the attacks against the point randomization (which can be viewed as Simple Power Analyses), the gain of using analytical techniques over simpler divide-and-conquer attacks is limited. Second, we take advantage of this observation to evaluate the theoretical noise levels necessary for the point randomization to provide strong security guarantees and compare different elliptic curve coordinates systems. Then, we turn this simulated analysis into actual experiments and show that reasonable security levels can be achieved by implementations even on low-cost (e.g. 8-bit) embedded devices. Finally, we are able to bound the security on 32-bit devices against worst-case adversaries. Keywords: Side-Channel Analysis · Elliptic Curve Cryptography Point randomization · Belief Propagation · Single-trace attacks

1

·

Introduction

Elliptic Curve Cryptography (ECC) is a building block of many security components and critical applications including: passports, ID cards, banking cards, digital certificates and TLS. Its relative efficiency (compared to other publickey cryptosystems) is usually considered as an advantage for implementation in small embedded devices. As a result, it is also a natural target for side-channel attackers. In this context, the Elliptic Curve Scalar Multiplication (ECSM) operation is critical, and many attacks against it are described in the literature [3,6,8,16,24,28,29]. These attacks can exploit different sources of secretdependent leakages, under different assumptions on the attack model. As a c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 205–227, 2020. https://doi.org/10.1007/978-3-030-65277-7_9

206

M. Azouaoui et al.

result, countermeasures have been developed, for example based on regular execution [19], scalar randomization [7] and point blinding/randomization [8]. In the last years, the research on Side-Channel Analysis (SCA) has been shifting towards more powerful attacks in order to exploit as much available information as possible, with the goal to assess (or at least estimate) the worstcase security level of cryptographic implementations. For example the use of Soft-Analytical Side-Channel Attacks (SASCA) in the context of AES implementations [12,14,34] and more recently lattice-based cryptography [30], aim at exploiting more secret-dependent leakages that cannot be easily exploited by classical Divide-and-Conquer (D&C) attacks (e.g. the leakage of the MixColumns operation for the AES). Following this direction, Poussier et al. designed a nearly worst-case single-trace horizontal attack against ECSM implementations [29]. Their attack exploits all the single-precision multiplications executed during one step of the Montgomery ladder ECSM. Since this attack relies on the knowledge of the input point of the ECSM, it seems natural to use point randomization as a countermeasure against it, and the evaluation of this countermeasure was left as an important direction for further research. In this work, we aim at assessing the possibility of recovering the randomized input point so that the attack of Poussier et al. (or more generally horizontal attacks) can be applied again. Besides its importance for the understanding of side-channel protected ECC implementations in general, we note that it is also of interest for pairing-based cryptography [23] and isogeny-based cryptography [25]. Our first contribution in this respect is to show how to efficiently apply SASCA in the case of point randomization by targeting field multiplications. We study the impact of different parameters of the implementation’s graph representation required to perform SASCA. Then, in order to compare the efficiency of SASCA to a (simpler) D&C method, we extend the Local Random Probing Model (LRPM) introduced by Guo et al. [15] to our use case. Using this extension, we show that for realistic noise levels, SASCA does not provide a significant gain over the D&C strategy, especially when augmented with enumeration. The latter matches the intuition that such analytical attacks work best in the continuous setting of a Differential Power Analysis (DPA), while we are in the context of a Simple Power Analysis (SPA). Yet, we also show that this is not the case for very high Signal-to-Noise Ratio (SNR) values, especially for 32-bit implementations and the Jacobian coordinates system. Based on this result, we then infer on the required SNR for such implementations to be secure against SCA. We show that it is relatively low (in comparison, for example, with the high noise levels required for practicallysecure higher-order masked implementations [2,5]), and should enable secure implementations even in low-cost (e.g. 8-bit) micro-controllers. We also compare different elliptic curve coordinates systems w.r.t. their side-channel resistance against advanced side-channel attackers targeting the point randomization countermeasure. In addition, we evaluate the concrete security of the point randomization countermeasure on such low-cost devices. For this purpose, we consider two different options for implementing multi-precision multiplications.

On the Worst-Case Side-Channel Security of ECC

207

Interestingly, we observe that the level of optimization of the multiplication significantly impacts the level of leakage of the implementation. Concretely, we show that, while a naive schoolbook multiplication leads to successful singletrace attacks on an Atmel ATmega target, the operands caching optimization reduces the SNR enough such that attacks become hard, even with high enumeration power. Finally, we discuss the cases of two other implementations, one using Jacobian coordinates on an 8-bit device, and one using homogeneous coordinates on a 32bit device. Based on our detailed analyses of 8-bit implementations we show how the guessing entropy of the randomized point can be bounded using the LRPM for 32-bit implementations even against worst-case attackers.

2 2.1

Background Elliptic Curve Cryptography

In the following, we will introduce the necessary background on elliptic curve cryptography for the understanding of this paper. We only consider elliptic curves over a field of prime characteristic >3. An elliptic curve E(Fp ) over a field of prime characteristic p = 2,3 can be defined using a short Weierstrass equation: E : y 2 = x3 + ax + b, where a, b ∈ Fp and 4a3 + 27b2 = 0. Associated to a point addition operation +, the points (x, y) ∈ Fp 2 verifying the equation E, along with the point at infinity O as the neutral element, form an Abelian group. Point arithmetic (addition and doubling) in affine coordinates (using 2 coordinates x and y) requires field inversions, which are expensive compared to other field operations. In order to avoid them, most elliptic curve implementations use projective coordinates (X : Y : Z), such as the homogeneous ones where x = X/Z and y = Y /Z with X, Y, Z ∈ Fp and Z = 0. The equation of the curve is then given by: Y 2 Z = X 3 + aXZ 2 + bZ 3 , so that each point (x, y) can be mapped to (λx : λy : λ) for any λ ∈ F∗p . Alternatively, considering the Jacobian projective Weierstrass equation: Y 2 = X 3 + aXZ 4 + bZ 6 , a point (x, y) of the curve can this time be represented using so-called Jacobian projective coordinates as (λ2 x : λ3 y : λ), with λ ∈ F∗p . In the rest of this paper, the NIST P-256 curve [31] is used to illustrate the principle of our security assessment. Note that this choice is not restrictive, as the study is fully independent of the curve’s choice.

208

2.2

M. Azouaoui et al.

Problem Statement

At CHES 2017, Poussier et al. designed a nearly worst-case single-trace horizontal differential SCA (HDPA) against an implementation of the Montgomery ladder ECSM [29]. This attack relies on the knowledge of the input point and the operations executed. It aims at distinguishing between the two possible sequences of intermediate values depending on the processed scalar bit. Once the first bit has been recovered, the current state of the algorithm is known. The following intermediate values corresponding to the following scalar bit can then also be predicted and matched against the side-channel leakages. Randomizing the input point naturally prevents HDPA: considering an implementation using projective coordinates, for each execution, a random value λ ∈ F∗p is generated and the representation of the point P = (x, y) is replaced by (λx, λy, λ) before the scalar multiplication takes place. Thus the hypothesis space to recover one bit using HDPA is increased from 2 to 2|λ|+1 , with |λ| the size of λ in bits. If |λ| is large enough, it renders the attack impractical. Given that HDPA is close to a worst-case attack against the ECSM, we want to investigate the security of the point randomization that is considered as its natural countermeasure. More precisely: could a side-channel adversary exploiting the additional information brought by the randomization process itself recover a sufficient amount of information about λ (e.g. by reducing the number of λ candidates to an enumerable set) so that an HDPA can again be applied. For that purpose, we examine two types of profiled attacks that are detailed in the next section.

3

Attacking and Evaluating the Point Randomization

Let’s consider the homogeneous projective coordinates randomization method previously described. For each execution, a modular multiplication is performed between the coordinates of the input point (for instance the public base point G) and a random value λ. For an elliptic curve system, the implementation of an optimized modular multiplication depends on the underlying field and of various compromises. For instance, one can perform first a multiplication then a modular reduction. Alternatively, these steps can be interleaved to reduce memory footprint. In order to illustrate our two approaches to attack the point randomization, we first focus on this initial multiplication, for which an abstract architecture is given in Fig. 1. The Figure represents the overall functioning of a school-book like multiplication of two 256-bit operands yielding a 512-bit result, implemented on an 8-bit platform: {λ0 , λ1 , ..., λ31 } correspond to the words of λ, {xG0 , xG1 , ..., xG31 } to the known words of xG (the x coordinate of the base point G), and c = {c0 , c1 , ..., c63 } to the 512-bit result. Every word of λ is multiplied by every word of xG . The resulting 2048 low (R0) and high (R1) parts of the single-precision multiplications are then combined in a series of additions (shown by the accumulation block in Fig. 1) to yield the 64-word result. Given this abstract view of multiplication, we identified two profiled singletrace side-channel attacks that can be used to defeat the point randomization

On the Worst-Case Side-Channel Security of ECC

209

SASCA

× xG0

λ0

× xG1

...

λ31

× xG0

A C C U M U L A

c0 c1

. . .

. . .

× xG31

R0 R1

T E

...

c63

Divide-and-conquer

Fig. 1. Multi-precision multiplication of 256-bit operands on an 8-bit device.

countermeasure. Our aim is to compare them to assess the most optimal security evaluation strategy. We emphasize that both methods can be used, independently of the point randomization algorithm. 3.1

Horizontal Divide-and-Conquer Attack with Enumeration

The Horizontal Divide-and-Conquer attack (HD&C) is applied independently and similarly to each word of λ. Following Fig. 1, λ0 is multiplied by 32 known values {xG0 , xG1 , ..., xG31 } resulting in the lower parts and the higher parts: (λ0 × xGj )%256 and (λ0 × xGj )\256 (where % and \ denote the modulus and the integer division) for j ∈ {0, 1, ..., 31}. The attacker exploits the direct leakage on the value of λ0 but additionally observes the leakage of these 64 intermediates. The side-channel information is extracted by characterizing the joint conditional distribution: Pr[λ0 |L(λ0 ), L((λ0 × xG0 )%256), ..., L((λ0 × xG31 )\256)], where L(v) denotes the side-channel leakage of a value v. Assuming the algorithmic independence of the targeted intermediates, which is verified in this case, and additionally the independence of the noise (i.e. that the leakage of each intermediate is independent of the leakage of other intermediates)1 , the joint distribution can be factorized into: Pr[λ0 |L(λ0 )] ×

31 

Pr[λ0 |L((λ0 × xGj )%256)] × Pr[λ0 |L((λ0 × xGj )\256)].

j=0 1

This is referred to as Independent Operations’ Leakages (IOL) and is a commonly used assumption in SCA and was shown to be reasonable [13]. For our case study it can be easily verified by plotting a covariance matrix.

210

M. Azouaoui et al.

For a given word of λ, the exploitation of this information gives to the attacker a list of probabilities for each value this word can take. If the correct word is ranked first for all the λi , the attack trivially succeeds. However, recovering λ directly is not necessary. Indeed, a D&C attack allows the use of enumeration as a post-processing technique [32]. As a result, reducing the entropy of λ until the value of the randomized point can be reached through enumeration is enough. We note that there is no straightforward way to verify if the correct point has been found. Indeed, only the result of its multiplication by the full value of the secret scalar k is revealed at the end of the ECSM. Yet, the attacker can feed the points given by enumeration as inputs to (e.g.) an HDPA attack. If the right point has been recovered, it is very likely that the HDPA will be able to easily distinguish the scalar bits. Otherwise if the hypothesis on the input point is wrong it will assign the possible values on the scalar bits equal probabilities. 3.2

Soft Analytical Side-Channel Attack

SASCA was introduced in [34] by Veyrat-Charvillon et al. as a new approach that combines the noise tolerance of standard DPA with the optimal data complexity of algebraic attacks. SASCA works in three steps: first it builds a large graphical model, called a factor graph, containing the intermediate variables linked by constraints corresponding to the operations executed during the target algorithm. Then it extracts the posterior distributions of the intermediate values from the leakage traces. Finally, it propagates the side-channel information throughout the factor graph using the Belief Propagation (BP) algorithm [20], to find the marginal distribution of the secret key, given the distributions of all intermediate variables. We refer interested readers to [34] for a detailed description of SASCA. SASCA exploits a larger number of intermediates in comparison to D&C attacks as illustrated in Fig. 1. For instance, it can use the leakage coming from the addition operations in the accumulator that combines intermediates that depend on multiple words of λ. The optimized information combination approach of SASCA implies that it is an appropriate tool to approximate the worst-case security of cryptographic implementations. Running the BP algorithm on a factor graph of a cryptographic implementation is time-consuming. Its time complexity is dominated by (2vs )deg (corresponding to the factor to variable message update) with vs the variable size (for e.g. 8 bits or 32 bits) and deg the degree of the largest factors (number of variables connected to it). This message passing is then repeated for each factor and for the number of iterations required for the BP algorithm to converge. For that reason, Guo et al. introduced the Local Random Probing Model (LRPM) [15], which bounds the information that can be obtained by decoding the factor graph using BP without running the BP algorithm. Concretely, the LRPM propagates the amount of information collected throughout the graph using approximations employed in coding theory. Assuming that the variables’ distributions are not too correlated, information coming from neighboring factors is summed at variable nodes. At factor nodes the information coming from neighboring variables is multiplied. The information is not in bits but instead computed in base log

On the Worst-Case Side-Channel Security of ECC

211

the size of the variables (such that the information is always between 0 and 1). We refer to [34] for more details on the LRPM. The LRPM provides an upper bound on the information propagated through a generic factor function, but generally the actual information propagated by a specifc logic or arithmetic operation is significantly lower. For instance, Guo et al. noted that information can be diluted when propagated through a XOR operation: when XORing two values with partial information, we may end with even less information on the result than their product. Based on this observation, they refine the model by introducing the XOR loss: a coefficient 0 < α < 1 that is multiplied by the information and reduces the model’s approximation for the XOR operation to a value of information that is closer to the one that is observed. In practice, α is estimated as the ratio between the upper bound on the information evaluated using the model and the information estimated from actually running BP on the XOR factor.

4

Analysis of the Field Multiplication Factor Graph

Prior to comparing the attacks identified in Sect. 3, we investigate the application of SASCA on a multi-precision multiplication factor graph. The nodes included in the graph and its structure can impact the performance of the BP algorithm. In the following, we investigate the impact of these characteristics. For this purpose, we build the factor graph of the first block from the assembly description of the operand-caching multiplication2 of Hutter and Wenger [18, Appendix A] on an 8-bit micro-controller. This graph G is presented in Fig. 2. Given the size of this graph, it is possible to run multiple experiments of the BP algorithm to get meaningful averaged results. The conclusions drawn from our analysis can be generalized to the full graph and any multiplication that follows the abstract architecture in Fig. 1 because of its very regular structure. There are two aspects to consider before running BP on this graph that we detail and investigate hereafter. First, G is a cyclic graph and BP convergence to the correct marginals is not guaranteed [26]. Second, while for previous applications of SASCA to the AES [34] and lattice-based encryption [30], the factor graphs contained factors of at most degree 3, G contains factors of degree 4 and 5 due to the additions with carry propagations. Although carry bits are small variables, we note that their values and possibly errors on their values may ripple through all the following steps of the computation. To investigate the effect of cycles and carry bits, we construct four other factor graphs. G no cycles is an acyclic version of G built by following the strategy from Green et al. [12]: removing factors causing the cycles and severing edges. G no carry is constructed by deleting carry bit variable nodes and integrating both possible values of a carry into the factors as described by the diagram below, where the carry (resp. carry’) variable node corresponds to the input (resp. output) carry. It is given by the 2

The operand-caching multiplication is an optimized schoolbook-like multiplication that minimizes the number of operand word loads. It is specifically designed for small embedded devices in order to improve efficiency by minimizing memory operations.

212

M. Azouaoui et al. c30 ACC00

Multiplication results

+

R00

carry0

R10

+

R01 R11

ACC01

Carry bits

+ ACC11

carry2

+ carry1

c31

ACC02

+

carry3

+

ACC12

c32

carry4

R02

c33

+

R12

ACC13

Fig. 2. Factor graph G: first block of the operand caching multiplication.

diagram below for the addition with carry operation but is done similarly for all carry operations. Finally, G no carry no cycles is G no carry where remaining cycles have been removed. x y

r

x

carry’

y

adc

adc no carry

r

carry

with fadc and fadc no carry defined as: fadc (x, y, carry, r, carry )  1 if r = (x + y + carry) % 256 and carry = (x + y + carry)/256, = 0 otherwise.  fadc no carry (x, y, r) =

1 if r = (x + y) % 256 or r = (x + y + 1) % 256, 0 otherwise.

To compare the different graphs, we use simulated leakages for all intermediate variables (Hamming weight leakage with Gaussian noise). We estimate the single-trace attack success rate (SR), average across λ0 and λ1 for different noise levels. The results of these experiments are plotted in Fig. 3. First, we notice that for high SNR values, the cyclic graphs (G and G no carry) perform better than the acylic ones (G no cycles and G no carry no cycles). This can be explained by the fact that cycles typically exacerbate side-channel errors but in this case are less detrimental since errors from side-channel observations are less likely to occur for high SNR. Additionally, the acyclic graphs are constructed by removing factors which contribute the most to the connectedness of the graph

On the Worst-Case Side-Channel Security of ECC

213

and as a result lose some information. When moving to the (more realistic) low SNR range (below 2), G no carry yields marginally better results, as it still benefits from the additional information provided by factors in cycles, and is also not prone to errors on carry bits. For even lower SNR (below 0.05), the experiments indicate that the best graph option is G no carry no cycles, which reflects previous observations on the impact of cycles and carry bits.

Fig. 3. Average single-trace success rate. Right: close-up for low SNR values.

5

Comparison of SASCA and HD&C Attack

In this section, we begin our investigation of the security level that can be achieved by the point randomization countermeasure by comparing SASCA and an HD&C attack. We still consider the 8-bit implementation of the operand caching multiplicaton. Since SASCA comes at a higher computational cost than an HD&C attack, it is worth evaluating the advantage of SASCA over an HD&C attack. For this purpose we make use of the LRPM’s efficiency and introduce new extensions of it such that it is applicable to the target factor graph. 5.1

Applying the LRPM to Multi-precision Multiplication

To investigate the relevance of the LRPM, Guo et al. [15] consider the AES as a target, for which all atomic operations have one output only and LRPM rules apply straightforwardly. To apply the LRPM to the full factor graph of the multiprecision multiplication, we extend the LRPM rules to operations with multiple outputs. This new rule is described below and is based on the factorization principle of BP. An example factorization for an addition factor is given in Appendix A.

214

M. Azouaoui et al. out1

in1

F in2

out2

MIF→out1 = MIin1→F × MIin2→F MIF→out2 = MIin1→F × MIin2→F MIF→in1 = MIin2→F × (MIout1→F + MIout2→F ) MIF→in2 = MIin1→F × (MIout1→F + MIout2→F )

Additionally, based on the work of Guo et al. and as explained in Sect. 3.2, in order to avoid too pessimistic upper bounds on the information, we estimate loss coefficients for all variables for every kind of operation in the graph. We estimated the loss coefficients assuming a Hamming weight leakage function as the ratio between the information computed after performing BP and the one predicted by the LPRM rules. Prior to using the LRPM to compare SASCA and HD&C attacks, we confirm in Fig. 4 that the LRPM’s MI predictions fit the experimental SR on G and its variants. We ran the LRPM assuming MI values that correspond to the simulated SR experiments and we focus on reasonable and realistic noise levels for software implementations. The left part of Fig. 4 corresponds to the MI on one word of λ of each graph as a function of the SNR, and the right part to the SR. The efficiency orderings of the different graphs in terms of MI and SR are similar. Slight discrepancies might be due to the experimental estimation of the SR. Decisively, the LRPM and the SR estimations agree on the fact that the best graph option is G no carry for a reasonable SNR < 2, while for even lower SNR all seem to perform similarly.

Fig. 4. Left: MI evaluated with the LRPM with loss coefficients for the different graphs. Right: the SR of SASCA for different graphs.

5.2

SASCA vs. HD&C

Based on our previous results, we build the full graph of the multi-precision multiplication with carry bits integrated in factors. Notably, this graph option is not only the best based on the previously shown experiments and LRPM predictions,

On the Worst-Case Side-Channel Security of ECC

215

Fig. 5. Comparison of SASCA on the full graph of the field multiplication, including the modular reduction, and the HD&C attack on register multiplications.

but also the most pragmatic one. Indeed, the full graph of the multi-precision multiplication is highly connected. Attempting to remove all cycles will render the graph very close to the one exploited by the HD&C strategy. Accordingly, we build the full graph which consists of 1024 multiplication factors and 3216 addition factors, then the factor graph corresponding to the HD&C attack that only contains the multiplication factors and related variables. We additionally build the full graph of the randomization procedure including the modular reduction implemented as suggested in [31]. In comparison to the efficient HD&C attack, running the BP algorithm for a single iteration on the factor graph of the multiprecision multiplication would naively require more than 237 operations, or more than 229 operations with specific factor message passing optimizations. For all three graphs, we use the LRPM to upper bound the information extracted in bytes on a word of λ. Results are given in Fig. 5. We observe that both BP and D&C attacks reach the maximal information of one byte across all words of λ when the SNR exceeds 0.4. The horizontal attack succeeds right after SASCA and, for lower SNR values, the gain is negligible considering the running time and the effort required to mount a BP-based attack. This is in-line with the results from [15] (e.g. for the AES): when the number of attack traces is too low (e.g. in a DPA), SASCA does not provide any gain over a D&C template attack. Things naturally get worse for lower SNR values. Our results therefore indicate that classical D&C attacks (on our particular target) come very close to the worst-case attack for low SNR cases (which are essentially the most interesting ones for side-channel investigations). This fact can be additionally highlighted by the limited information provided by the addition of the modular reduction to the factor graph compared to solely the multiplication. An analogy can be made with the AES MixColumns: since the operations contained in these examples (MixColumns, modular reduction by a Mersenne prime) diffuse already limited information (by combining noisy leakages of multiple intermediates), they do not contribute significantly to the overall information on the targeted secret for low and reasonable SNR values.

216

M. Azouaoui et al.

Overall, this result shows that SASCA, which exploits all the available information, barely improves the results compared to a D&C attack. Moreover, as SASCA does not allow optimal enumeration: adding the use of computational power to HD&C mitigates even more the already small gap between them. As a result, the rest of this paper considers the HD&C strategy in order to evaluate the security of the point randomization countermeasure. This naturally raises the question of how much noise is necessary to make the attack fail, which we investigate in the following section.

6

Security Graphs and Necessary Noise Levels

In this section, we investigate the nearly worst-case security of the point randomization countermeasure. For this purpose, we use the HD&C strategy, identified in the previous section to achieve comparable efficiency to the worst-case BP-based attack. Since the HD&C attack strategy is extremely efficient computationally (in comparison to SASCA), it allows us to investigate different implementation cases. For our experiments, we choose a homogeneous projective coordinates system to represent the points and later discuss the case of Jacobian projective coordinates. A homogeneous coordinates system allows for a very efficient point randomization procedure as it requires at most two field multiplications. We consider the case where both the affine coordinates of a point are randomized, and additionally the fast parallel point addition and doubling Montgomery ladder from Fischer et al. [10], where only the x coordinate is required and thus randomized. The field multiplication is as described previously: a multi-precision multiplication followed by a modular reduction. We also take into account a 256-bit randomizing parameter λ and a 128-bit alternative. The goal of this section is to provide a characterization of the security level expected as a function of the measurements’ SNR. For this purpose, we plot security graphs based on [33] for the different cases of the point randomization procedure. These graphs are produced by performing 100 independent attacks and rank estimations for different SNR values. These experiments provide sampled ranks for each SNR value. Using these samples, the rank distributions can be easily estimated using kernel density estimation. The most interesting conclusions can be deduced from the cumulative distribution function (CDF) of the rank. The CDF of the rank for a specific SNR tells us about the probability that the rank of the secret lies above a certain enumeration effort, and thus the probability for an adversary to recover the secret. This is visually represented by a gray-scale on a security graph: the darker (resp. lighter) the zone is, the higher (resp. lower) is the probability of recovering the secret. For the following results, we performed HD&C attacks and corresponding rank estimations on simulated leakages of every 8-bit word of λ and every result of a register multiplication following the classical side-channel model: HW and Gaussian noise. This leads to 65 leakages per target byte (one for the byte itself and 64 for the 2 parts of the 32 multiplication results). We produce in Fig. 6 the security graph for a 128-bit λ, when randomizing only the x coordinate of the

On the Worst-Case Side-Channel Security of ECC

(a) Randomization of x

217

(b) Randomization of x and y.

Fig. 6. Security graph of the point randomization countermeasure for λ ∈ F2128 . (Color figure online)

(a) Randomization of x

(b) Randomization of x and y.

Fig. 7. Security graph of the point randomization countermeasure for λ ∈ F2256 . (Color figure online)

base point (Fig. 6a) and when randomizing both coordinates (Fig. 6b). Figure 7 gives the corresponding security graphs for a 256-bit λ. The red line corresponds to the log2 of the guessing entropy. The security graphs shown in Figs. 6 and 7 encompass a great deal of information on the security of the target. For e.g. to achieve security against an adversary attacking only the x coordinate randomization using a 256-bit parameter, who can enumerate up to 250 candidates, the SNR needs to be lower than 0.3 based on the results displayed in Fig. 7a. The comparison of Fig. 6a and b highlights the expected fact that the information is simply doubled when exploiting λ × xG and λ × yG compared to only λ × xG . Subsequently, to get the same level of security, the SNR has to be halved. The same applies to Fig. 7a and b. Moreover, the security graphs for the 128-bit case and the 256-bit case illustrate the trade-off between the randomness requirements (i.e. the size of λ) and the side-channel noise necessary to make the point randomization robust against nearly worst-case adversaries. Additionally, since the LRPM provides an upper bound on the information that can be extracted from a factor graph (in this case of the D&C attack), in all security graphs we plot the corresponding remaining entropy as a lower bound of the actual guessing entropy. This bound is quite helpful particularly in cases where performing multiple attacks and rank estimations is not possible.

218

7

M. Azouaoui et al.

Experimental Evaluations

In this section, we analyze two different implementations of a long-integer multiplication both written in assembly. First, a naively implemented schoolbook multiplication with two nested unrolled loops and no specific optimizations. The second multiplication is the operand-caching multiplication of Hutter and Wenger [18] that we presented above. We performed our experiments on an 8-bit AVR ATmega328p microcontroller mounted on an Arduino UNO board running at 16 MHz. Using a custom probe, the measurements are captured on a PicoScope5244D oscilloscope synchronized with a trigger at the beginning of each execution, at a sampling rate of 125 MSam/s. The traces are preprocessed using amplitude demodulation. Hereafter we describe the HD&C attack applied to both implementations. First, we first performed a preliminary Points Of Interest (POIs) selection step using a correlation test [9] to find the samples that correspond to leakages of intermediate values depending on the target byte. We then use a dimensionality reduction technique, namely Principal Component Analysis (PCA) [1], to further reduce the number of dimensions. This can be done since as the coordinate of the base point is fixed and known, the leakage only depends on the bytes of λ. Using the compressed traces, we build multivariate Gaussian templates using 49k traces based on the values of the bytes. Note that while the noise is independent for the attack on simulations presented in Sect. 6, it is not perfectly the case for real traces. We thus take the dependency into account in order to improve the results and perform a single-trace template attack on each secret byte. Finally, we combine the results of all the bytes using the Glowacz et al. histogram-based rank estimation algorithm [11]. 7.1

Classical Schoolbook Multiplication

In this section, we show the results of the HD&C attack on the non-optimized schoolbook multiplication. This multiplication loads every byte of the secret 32 times, displaying notable leakages on λ. As a first step, we examine the bytes’ guessing entropies as a function of the number of PCA components retained. These results are shown in Fig. 8. Typically, when applying PCA to side-channel leakages of symmetric cryptography implementations, the relevant information is located in a few principal directions (e.g. the recent [4] uses 10). For our specific target, more dimensions are useful, as shown by Fig. 8. This is presumably due to the large amount of intermediate values that relate to the secret bytes, and the amount of different single-precision operations manipulating these intermediates. From Fig. 8, we observe that the logarithms of the guessing entropies of all bytes decrease similarly and are mostly below 4 bits (less than 16 candidates) when the number of components is above 350. The guessing entropies keep on decreasing as the number of components increases. The next step of the evaluation is to assess if a single-trace recovery is feasible on the full value of λ. For each execution, we combine the results obtained from the independent attacks on the bytes of λ and plot the results of rank

On the Worst-Case Side-Channel Security of ECC

219

Fig. 8. Logarithm of the guessing entropy of the 32 bytes of λ as a function of the number of PCA components for the non-optimized schoolbook multiplication.

estimation in Fig. 9. We show the distribution of the logarithm of the ranks. The red vertical line denotes the mean log rank observed. We also focus on two sizes for λ, namely 128 and 256 bits, in order to evaluate the impact of the HD&C attack on the required randomness for each execution. First, for a 128-bit randomizing parameter, we observe that approximately half of the randomized points can be recovered with a minimal enumeration of less than 216 . The results for a 256-bit randomizing parameter are given in Fig. 9b. In this case, while the rank of the secret is higher than for the 128-bit case, the implementation can still be considered vulnerable with an average rank of 50 bits. Moreover, some parameters are easily recovered. For instance approximately 20% of all λs are fully recovered using an enumeration effort of 216 .

Fig. 9. Distribution of log rank of λ for the schoolbook multiplication. (Color figure online)

7.2

Operand Caching Multiplication

The second part of our experiment consists of applying the same process of HD&C attack against an optimized implementation, namely the operand-caching

220

M. Azouaoui et al.

one. This method is a variant of the long-integer multiplication that reduces the number of memory accesses to gain in efficiency, thereby reducing the amount of leakage. For each byte of λ, we perform the same systematic steps as for the non-optimized implementation. In Fig. 10, we plot the evolution of the guessing entropies for the different bytes as a function of the number of PCA components. Bytes are color coded according to the number of times they are loaded. First, we observe the same behavior as for the previous implementation when it comes to the number of PCA components: we require a large number of components. Next, we clearly notice the impact of the amount of loads on the guessing entropy. That is, less loads tend to lead to a higher entropy in comparison with the nonoptimized implementation. It is clear that the limited leakages in this case do not allow reaching guessing entropies below 20 candidates, which is significantly more than on the naive schoolbook multiplication.

Fig. 10. Logarithm of the guessing entropy of the 32 bytes of λ as a function of the number of PCA components for the optimized operand-caching multiplication.

Next, we plot the distribution of the log ranks for both 128- and 256-bit λ in Fig. 11. We observe that due to the optimizations applied for the operand caching, the leakages are minimized, leading to significantly different results compared to the non-optimized implementation. For the 128-bit (resp. 256-bit) size parameter, we obtain an average log rank of 100 bits (resp. 200 bits). The entropy of λ is only marginally reduced and its value cannot be recovered with a reasonable enumeration effort. Overall, the comparison of the non-optimized multiplication and the optimized one highlights how point randomization can be made quite robust against nearly worst-case adversaries without much performance overheads: as clear from [17], the additional cost of point randomization is limited for (already expensive) ECC implementations. That is, a few design considerations and optimizations suffice to reduce the overall leakages that can be exploited by an adversary, so that horizontal attacks become impractical. We note that, as usual in experimental side-channel attacks, these results obviously depend on the measurement setup and the preprocessing applied to the traces, which can possibly be improved. Yet, the conclusion that limited SNR reductions are enough to

On the Worst-Case Side-Channel Security of ECC

221

Fig. 11. Distributon of log rank of λ for the operand caching multiplication. (Color figure online)

secure point randomization, and that simple optimizations (that are motivated by general performance concerns) are good for this purpose, should hold in general.

8

Projective Coordinates System Comparison

In our preliminary analysis, we focused on an homogeneous projective point representation. This section investigates the use of a different coordinates system, namely Jacobian coordinates, and the possible consequences regarding sidechannel resistance. For this purpose, using the LRPM and its extensions proposed in this paper, we further extend our analysis by comparing the randomization in homogeneous and Jacobian projective coordinates. In the Jacobian case, the computation of λ2 is required. Therefore, we consider again two cases: one where a generic multiplication is used to perform the squaring, and one where the squaring is implemented efficiently (avoiding the re-calculation of equal cross products). The results of our evaluations are depicted in Fig. 12 where MI upper bounds are plotted as a function of the SNR. First, and as previously shown, we confirm that for homogeneous coordinates since λx and λy are independent operations when targeting λ, the overall information is simply doubled. Secondly, when it comes to Jacobian coordinates, using a squaring instead of a multiplication for λ2 leads to improved security. Regarding the comparison, while the coordinates system used is typically dictated by the ECSM’s overall performance, the results shown in Fig. 12 lead to interesting conclusions. These results suggest that for low SNR ( 0.1, the x-only homogeneous projective randomization is the best option, followed by the Jacobian coordinates randomization with a dedicated squaring operation.

222

M. Azouaoui et al.

Fig. 12. Comparison of SASCA on homogeneous projective coordinates randomization and Jacobian projective coordinates randomization.

9

Worst-Case Evaluation of 32-bit Implementations

In this section, we translate our previous investigation on 8-bit implementations to the practically-relevant case of 32-bit devices. First, we emphasize that implementing an actual attack against a 32-bit implementation is much more challenging. While profiling 32-bit leakage is feasible by using a linear regression based approach, exploitation on the other hand is very demanding since it would require enumerating 232 values, leading to large computation and storage efforts. Some workarounds are possible by trading computational complexity for algorithmic noise. As we focus on worst-case security, we analyze the information obtained by a strong attacker able to perform attacks on 32-bit leakages directly. For that purpose, we use the LRPM. As shown in Sect. 6 in Figs. 6 and 7, the LRPM provides a lower bound on the guessing entropy after the attack. Based on this observation, we bound the guessing entropy when targeting a 32-bit implementation for different attacks (D&C and SASCA) and for different SNR levels. The results of this analysis are plotted in Fig. 13 for high (left) and low SNR (right) cases. First, we note that when the SNR is low the gain of analytical strategies over D&C strategies is still very limited as for the 8-bit case. But more interestingly, we observe that implementations of the point randomization on 32-bit devices are able to resist worst-case adversaries as shown by the pessimistic lower bounds on the guessing entropy in Fig. 13. Concretely, for an SNR ≈ 0.9 as measured in a recent attack targeting an STM32F405 [21] it is possible to achieve excellent concrete security even against attackers exploiting all the possible leakage of the point randomization. This is a positive result that suggests that securing 32-bit ECC implementations (for e.g. on ARM devices) against very powerful attackers might be feasible in practice.

On the Worst-Case Side-Channel Security of ECC

(a) High SNR

223

(b) Low SNR

Fig. 13. Entropy lower bound comparison of SASCA on the full graph of the field multiplication, and the HD&C attack for a 32-bit implementation with λ ∈ F2256 .

10

Conclusion and Future Works

In this work, we investigated the security of the point randomization countermeasure w.r.t. worst-case attackers. We showed how to apply SASCA when targeting point randomization and additionally adapted a recent and efficient evaluation methodology for SASCA using the LRPM to this asymmetric use case. Second, and using that model, we showed that, for realistic noise levels for 8-bit devices, there is almost no gain in mounting a complex SASCA making use of all the available information compared to a simpler horizontal D&C attack that can be complemented with enumeration. As a result, we estimate the required SNR needed by implementations in small embedded devices to be secure. Somewhat surprisingly, we observe that the point randomization technique can be implemented quite securely even in 8-bit devices. We then perform practical experiments against basic and optimized implementations to further illustrate the impact of performance optimizations: while the naive implementation is shown to be broken, the optimized one leads to ranks that are not reachable with enumeration. Finally, we provide guessing entropy lower bounds for challenging attacks on 32-bit implementations which again confirm the resistance of point randomization against side-channel attacks. Interestingly, our results indicate that the point randomization on 32-bit devices provides excellent concrete security against worst-case adversaries. This leads to the positive conclusion that secure 32-bit ECC implementations might be feasible in practice. While we studied different implementation options of the point randomization, there is still room for analyzing other options to implement elliptic curve based systems and evaluate the security provided by side-channel countermeasures. For instance, then this additional leakage that does not depend on the secret scalar bit could potentially be exploited. Besides, different multiplication algorithms have been described in the literature. We focused on multiplications that share the same abstract architecture as a classical schoolbook multiplication, but other long-integer multiplication algorithms such as the Karatsuba

224

M. Azouaoui et al.

algorithm [22] or modular multiplication methods such as the Montgomery multiplication [27] remain to be studied. Acknowledgement. Fran¸cois-Xavier Standaert is a senior research associate of the Belgian fund for scientific research (FNRS-F.R.S.). This work has been funded in parts by the ERC project 724725 (SWORD) and by the European Commission through the H2020 project 731591 (acronym REASSURE). The authors acknowledge the support from the Singapore National Research Foundation (“SOCure” grant NRF2018NCRNCR002-0001 – www.green-ic.org/socure).

A

Factorization of fadd

The LPRM rules for information propagation for factors with multiple outputs are deduced from the factorization of a factor with two outputs into two factors with one output each, as shown by the diagram below for the addition operation: in1

out1

in1

1 fadd

out1

2 fadd

out2

f add in2

out2

in2

Where in1 and in2 refer to the two inputs to the addition. out1 to the result of the addition and out2 to the output carry bit. Then the add factor is defined as: fadd (in1, in2, out1, out2)  1 if out1 = (in1 + in2) % 256 and out2 = (in1 + in2)/256 = 0 otherwise 1 2 The add factor can be factorized into fadd and fadd which are defined as:  1 if out1 = (in1 + in2) % 256 1 (in1, in2, out1) = fadd 0 otherwise

 2 fadd (in1, in2, out2) =

1 if out2 = (in1 + in2)/256 0 otherwise

The LRPM propagation rules applied to the factorized factor yield for the variable in1: 1 →in1 = MIin2 × MIout1 and MIf 2 →in1 = MIin2 × MIout2 MIfadd add

Since information at variable node is summed we have: 1 ,f 2 )→in1 = MIf MI(fadd = MIin2 × (MIout1 + MIout2 ) add →in1 add

On the Worst-Case Side-Channel Security of ECC

225

References 1. Archambeau, C., Peeters, E., Standaert, F.-X., Quisquater, J.-J.: Template attacks in principal subspaces. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 1–14. Springer, Heidelberg (2006). https://doi.org/10.1007/11894063 1 2. Battistello, A., Coron, J.-S., Prouff, E., Zeitoun, R.: Horizontal side-channel attacks and countermeasures on the ISW masking scheme. In: Gierlichs, B., Poschmann, A.Y. (eds.) CHES 2016. LNCS, vol. 9813, pp. 23–39. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53140-2 2 3. Bauer, A., Jaulmes, E., Prouff, E., Wild, J.: Horizontal collision correlation attack on elliptic curves. In: Lange, T., Lauter, K., Lisonˇek, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 553–570. Springer, Heidelberg (2014). https://doi.org/10.1007/9783-662-43414-7 28 4. Bronchain, O., Standaert, F.-X.: Side-channel countermeasures’ dissection and the limits of closed source security evaluations. IACR Cryptology ePrint Archive 2019:1008 (2019) 5. Cassiers, G., Standaert, F.-X.: Towards globally optimized masking: From low randomness to low noise rate or probe isolating multiplications with reduced randomness and security against horizontal attacks. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019(2), 162–198 (2019) 6. Clavier, C., Feix, B., Gagnerot, G., Roussellet, M., Verneuil, V.: Horizontal correlation analysis on exponentiation. In: Soriano, M., Qing, S., L´ opez, J. (eds.) ICICS 2010. LNCS, vol. 6476, pp. 46–61. Springer, Heidelberg (2010). https://doi.org/10. 1007/978-3-642-17650-0 5 7. Clavier, C., Joye, M.: Universal exponentiation algorithm. In: Cryptographic Hardware and Embedded Systems - CHES 2001, Third International Workshop, Paris, France, 14–16 May 2001, Proceedings, number Generators, pp. 300–308 (2001) 8. Coron, J.-S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Ko¸c, C ¸ .K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48059-5 25 9. Durvaux, F., Standaert, F.-X.: From improved leakage detection to the detection of points of interests in leakage traces. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 240–262. Springer, Heidelberg (2016). https:// doi.org/10.1007/978-3-662-49890-3 10 10. Fischer, W., Giraud, C., Knudsen, E.W., Seifert, J.-P.: Parallel scalar multiplication on general elliptic curves over fp hedged against non-differential side-channel attacks. IACR Cryptology ePrint Archive, 2002:7 (2002) 11. Glowacz, C., Grosso, V., Poussier, R., Sch¨ uth, J., Standaert, F.-X.: Simpler and more efficient rank estimation for side-channel security assessment. In: Leander, G. (ed.) FSE 2015. LNCS, vol. 9054, pp. 117–129. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48116-5 6 12. Green, J., Roy, A., Oswald, E.: A systematic study of the impact of graphical models on inference-based attacks on AES. Cryptology ePrint Archive, Report 2018/671 (2018). https://eprint.iacr.org/2018/671 13. Grosso, V., Standaert, F.-X.: Masking proofs are tight and how to exploit it in security evaluations. In: Nielsen, J.B., Rijmen, V. (eds.) EUROCRYPT 2018. LNCS, vol. 10821, pp. 385–412. Springer, Cham (2018). https://doi.org/10.1007/978-3319-78375-8 13 14. Grosso, V., Standaert, F.-X.: ASCA, SASCA and DPA with enumeration: which one beats the other and when? Cryptology ePrint Archive, Report 2015/535 (2015). https://eprint.iacr.org/2015/535

226

M. Azouaoui et al.

15. Guo, Q., Grosso, V., Standaert, F.-X.: Modeling soft analytical side-channel attacks from a coding theory viewpoint. Cryptology ePrint Archive, Report 2018/498 (2018). https://eprint.iacr.org/2018/498 16. Hanley, N., Kim, H.S., Tunstall, M.: Exploiting collisions in addition chain-based exponentiation algorithms using a single trace. In: Nyberg, K. (ed.) CT-RSA 2015. LNCS, vol. 9048, pp. 431–448. Springer, Cham (2015). https://doi.org/10.1007/ 978-3-319-16715-2 23 17. Hutter, M., Schwabe, P.: NaCl on 8-bit AVR microcontrollers. In: Youssef, A., Nitaj, A., Hassanien, A.E. (eds.) AFRICACRYPT 2013. LNCS, vol. 7918, pp. 156–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-3855379 18. Hutter, M., Wenger, E.: Fast multi-precision multiplication for public-key cryptography on embedded microprocessors. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 459–474. Springer, Heidelberg (2011). https://doi.org/ 10.1007/978-3-642-23951-9 30 19. Joye, M., Yen, S.-M.: The montgomery powering ladder. In: Kaliski, B.S., Ko¸c, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 291–302. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36400-5 22 20. Judea, P.: Reverend Bayes on inference engines: a distributed hierarchical approach. In: Proceedings of the Second AAAI Conference on Artificial Intelligence, AAAI 1982, pp. 133–136. AAAI Press (1982) 21. Kannwischer, M.J., Pessl, P., Primas, R.: Single-trace attacks on keccak. Cryptology ePrint Archive, Report 2020/371 (2020). https://eprint.iacr.org/2020/371 22. Karatsuba, A., Ofman, Yu.: Multiplication of many-digital numbers by automatic computers. Dokl. Akad. Nauk SSSR 145, 293–294 (1962) 23. Kim, T.H., Takagi, T., Han, D.-G., Kim, H.W., Lim, J.: Side channel attacks and countermeasures on pairing based cryptosystems over binary fields. In: Pointcheval, D., Mu, Y., Chen, K. (eds.) CANS 2006. LNCS, vol. 4301, pp. 168–181. Springer, Heidelberg (2006). https://doi.org/10.1007/11935070 11 24. Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-68697-5 9 25. Koziel, B., Ackie, A.-B., El Khatib, R., Azarderakhsh, R., Mozaffari-Kermani, M.: Sike’d up: fast and secure hardware architectures for supersingular isogeny key encapsulation. Cryptology ePrint Archive, Report 2019/711 (2019). https:// eprint.iacr.org/2019/711 26. MacKay, D.J.C.: Information Theory, Inference & Learning Algorithms. Cambridge University Press, New York (2002) 27. Montgomery, P.: Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985) 28. Nascimento, E., Chmielewski, L  .: Applying horizontal clustering side-channel attacks on embedded ECC implementations. In: Eisenbarth, T., Teglia, Y. (eds.) CARDIS 2017. LNCS, vol. 10728, pp. 213–231. Springer, Cham (2018). https:// doi.org/10.1007/978-3-319-75208-2 13 29. Poussier, R., Zhou, Y., Standaert, F.-X.: A systematic approach to the side-channel analysis of ECC implementations with worst-case horizontal attacks. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 534–554. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4 26

On the Worst-Case Side-Channel Security of ECC

227

30. Primas, R., Pessl, P., Mangard, S.: Single-trace side-channel attacks on masked lattice-based encryption. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 513–533. Springer, Cham (2017). https://doi.org/10.1007/978-3319-66787-4 25 31. NIST FIPS PUB. 186–2: Digital signature standard (DSS). National Institute for Standards and Technology (2000) 32. Veyrat-Charvillon, N., G´erard, B., Renauld, M., Standaert, F.-X.: An optimal key enumeration algorithm and its application to side-channel attacks. In: Knudsen, L.R., Wu, H. (eds.) SAC 2012. LNCS, vol. 7707, pp. 390–406. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35999-6 25 33. Veyrat-Charvillon, N., G´erard, B., Standaert, F.-X.: Security evaluations beyond computing power. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 126–141. Springer, Heidelberg (2013). https://doi.org/10. 1007/978-3-642-38348-9 8 34. Veyrat-Charvillon, N., G´erard, B., Standaert, F.-X.: Soft analytical side-channel attacks. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol. 8873, pp. 282–296. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-456118 15

Efficient Hardware Implementations for Elliptic Curve Cryptography over Curve448 Mojtaba Bisheh Niasar1(B) , Reza Azarderakhsh1,2 , and Mehran Mozaffari Kermani3 1

3

Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA {mbishehniasa2019,razarderakhsh}@fau.edu 2 PQSecure Technologies, LLC, Boca Raton, FL, USA Department of Computer Science and Engineering, University of South Florida, Tampa, FL, USA [email protected]

Abstract. In this paper, we present different implementations of point multiplication over Curve448. Curve448 has recently been recommended by NIST to provide 224-bit security over elliptic curve cryptography. Although implementing high-security cryptosystems should be considered due to recent improvements in cryptanalysis, hardware implementation of Curve488 has been investigated in a few studies. Hence, in this study, we propose three variable-base-point FPGA-based Curve448 implementations, i.e., lightweight, area-time efficient, and highperformance architectures, which aim to be used for different applications. Synthesized on a Xilinx Zynq 7020 FPGA, our proposed highperformance design increases 12% throughput with executing 1,219 point multiplication per second and increases 40% efficiency in terms of required clock cycles×utilized area compared to the best previous work. Furthermore, the proposed lightweight architecture works 250 MHz and saves 96% of resources with the same performance. Additionally, our area-time efficient design considers a trade-off between time and required resources, which shows a 48% efficiency improvement with 52% fewer resources. Finally, effective side-channel countermeasures are added to our proposed designs, which also outperform previous works. Keywords: Curve448 · Elliptic curve cryptography · FPGA Hardware security · Implementation · Point multiplication · Side-channel

1

·

Introduction

Elliptic curve cryptography (ECC) has gained prominent attention among asymmetric cryptographic algorithms due to its short key size. ECC is mostly implemented in Internet-of-Thing (IoT) devices considering their limited power c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 228–247, 2020. https://doi.org/10.1007/978-3-030-65277-7_10

Efficient Hardware Implementations for Elliptic Curve Cryptography

229

resources and processing units. Recently, to address some backdoor issues in ECC constructions due to advances in the strong cryptanalysis and classical attacks, new NIST [1] and IETF [2] recommendations make Curve25519 and Curve448 suitable for higher-level security requirements. Although we are confident with the security of ECC over prime fields, there is always the possibility that algorithmic improvements reduce the required computation to break ECC. Therefore, moving to a higher level of security will help to keep a margin against unknown attack improvements. However, higher security levels come with the performance penalty and industry often resists them. Hence, we need to provide a level of security that can be feasible subject to the performance requirement of the target application such as high-end servers of constrained devices. According to Shor’s algorithm [3], most of the current cryptosystems will be broken by quantum computing. Hence, Post-Quantum Cryptography (PQC) algorithm are going to replace the classic public key cryptography algorithms. PQC based on elliptic curves is available for example in [4–6]. However, the transition to PQC includes an emerging field called hybrid systems, which require both classic and PQC [7]. Hence, ECC is going to be used in the hybrid mode for maintaining accordance with industry or government regulations, while PQC updates will be applied completely. Therefore, classical cryptosystems cannot be eliminated even if PQC will significantly be developed, so designing high-security ECC is crucial. As a part of the Transport Layer Security (TLS) [8], Curve448 provides 224bit security designed by Hamburg in 2015 [9,10]. Moreover, Curve448 belongs to [11], and Safe-Curve policies are considered in its design procedures. Although Curve25519 is highly investigated in recent years for different applications [12–15], there are few FPGA-based Curve448 implementations due to its optimal software-based design. The closest related works that can be directly compared to ours are proposed by Sasdrich and G¨ uneysu in [16], and the protected architecture in [13] by adding re-randomization countermeasure to design the resistant scheme against horizontal attacks. In these works, a field arithmetic unit based on schoolbook multiplication is designed by cascading arithmetic and reduction core employing 28 and 5 DSP blocks, respectively. These architectures heavily rely on the DSP blocks which leads to work in high operating frequency (i.e. 357 and 341 MHz). Furthermore, Shah et al. in [17] proposed a LUT-based scheme for high-performance point multiplication employing the most significant digit multiplier. Moreover, our FPAG-based design for Curve448 is presented in [18]. Other FPGA implementations of ECC for Montgomery curves in the literature cannot be directly compared to ours, because they target different curves. Ananyi et al. in [19] introduced a flexible hardware ECC processor over several NIST prime fields. Furthermore, Alrimeih et al. designed an enhanced architecture over different NIST prime fields up to 521-bit in [20] including several countermeasures. Based on the aforementioned discussions, implementation gaps are identified in: (i) the need for exploration of the different trade-offs between resource

230

M. Bisheh Niasar et al.

utilization and performance considering different optimization goals, and (ii) the lack of employing the Karatsuba-friendly property of Curve448. Our Contributions: To the best of our knowledge, there appear to be extremely few hardware implementations that focus only on Curve448 and make the best of all its features. The main contributions of this work are as follows: – We Investigate three design strategies to port Curve448 to various platforms with different design goals (i.e., time-constrained, area-constrained, and area-time trade-off applications) using a precise schedule corresponding to each architecture. Hence, different modular multiplication and addition/subtraction modules are developed particularly tailored on a Xilinx Zynq 7020 FPGA to perform variable-base-point multiplications. Furthermore, all schemes are extended by side-channel countermeasures. – The proposed architectures are combined with interleaved multiplication and reduction employing redundant number presentation and refined Karatsuba multiplication to increase efficiency in comparison with those presented in previous. – Our proposed architectures outperform the counterparts available in the literature. The rest of this paper is organized as follows: In Sect. 2, some relevant mathematical background and side-channel considerations are reviewed. In Sect. 3, the proposed architectures are investigated. In Sect. 4, our proposed FPGA implementations are detailed. In Sect. 5, the results and comparison with other works are discussed. Eventually, we conclude this paper in Sect. 6.

2

Preliminaries

In this section, the mathematical background of ECC will be covered briefly. Additionally, Curve448 and its specifications will be introduced. Then, the respective side-channel analysis attack protection will be described. 2.1

Field Arithmetic and ECDH Key Exchange

The Galois Field GF (p) is described by finite elements including {0, 1, . . . , p − 1} to define a finite field. Curve448 over GF (p) is defined by E : y 2 + x2 ≡ 2 2 448 224 1 + dx y mod p where p = 2 − 2 − 1 and d = −39081. This curve is a Montgomery curve and also an untwisted Edwards curve which is called Edwards448 [2]. Curve448 specifications can be employed to speed up the elliptic curve DiffieHellman (ECDH) Key-Exchange. Using the advantage of 448 = 7 × 64 = 14 × 32 = 28 × 16 = 56 × 8 provides more flexibility to design efficient architecture for different platforms. Additionally, due to its Solinas prime with golden ratio φ = 2224 , fast Karatsuba multiplication can be performed as follows:

Efficient Hardware Implementations for Elliptic Curve Cryptography

231

C = A · B = (a1 φ + a0 ) · (b1 φ + b0 ) = a1 b1 φ2 + a0 b0 + ((a1 + a0 ) · (b1 + b0 ) − a1 b1 − a0 b0 )φ ≡ (a1 b1 + a0 b0 ) + ((a1 + a0 ) · (b1 + b0 ) − a0 b0 )φ

(mod p)

(1)

where A = (a1 φ + a0 ), B = (b1 φ + b0 ), and A, B, C ∈ GF (p). To implement modular inversion over GF (p) using Fermat’s Little Theorem (FLT), a−1 ≡ ap−2 mod p is computed by consecutive operations including 447 squaring and 15 multiplications. To generate a shared secret key Q between two parties through an insecure channel, i.e., internet, ECDH Key-Exchange protocol can be implemented using elliptic curve point multiplication (ECPM) Q = k · P over Curve448 where k and P are a secret scalar and a known base point, respectively. Moreover, public keys of Curve448 are reasonably short and do not require validation as long as the resulting shared secret is not zero. 2.2

Group Arithmetic and Montgomery Ladder

Scalar multiplication is broken down into 448 iterations considering bit values of k. To perform efficient scalar multiplication over the Montgomery curve, the Montgomery ladder was introduced [21] to perform one point addition (PA) and one point doubling (PD) in each iteration. Furthermore, using this method in projective coordinate increases efficiency. If P = (xp , yp ) is a base point in affine coordinate, it can be transmitted to projective coordinate such that P = (X, Y, Z) where xp = X/Z and yp = Y /Z. Suppose P1 = (X1 , Z1 ) and P2 = (X2 , Z2 ) are two points in projective coordinates. Therefore, P1 + P2 and 2P1 are computed by following equations: XP D =(X1 − Z1 )2 · (X1 + Z1 )2 ZP D =4X1 Z1 ·

(X12

+ 39081X1 Z1 +

XP A =4(X1 X2 − Z1 Z2 )

2

(3) (4)

ZP A =4xp (X1 Z2 − Z1 X2 )

2

2.3

(2) Z12 )

(5)

Side-Channel Protection

To implement a resistant architecture against side-channel analysis (SCA) attacks, different considerations are taken into account. Hence, several countermeasures should be embedded in the cryptographic implementations to prevent information leakage. Two basic protection including (i) constant-time implementation against timing attack, and (ii) secret-independent implementation against simple power analysis (SPA) can be achieved by performing inherently resistant algorithms. Hence, the proposed architecture is resistant against timing attacks and SPA attacks due to performing a constant number of operations in each

232

M. Bisheh Niasar et al.

iteration of the Montgomery ladder and employing constant-time FLT Inversion. Furthermore, some countermeasures which were introduced by Coron [22] are considered to avoid differential power analysis (DPA) attacks. These countermeasures include point randomization and scalar blinding which change both terms in the scalar multiplication Q = k · P . Point Randomization: Point randomization can be achieved by adding a degree of freedom to represent a base point using projective coordinate representations. In this method, the base point P = (X, Z) is projected from affine coordinates using a random value λ ∈ Z2448 \ {0} such that Pr = (λ · X, λ · Z). However, the scalar multiplication output is not changed, as proven in (6). xp =

λX X = Z λZ

(6)

Point randomization provides different point representations corresponding to random value λ to avoid any information extraction employing statistical analysis. Scalar Blinding: The second term in scalar multiplication is randomized in scalar blinding. In this method, multiple group order #E is added to k such that kr = k + r × #E where r is a random value. According to the fact that adding group order times of base point results in the point in infinity, the correctness of the scalar blinding approach can be proven as follows: kr · P = (k + r × #E) · P = k · P + r · O = k · P

(7)

This computation takes away data dependency between the swap function in the Montgomery ladder and the corresponding bit in k.

3

Proposed Algorithm and Architecture

In this section, the proposed algorithms for three different performance levels including lightweight, area-time efficient, and high-performance designs are covered. Moreover, their architectures and modular arithmetic units are investigated. The top-level architecture used in our schemes for different design strategies is illustrated in Fig. 1 composed of three stages: (i) the dedicated controller stage including an FSM and program ROM, (ii) the field arithmetic units in the middle stage including modular addition/subtraction and modular multiplication, and (iii) a memory stage to store the intermediate results. All stages are customized based on the corresponding design strategy for increasing efficiency. In lightweight and area-time efficient design, a major determinant of efficiency is matching the compute power to the memory bandwidth, so that most cycles the compute units and the memory units are used to accomplish as much work as possible. However, in high-performance design, more resources are utilized to reduce the required time for computation.

Efficient Hardware Implementations for Elliptic Curve Cryptography

233

P (base point) k

Design I

Implementing the Karatsuba multiplication requires a complex memory access pattern. Therefore, in lightweight architecture, the product scanning approach is proposed to perform repetitive operations with shared resources. In Design II and III, Karatsuba-based modular multiplication is employed.

c

0 c A B

1 16 16

A

+/-

16

16

B

16 C

17

Design II

A

+/-

128

128 C

84 Accu.

128

B

ai1

bi1

128 m0

64

ai0

Int. b i

A 128 B 128

128

16 C

+/-

B 0. Then (Su ||Sv ||Sw )k = (Su ||F (Sv )||Sw )k . Proof. By Lemma 2 we have (Su ||Sv ||Sw )k = (Su )k · k v +w + (Sv )k · k w + (Sw )k = (Su )k · k v +w + (F (Sv ))k · k w + (Sw )k = (Su ||F (Sv )||Sw )k .

Theorem 1. Let S = (a−1 , . . . , a1 , a0 ) be the k-ary representation of N ∈ Z+ with k  N . Let Sn || · · · ||S1 be the decomposition of S into GoodStrings and BadStrings given by Lemma 1. Then N = (Sn || · · · ||S1 )k = (F (Sn )|| · · · ||F (S1 ))k . Remark 3. For all 1 ≤ i ≤ n, the entries of F (Si ) lie in the set {±1, . . . , ±(k − 1)}. Proof. By definition of the ai we have N = (a−1 , a−2 , . . . , a1 , a0 )k = (Sn || · · · ||S1 )k . We define a new -String from the Si as S := (F (Sn )|| · · · ||F (S1 )) = (b−1 , b−2 , . . . , b1 , b0 ) so that bi = 0 for all i by to the definition of F . We apply Lemma 3 on each of the n GoodStrings or BadStrings in S where at each time we replace Si by F (Si ). At the end of n number of steps the new recoded string is (F (Sn )|| · · · ||F (S1 )). At each of these steps the k-ary value of the updated string remains the same according to Lemma 3 by the equation (Su ||Sv ||Sw )k = (Su ||F (Sv )||Sw )k . While doing so we follow three different settings following Lemma 3

Extending the Signed Non-zero Bit and Sign-Aligned Columns

255

1. We begin with applying the function F according to Definition 5 on Sn by setting Sv = Sn , Su to be an empty string and Sw = Sn−1 || · · · ||S1 . 2. In the intermediate steps we set Su = F (Sn )|| · · · ||F (Si+1 ), Sv = Si and Sw = Si−1 || · · · ||S1 for the intermediate applications 2 ≤ i ≤ n − 1 3. And at the final step we set Su = F (Sn )|| · · · ||F (S2 ) , Sv = S1 and Sw is an empty string Therefore, we can conclude N = (Sn || · · · ||S1 )k = (F (Sn )|| · · · ||F (S1 ))k . 2.2



Recoding Algorithm

In the previous section we have built a theory of replacing all zero entries within an -String with signed non-zero entries so that the k-ary value of the -String remains unchanged. While doing this we trace down all BadStrings, since only BadStrings contains zeroes in an -String, from left to right and replace each of them with a GoodString. Due to the fact that the length of each of the GoodStrings or BadStrings are very likely to be different makes this tracing process irregular. Therefore, building an algorithm following the tracing process that has such a lack of uniformity will make it vulnerable against simple power analysis attacks. Thus, in order to gain some degree of resistance against such attacks, we need to build an algorithm that replaces BadStrings by GoodStrings keeping the k-ary value unchanged in a uniform manner. To give it a regular structure, we use the lookup table given in Table 1, called the k-ary table. While generating the signed non-zero digit representation of a given -String S = (a−1 , . . . , a0 ), Algorithm 1 traces two entries ai , ai−1 from left to right and replaces ai−1 by the (ai , ai−1 )-th entry of the k-ary table. Since at each step we are tracing two entries at a time and replace one entry by a table look up, this gives a regularity to the algorithm. We call this algorithm Recode. The output of Recode on input (a−1 , . . . a1 , a0 ) is exactly Recode(a−1 , . . . a1 , a0 ) = (1, M(0,a−1 ) , M(a−1 ,a−2 ) , . . . , M(a1 ,a0 ) ).

(2)

Table 1. Lookup table where the entry in row x0 and column x1 represents the value of M(x0 ,x1 ) . x1 x0

0

1

2

3

···

m

···

k−1

0

−(k − 1)

−(k − 1)

−(k − 2)

−(k − 3)

···

−(k − m)

···

−(k − (k − 1))

1

1

1

2

3

···

m

···

(k − 1)

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

j

1

1

2

3

···

m

···

(k − 1)

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

k−1

1

1

2

3

···

m

···

(k − 1)

256

A. Dutta et al.

Algorithm 1. Recode Require: k ∈ N with k ≥ 2, and a string of digits (a−1 , . . . , a1 , a0 ) with each ai ∈ {0, 1, . . . , k − 1} and a0 = 0. Ensure: A string s = (s , s−1 , . . . , s1 , s0 ) with si ∈ {±1, ±2, . . . , ±(k − 1)}, and (s , s−1 , . . . , s1 , s0 )k = (a−1 , . . . , a1 , a0 )k 1: M ← GenTable(k) 2: for i = 0 to  − 2 do si ← M(ai+1 ,ai ) end for 3: s−1 ← M(0,a−1 ) and s ← 1 4: return s

Note that all rows of Table 1 after the first are identical. Therefore it can be compressed to a 2 × k table, where the two rows of the compressed table are identical with the first two rows of Table 1. Using the compressed k-ary table in the above context, one can instead replace ai−1 with the (χ(ai ), ai−1 ) entry of the compressed table, where χ is an indicator function (see Sect. 3.2). Alternatively, one might replace the entire table with an algebraic function which returns the appropriate value. Before we state and prove the next Lemma we introduce the following notation. For a string S+1 = (c , c−1 , c−2 , . . . , c0 ), we define S+1 ∼ (,  − 1) to be the string S+1 with the entries with index  and  − 1 removed: S+1 ∼ (,  − 1) := (c−2 , . . . , c0 ). Lemma 4. For 1 ≤ i ≤ n, let Si = (cii −1 , · · · , ci0 ) be either a GoodString or a BadString. Then Recode(Sn || · · · ||S1 ) = Recode(Sn ) || Tn −1 || · · · || T1 , where Ti = M(ci+1 ,ci

i −1 )

0

||(Recode(Si ) ∼ (i , i − 1)), for all 1 ≤ i < n.

Note that each Recode(Si ) is an (i +1)-String, and so the removal operation ∼ is well-defined here. Proof. Running Algorithm 1 with input Sn || · · · ||S1 gives exactly the output Recode(Sn || · · · ||S1 ) = S n ||S n−1 || · · · ||S 1 , where S n = (1, M(0,cn

n −1

n , M(cn −2 ,cn −3 ) , . . . , M(cn1 ,cn0 ) ) ) , M(cn n −1 ,cn −2 ) n n

S i = (M(ci+1 ,ci 0

i −1 )

, M(ci

i −1

,ci

i −2

),

M(ci

i −2

,ci

i −3

) , . . . , M(ci1 ,ci0 ) )

and for i < n.

For i < n, we can directly compute S i = (M(ci+1 ,ci

, M(ci −1 ,ci −2 ) , . . . , M(ci1 ,ci0 ) ) i i   = M(ci+1 ,ci ) || 1, M(0,ci −1 ) , M(ci −1 ,ci −2 ) , . . . , M(ci1 ,ci0 ) ∼ (i , i − 1) 0 i −1 i i i  = M(ci+1 ,ci ) || Recode(Si ) ∼ (i , i − 1) = Ti . 0

0

i −1 )

i −1

Extending the Signed Non-zero Bit and Sign-Aligned Columns

257

Finally, by Eq. 2 we have S n = Recode(Sn ).



Lemma 5. Let an -String S be written as Sn || . . . ||S1 where Si = (cii −1 , · · · , ci0 ) is either a GoodString or a BadString for all 1 ≤ i ≤ n. We define Ti = M(ci+1 ,ci ) ||(Recode(Si ) ∼ (i , i − 1)) 0

i −1

for all 1 ≤ i < n as in Lemma 4. Then for all i ∈ {1, 2, . . . , n − 1} we have Ti = F (Si ), where F is given in Definition 5. Proof. By Eq. 2 the full output of the  Recode algorithm on input Si is Recode(Si ) = 1, M(0,ci −1 ) , . . . , M(ci1 ,ci0 ) . We therefore have i



Ti = M(ci+1 ,ci 0

i −1 )

= (M(ci+1 ,ci 0

i −1 )

 ||(Recode(Si ) ∼ (i , i − 1))

, M(ci

i −1

,ci

i −2

) , M(ci

i −2

,ci

i −3

) , . . . , M(ci1 ,ci0 ) ).

Using the definition of M we can directly compute the M values above. Recall that each cij is nonzero when Si is a GoodString, each cij is zero for j > 0 when Si is a BadString, and every ci0 is always nonzero. For 1 < j < i , we then have  1 if Si is a BadString M(ci+1 ,ci ) = 0 i −1 if Si is a GoodString, cii −1  −(k − 1) if Si is a BadString M(cij ,cij−1 ) = if Si is a GoodString, cij−1  i −(k − c0 ) if Si is a BadString M(ci1 ,ci0 ) = if Si is a GoodString. ci0 By replacing each M table lookup with its value, we get  (1, −(k − 1), . . . , −(k − 1), −(k − ci0 )) if S is a BadString T i = if S is a GoodString. (cii −1 , cii −2 , . . . , ci1 , ci0 ) The right hand side in the above equation is exactly F (Si ).



Lemma 6. If S is any GoodString or BadString, then Recode(S )k = (S )k . Proof. Suppose first that S is a BadString of length  with S = (0, 0, . . . , 0, a) for some a ∈ {1, 2, . . . , k − 1}. Since M(0,a) = −(k − a) and M(0,0) = −(k − 1), the output of Recode is Recode(Sn ) = (1, −(k − 1), −(k − 1), . . . , −(k − 1), −(k − a)) .



+1

Computing the k-ary value results in a telescoping sum: (Recode(S ))k = [1 · k  ] + [−(k − 1) · k −1 ] + · · · + [−(k − 1) · k] + [−(k − a)] = k  + [−k  + k −1 ] + · · · + [−k 2 + k] + [−k + a] = a = (S )k .

258

A. Dutta et al.

This proves the statement when S is a BadString. Suppose now that S is a GoodString with S = (a−1 , a−2 , . . . , a1 , a0 ) for some ai ∈ {1, 2, . . . , k − 1} for all 0 ≤ i ≤  − 1. Since M(ai+1 ,ai ) = ai the output of Recode is mostly the same as the input, but we must account for the 0 that is initially prepended. We have M(0,a−1 ) = −(k − a−1 ), and so Recode(S ) = (1, −(k − a−1 ), a−2 , . . . , a1 , a0 ) When taking the k-ary value, the “1” and the “k” in the expression above cancel each other: (Recode(S ))k = [1 · k  ] + [−(k − a−1 ) · k −1 ] + (a−2 , . . . , a1 , a0 )k = a−1 · k −1 + (a−2 , . . . , a1 , a0 )k = (S )k . This proves the statement when S is a GoodString and concludes the proof.

Theorem 2. Let S = (a−1 , . . . , a0 ) be the k-ary representation of some N ∈ N with k  N . Write S = Sn || · · · ||S1 with each Si a GoodString or BadString. Then Recode(S ) = Recode(Sn )||F (Sn−1 )|| . . . ||F (S1 ), and 1. (Recode(S ))k = N , 2. Recode(S ) has entries in {±1, . . . , ±(k − 1)}, 3. the length of Recode(S ) is  + 1. Proof. As in Lemma 4 we define Ti = M(ci+1 ,ci 0

i −1 )

||(Recode(Si ) ∼ (i , i − 1))

for each 1 ≤ i < n. Applying Recode with Lemmas 4 and 5, we get Recode(S ) = Recode(Sn || · · · ||S1 ) = Recode(Sn )||Tn−1 || · · · ||T1 = Recode(Sn )||F (Sn−1 )|| · · · ||F (S1 ).

(3)

Taking the k-ary value, we have (Recode(S ))k = (Recode(Sn )||F (Sn−1 )|| · · · ||F (S1 ))k = (Recode(Sn ))k · k

n−1 j=1

i +1

+ (F (Sn−1 )|| · · · ||F (S1 ))k .

Now we apply Theorem 1 on (Sn−1 || · · · ||S1 )k and Lemma 6 on Recode(Sn ) to get (F (Sn−1 )|| · · · ||F (S1 ))k = (Sn−1 || · · · ||S1 )k ,

(Recode(Sn ))k = (Sn )k

and conclude Recode(S )k = (S )k = N , proving the first claim. To show claim 2, we examine Eq. (3). Each F (Sn ) has entries in K = {±1, . . . , ±(k − 1)} by definition, and the proof of Lemma 6 explicitly computes the value of Recode(Sn ), from which one can see it also has values in K. By examining Algorithm 1, it should be clear that the length of the output array B is exactly  + 1, which shows claim 3.

Extending the Signed Non-zero Bit and Sign-Aligned Columns

259

Algorithm 2. Align Require: k ∈ N with k ≥ 2; a sign sequence s = (s , s−1 , . . . , s0 ) with each si ∈ {−1, 1} and s = 1; and a = (a−1 , . . . , a0 ) with each ai ∈ {0, 1, . . . , k − 1}. Ensure: A string b = (b , b−1 , . . . , b0 ) with bi ∈ {0, ±1, ±2, . . . , ±(k − 1)}, with (b , b−1 . . . , b0 )k = (a−1 , . . . , a0 )k , and either bi = 0 or Sign(bi ) = si for all i. 1: bi ← ai for i = 0, 1, . . . ,  − 1. 2: b ← 0. 3: for i = 0 to  − 1 do 4: if bi = k then 5: bi ← 0, bi+1 ← bi+1 + 1 6: else 7: if si = 1 then 8: bi ← bi , bi+1 ← bi+1 9: else 10: if bi = 0 then 11: bi ← bi , bi+1 ← bi+1 12: else 13: bi ← −(k − bi ), bi+1 ← bi+1 + 1 14: end if 15: end if 16: end if 17: end for 18: return b

3

Generalized Sign Aligned Recoding Algorithm

In Sect. 2 we introduced an algorithm that recodes the k-ary representation of an integer into a signed non-zero digit representation. As was discussed in Sect. 1, it is desirable to have a sign aligned recoding algorithm which yields a better storage complexity for our scalar multiplication algorithm. This section will introduce a sign-alignment algorithm for a general base k and prove its correctness. 3.1

Basic Sign-Aligned Recoding

A sign sequence is any -String consisting of 1’s and −1’s. Suppose S = (a−1 , . . . , a0 ) is the k-ary representation of a positive integer N , and that S is recoded using Recode to a new representation (b , . . . , b0 ). In this section, we detail Algorithm 2, called Align, and its supporting mathematical proof. For a given length  + 1 sign-sequence (s , . . . , s0 ), the Align algorithm recodes an -String with entries in {0, 1, . . . , k − 1} so that the sign of each of its the entries agrees with the sign sequence. In the scalar multiplication algorithm presented in Sect. 4, we will take (Sign(b ), . . . , Sign(b0 )) as the sign sequence. The following theorem makes this discussion more precise and serves as a correctness statement for Algorithm 2.

260

A. Dutta et al.

Theorem 3. Let ξ+1 = (s , . . . , s0 ) such that si ∈ {1, −1} with s = 1, and let S = (a−1 , . . . , a0 ) with ai ∈ {0, 1, . . . , k − 1}. Let Align(ξ+1 , S ) = (b , . . . , b0 ). Then: 1. For 0 ≤ i <  we have bi ∈ (−k, k) ∩ Z, and b ∈ {0, 1}, 2. For 0 ≤ i ≤ , either bi = 0 or Sign(bi ) = si , 3. (b , b−1 , . . . , b0 )k = (a−1 , . . . , a0 )k . Proof. We first prove claims (1.) and (2.). Notice that during the algorithm every entry of b with the exception of b0 is updated exactly twice (though sometimes to the same value), with bi being updated on iterations i − 1 (by a “bi+1 ←” update rule) and i (by a “bi ←” update rule). We’ll first prove the claims true for b0 , which is only modified on iteration i = 0. At the start of the algorithm b0 is initialized with value a0 , which lies in the set {0, 1, . . . , k − 1} by assumption, and so the conditional statement “b0 = k” cannot evaluate to true. In the Else branch on line 6, if s0 = 1 then b0 is to remain unmodified from its value of a0 and the claims are true. Alternatively we have s0 = −1. Then either b0 = a0 = 0 and its value is left unchanged and the claims are true, or b0 = a0 ∈ {1, . . . , k − 1} in which case the final value becomes −(k − a0 ) ∈ {−1, −2, . . . , −(k − 1)} and Sign(b0 ) = −1 = s0 . This proves claims (1.) and (2.) for i = 0. Now fix an index 1 ≤ j ≤  − 1 to examine. As previously stated, bj is modified only on iterations i = j − 1 and i = j using separate sets of rules. All of the “bi+1 ←” update rules either leave the variable unchanged or increment it by 1, and we will handle each case separately. First suppose on iteration i = j − 1 that bj remains unchanged, so that at the start of iteration i = j we have bj = aj ∈ {0, 1, . . . , k − 1}. Here we cannot have bi = k, so we proceed into the Else branch of line 6. There are three separate cases to consider: 1. sj = 1: then no update to bi is performed and bj ’s final value is aj . 2. sj = −1 and bj = 0: then bj is assigned final value 0. 3. sj = −1 and bj ∈ {1, 2, . . . , k − 1}: then bj is assigned final value −(k − aj ) ∈ {−1, −2, . . . , −(k − 1)}. In each case above claims (1.) and (2.) of the theorem are satisfied. Now suppose that iteration i = j − 1 takes the action “bi+1 ← bi+1 + 1”, so that at the start of iteration i = j we have bj = aj + 1 ∈ {1, 2, . . . , k}. Here it’s possible that bi = k, in which event bj gets final value 0. Otherwise, the cases proceed identical to the previous paragraph except that the option bj = 0 is eliminated. In all cases claims (1.) and (2.) are therefore satisfied for b0 , . . . , b−1 . The final digit b is initialized to 0 and is only modified on the final iteration i =  − 1 by one of the “bi+1 ←” update rules. Consequently it can only have value 0 or 1. Note that s = 1 by assumption. This proves claims (1.) and (2.) for all digits. We prove claim (3.) by showing inductively that it holds true upon completion of each iteration of the main loop, and will therefore hold true at the end of the

Extending the Signed Non-zero Bit and Sign-Aligned Columns

261

algorithm. The first line of the algorithm initializes b with value a, serving as the base case of the induction. The branching structure of the algorithm gives four possible cases to consider, two of which leave the value of b unmodified. We consider the two nontrivial cases below. In the following, we let bij denote the value of the variable bj at the time of completion of the i-th iteration of the algorithm. Let 0 ≤ i ≤  − 1 be fixed. = k. In this case the update rules give The first nontrivial case is if bi−1 i i bi = 0 and bii+1 = bi−1 + 1, and all other values of b remain unchanged from the i+1 i−1 i previous iteration giving bj = bj for j = i, i + 1. We then have i−1 i−1 (bi , . . . , bii+1 , bii , . . . , bi0 )k = (bi−1  , . . . , bi+1 + 1, 0, . . . , b0 )k i−1 i−1 = (bi−1  , . . . , bi+1 , k, . . . , b0 )k

=

(same k-ary value)

i−1 i−1 , . . . , bi−1 (bi−1 0 )k  , . . . , bi+1 , bi

= (a−1 , . . . , a0 )k .

(by inductive hypothesis)

= 0, where the new values become The remaining case is when si = −1 and bi−1 i ) and bii+1 = bi−1 bii = −(k − bi−1 i i+1 + 1. Here we get i−1 i−1 (bi , . . . , bii+1 , bii , . . . , bi0 )k = (bi−1 ), . . . , bi−1 0 )k  , . . . , bi+1 + 1, −(k − bi i−1 i−1 = (bi−1 , . . . , bi−1 0 )k  , . . . , bi+1 , bi

= (a−1 , . . . , a0 )k .

(same k-ary value)

(by inductive hypothesis)

In all cases claim (3.) is satisfied. This completes the induction and concludes the proof. 3.2

Optimized Regular Sign-Alignment

As written, Algorithm 2 contains numerous branches which are heavily dependent upon the input string, which can potentially be exploited in side channel attacks. In this section, we present Algorithm 3, an alternate form of Algorithm 2, which is more resistant against side-channel attacks. To get started, we notice that each of the conditional statements in Algorithm 2 are a check for equality between two values. All of these checks can be put under the same umbrella by introducing an indicator function χ : Z → {0, 1}, whose value is defined to be  1 if x = 0 χ(x) = 0 if x = 0. The function χ can be used to transform conditional statements of a certain form into arithmetic evaluations in the following manner:

262

A. Dutta et al.

Algorithm 3. OptimizedAlign(s, a) Require: k ∈ N with k ≥ 2; a sign sequence s = (s , s−1 , . . . , s0 ) with each si ∈ {−1, 1} and s = 1; and a = (a−1 , . . . , a0 ) with each ai ∈ {0, 1, . . . , k − 1}. Ensure: A string b = (b , b−1 , . . . , b0 ) with bi ∈ {0, ±1, ±2, . . . , ±(k − 1)}, with (b , b−1 . . . , b0 )k = (a−1 , . . . , a0 )k , and either bi = 0 or Sign(bi ) = si for all i. 1: b ← 0 and bi ← ai for i = 0, 1, . . . ,  − 1. 2: for i = 0 to  − 1 do 3: u1 ← χ(bi − k), u2 ← χ(si − 1), u3 ← χ(bi ) 4: v ← (1 − u1 ) · (1 − u2 ) · (1 − u3 ) 5: bi ← (1 − u1 ) · bi − v · k and bi+1 ← bi+1 + u1 + v 6: end for 7: return b

a=b var ← X

−→

var ← χ(a − b) · X + (1 − χ(a − b)) · Y

var ← Y assuming that X and Y have numeric values. By repeatedly replacing the innermost conditional statements in Algorithm 2 with their equivalent arithmetic expressions as above and simplifying, we arrive at Algorithm 3. This version of Align makes three calls to the indicator function χ before computing the new value of b, and completely eliminates the branching based on secret data seen in Algorithm 2. Therefore, if both χ and integer arithmetic are implemented in a constant-time fashion, we obtain better side-channel resilience with Algorithm 3.

4

Cryptographic Applications

Algorithm 6 presents our scalar multiplication algorithm, which uses Recode (Algorithm 1) and OptimizedAlign (Algorithm 3) from the previous sections. Any cryptographic protocol which requires a VS-FB scalar multiplication computation is a suitable application for our algorithm; see Sect. 1 for examples of such protocols. If the precomputation stage has low cost (which may be the case on a curve with efficient endomorphisms, for instance), Algorithm 6 would also be suitable for applications in the VS-VB setting. In this section, we analyze Algorithm 6 and its cost in terms of both run time and storage. We also discuss choosing the parameters k and d, and the choice of curve and coordinate system which result in a more efficient algorithm. Throughout this section we will use the following notation for the cost of various operations: a, m, s, and i will respectively denote the costs of addition/subtraction, multiplication, squaring, and inversion in a fixed field F; we will use A, D, T, and K to respectively denote the elliptic curve operations of point addition, doubling, tripling, and general multiplication by k on some fixed elliptic curve E.

Extending the Signed Non-zero Bit and Sign-Aligned Columns

263

Algorithm 4. Scalar Multiplication Require: An integer k ≥ 2, a point P of order m relatively prime to k in an abelian a ∈ [1, group G, with  = log2 (m); a scalar  m); an integer d ≥ 2 determining the  number of subscalars, with w := Ensure: aP .

logk (2 ) d

.

Precomputation Stage 1: T [uk + v] ← (ud−1 k(d−1)w + · · · + u2 k2w + u1 kw + v)P for all v ∈ [1, k), u ∈ [0, kd−1 ) where (ud−1 , . . . , u1 ) is the k-ary representation of u. Recoding Stage 2: kMULT ← a mod k 3: if kMULT = 0 then a ← m − a 4: (adw−1 , . . . , a0 ) ← k-ary representation of a, padded with sufficiently many zeroes on the left. 5: (b1w , . . . , b10 ) ← Recode(aw−1 , . . . , a0 ) 6: (biw , . . . , bi0 ) ← OptimizedAlign(Sign(b1w ), . . . , Sign(b10 )), (aiw−1 , . . . , a(i−1)w )) for 2 ≤ i ≤ d. kd−2 + · · · + b2i k + b1i | for 0 ≤ i ≤ w. 7: Bi ← |bdi kd−1 + bd−1 i 8: 9: 10: 11: 12: 13: 14:

4.1

Evaluation Stage Q ← T [Bw ] for i = w − 1 to 0 by −1 do Q ← kQ Q ← Q + Sign(b1i ) · T [Bi ] end for if kMULT = 0 then return −Q return Q

Explanation of Algorithm 4

Our scalar multiplication algorithm uses an integer parameter k ≥ 2, for which the input scalar is written in base k and recoded, and an integer parameter d ≥ 2, which is used to split the input scalar into d subscalars for sign alignment. The base point P of the algorithm should have order m relatively prime to k, where P is a point of an abelian group G, with m having  = log2 (m) bits. The input scalar a is an integer in [1, m), and the algorithm returns aP as output. We let   logk (2) , which will determine the length in digits of each subscalar. w := d We briefly explain the details of Algorithm 4 as follows. In the Precomputation Stage of Algorithm 4, specific multiples of P are computed which will be used in the main loop of the algorithm (on line 11). For 0 ≤ i < d, let Pi = k iw P . Then line 1 computes all points in the set {ud−1 Pd−1 + · · · + u1 P1 + vP0 : 0 ≤ ui < k, 1 ≤ v < k}

(4)

and arranges them into a table T . For a fixed base algorithm (such as the first round of ECDH) this step will be performed offline at no cost, and so we do not go into details on how the computation of the points in this set is carried out.

264

A. Dutta et al.

In the Recoding Stage, the scalar a is recoded using Recode and OptimizedAlign from the previous sections, which gives our scalar multiplication algorithm its regular structure. Note that the input scalar a can be any integer in [1, m), while the input to Recode requires k not to divide a. To address this restriction, we assign a variable kMULT to a mod k and if this equals 0 we update scalar a to m − a. In this case, the return value should be corrected to −Q. We have included the restriction that k and m be relatively prime so that k cannot divide both a and m − a simultaneously. Line 3 uses this approach to ensure that k does not divide a. Afterwards, a is written in k-ary as a = (adw−1 , . . . , a0 )k (padded with zeroes if necessary) and partitioned into d subscalars; let a(i) := (aiw−1 , . . . , a(i−1)w ) for 1 ≤ i ≤ d. On line 5 the Recode algorithm is invoked on a(1) with base k (note that a0 = 0 since by this step a is not divisible by k) to get an output b(1) with the properties of Theorem 2. Next a sign alignment is performed on each of the remaining subscalars a(2) , . . . , a(d) on line 6 to find b(2) , . . . , b(d) , each of which satisfies the properties of Theorem 3 with respect to the signs of the digits of b(1) . Finally line 7 collects the digits of the b(j) into new scalars Bi which give the entries of the lookup table T to be used in the evaluation stage. The last lines 8–12 of Algorithm 4 define the Evaluation Stage. This stage proceeds by performing one point multiplication by k and one point addition in G on each iteration of the loop by use of the Bi . Due to the sign alignment of the scalars the sum involved is either an addition or subtraction based on the (1) sign of bi . 4.2

General Cost Analysis

Here we derive general costs involved in Algorithm 4 for a general group G in terms of storage and computation. The main storage cost involved is due to line 1, where all points in the set given in Eq. 4 are required to be stored in the table T . The number of points in this set is exactly (k − 1)k d−1 . Storage costs involved during runtime are that of storing the digits of a, the digits of the recoded scalars b(i) , the integers Bi , and the point Q; we assume these costs are negligible in comparison to the size of the table T and therefore ignore them. The computational costs involved in lines 3–7 deal mainly with small integer arithmetic (including the operations involved in Recode and OptimizedAlign), and we assume the construction of T in line 1 is performed offline at no cost, like in the VS-FB setting. Therefore, we only consider the computational costs involved in the evaluation stage. The loop contains w iterations, which each perform a single point multiplication by k and point addition (we assume inversion in G is negligible, such as on an elliptic curve). This gives a total cost of w(K + A) =   logk (2) (K + A). We summarize these costs as: d Algorithm 4 Storage Cost: Algorithm 4 Computation Cost:

(k − 1)k d−1 points of G,    logk (2) (K + A). d

(5) (6)

Extending the Signed Non-zero Bit and Sign-Aligned Columns

265

The split and comb method is a well-known general-use VS-FB algorithm generally attributed to Yao [28] and Pippenger [24] in 1976. Lim and Lee improved upon this method in [19] to give an efficient special case in 1994. The split and comb method further subdivides each of the d subscalars into v subsubscalars, increasing the storage space linearly in v with the benefit of reducing the number of K operations required. Algorithm 4 can be modified to use the split and comb method by partitioning the (biw , . . . , bi0 ) strings into v many blocks, padding with 0’s if necessary. New integers Bi,j can be derived in a similar way, and the main loop performs 1K and vA per iteration, with fewer iterations total. See [19] for further details on the split and comb method. The total costs for Algorithm 4 modified with the split and comb method are: (7) Storage Cost: v(k − 1)k d−1 points of G.       logk (2)  logk (2)/d + 1 Computational Cost: A+ − 1 K. (8) d v 4.3

Concluding Remarks

An interesting question to ask is under what scenario will k = 3 outperform k = 2 in our algorithm (Algorithm 4 with the split and comb modification) at the same storage level? To investigate this we now compare the computational costs for the k = 2 and k = 3 settings assuming approximately the same amount of total storage space. We choose 256-bit scalar and denote the cost of the point doubling, tripling and addition operations as D, T and A respectively. We use parameters d2 , v2 for the k = 2 setting and d3 , v3 for k = 3 setting. This leads to solving the following inequality, subject to the equal storage constraint: 



⎛⎡  256 log

256 log3 2 A + ⎝⎢ ⎢ d3 ⎢

d3

3

2

v3

 +1





⎥ − 1⎠ T ≤ ⎥ ⎥





⎛⎡ 

256 A + ⎝⎢ ⎢ d2 ⎢

256 d2



v2

+1





⎥ − 1⎠ D ⎥ ⎥

(9) d3 −1

3

d2 −1

2v3 ≈ 2

v2

(10)

To derive a concrete solution, we specialize to the case of Twisted Edwards curves using points in projective coordinates. Here, we assume table elements are stored with Z value 1 and therefore use the cost values of A = 9 m + 1 s, D = 3 m + 4 s, and T = 9 m + 3 s (see [1]), and assume 1 s = 0.8 m. We then d2 −2 solve Approximation 10 to obtain v3 ≈ 23d3 −1v2 , substitute this expression into inequality 9, and consider the two cases d2 = d3 = 2 and d2 = d3 = 4. For these cases, we find v2 ≥ 6 for d2 = d3 = 2 and v2 ≥ 13 for d2 = d3 = 4. Taking d2 = 2 and v2 = 6 yields a storage level of 21 · 6 = 12 points, while d2 = 4 and v2 = 13 gives a storage level of 23 · 13 = 108 points; these storage sizes are certainly feasible for implementation, and so k = 3 may indeed be more desirable than k = 2 in realistic scenarios. Table 2 lists some particular cases where k = 3 outperforms k = 2 at comparable storage levels when d = 2, 4.

266

A. Dutta et al.

We should note that our comparisons are made over a fixed choice of d (while varying v). This is the case in many cryptographic applications. For example, if a protocol is implemented using elliptic curves with endomorphisms of degree 4 [6], then d is naturally fixed to 4. Similarly, d = 2 is a common choice in applications where curves with endomorphisms of degree 2 are deployed [10, 11], or when double point multiplication is required such as in isogeny based cryptosystems [14]. Table 2. Fixing d = 2 and 4 the run time in terms of cost per bit unit for k = 3 is better than k = 2 when the storage i.e the number of precomputed points is equal (or approximately equal) k=2 d v2

4.4

k=3

Storage Cost Per Bit v3 Storage Cost Per Bit

2

6 12 9 18 12 24 54 108 108 216

5.41 5.24 5.14 4.95 4.92

2 12 3 18 4 24 18 108 36 216

4.88 4.30 3.99 3.28 3.19

4

13 104 20 160 27 216

2.55 2.52 2.50

2 108 3 162 4 216

2.46 2.15 2.01

Future Work

We considered only Twisted Edwards curves and projective coordinates for representing points, and we focused on the VS-FB setting. Other curve and coordinate choices may be more suitable for certain scenarios, such as using a triplingoriented Doche-Icart-Kohel curve in the k = 3 setting, or using a mixing of projective and extended coordinates on Twisted Edwards curves in the k = 2 setting. A careful analysis of the performance of Algorithm 4 will be required for both scenarios before a clear winner can be determined. A detailed C implementation would also be very insightful to see how our theoretical costs reflect the timings achieved in practice, for both VS-FB and VS-VB settings. Algorithms we presented in this paper provide some degree of resistance against side-channel attacks thanks to their regular nature. However, further analysis and implementation would be required to evaluate their security in practice.

A

Example of Algorithm 4

Let P be point in an abelian group, such as an elliptic curve over a finite field Fp , and that |P | = m with  = log2 (m) = 15. Then scalars in [1, m) are represented

Extending the Signed Non-zero Bit and Sign-Aligned Columns

267

with length log3 (2 ) = 10, appending leading 0s if necessary. Suppose we run Algorithm 4 with inputs P , a = 39907, d = 4 and k = 3. Notice k does not divide a, so the exact value of m is irrelevant. The length of each subscalar is determined as w = /d = 10/4 = 3. Algorithm 4 computes 39907P as follows. Precomputation Stage: According to Algorithm 4 we first precompute a table T of points of the form T [uk + v] = (u3 , u2 , u1 , v)kw P = (u3 k 3w + u2 k 2w + u1 k w + v)P for v = 1, 2 and u ∈ [0, 33 ) where (u3 , u2 , u1 ) is the k-ary representation of u. Recoding Stage: The k-ary representation of a = 39907 is (2, 0, 0, 0, 2, 0, 2, 0, 0, 1). We pad dw −  = 4 · 3 − 10 = 2 many 0’s on its left to get (0, 0, 2, 0, 0, 0, 2, 0, 2, 0, 0, 1). The string is then split into d = 4 many 3-Strings A1 , A2 , A3 , A4 , shown below. Recode is then applied to A1 = (0, 0, 1) to get the nonzero scalar b1 = (1, −2, −2, −2), and Align (or OptimizedAlign) is applied to A2 , A3 , A4 with the sign sequence Sign(b1 ) = (Sign(1), Sign(−2), Sign(−2), Sign(−2)) = (1, −1, −1, −1), to get sign-aligned scalars b2 , b3 , and b4 . This process can be visualized in matrix form as:

⎡ ⎤ ⎡ ⎤ A1 001 ⎢A2 ⎥ ⎢2 0 2⎥ ⎢ ⎥=⎢ ⎥ ⎣A3 ⎦ ⎣0 0 0⎦ A4 002

⎡ ⎤ ⎡ 1⎤ Recode 1 −2 −2 −2 b } − − − − − − → ⎫ 2⎥ ⎢ ⎥ ⎢ 1 0 −2 −1 b ⎬ ⎢ ⎥ = ⎢ 3⎥ Align −−−−−−→ ⎣0 0 0 0 ⎦ ⎣b ⎦ ⎭ b4 1 −2 −2 −1

Note that the above 4×3 matrix is the matrix used in Straus’ algorithm. The final step of the recoding stage is to compute B0 , B1 , B2 , B3 . These are determined by interpreting the columns of the 4×4 matrix above in base k, with the top row being the least significant digit and B0 corresponding to the rightmost column. Specifically, the Bi are derived as follows. For convenience, the corresponding table entries and the signs si = Sign(b1i ) are also listed. B0 = |(−1, 0, −1, −2)3 | = 32, B1 = |(−2, 0, −2, −2)3 | = 62,

T [32] = 19712P, T [62] = 39422P,

s0 = −1, s1 = −1,

B2 = |(−2, 0, 0, −2)3 | = 56, B3 = |( 1, 0, 1, 1)3 | = 31,

T [56] = 39368P, T [31] = 19711P.

s2 = −1,

Evaluation Stage: It can be easily verified that the following equality holds: 39907P = T [31] · 33 + s2 T [56] · 32 + s1 T [62] · 31 + s0 T [32].

268

A. Dutta et al.

Table 3. Evaluation stage of Algorithm 4 for input a = 39907, k = 3, and d = 4. i b1

b2

3 1 1 0 2 −2 1 −2 −2 0 −2 −1

b3

b4

Bi T [Bi ]

0 1 31 19711P 0 −2 56 39368P 0 −2 62 39422P 0 −1 32 19712P

si

Q ← 3Q Q ← Q + si T [Bi ]

— −1 −1 −1

— 59133P 59295P 59619P

19711P 59133P − 39368P = 19765P 59295P − 39422P = 19873P 59619P − 19712P = 39907P

In the evaluation stage Algorithm 4 computes 39907P in a triple-and-add manner using the above equality. We stress that in general the T [Bi ] are never zero, so trivial additions never occur. We therefore initialize a point Q to have value Q = T [B3 ] = T [31] = 19711P , and then proceed in an alternating sequence of tripling Q and adding/subtracting the appropriate T [Bi ] to Q. Each of the steps are given in Table 3.

References 1. Explicit Formulas Database. https://hyperelliptic.org/EFD/ 2. Avizienis, A.: Signed-Digit Number Representations for Fast Parallel Arithmetic. IRE Transactions on Electronic Computers, EC-10, pp. 289–400 (1961) 3. Avanzi, R., et al.: Handbook of Elliptic and Hyperelliptic Curve Cryptography, Second Edition. Chapman & Hall/CRC, 2nd edition (2012) 4. Booth, A.: A signed binary multiplication technique. Q. J. Mech. Appl. Math. 4, 236–240 (1951) 5. Bos, J., Coster, M.: Addition chain heuristics. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 400–407. Springer, New York (1990). https://doi.org/ 10.1007/0-387-34805-0 37 6. Costello, C., Longa, P.: FourQ: four-dimensional decompositions on a Q-curve over the Mersenne Prime. In: Iwata, T., Cheon, J.H. (eds.) ASIACRYPT 2015. LNCS, vol. 9452, pp. 214–235. Springer, Heidelberg (2015). https://doi.org/10.1007/9783-662-48797-6 10 7. Diffie, W., Hellman, M.: New directions in cryptography. IEEE Trans. Inf. Theory 22, 644–654 (1976) 8. Faz-Hern´ andez, A., Longa, P., S´ anchez, A.H.: Efficient and secure algorithms for GLV-based scalar multiplication and their implementation on GLV-GLS curves. In: Benaloh, J. (ed.) CT-RSA 2014. LNCS, vol. 8366, pp. 1–27. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04852-9 1 9. Feng, M., Zhu, B.B., Zhao, C., Li, S.: Signed MSB-set comb method for elliptic curve point multiplication. In: Chen, K., Deng, R., Lai, X., Zhou, J. (eds.) ISPEC 2006. LNCS, vol. 3903, pp. 13–24. Springer, Heidelberg (2006). https://doi.org/10. 1007/11689522 2 10. Galbraith, S.D., Lin, X., Scott, M.: Endomorphisms for faster elliptic curve cryptography on a large class of curves. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 518–535. Springer, Heidelberg (2009). https://doi.org/10.1007/9783-642-01001-9 30

Extending the Signed Non-zero Bit and Sign-Aligned Columns

269

11. Gallant, R.P., Lambert, R.J., Vanstone, S.A.: Faster point multiplication on elliptic curves with efficient endomorphisms. In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 190–200. Springer, Heidelberg (2001). https://doi.org/10.1007/3540-44647-8 11 12. Hedabou, M., Pinel, P., B´en´eteau, L.: Countermeasures for preventing comb method against SCA attacks. In: Deng, R.H., Bao, F., Pang, H.H., Zhou, J. (eds.) ISPEC 2005. LNCS, vol. 3439, pp. 85–96. Springer, Heidelberg (2005). https:// doi.org/10.1007/978-3-540-31979-5 8 13. Hisil, H., Hutchinson, A., Karabina, K.: d -MUL: optimizing and implementing a multidimensional scalar multiplication algorithm over elliptic curves. In: Chattopadhyay, A., Rebeiro, C., Yarom, Y. (eds.) SPACE 2018. LNCS, vol. 11348, pp. 198–217. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05072-6 12 14. Jao, D., De Feo, L.: Towards quantum-resistant cryptosystems from supersingular elliptic curve isogenies. In: Yang, B.-Y. (ed.) PQCrypto 2011. LNCS, vol. 7071, pp. 19–34. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25405-5 2 15. Jedwab, J., Mitchell, C.: Minimum weight modified signed-digit representations and fast exponentiation. Electron. Lett. 25, 1171–1172 (1989) 16. Johnson, D., Menezes, A., Vanstone, S.: The Elliptic Curve Digital Signature Algorithm (ECDSA). Int. J. Inf. Secur. 1(1), 36–63 (2001). https://doi.org/10.1007/ s102070100002 17. Joye, M., Tunstall, M.: Exponent recoding and regular exponentiation algorithms. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 334–349. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02384-2 21 18. Karabina, K.: A survey of scalar multiplication algorithms. In: Chung, F., Graham, R., Hoffman, F., Mullin, R.C., Hogben, L., West, D.B. (eds.) 50 years of Combinatorics, Graph Theory, and Computing, chapter 20, pp. 359–386. Chapman and Hall/CRC (2019) 19. Lim, C.H., Lee, P.J.: More flexible exponentiation with precomputation. In: Desmedt, Y.G. (ed.) CRYPTO 1994. LNCS, vol. 839, pp. 95–107. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-48658-5 11 20. Brickell, E.F., Gordon, D.M., McCurley, K.S., Wilson, D.B.: Fast exponentiation with precomputation. In: Rueppel, R.A. (ed.) EUROCRYPT 1992. LNCS, vol. 658, pp. 200–207. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-475559 18 21. M¨ oller, B.: Securing elliptic curve point multiplication against side-channel attacks. In: Davida, G.I., Frankel, Y. (eds.) ISC 2001. LNCS, vol. 2200, pp. 324–334. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45439-X 22 22. Morain, F., Olivos, J.: Speeding up the computations on an elliptic curve using addition-subtraction chains. Theoret. Informat. Appl. 24, 531–543 (1990) 23. Okeya, K., Takagi, T.: The width-w NAF method provides small memory and fast elliptic scalar multiplications secure against side channel attacks. In: Joye, M. (ed.) CT-RSA 2003. LNCS, vol. 2612, pp. 328–343. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36563-X 23 24. Pippenger, N.: On the Evaluation of Powers and Related Problems. In: 17th Annual IEEE Symposium on Foundations of Computer Science, pp. 258–263 (1976) 25. Solinas, J.: Efficient arithmetic on koblitz curves. Des. Codes Crypt. 19, 195–249 (2000)

270

A. Dutta et al.

26. Straus, E.: Addition chains of vectors (problem 5125). Am. Math. Monthly 70, 806–808 (1964) 27. Xu, M., Feng, M., Zhu, B., Liu, S.: Efficient comb methods for elliptic curve point multiplication resistant to power analysis. Cryptology ePrint Archive, Report 2005/22 (2005) 28. Yao, A.: On the evaluation of powers. SIAM J. Comput. 5, 281–307 (1976)

Ciphers and Cryptanalysis

Cryptanalysis of the Permutation Based Algorithm SpoC Liliya Kraleva1(B) , Raluca Posteuca1 , and Vincent Rijmen1,2 1

imec-COSIC, KU Leuven, Leuven, Belgium {liliya.kraleva,raluca.posteuca,vincent.rijmen}@esat.kuleuven.be 2 Department of Informations, University of Bergen, Bergen, Norway

Abstract. In this paper we present an analysis of the SpoC cipher, a second round candidate of the NIST Lightweight Crypto Standardization process. First we present a differential analysis on the sLiSCP-light permutation, a core element of SpoC. Then we propose a series of attacks on both versions of SpoC, namely round-reduced differential tag forgery and message recovery attacks in the related-key, related-nonce scenario, as well as a time-memory trade-off key-recovery attack on the full round version of Spoc-64. Finally, we present an observation regarding the constants used in the sLiSCP-light permutation. Keywords: SpoC · sLiSCP permutation · Lightweight · Differential cryptanalysis · TMTO attack · NIST Lightweight competition · LWC

1

Introduction

The majority of the current standards in symmetric cryptography were initially designed for desktop and server environments. The increasing development of technology in the area of constrained environments (RFID tags, industrial controllers, sensor nodes, smart cards) requires the design of new, lightweight primitives. For this reason, NIST organised a competition aiming at standardizing a portfolio of lightweight algorithms, targeting authenticated encryption with associated data (AEAD) ciphers and hash functions. Currently the competition is at the second round with 32 out of 56 candidates left. The candidates of the NIST Lightweight Competition [NIS19] need to satisfy certain criteria for performance and have a level of security of at least 112 bits. In order to contribute to the public research efforts in analysing the candidates of the on-going second round, we focused on the SpoC cipher, a permutation based AEAD. In this paper we present the results of a security research on both versions of SpoC, namely SpoC-64 and SpoC-128. We analyse the sLiSCP-light permutation used in the algorithm, as well as the structural behaviours of SpoC-64. 1.1

Related Work

In [LSSW18] the authors describe a differential analysis of reduced-step sLiSCP permutation, a previous version of sLiSCP-light, in the AE and hashing modes. c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 273–293, 2020. https://doi.org/10.1007/978-3-030-65277-7_12

274

L. Kraleva et al.

This paper presents forgery attacks on 6 (out of 18) steps of the permutation. Their approach is similar to the one presented in this paper, even though we aim at analysing the newer version of the permutation, namely the sLiSCP-light permutation and its application in the SpoC cipher. In parallel with our research, an open source paper analyzing the sLiSCP and sLiSCP-light permutations was published. In [HNS20] the authors use an approach based on Limited birthday distinguishers, improving the attacks against the sLiSCP permutation and proving the lower bound of the complexity to solve LBDs for a random permutation. 1.2

Our Contribution

In this paper we present differential characteristics for round reduced versions of both sLiSCP-light-[256] and sLiSCP-light-[192]. For the former a 6 out of 18 rounds characteristic is given with probability 2−106.14 and for the latter a 7 round and 9 (out of 18) round characteristics with probabilities 2−108.2 and 2−110.84 respectively are presented. Based on the described characteristic, we introduce selective tag forgery attacks for both versions of round-reduced SpoC and a message recovery attack on 9-round SpoC-64. Additionally, a keyrecovery attack is introduced for the full round version of SpoC-64, based on a time-memory trade-off approach. Table 1 summarizes our attacks and their complexities. To the best of our knowledge, this is the first research that analyses the security of the sLiSCP-light permutation and the first published results on SpoC. Table 1. All attacks on SpoC. We underline that the data complexity of the keyrecovery attack corresponds to the success probability 2−15 Attack

Steps of π Data

Tag forgery on SpoC-128

6

2

Tag forgery on SpoC-64

7

2108.2

Message recovery on SpoC-64 9 Key recovery on SpoC-64 All ∗ SBox computations, ∗∗ table look ups,

1.3

106.14

2

110.84

Time

Memory Section

107.14∗

2

2109.2



109.84∗∗

2

267 2110 ∗∗∗ table entries



4.2



4.3

– 2110

4.4 ∗∗∗

5

Structure

The paper is structured as follows: In Sect. 2 we briefly introduce the tools used in our research, the SpoC cipher and the sLiSCP-light permutation. In Sect. 3 we present round-reduced differential characteristics of both versions of the sLiSCPlight permutation. Section 4 introduces tag-forgery and message recovery attacks on the SpoC cipher parameterized with round-reduced sLiSCP-light permutations, while in Sect. 5 is introduced a TMTO key-recovery attack on full Spoc64. Section 6 presents an observation regarding the generation algorithm of the sLiSCP-light constants. Finally, the last section concludes the paper.

Cryptanalysis of the Permutation Based Algorithm SpoC

2

275

Preliminaries

This section gives some essential aspects of differential cryptanalysis and how to estimate the differential probability. Additionally, the algorithms of SpoC and sLiSCP-light are presented. 2.1

Differential Cryptanalysis

Differential cryptanalysis is one of the most powerful and used cryptanalysis techniques against symmetric primitives. Proposed by Biham and Shamir [BS90], this approach was introduced as the first attack that fully breaks the DES algorithm [NIS79]. This attack has been largely analysed and further developed, leading to attacks like Truncated differentials [Knu94], the Boomerang attack [Wag99] or Impossible differentials [BBS99]. A differential characteristic over n rounds of an iterated cipher is defined as the sequence of intermediate differences (a = a1 , a2 . . . an = b), where each (ai , ai+1 ) represents the input and output difference of one round. Assuming that the rounds are independent and the keys are uniformly distributed, then the Expected Differential Probability (EDP) of a characteristic is computed by multiplying the single rounds’ probabilities.

EDP (a1 , a2 , . . . an ) =

n 

DP (ai−1 , ai ),

i

where DP represents the differential probability for one round. For simplicity, in this paper we also use the notion of weight of a differential characteristic, instead of probability, computed as the absolute value of the log2 of the probability. In practice, only the input and output differences can be observed without considering the intermediate differences. We define the set of all differential characteristics with input difference a and output difference b as a differential (a, b). The probability of a differential (a, b) is computed as the sum of the probabilities of all differential characteristics contained by it and is thus a hard problem. Therefore, the probability of the optimal characteristic serves as a lower bound for the probability of the differential. 2.2

SAT Solvers

Nowadays, the security against differential cryptanalysis of ARX ciphers is analysed using automated tools such as SAT (Booolean Satisfability) or MILP (Mixed-Integer Linear Programming) solvers. A SAT solver determines whether a boolean formula is satisfiable. Many operations like modular addition, rotation and XOR can be written as simple equations and easily translated to boolean expressions. For ARX ciphers several automatic search tools have been developed, see for example [Ste14,MP13,Ran17]. We chose to use the ARXpy tool [Ran17,RAS+20], because of its easy to use implementation, complete documentation and open-source code.

276

2.3

L. Kraleva et al.

Specifications of SpoC

SpoC, or Sponge with masked Capacity [AGH+19], is a permutation based mode of operation for authenticated encryption. Since it is a sponge construction, its b-bit state is divided into rate and capacity bits with r the size of the rate and c = b−r the size of the capacity. The authors introduced a masked capacity part with size r, representing the blocks in which the message and AD are added. Two main versions are defined, namely SpoC-64 sLiSCP-light-[192] and SpoC-128 sLiSCP-light-[256], where 64 and 128 represent the corresponding size of the rate bits and sLiSCP-light-[] is the permutation used. Throughout this paper we refer to each of the SpoC versions as SpoC-r, while the permutation is also denoted by π. The key and nonce sizes of both versions are 128 bits. The following table describes the bit size of the parameters of both versions:

Instance

State b Rate r Tag t SB size Rounds u Steps s

SpoC-64 sLiSCP-light-[192]

192

64

64

48

6

18

SpoC-128 sLiSCP-light-[256] 256

128

128

64

8

18

The state of SpoC is divided into 4 subblocks (S0 ||S1 ||S2 ||S3 ) with equal length, where || defines the concatenation. The encryption process of SpoC contains the Initialization, Associated Data Processing, Plaintext Processing and Tag Processing phases and is shown in Fig. 1. Initialization. This is the only step that is different for the two versions of SpoC. For SpoC-128, this process consists only of loading the key K = K0 ||K1 and nonce N = N0 ||N1 to the state into the odd and even numbered subblocks respectively. That is, S1 [j] ← K0 [j], S3 [j] ← K1 [j], S0 [j] ← N0 [j], S2 [j] ← N1 [j], where 0 < j < 7 and X[j] represents the j th byte of the block X. All capacity bits are considered as masked. In SpoC-64 the state is smaller, while the sizes of the key and nonce are unchanged. First, K and N0 are loaded to the state and then the permutation sliSCP-light-[192] is applied. Finally, N1 is added to the masked capacity bits. The rate part is represented by the first 4 bytes of subblocks S0 and S2 and is where N0 is loaded, while the key is loaded to the rest of the state. AD and Message Processing. This two phases are very similar. The blocks of AD, respectively message, are absorbed into the state and after each block a control signal (also referred to as a constant) is added. Finally, π is applied to the state. The added constant depends on both the phase and the length of the block. For the AD processing phase, the added constant is 0010 for a full block and 0011 for a partial (last) block. Respectively, for the message processing phase, 0100 is added for a full block and 0101 for a partial block. In both AD and message processing phases, in the case of an incomplete last block, a padding function is applied. We denote this by padding(M ) =

Cryptanalysis of the Permutation Based Algorithm SpoC N1

Aa−2

A0

277

Aa−1

c load (K, N0 )

π

π

π

π

r 0010||0n−4

0010||0n−4

0010||0n−4 (0011||0n−4 )

Mn−2

M0

π

Mn−1

π

π

0100||0n−4

0100||0n−4

Mn−2

M0 C0

π

0100||0n−4

1000||0n−4

(0101||0n−4 )

Mn−1 Cn−2

tag

Cn−1

Fig. 1. Schematic diagram of the different phases used in the SpoC cipher. The shown initialization is for SpoC-64. Note that the state is divided in two parts, the capacity (the upper part) and the rate (the lower part).

M ||1||0mc −n−1 for a block M of length n < mc , where mc is the length of a full block (which is, in fact, the length of the masked capacity part). Tag Processing. The control signal in this phase is set to 1000 and is added right after the control signal of the previous phase. Then π is applied and the tag is extracted from S1 and S3 of the output state. Note, that if there is null AD (respectively, null message), the corresponding phase is entirely omitted. We denote with (AD, M ) an associated data and message pair and with ”” an empty instance of either of them. For example, (””, M ) denotes an input pair having null associated data. 2.4

Specifications of sLiSCP-Light

sLiSCP-light-[b] is a permutation that operates on a b-bit state, where b equals 192 or 256, and is defined by repeating a step function s times. The state is divided into four 2m-bit subblocks (S0i ||S1i ||S2i ||S3i ), with 0 < i < s − 1 the step number and m being equal to 24 or 32. We denote a zero subblock as 02m or simply 0, where the number of zero bits is clear from the context. The following 3 transformations are performed: SubstituteSubblocks(SSb), AddStepconstants(ASc), and MixSubblocks(MSb). An ilustration is shown in Fig. 2a.

278

L. Kraleva et al. yi

xi S0

S1 rc0

S2 rc1

SB

sc0

S3 5 SB

sc1

S01

S11

1

S01

S01

rc

xi+1

(a)

yi+1

(b)

Fig. 2. The sLiSCP-light step function is shown in (a) On top, the blue blocks represent masked capacity bits and the green represent the rate bits for SpoC-64. (b) represents one round of the Simeck cipher, used as SB. (Color figure online)

The SSb transformation is a partial substitution layer, in which a non-linear operation is applied to half of the state - in subblocks S1 and S3 . The nonlinear operation, or the SBox, is represented by an u-round iterated unkeyed Simeck-2m block cipher [YZS+15]. The Simeck SBox used in the description of the permutation is depicted in Fig. 2b. For an input (xi , yi ) of the ith round, the output is R(xi+1 , yi+1 ) = (yi ⊕ f (xi ) ⊕ rc, xi ), where f (x) = (x  (x  5) ⊕ (x  1)) and rc represents a constant computed using the sLiSCP-light’s round constants rci . The ASc layer is also applied to half of the state as the step constants sci1 and sci2 are applied to subblocks S0i and S2i respectively. Table 2 lists all round and step constants used in sLiSCP-light. The reader can refer to [AGH+19] or [ARH+18] for more details regarding the description of SpoC and the sLiSCP-light permutation. 2.5

Security Claims and the Impact of Our Attacks

In the original paper, the authors of SpoC introduced the security claims in different manners. In this paper, we refer to the security claims described in Table 3.1 from [AGH+19], since we consider this to be the most restrictive. Our interpretation of this table is that the best attack on either version of SpoC uses at most 250 data encrypted under the same key and has a time complexity of at most 2112 . Furthermore, the attack, aiming at either breaking the confidentiality or the integrity of the cipher, has a success probability of at least 2−16 .

Cryptanalysis of the Permutation Based Algorithm SpoC

279

Table 2. (a) Round and step constants for sLiSCP-light-[192] (b) Round and step constants for sLiSCP-light-[256]

(a) Round and step constants for sLiSCP-light-[192]: 0-5

i (rci 0 , rc1 ) (7,27), (4,34), (6,2e), (25,19), (17,35), (1c,f)

i (sci 0 , sc1 ) (8, 29), (c, 1d), (a, 33), (2f, 2a), (38, 1f), (24, 10)

6-11

(12, 8), (3b, c), (26, a), (15, 2f), (3f, 38), (20, 24)

(36, 18), (d, 14), (2b, 1e), (3e, 31), (1, 9), (21, 2d)

step i

12-17 (30, 36), (28, d), (3c, 2b), (22, 3e), (13, 1), (1a, 21) (11, 1b), (39, 16), (5, 3d), (27, 3), (34, 2), (2e, 23)

(b) Round and step constants for sLiSCP-light-[256]: 0-5

i (rci 0 , rc1 ) (f, 47), (4, b2), (43, b5), (f1, 37), (44, 96), (73, ee)

i (sci 0 , sc1 ) (8, 64), (86, 6b), (e2, 6f), (89, 2c), (e6, dd), (ca, 99)

6-11

(e5, 4c), (b, f5), (47, 7), (b2, 82), (b5, a1), (37, 78)

(17, ea), (8e, 0f), (64, 04), (6b, 43), (6f, f1), (2c, 44)

step i

12-17 (96, a2), (ee, b9), (4c, f2), (f5, 85), (7, 23), (82, d9) (dd, 73), (99, e5), (ea, 0b), (0f, 47), (04, b2), (43, b5)

In all of our differential-based attacks, each encryption is performed under different key-nonce pairs, whereas in the online phase of the key-recovery attack the adversary intercepts messages encrypted by an honest user. The Impact of Our Attacks. Since all of our differential-based attacks are applied to round-reduced versions of SpoC, we consider that they do not have an impact on the security of the SpoC cipher. Nonetheless, we consider this work relevant because it presents an analysis on the SpoC cipher and on the sLiSCP-light permutation that improves the knowledge about the security of both. Regarding our key-recovery attack, since we were able to apply a generic Time-Memory Trade-Off on the full SpoC-64, we consider that there might be some undesirable properties inducing vulnerabilities in the mode of operation of SpoC-64. Since the initialization function of SpoC-64 is not bijective, different key-nonce pairs might lead to a collision on the internal state after initialization. This fact can be compared to the nonce misuse scenario since, in both cases, the same initialized state is used more than once. Therefore, there is no distinction, with respect to the ciphertext and tag, between encrypting twice with the same (key, nonce) pair and encrypting with two (key, nonce) pairs that collide after the initialization phase.

3

Differential Cryptanalysis on sLiSCP-Light

In this section we present several characteristics for both versions of the sLiSCPlight permutation. Our characteristics are constructed by imposing specific constraints on the input and output differences in order to be further used in a series of attacks on SpoC. First, the details for our 6 round characteristic on sLiSCP-light-[256] are presented, then 2 different characteristics over 7 and 9 rounds of sLiSCP-light-[192] are shown.

280

L. Kraleva et al. S1

S0

S2

S3

δ4 1

δ3

δ3

SB δ3

δ5 0

δ6

SB δ6

δ6

0 δ6 2

δ3

δ3

SB δ3

δ6

0

SB

δ6

0 δ6 3

δ3

δ3

SB δ3

SB

SB

SB

0

4

δ3 δ3

δ3 5

SB

0

δ2

SB δ2

δ2 δ2 6

0

δ1

SB δ1

δ2

0

SB

δ2

δ1 δ1

δ2

0

δ1

Fig. 3. The active SBoxes for 6 rounds of sLiSCP-[256] with the difference propagation

3.1

Characteristics on sLiSCP-Light-[256]

In order to construct this 6 rounds characteristic of sLiSCP-light-[256] we impose a constraint only on the output difference and thus we construct it backwards. We fix the output difference in the first and third subblocks to ΔS06 = δ1 and ΔS26 = 0. In fact, those are the positions of the rate bits when applied to SpoC. The purpose and value of δ1 is discussed in Sect. 4. The output difference in subblocks S16 and S36 is irrelevant for our attacks, thus can take any value. However, in order to decrease the number of active Sboxes, we choose ΔS16 = δ2 and ΔS36 = δ1 , where δ2 is a possible input difference for the Sbox that leads to δ1 . In this case, the characteristic has only one active Sbox in each of the last two rounds and none in the 4th one. As it can be seen in Fig. 3, for 6 rounds of the permutation we have 6 active Sboxes with the following transitions: SB

SB

SB

SB

δ5 −−→ δ6 −−→ δ3 −−→ δ2 −−→ δ1 ,

SB

δ4 −−→ δ3 .

The resulted characteristic has the input difference δ3 ||δ4 ||064 ||δ5 and the output difference δ1 ||δ2 ||064 ||δ1 , therefore, by fixing the input and the output

Cryptanalysis of the Permutation Based Algorithm SpoC

281

of the characteristic we actually fix the differences from δ1 to δ5 . However, the difference δ6 can take multiple values, as long as it is a valid output difference for the input difference δ5 and is a valid input difference for the output difference δ3 . In the case of our characteristic we take δ6 = δ4 . To choose the differences δi we used the automatic search tool ArxPy [Ran17]. By fixing δ1 to one of the possible values described in Sect. 4, we constructed a tree of differences. The nodes of this tree represent all possible input differences returned by ArxPy, having weight less than the optimal one plus 3. After choosing the appropriate differences, the weight of each transition was empirically verified using 230 data. In order to obtain the final weight of the characteristic all weights of the SBox transitions were added up. Our best differential characteristics and their corresponding weights are listed in Table 3. The best weight that we found for 6 rounds of the permutation is 106.14. Note that the optimal characteristic over one SBox (8 rounds of Simeck) has weight 18. This is proven in [LLW17] and verified by us with the ArxPy tool. The empirical differential probability of 8 rounds of Simeck has slightly lower or higher weight than the optimal characteristic, which is expected due to the differential effect and the independency assumptions. 3.2

sLiSCP-Light[192]

In this section we present 2 characteristics, one over 7 rounds and one over 9 rounds of sLiSCP-light-[192]. They are constructed in a similar way as the characteristic described in the previous subsection. The requirements of the desired differences in the input and output bit positions are different, since the constraints imposed by our attack scenarios are different. 7-Round Characteristic. The constraints of our 7-round characteristic fix the nonzero input difference to S10 and S30 , while the output difference is S07 = δ1 and S27 = 048 . The difference in S17 and S37 is chosen for convenience, to reduce the number of active SBoxes. Therefore, our characteristic has 7 active Sboxes, as shown in Fig. 4a. The described characteristic has input and output differences (048 ||δ4 ||048 ||048 → δ1 ||048 ||048 ||δ1 ), with SB

SB

SB

δ4 −−→ δ3 −−→ δ2 −−→ δ1 ,

SB

δ2 −−→ δ3 .

Table 3. The best characteristics we found for sLiSCP-[256] are shown in this table. For simplicity the 64-bit differences are presented in hexadecimal and separated in two halves and the “..” denote 3 zero bytes. The values above the arrows represent the weight of each transition’s differential. δ5

δ4 = δ6

δ3

δ2

δ1

17.69

17.69

17.69

17.69

17.69

17.69

17.69

17.69

18.69

17.69

17.69

17.69

18.30

18.30

18.30

18.90

w

1..0, 1..0 − −−− → 1..1, 0..0 − −−− → 1..0, 0..0 − −−− → 1..1, 0..0 − −−− → 1..0,0..0 106.14 1..0, 0..0 − −−− → 1..1, 0..0 − −−− → 1..0, 0..0 − −−− → 1..1, 0..0 − −−− → 1..0,0..0 106.14 1..0, 0..2 − −−− → 1..1, 0..0 − −−− → 1..0, 0..0 − −−− → 1..1, 0..0 − −−− → 1..0,0..0 107.14 0..0, 2..0 − −−− → 0..0, 2..2 − −−− → 0..0, 2..0 − −−− → 0..0, 2..2 − −−− → 6..0,0..0 110.4

282

L. Kraleva et al. S0

S1

S2

S3

δ4 1

δ3

SB δ3

δ3

δ4

0

SB

δ4

0 δ4 2

δ3

δ3

SB δ3

SB

SB

SB

0 S0

S1

S2

S3

δ4 1

0

δ3

SB δ3

SB

3

δ3

δ3

0

δ3 δ3

2

δ3

0

SB

0

δ2

SB δ2

δ3

SB δ3

δ3

δ3 δ2

δ2

SB δ2

δ3

0

SB δ3

δ3

SB

δ3

δ2

δ2

SB δ2

δ3

SB δ3

δ3

δ2

δ2

SB δ2

0

SB

δ3 0

δ4

SB δ4

δ4

0 δ3

SB

δ4

δ4 6

0

5

SB δ4

δ4

δ3

0

0

δ4

δ4 5

δ3 4

0 δ4

δ2 0

SB

δ2

δ3

3

δ3 4

δ4 7

δ4

SB δ3

δ4

δ4

0

SB

δ4

0 δ4 6

SB

δ2

SB

8

δ2

δ3

δ3

SB δ3

SB

SB

SB

0

δ2 7

0

δ1

SB δ1

δ1 δ1

0

0

SB

δ3 δ3

0 0

9

0

δ1

0

0

(a)

0

δ3

(b)

Fig. 4. Differential characteristics of a) 7 rounds and b) 9 rounds of sLiSCP-light-[192]

SB

From Fig. 4a we can see that the iterative transition δ3 −−→ δ2 happens 4 times, SB while δ2 −−→ δ3 appears only once. The exact values are chosen to minimize the SB weight of δ3 −−→ δ2 . Our best probability of 2−108.2 happens for δ1 = 0x700000000000, δ2 = 0x500001800000, δ3 = 0x100000200000, δ4 = 0x100001100000.

Cryptanalysis of the Permutation Based Algorithm SpoC

283

with weights 17

11,3

23.1

δ4 −→ δ3 −−→ δ2 −−→ δ1 ,

22.9

δ2 −−→ δ3 .

Note that the optimal characteristic over one SBox (6 rounds of Simeck) has weight 12, as verified with the ArxPy tool. 9 Round Characteristic. For this characteristic we fix the non-zero output difference to S19 and S39 . More precisely, in the first 4 bytes of the subblocks, which correspond to the masked capacity bits in SpoC. In order to design this characteristic we used an iterative transition SB SB δ3 −−→ δ4 −−→ δ3 . The input and output differences of our characteristic are (δ3 ||δ4 ||δ4 ||048 ) and (048 ||048 ||048 ||δ3 ), respectively. SB As seen in Fig. 4b, the characteristic has 8 active SBoxes with δ3 −−→ δ4 SB appearing 6 times and δ4 −−→ δ3 two times. Our best probability of 2−108.5 holds for the differences δ3 = 0x000a00000400 and δ4 = 0x000a00001000 with weights 22.34

10.65

δ3 −−−→ δ4 −−−→ δ3 .

4

Differential Attacks on SpoC-128 and SpoC-64

In this section we present a series of attacks based on the differential characteristics introduced in the previous section. More precisely, we design tagforgery attacks based on the 6-round characteristic of sLiSCP-light-[256] and the 7-round characteristic of sLiSCP-light-[192]. The 9-round characteristic of sLiSCP-light[192] is used to design a message-recovery attack. 4.1

Tag Forgery Attacks

As stated in Sect. 2.3, in order to distinguish between different phases of the encryption process, the authors of SpoC used a 4-bit control signal. The values of these 4 bits depend on the current phase, but also on whether the inputs (associated data or plaintext) are padded or not. Moreover, if the associated data or the plaintext is null, the corresponding phase is disregarded. Our approach is based on identifying and exploiting scenarios in which different types of inputs lead to similar internal states. Take, for example, the scenario where one uses SpoC-64 or SpoC-128 to encrypt two one-block plaintexts: an incomplete block M and a complete block M ∗ = padding(M ), using, in both cases, the same associated data and the same (key, nonce) pair. Take into account that, at the end of the plaintext addition phase, just before generating the tag, the difference between the corresponding internal states is given by the difference between the corresponding 4-bit control signals, i.e. 0001||0n−4 . In the plaintext processing phase of the first case, the 0101 control signal is used, while in the second case 0100 is used. The difference

284

L. Kraleva et al.

between the used constants we denote by δ1 and it represents the convenient difference that we can cancel locally. The Difference δ1 . Depending on the scenario, we identified three possible values for the control signals’ difference δ1 , as follows: 1. δ1 = 0001||0n−4 = 0100||0n−4 ⊕ 0101||0n−4 This value can be obtained in the case when we encrypt the plaintexts M and M ∗ described above, using the same (key, nonce) pair and the same AD. 2. δ1 = 0110||0n−4 = 0100||0n−4 ⊕ 0010||0n−4 This value can be obtained when we encrypt (””, M ) and (””, M ). More precisely, in the first case we use a null AD, while in the second case we use a null plaintext. The former encryption consist of initialization, message processing phase and finalization , whereas the latter has AD processing phase instead of message processing. It will produce no ciphertext, however the tags of the two would be the same. Hence we can forge the verification of associated data. 3. δ1 = 0111|0n−4 = 0101||0n−4 ⊕ 0010||0n−4 This value can be obtained when we encrypt the pairs (””, M ) and (AD, ””), where the length of M is less than the length of a full block and AD = padding(M ). In order to achieve a tag forgery, we designed differential characteristics such that, after the plaintext processing phase, the difference between the corresponding control signals is cancelled by the output difference of the characteristic. Since this results in the same internal states, the corresponding tags will collide. We underline the fact that the control signal bits are influencing the difference on the rate part of the internal state. The target characteristic might also have active bits in the capacity part, these being canceled through a difference between the two plaintexts. Therefore, we aim at finding characteristics having the output difference of the form (δ, λ, 0, γ), where δ is the difference between the constants, while the differences λ and γ can be cancelled through the plaintext block difference. In our experiments, in order to optimize the number of active Sboxes, we imposed the additional constraint that δ = γ. In Sect. 3 we presented the best characteristics that we found, suitable for our approach, on 6-round sLiSCP-256 and on 7-round sLiSCP-192. Using these characteristics and the approach presented above, we designed tag forgery attack on reduced versions of both Spoc-64 and Spoc-128. Since the complexity of our round-reduced characteristics are close to the security bound, we chose the input parameters such that the difference propagates through only one permutation. 4.2

Tag Forgery Attack on SpoC-128

After we have fixed our characteristic, we can proceed to the attack. Note that, in the case of SpoC-128, the initialization phase is represented only by the loading of the (key, nonce) pair into the internal state. Since there is only one

Cryptanalysis of the Permutation Based Algorithm SpoC

285

sLiSCP-light application before the ciphertext generation, our tag forgery attack on SpoC-128 follows the related-key related-nonce scenario. According to our 6-round characteristic, we use inputs such that the key difference is (δ4 ||δ5 ), while the nonce difference is (δ3 ||0). Moreover, since our best characteristic uses δ1 = 0x1||0124 , the setup of this attack assumes the use of null associated data, a plaintext M = M1 ||M2 having the size less than 128 bits and a plaintext M ∗ = padding(M ) ⊕ δ2 ||δ1 = M1∗ ||M2∗ encrypted under related-key relatednonce pairs. As we mentioned before, by injecting a difference in the plaintexts we cancel the capacity difference after the permutation. The encryption processes are described in Fig. 5. M ||8||0123−m ⊕ δ2 ||δ1

M padding

padding

M ||8||0123−m

M ||8||0123−m ⊕ δ2 ||δ1

c load (K, N )

c

π

π

tag

load (K ⊕ δ4 ||δ5 , N ⊕ δ3 ||0)

r

π

π

tag

r 0101||0n−4 1000||0n−4

0100||0n−4 1000||0n−4 M∗

M

C∗

C

Fig. 5. The encryption of an incomplete and a full block of plaintexts, using related-key related-nonce inputs and null associated data

If our differential characteristic holds, then the following equations also hold with probability 1: M1∗ = M1 ⊕ δ2 ,

S06∗ = S06 ⊕ δ1 ,

S26∗ = S26 ,

C ∗ = M1∗ ⊕ S06∗ ||M2∗ ⊕ S26∗ .

Therefore, C ∗ = (C1 ||C2 ) ⊕ (δ2 ⊕ δ1 ||δ1 ). More precisely, the ciphertext-tag pair (C ∗ , τ ) would then be valid under (K⊕ ΔK , N ⊕ ΔN ) = (K ⊕ δ4 ||δ5 , N ⊕ δ3 ||0) with the probability of the characteristic. The pseudocode of this attack is presented in Algorithm 1. Note that it is not needed to use the same key for each Encryption step of the algorithm. Since the rate part of the internal state is used for the encryption, by knowing both the plaintext and the ciphertext we can recover the rate part of each internal state used in the plaintext processing phase. In the case of our approach we use this observation to decrease the data complexity of our attack, by filtering the ciphertexts obtained in the first step. The data complexity of this attack, computed as the number of encryptions and decryptions required, is (PR6 + 1) · PR1 →R5 = (217.69 + 1) · 288.45 = 2106.14 + 288.45 , where Ri represents the ith sLiSCP step. The time complexity, computed as the number of offline SBox computations, is 2 · PR6 · PR1 →R5 = 2107.14 .

286

L. Kraleva et al.

Algorithm 1: The tag forgery attack on Spoc-128 Encryption Obtain (C = C1 C2 , τ ), the encryption of (””, M ) under arbitrary (K, N ); Compute S06 S26 = padding(M ) ⊕ C; if SB −1 (S06 ) ⊕ SB −1 (S06 ⊕ δ1 ) == δ2 then Decryption Ask for P , the decryption of (C1 ⊕ δ2 ⊕ δ1 ||C2 ⊕ δ1 , τ ) under (K ⊕ δ4 ||δ5 , N ⊕ δ3 ||0); if P =⊥ then P = (””, M ∗ ); else go to Encryption; end else go to Encryption; end

Improved Attack. We can improve the complexity of the attack by using multiple differential characteristics that have the same output difference, i.e. equal δ1 s and δ2 s. Suppose that we have d differential characteristics such that

where each characteristic has the probability pi , i = 1, . . . , d. The attack follows the lines of Algorithm 1, where instead of asking for the decryption under a fixed (key, nonce) pair, we use every (K ⊕ ΔiK , N ⊕ ΔiN ) pair. For our improved attack the time and data complexities can improve with at most a factor of log2 d, when all the characteristics have the same probability. By using 10 different characteristics that we found that have δ1 = 1..0, 0..0 and δ2 = 1..1, 0..0, the complexity is improved by a factor of 21.82 , the time complexity being around 2105.32 , while the data complexity decreases to approximately 2104.32 + 286.63 . Time-Memory Trade-Off. The time complexity of our attack can be improved by using a time-memory trade-off approach. In this case the attack will also imply an offline phase, as follows: for all possible values of S06 we verify if SB −1 (S06 ) ⊕ SB −1 (S06 ⊕ δ1 ) == δ2 . If the condition holds, we store the corresponding value of S06 in the sorted list listS06 . The complexity of this phase is 264 SBox computations. In this case, instead of verifying the specified condition, it will be verified if S06 ∈ listS06 . The time complexity of each query will be log2 (#listS06 ) operations, while the memory complexity will be less than 264 (negligible compared to the data complexity).

Cryptanalysis of the Permutation Based Algorithm SpoC

4.3

287

Tag Forgery Attack on SpoC-64

The main idea of the attack is similar to the one presented in Subsect. 4.2, some modifications being imposed due to the different loading phase of SpoC-64. This attack is based on the 7-round characteristic presented in Subsect. 3.2. Since our characteristic covers only one permutation, in this scenario we have one more constraint on the input differences. More precisely, the setup of our attack assumes the use of related N1 s, while the key and the nonce N0 are equal. The input difference is given by the difference between the corresponding N1 s, while the output difference respects the constraints from Subsect. 4.2. Since N1 is added to the masked capacity bits, note that the difference needs to have active bits only in the masked capacity part. Moreover, for the 7-round characteristic on sLiSCP-light-192 we used δ1 = 0111||0n−4 , therefore the setup of our attack is the one presented in Fig. 6. N1

M

c load (K, N0 )

π

N1

c

c

π

π

r

AD = padded(M ) ⊕ 0||δ1

r

tag

load (K, N0 )

π

π r

0100||0n−4 1000||0n−4

c

π

tag

r 0010||0n−4 1000||0n−4

(0101||0n−4 )

M C

Fig. 6. The two processes of SpoC-64 used by our approach. Note that the second XORed constant is imposed by the beginning of the tag generation phase.

We state that, while the encryption of the message (””, M ) will return a (ciphertext, tag) pair, the encryption of the pair (AD, ””) results in a null ciphertext and a tag. Therefore, assuming the 7-round characteristic holds, the ciphertext-tag pair (””, τ ) is valid under (K, N0 ||(N1 ⊕ ΔN )) = (K, N0 ||(N1 ⊕ δ4 ||0)) with probability 1. The tag forgery attack on SpoC-64 is very similar to the one described in Algorithm 1. The distinction is given by the input difference of the characteristic which impacts the decryption (key, nonce) pair. More precisely, only the nonce N1 is different. The data complexity of this attack, computed as the number of encryptions and decryptions required, is (PR7 +1)·PR1 →R6 = (223.1 +1)·285.1 = 2108.2 +285.1 . The time complexity, computed as the number of offline Sbox computations, is 2 · PR7 · PR1 →R6 = 2109.2 . Even though the required amount of data is higher than the size of the tag space, we consider that our attack is meaningful since the authors of SpoC claim security of 112 bits for both confidentiality and integrity.

288

L. Kraleva et al.

Time-Memory Trade-Off. By following the same time-memory trade-off approach presented in Subsect. 4.2, we can improve the time complexity of our attack. The complexity of the offline phase is also 264 SBox computations. Thus, the time complexity of each query will be log2 (#listS07 ) operations, while the memory complexity will be less than 264 (considerably smaller compared to the data complexity). 4.4

Message Recovery Attack on SpoC-64

In this section we present a message recovery attack on Spoc-64 based on a differential cryptanalysis approach. This attack exploits the fact that the initialization phase is not a bijective function, since the input is 256 bits and the internal state is 192 bits. The analysis aims at constructing (key, nonce) pairs that lead to the same internal state after the initialization. Thus, we designed the 9-round differential characteristic on sLiSCP-light-[192] presented in Sect. 3. More precisely, the constraint of our characteristic is that the output difference only affects the capacity part, this difference being canceled by a difference between the corresponding N1 ’s. Therefore our approach uses a key-related nonce-related scenario. By using our 9-round characteristic on a round-reduced scenario, the internal states after the initialization collide. Therefore, the encryption of the same plaintext under different (key, nonce) pairs lead to identical ciphertexts and tags. Moreover, if we encrypt two messages with the same l first blocks, the corresponding l ciphertext blocks will also be the same. We used this approach to design a related-key related-nonce attack on SpoC64. The attack works as follows: 1. With a key-nonce pair (K, N ) we ask for the encryption of an arbitrary, unknown plaintext M , using the associated data AD; we obtain the ciphertext-tag pair (C, τ ); 2. We ask for the decryption of (C, τ ) under (K ⊕ ΔK , N ⊕ ΔN ) and using the initial AD; 3. If the tag verification holds, we obtain the plaintext M  . If M  is a readable text, then M  = M and the message is recovered. We specify M  being readable, since there is always a probability that tags collide. As stated in Sect. 3, the probability of our 9-round characteristic is 2−109.84 . Since the data complexity defines the number of encryptions and decryptions, in our case the data complexity is 2110.84 , while the time complexity is bounded by the data complexity.

5

Key-Recovery Attack on SpoC-64

In this section we generalise the approach described in Subsect. 4.4, by defining the notion of class-equivalence over the space of all (key, nonce) pairs. We then present a time-memory trade-off attack based on the class-equivalence that leads to the recovery of the secret key K.

Cryptanalysis of the Permutation Based Algorithm SpoC

289

Equivalence in the Set of (Key, Nonce) Pairs Definition 1. The (key, nonce) pairs (K 1 , N 1 ) and (K 2 , N 2 ) are said to be in the same equivalence class (or simply equivalent) if the corresponding internal states, after the initialization phase, are equal. The number of equivalence classes is given by the number of all possible internal states of SpoC-64, namely 2192 . For each fixed internal state, one can consider all values of N1 and can compute the associated (K, N0 ) pairs by applying the inverse of the permutation. Therefore, each equivalence class is formed by 264 (key, nonce) pairs. Note that the encryption of the same message under equivalent (key, nonce) pairs results in equal ciphertexts and tags. Moreover, the decryption and tag verification of a (ciphertext, tag) pair can successfully be performed under any (key, nonce) pair belonging to the same equivalence class. The Key-Recovery Attack. Our attack consists of two phases: an offline and an online phase. In the offline phase, the adversary generates a table containing 2110 entries. Each entry contains a (K, N0 ||N1 ) pair and the ciphertexts and tag obtained by applying SpoC-64 on a well chosen plaintext M , under the (K, N0 ||N1 ) pair and a null AD. The (key, nonce) pairs are generated such that they belong to different equivalence classes. More precisely, 2110 different internal states are generated by the adversary. For each state an arbitrary N1 is chosen and, by XORing it to the internal state and by applying the inverse of the permutation, the (K, N0 ) pair is computed. Using each (K, N0 ||N1 ) pair, the adversary encrypts, using SpoC-64, a common short message M . Note that, in practice, depending on the nature of a correspondence, messages usually start with the same words or letters. For example, e-mails normally start with “Dear (*name*),” or “Hello (*name*),”. By making this assumption, we choose a plaintext M to be a regularly used word or phrase of length l blocks. In our research we make the assumption that the full 18-round sLiSCP-light behaves as a random permutation, thus no particular properties can be observed. Therefore, we claim that l = 3 is the number of blocks of ciphertext that uniquely defines the equivalence class of the (K, N0 ||N1 ) pairs. We consider the encryption function of SpoC-64, defined using a fixed plaintext and considering as the input state the result of the initialization phase. On one hand, in order to have uniqueness, this function has to be injective. Therefore, since the length of the internal state is 192 bits and the length of one block of ciphertext is 64 bits, the minimum value of l is 3. On the other hand, by writing the system of bit-level equations of the targeted function, for l blocks of ciphertext we obtain 64 × l equations using 192 variables. If this system does not have an unique solution for l = 3, it means that the resulted equations are not independent, thus there are some particular properties of the sLiSCP-light permutations that could be further extended to an attack. The pseudocode of the offline phase is presented in Algorithm 2. Note that π −1 denotes the inverse of the full sLiSCP-light-[192] permutation. The resulted list is sorted with respect to the ciphertexts, using a hash

290

L. Kraleva et al.

Algorithm 2: Offline phase list = null ; choose M ; while list.length < 2110 do sample internal state; sample N1 ; compute (K, N0 ) = π −1 (internal state ⊕ N1 ) ; encrypt (C, τ ) = SpoC-64(K, N0 ||N1 , ””, M ) ; list.Add(K, N0 ||N1 , C) end Result: list populated with 2110 entries

table. The memory complexity of this phase is 2110 table entries, while the time complexity is 2110 SpoC-64 encryptions. Note that the steps of an encryption are not performed sequentially. Since the first step is to sample the internal state, the encryption can be performed without the initialization phase, while the initialization phase is performed backwards, by computing a (key, nonce) pair corresponding to a fixed internal state. Thus, by assuming that the permutation function and the inverse of the permutation function are equivalent time-wise, the time complexity of our offline phase is 2110 encryptions. In the online phase the adversary intercepts the (ciphertext, tag) pairs encrypted by a valid user. For simplicity, we assume that the valid user used null associated data. We discuss in a paragraph below the case where the associated data is not null. For every intercepted ciphertext, the adversary verifies if the first l blocks belong to the table computed in the offline phase. Since a string of l blocks uniquely defines the equivalence class, a match means that the valid user encrypted the plaintext under a (key, nonce) pair that is in the same equivalence class with the pair (K, N0 ||N1 ) extracted from the precomputed table. Moreover, the adversary can easily compute the internal state obtained after the initialization phase, using the (K, N0 ||N1 ). Since the nonce N1 is public, it can XOR N1 to the internal state and, by applying the reverse of the permutation, the adversary can compute the user’s key. In the case where the valid user chooses a non-empty value for the associated data, the key-recovery works as follows: 1. The adversary verifies if the first l blocks of the ciphertext belongs to the table; 2. When a match is found, the adversary reverse the associated data addition phase; this action is allowed, since AD is a public value; 3. On the obtained internal state, the adversary XOR the N1 and apply the inverse of the permutation. Therefore, the adversary gains full control over the encryption of the valid user, being able to decrypt all the past and future communication in which the valid user used the recovered key. Moreover, the adversary gains the ability of

Cryptanalysis of the Permutation Based Algorithm SpoC

291

impersonating the valid user, being able to generate (ciphertext, tag) pairs using the secret key of the valid user. Since the adversary can control 2110 equivalence classes, through the precomputed table, the probability that an intercepted message belongs to the precomputed table is 2110−192 = 2−82 . Thus, if the adversary intercepts 267 (ciphertext, tag) pairs, the success probability of this attack is 2−15 , twice the probability claimed by the authors of SpoC. By increasing the amount of intercepted data, the success probability of the attack also increases. The data complexity of the online phase is represented by the number of required online encryptions. So, for a success probability of 2−15 , the data complexity of the online phase is 267 . Since the precomputed table is a sorted hash table, the search of a ciphertext has a time complexity of O(1). Thus, the time complexity of the online phase is 267 table lookups. For comparison, an exhaustive search attack with the same success probability of 2−15 would require 2113 data. Note that even though the online phase of the attack can be performed many times (e.g. the attack targets two or more valid users), the offline phase of the attack is only performed once. Thus, the time complexity of the offline phase can be overlooked in any application of the attack, except for the first one. Therefore, every other instance of the attack has a total time complexity of 267 . Note that our attack recovers only one of the secret keys used by the valid user.

6

Other Observations

While analyzing the sLiSCP-light-256 permutation we noticed a particular property of both the round and the step constants. More precisely, using the notations introduced in [AGH+19], we noticed that rci0 = rci+8 1 , ∀i ∈ {0, ...10} sci0 = sci+8 1 , ∀i ∈ {0, ...10} The design rationale behind the generation of these constants is described in [ARH+17]. The constants are computed using an LFSR with length 7 and the primitive polynomial x7 + x + 1. The initial state of the LFSR is filled with seven bits of 1. For the computation of the round constants, the LFSR runs continuously for 18 × 2 × 8 steps. The first 16 bits of the returning string are: 1111111000000100. The constants rc00 and rc01 are computed by 2-decimation. More precisely, the bits of rc00 are the bits in odd positions of the string above while the bits of rc01 are the bits from the even positions, both of them being read in an little-endian manner. Thus, rc00 = 00001111 = 0xF and rc01 = 01000111 = 0x47. Since the primitive polynomial has degree 7, it’s period is 27 − 1 = 127. Therefore, the 127 + nth bit will be equal to the nth generated bit. In particular, the bits of rc8+n are equal to the bits of rcn0 . A similar approach is used for the 1 computation of the step constants. In this case, after loading the initial state of

292

L. Kraleva et al.

the LFSR with seven bits of 1, 14 steps are performed (discarding the outputed bits). Then the same procedure is applied, thus, the same observation is also valid for the step constants. The round and step constants of the sLiSCP-light-192 permutation are computed by a similar manner. But since both the constants and the LFSR length is 6, the 2-decimation does not influence the distribution of the bits through the constants. Note that the authors of the sLiSCP permutation claim that each 8-bit constant is different.

7

Conclusion and Future Work

Our work analyzes the SpoC cipher, a second round candidate of NIST Lightweight competition, and the permutation sLiSCP-light which represents one core component of the SpoC cipher. For both versions of SpoC, namely SpoC-128 and SpoC-64, we propose characteristics covering round-reduced versions of the permutation. We then use these characteristics to design tag-forgery and message-recovery attacks on SpoC parameterized with round-reduced versions of the sLiSCP-light permutation. Furthermore, by using an TMTO approach, we designed a key-recovery attack on SpoC-64. A summary of our results is depicted in Table 1. The work we presented can be extended in several directions. For example, it would be interesting to analyse both the SpoC cipher and the sLiSCP-light permutation using other techniques. It also remains to be investigated if or how our observations regarding the round and step constants can be exploited. Further research should also consider investigating the impact of our characteristics to other ciphers based on the sLiSCP-light permutation. Acknowledgements. The authors would like to thank Adri´ an Ranea for all the fruitful discussions regarding the ARXpy tool. This work was supported by CyberSecurity Research Flanders with reference number VR20192203 and partially supported by the Research Council KU Leuven, C16/18/004, through the EIT Health RAMSES project, through the IF/C1 on New Block Cipher Structures and by the Flemish Government through FWO fellowship and FWO Project Locklock G0D3819N.

References [AGH+19] AlTawy, R., et al.: SpoC: an authenticated cipher submission to the NIST LWC competition (2019). https://csrc.nist.gov/CSRC/media/Projects/ lightweight-cryptography/documents/round-2/spec-doc-rnd2/spoc-specround2.pdf [ARH+17] AlTawy, R., Rohit, R., He, M., Mandal, K., Yang, G., Gong, G.: sLiSCP: simeck-based permutations for lightweight sponge cryptographic primitives. In: Adams, C., Camenisch, J. (eds.) SAC 2017. LNCS, vol. 10719, pp. 129– 150. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72565-9 7 [ARH+18] AlTawy, R., Rohit, R., He, M., Mandal, K., Yang, G., Gong, G.: SLISCPlight: towards hardware optimized sponge-specific cryptographic permutations. ACM Trans. Embed. Comput. Syst. 17(4), 81:1–81:26 (2018)

Cryptanalysis of the Permutation Based Algorithm SpoC

293

[BBS99] Biham, E., Biryukov, A., Shamir, A.: Cryptanalysis of skipjack reduced to 31 rounds using impossible differentials. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 12–23. Springer, Heidelberg (1999). https://doi. org/10.1007/3-540-48910-X 2 [BS90] Biham, E., Shamir, A.: Differential cryptanalysis of DES-like cryptosystems. In: Menezes, A.J., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 2–21. Springer, Heidelberg (1991). https://doi.org/10.1007/3-54038424-3 1 [HNS20] Hosoyamada, A., Naya-Plasencia, M., Sasaki, Y.: Improved attacks on sliscp permutation and tight bound of limited birthday distinguishers. IACR Cryptology ePrint Archive 2020/1089 (2020) [Knu94] Knudsen, L.R.: Truncated and higher order differentials. In: Preneel, B. (ed.) FSE 1994. LNCS, vol. 1008, pp. 196–211. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60590-8 16 [LLW17] Liu, Z., Li, Y., Wang, M.: Optimal differential trails in SIMON-like ciphers. IACR Trans. Symmetric Cryptol. 2017(1), 358–379 (2017) [LSSW18] Liu, Y., Sasaki, Y., Song, L., Wang, G.: Cryptanalysis of reduced sliscp permutation in sponge-hash and duplex-ae modes. In: Cid, C., Jacobson Jr., M. (eds.) SAC 2018. LNCS, vol. 11349, pp. 92–114. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-10970-7 5 [MP13] Mouha, N., Preneel, B.: Towards finding optimal differential characteristics for ARX: application to Salsa20. Cryptology ePrint Archive, report 2013/328 (2013). https://eprint.iacr.org/2013/328 [NIS79] NIST: FIPS-46: Data Encryption Standard (DES) (1979). http://csrc.nist. gov/publications/fips/fips46-3/fips46-3.pdf [NIS19] NIST: Lightweight Cryptography Competition (2019). https://csrc.nist. gov/projects/lightweight-cryptography [Ran17] Ranea, A.: An easy to use tool for rotational-XOR cryptanalysis of ARX block ciphers (2017). https://github.com/ranea/ArxPy [RAS+20] Ranea, A., Azimi, S.A., Salmasizadeh, M., Mohajeri, J., Aref, M.R., Rijmen, V.: A bit-vector differential model for the modular addition by a constant (2020). https://eprint.iacr.org/2020/1025 [Ste14] K¨ olbl, S.: CryptoSMT: an easy to use tool for cryptanalysis of symmetric primitives (2014). https://github.com/kste/cryptosmt [Wag99] Wagner, D.: The boomerang attack. In: Knudsen, L. (ed.) FSE 1999. LNCS, vol. 1636, pp. 156–170. Springer, Heidelberg (1999). https://doi.org/10. 1007/3-540-48519-8 12 [YZS+15] Yang, G., Zhu, B., Suder, V., Aagaard, M.D., Gong, G.: The simeck family of lightweight block ciphers. IACR Cryptology ePrint Archive 2015/612 (2015)

More Glimpses of the RC4 Internal State Array Pranab Chakraborty1 and Subhamoy Maitra2(B) 1

Learning and Development, Human Resources, Wipro Limited, Doddakannelli, Sarjapur Road, Bangalore 560035, India [email protected] 2 Applied Statistics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700108, India [email protected]

Abstract. We present three categories of new non-randomness results on RC4 here. First, using Jenkins’ Glimpse result (1996), we show that for every round, the key-stream value comes from a distant permutation array (S-Box) byte position with a probability higher than the fair chance. Moreover, we show how the Jenkins’ Glimpse gets modified when the two consecutive output keystream bytes are equal to a certain value. Second, we show that corresponding to every Fluhrer-McGrew bias (2000), the key-stream value for each of the two rounds comes from certain array positions with a higher (or a lower) probability than the uniform one, depending upon whether the double-byte is positively or negatively biased. Also, there are four “lag-one digraph” (or alternate pair double-byte) biases that have been summarized in a recent paper. We show that for each of these cases, there are preferred positions of permutation array byte corresponding to the key-stream value in that round. Third, we show that in each of the Fluhrer-McGrew or lag-one cases, the Jenkins’ correlation results must be refined. Surprisingly, in one particular configuration, the Glimpse correlation value becomes almost 4 instead of Jenkins’ N2 . N Keywords: RC4 · Non-randomness · Glimpse or Jenkins’ correlation Fluhrer-McGrew Digraph repetition bias · Sequence · Stream Cipher

1

·

Introduction

RC4 has been the most popular stream cipher in the commercial domain till 2015, following that it is being withdrawn from many applications [6] due to certain very serious cryptanalytic results [1,12]. However, it is quite challenging to replace RC4 from many of the applications running around the world and it is still in use in certain places, as evident from the security advisory [9] that has been updated very recently. From the research perspective, it is well known that RC4 is one of the most attractive cryptosystem to be analysed due to its simplicity in understanding. At the same time, there are lot of unexplored c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 294–311, 2020. https://doi.org/10.1007/978-3-030-65277-7_13

More Glimpses of the RC4 Internal State Array

295

non-randomness results. It is always surprising why such results have not been identified earlier even after so many research results (see [2,3,10,12] and the references therein). RC4 has a Key Scheduling Algorithm (KSA) and a Pseudo Random Generator Algorithm (PRGA). In this paper we mainly concentrate on the PRGA part and thus we present the algorithm below. Note that all the additions here are modulo N and we will use the same notation while discussing about operations that are related to RC4. The array S contains a permutation of 0 to N − 1. – Initialization: i = j = 0; – As long as keystream z is required: • i = i + 1; • j = j + S[i]; • swap(S[i], S[j]); • z = S[S[i] + S[j]]; In RC4, we have the N = 256 length array of 8-bit integers 0 to N − 1, and after the KSA, the array S is supposed to be randomly permuted. In contrary, there are several results that show the KSA cannot produce sufficiently random permutation and thus the initialization part is not completely secure [11, Chapter 3]. However, in this work, we will concentrate on the PRGA part only and thus we will assume the initial permutation (after KSA and before PRGA) as completely random. This can be achieved practically to some extent if around 2048 initial keystream bytes are thrown away during the PRGA and then we start looking at the permutation. In a stream cipher, one important requirement is any information about the state should not be revealed from the key-stream. It was first pointed out in [5] that RC4 cannot satisfy this. Here we have the complete information about the public index i and the key-stream byte z at any round of RC4. Thus, the question is, from i, z, is it possible to obtain additional information (‘Glimpse’ – this is the way it is generally referred in the literature) regarding the hidden variables j, S[i] or S[j] other than the random association? To explain the idea of information leakage clearly, we need to introduce the subscript r. The subscript identifies the round r and Sr [x] denotes the value at x-th location after the swap operation in the said round. The index jr is the value of j after updating j + S[i] in the round r. That way, zr is the key-stream output byte in round r. Given these, the result of [5] is as follows. Note that the fair chance should have the probability N1 . Theorem 1 (Jenkins’ Glimpse theorem [5]). During RC4 PRGA, for any arbitrary round r, Pr(Sr [ir ] = jr − zr ) = N2 − N12 and Pr(Sr [jr ] = ir − zr ) = 2 1 N − N2 . This result has been studied in great details and certain refinements in the probability calculation has been suggested in one of our recent work [2] (two authors of this paper are same). At the same time, the work of [2] considered

296

P. Chakraborty and S. Maitra

certain results related to Glimpse and the proofs of a family of biases by Fluhrer and McGrew [4]. On the other hand, in this paper, we note that the family of biases presented in [4] can provide a completely new set of Glimpses in RC4, that has not been described in [2]. Further, our newly identified results here, following [4], can be studied in conjunction with the Jenkins’ Glimpse theorem [5] to improve the existing works. Note that these were not covered in the revised results related to Glimpse in [2, Theorem 2]. We also investigate how the lag-one biases [2,12] (biases between two alternate keystream bytes) provide additional Glimpses. In summary, here we present several new family of Glimpses that have not been covered earlier [2,5,7,11,12]. All our results are supported by detailed experimental evidences. Let us outline the statistical method to detect a bias experimentally with strong confidence, i.e., the experimental bias observed should not be a false alarm. Consider two keystream sequences, one is generated from an ideally random source (say A) and the other one is the key-stream from an actual stream cipher (say B), which is a pseudo-random source. Let the probability of occurrence of certain event in A be p and the same from B be p(1 + q). Then it is possible to distinguish the sources A, B in O( pq12 ) samples with some confidence. Generally this estimate works quite well for q  p. We need to carefully study what constant c is to be taken in pqc2 and what are the corresponding confidence values, following [8, Section 3.3] and [11, Chapter 6.1]. The confidence of success for c = 26 = 64 is greater than 99.99%. 64 Thus, for each experiment we choose the sample size as pq 2. 1.1

Organization and Contribution

The results are organized as follows. In Sect. 2, we warm up with a corollary of Jenkins’ Glimpse theorem that relates locations of the internal state array S other than the ones indexed by i, j. However, this bias is less significant. Then we show how the Jenkins’ Glimpse value gets modified for two consecutive rounds having the same keystream byte output related to the round number. In Section 3, we consider the Glimpses discovered through the biases identified by Fluhrer-McGrew [4]. Note that [2, Section 2] studies “Glimpses corresponding to Fluhrer-McGrew biases”, but the results we present here are different (though related) from the ones in [2]. We explain the differences clearly in Sect. 3. Further, we also study the Glimpses arising out of lag-one biases [2,12] in this section. Then, in Sect. 4, we show how the Jenkins’ Glimpses [5] should be refined in conjunction with Fluhrer-McGrew [4] and lag-one biases [2,12]. We conclude the paper in Sect. 5. As there are several assumptions in the theoretical proofs in this paper, we checked the results with experiments having sufficient sample size.

2

Jenkins’ Glimpse [5] and Revised Results Using Two Consecutive Rounds

The Jenkins’ Glimpse [5] is surfaced following the way the output key-stream byte, zr = Sr [Sr [ir ] + Sr [jr ]], is generated based on the two indices. As pointed

More Glimpses of the RC4 Internal State Array

297

out in [2], the source of this bias is not dependent on the way RC4 permutation array (we will sometimes simply call array) or S-box S evolves. This is only dependent upon the way key-stream byte is derived in the last step of the PRGA algorithm. For a better understanding, let us now explain the proof of Theorem 1. This result follows from the understanding that if Sr [jr ] (respectively Sr [ir ]) becomes same as zr , then (Sr [ir ] + Sr [jr ]) which is the location of the output key-stream byte, must be equal to jr (respectively ir ). Hence, Sr [jr ] = zr (respectively Sr [ir ] = zr ) forces the desired condition of Sr [ir ] = jr − zr (respectively Sr [jr ] = ir − zr ). Under the assumption of uniform random distribution, if one assumes Pr(Sr [jr ] = zr ) = N1 and for the complimentary condition of Sr [ir ] = zr , if we accept a fair chance of N1 for the desired event of Sr [ir ] = jr − zr , the overall probability calculation comes out to be Pr(Sr [ir ] = jr −zr ) = N1 ·1+(1− N1 )· N1 = 2 1 2 1 N − N 2 . In the same way, the other result, i.e., Pr(Sr [jr ] = ir − zr ) = N − N 2 has been proved in [5]. Note that this calculation has been revisited in greater details in [2, Section 4.1] and refined further. As we note from Theorem 1, the Jenkins’ Glimpse [5] is related to the locations in S indexed by i, j and their values. Thus, the immediate question is whether it is possible to obtain any further information from the same result. This we show in the following result. Corollary 1 1 1 + 2 , when ir = zr + zr+1 − 1, N N 2 = , when ir = zr + zr+1 − 1. N

Pr(zr = Sr [zr + zr+1 ]) =

Proof. Suppose q = zr + zr+1 . First we consider the ir = q − 1 case. Under usual randomness assumption, Pr(jr+1 = q) = N1 . Now, as per Theorem 1, Pr(jr+1 − zr+1 = Sr+1 [ir+1 ]) = N2 . Now, jr+1 − zr+1 = Sr+1 [ir+1 ] implies q − zr+1 = Sr+1 [ir+1 ] = Sr [q], i.e., zr = Sr [q]. This follows as jr+1 = q, and Sr+1 [ir+1 ] = Sr [jr+1 ], considering the same situation before swap in the (r + 1)th round and after swap in r-th round. Thus, in this case, Pr(zr = Sr [q]) = 1 2 2 1 N · N = N 2 . For the rest of the cases, jr = q happens with probability (1 − N ) 1 and we consider random association of N in this case. Thus, finally, Pr(zr = Sr [zr + zr+1 ]) = N22 + (1 − N1 ) · N1 = N1 + N12 . Next, let us consider ir = q−1. As per Theorem 1, ir+1 −zr+1 = Sr+1 [jr+1 ] = Sr [ir+1 ] happens with probability N2 . Now, ir+1 − zr+1 = Sr [ir+1 ] implies q − zr+1 = Sr [q], i.e., zr = Sr [q]. Thus, in this case the probability is exactly the same as Jenkins’ Glimpse.   This result shows that knowing the values of two consecutive bytes always provides more information regarding the secret array location S[zr + zr+1 ]. The bias is lesser than the Jenkins’ Glimpse in all the cases, except it is the same one as the Jenkins’ for ir+1 = ir + 1 = zr + zr+1 . The experimental results perfectly match with the theoretical ones. We like to point out that one may again refine

298

P. Chakraborty and S. Maitra

such proofs for higher order terms involving O( N13 ). However, we are interested in more prominent biases, such as N2 or more instead of the random association N1 . In this regard, let us present the following results which shows different behaviours from the Jenkins’ Glimpse when two consecutive keystream output bytes at rounds r, r + 1 are equal to r + 1. Theorem 2. Consider two successive rounds r, r + 1 in RC4 PRGA such that zr = zr+1 = r + 1. Then 1. 2. 3. 4. 5.

Pr(Sr [ir ] = jr − zr |zr = zr+1 = r + 1) = N3 , Pr(Sr [jr ] = ir − zr |zr = zr+1 = r + 1) = N3 , Pr(Sr+1 [ir+1 ] = jr+1 − zr+1 |zr = zr+1 = r + 1) = Pr(Sr+1 [jr+1 ] = ir+1 − zr+1 |zr = zr+1 = r + 1) = Pr(Sr [jr ] = zr |zr = zr+1 = r + 1) = N2 ,

2 N, 1 N,

where terms of O( N12 ) are ignored. Proof. Let us first prove 1. Pr(Sr [ir ] = jr − zr |zr = zr+1 = (r + 1)) =

3 N.

We investigate the following three mutually independent configurations. Configuration 1: Sr [ir + 1] = 0, Configuration 2: Sr [ir + 1] = 0 and Sr [jr ] = (r + 1), Configuration 3: Rest of the cases. Configuration 1: The probability associated with this configuration is N1 . Since Sr [ir + 1] = 0, we have jr = jr+1 . Let us assume that Sr [ir ] = p and Sr [jr ] = q where p and q are two arbitrary byte values. As zr = (r +1) is given, it is evident that Sr [p+q] = (r+1). In the next round, Sr+1 [jr+1 ] = 0 and Sr+1 [ir+1 ] = q after the swap operation. As zr+1 = (r+1) is also given, Sr+1 [0+q] = Sr+1 [q] = (r+1). So we have a situation where Sr [p + q] = Sr+1 [q] = (r + 1). This is not possible if (r + 1) byte value remains in the same position in the permutation array S across the two rounds. In most of the cases the value of (r + 1) = 0. So the only possible option is to have q = (r + 1). In round r, (r + 1) is located at the position indexed by jr and in round (r + 1) it moves to the position indexed by ir+1 which is same as (r + 1). If in round r, Sr [jr ] = (r +1) = zr , we must have Sr [ir ] = jr −zr as only then Sr [ir ] + Sr [jr ] becomes equal to jr resulting in zr = (r + 1). Here Sr [ir ] = jr − zr is the desired condition. Configuration 2: The probability associated with this configuration is around (1 − N1 ) N1 = N1 − N12 . Since Sr [jr ] = (r + 1), the value of Sr [ir ] must be such that Sr [ir ] + Sr [jr ] becomes equal to jr resulting in zr = (r + 1). To have Sr [ir ]+Sr [jr ] = jr , we must have Sr [ir ] = jr −zr , which is the desired condition. Configuration 3: The probability associated with this configuration is around (1 − N1 − N1 + N12 ) = (1 − N2 + N12 ). Assuming that in this configuration, there is a fair chance ( N1 ) of the desired condition Sr [ir ] = jr − zr , the composite probability is (1 − N2 + N12 ) · N1 .

More Glimpses of the RC4 Internal State Array

299

We now add all the three mutually exclusive configuration probabilities to get the value of ( N1 + N1 − N12 + N1 − N22 + N13 ). By ignoring the terms of the order of N12 , we arrive at the result 1. Pr(Sr [ir ] = jr − zr |zr = zr+1 = (r + 1)) =

3 N

We now prove 2. Pr(Sr [jr ] = ir − zr |zr = zr+1 = (r + 1)) =

3 N.

Similar to the proof of the previous case (1), we identify the following three mutually independent configurations: Configuration 1: Sr [ir ] = (r + 1), Configuration 2: Sr [ir ] = (r + 1) and jr = (ir + 1), Configuration 3: Rest of the cases. Configuration 1: The probability associated with this configuration is N1 . Since Sr [ir ] = (r + 1), the value of Sr [jr ] must be such that Sr [ir ] + Sr [jr ] becomes equal to ir resulting in zr = (r + 1). To have Sr [ir ] + Sr [jr ] = ir , we must have Sr [jr ] = ir − zr , which is the desired condition. Configuration 2: The probability associated with this configuration is around (1 − N1 ) N1 = N1 − N12 . Let us assume that Sr [ir ] = p and Sr [jr ] = q where p and q are two arbitrary byte values. As zr = (r + 1) is given, it is evident that Sr [p + q] = (r + 1). If in round r + 1, jr+1 moves to a new position that contains a value k, then the position of output key-stream byte changes to q + k instead of p + q. In order to have zr = zr+1 , one must require that the permutation array byte value (r + 1) changes its position in round r + 1. However, it can be shown that as jr = (r + 1), this condition can not be satisfied. The only way to satisfy this is to have ir and jr swap their positions in round r + 1 resulting in the output key-stream byte position of p + q in both the rounds. This implies Sr [jr ] = N − 1 which satisfies the desired condition of Sr [jr ] = ir − zr . Configuration 3: The probability associated with this configuration is around (1 − N1 − N1 + N12 ) = (1 − N2 + N12 ). Assuming that in this configuration, there is a fair chance ( N1 ) of the desired condition Sr [jr ] = ir − zr , the composite probability is (1 − N2 + N12 ) · N1 . We now add all the three mutually exclusive configuration probabilities to get the value of ( N1 + N1 − N12 + N1 − N22 + N13 ). By ignoring the terms of the order of N12 , we arrive at the result: 2. Pr(Sr [jr ] = ir − zr |zr = zr+1 = (r + 1)) =

3 N.

Next, let us consider 3. Pr(Sr+1 [ir+1 ] = jr+1 − zr+1 |zr = zr+1 = (r + 1)) =

2 N.

This result coincides with the standard Jenkins’ Glimpse correlation (ignoring the term of the order of N12 )) and we do not observe any additional configuration to positively or negatively influence the usual result. We now prove

300

P. Chakraborty and S. Maitra

4. Pr(Sr+1 [jr+1 ] = ir+1 − zr+1 |zr = zr+1 = (r + 1)) =

1 N.

We observe that in this case ir+1 − zr+1 = (r + 1) − (r + 1) = 0. Hence, to satisfy the desired condition, we must have Sr+1 [jr+1 ] = ir+1 − zr+1 = 0. In this situation, there could be two possible configurations: Configuration 1: Sr+1 [ir+1 ] = r + 1, Configuration 2: Sr+1 [ir+1 ] = p where Sr+1 [p] = r + 1. We first argue that Configuration 2 cannot lead to the desired condition for round r. Since Sr+1 [jr+1 ] = 0, we must have jr = jr+1 . This also implies Sr [jr ] = p. So assuming Sr [ir ] = q for an arbitrary non-zero byte value, zr must come from p + q location which is different from the location p. This means zr can’t be same as zr+1 leading to a contradiction. Hence, in this case the special configuration used to prove the Jenkins’ Glimpse correlation is the only configuration possible (Configuration 1). There can’t be a situation where Sr+1 [ir+1 ] = zr+1 and still the desired condition is met. This leads to the result that 4. Pr(Sr+1 [jr+1 ] = ir+1 − zr+1 |zr = zr+1 = (r + 1)) =

1 N.

Finally, we prove 5. Pr(Sr [jr ] = zr |zr = zr+1 = (r + 1)) =

2 N.

We investigate the following two mutually independent configurations taken from the first case: Configuration 1: Sr [ir + 1] = 0, Configuration 2: Sr [ir + 1] = 0 and Sr [jr ] = (r + 1). Configuration 1: The probability associated with this configuration is N1 . Using the argument identical to that of the first case we get that in round r, Sr [jr ] = (r + 1) = zr , which is our desired condition in this case. Configuration 2: The probability associated with this configuration is around (1 − N1 ) N1 = N1 − N12 . Since Sr [jr ] = (r + 1) = zr , the desired condition is already satisfied. We now add the two mutually exclusive configuration probabilities to get the value of ( N1 + N1 − N12 ). By ignoring the terms of the order of N12 , we arrive at the result 5. Pr(Sr [jr ] = zr |zr = zr+1 = (r + 1)) =

2 N.

  We have checked and verified that the theoretical expressions match with the experimental results.

More Glimpses of the RC4 Internal State Array

3

301

New Glimpses Through Fluhrer-McGrew [4] and Lag-One [2, 12] Biases

We start this section referring to Table 1 to link the biases in [4] to the Glimpse results that we identify here. While there are some overlaps of the results of Table 1 with the results described in [2, Section 2], the following are the differentiated contributions of this paper. – We identify the biased S-Box (permutation array) values corresponding to the output key-stream bytes for both the rounds r and r + 1. – We also identify the source permutation array values and their associated biases corresponding to the four biased lag-one digraphs (as mentioned in [2]) for the successive two rounds. – We demonstrate that there could be multiple biased permutation array values in the same round. For example, in case of Fluhrer-McGrew bias [4] of (zr = 0, zr+1 = 0) where ir = 1, we have shown in Table 1 (see 1(a), 1(b)) that zr+1 can be independently associated with Sr+1 [ir ] as well as Sr+1 [jr ] with N2 and 3 N biases respectively. – In some Fluhrer-McGrew cases [4], our identified sources of zr and zr+1 do not have any direct mapping with the source configurations mentioned in [2]. For example, if we compare row 2 of Table 1 with [2, Scenario 2, Table 1], this becomes evident. While in [2] the source configurations are meant to be used together to arrive at the biased key-stream byte pairs and to prove the Fluhrer-McGrew biases [4], in this paper we focus on finding out the biased permutation array values that directly correspond to the output keystream bytes. Row 11 of Table 1 corresponding to the case of (zr = (N − 1), zr+1 = (N − 1)) for ir = (N − 2) is another such example, where the source configuration as mentioned in [2, Table 1] does not automatically lead to the source of zr or zr+1 and we have also not been able to identify any array value that corresponds to (N − 1) with a probability different from the fair chance ( N1 ). – It is interesting to observe (from Table 2) that even though the lag-one digraph (zr = 0, zr+2 = 0) at ir = N − 2 is negatively biased, there exist array positions in both the rounds that are positively biased to the output key-stream byte values. The findings captured in this section have the following implications. First, in the long term evolution of RC4 PRGA, whenever we observe consecutive or interleaved key-stream pairs corresponding to Fluhrer-McGrew biases [4] or lagone digraph biases [2] respectively, we can directly infer about the permutation array byte positions that can be deemed as the biased sources of the key-stream bytes. Second, unlike Jenkins’ Glimpse correlations, these results demonstrate that subject to certain conditions (like occurrence of specific key-stream byte pairs), direct correlation exists between a key-stream byte value (zr ) and some permutation array byte value(s) of the same round (r) without bringing any index variable (ir or jr ) in the relationship. In addition, it is interesting to

302

P. Chakraborty and S. Maitra Table 1. Glimpses related to Fluhrer-McGrew (FM) biases [4]. FM Biases

(zr , zr+1 )

Source of zr A

Pr(zr = A)

Source of zr+1 B

Pr(zr+1 = B)

1(a)

(0, 0)

Sr [ir + 1]

2 N

Sr+1 [ir ]

2 N 3 N 2 N 2 N

1(b) 2

(0, 0)

Sr [jr−1 ]

3(a)

(0, 1)

Sr [ir + 1]

3(b)

Sr [jr−1 ]

Sr+1 [ir+1 ]

Sr [jr ]

Sr+1 [jr ]

Sr [jr−1 ]

2 N

Sr+1 [ir+1 ]

0

Sr [ir ]

(N − 1, i + 1)

Sr [ir ]

(N − 1, i + 2)

Sr [ir + 1]

5(b)

7(a)

Sr [jr ] (N − 1, 0)

7(b) 8(a)

Sr [ir + 1] Sr [jr ]

(N − 1, 1)

8(b)

Sr [ir + 1] Sr [jr ]

9

(N − 1, 2)

Sr [ir + 1]

10(a)

(N + 1, N + 1) 2 2

Sr [ir ]

10(b) 11

(N − 1, N − 1)

12

(0, i + 1)

Sr+1 [ir ]

3 N 2 N 3 N 3 N 3 N 2 N 2 N 2 N 3 N

(i + 1, N − 1)

5(a)

6(b)

Sr+1 [jr ]

2 N 2 N 2 N 2 N 2 N 2 N 3 N 2 N 2 N 2 N 2 N 3 N

4

6(a)

2 N 2 N 2 N 2 N 2 N

Sr+1 [jr ]

Sr+1 [ir ] Sr+1 [ir+1 ] Sr+1 [jr ] Sr+1 [jr ] Sr+1 [ir+1 ] Sr+1 [jr ] Sr+1 [ir+1 ] Sr+1 [jr ] Sr+1 [ir+1 ] Sr+1 [ir ]

observe that in almost all the cases (except Row 11 of Table 1), there exist array byte positions that are biased towards the key-stream byte values of that round (r or r + 1) where many of these bias values are around N3 and in one particular case (Row 3 of Table 2) it is as high as N4 . We now explain the mechanism that is responsible for the biases shown in Table 1 and Table 2. Let us first consider the Fluhrer-McGrew scenario [4] where the consecutive key-stream byte-pair is (0, 0) at ir = 1. This corresponds to the sub-scenarios 1(a) and 1(b) as depicted in Table 1. That is, we are going to prove: 1. Pr(zr = Sr [ir + 1]) = N2 , 2. Pr(zr+1 = Sr+1 [ir ]) = N2 , 3. Pr(zr+1 = Sr+1 [jr ]) = N3 . The generic combinatorial approach that we are going to use is as follows. First, we count the possible number of ways through which one can achieve the desired key-stream byte pair (0, 0) at ir = 1. We refer to this as T . Next, we identify certain mutually independent configurations in which the value of zr or zr+1 may come from a particular permutation array byte position and we count the number of possible ways this can be achieved (let us call this C). The probability is derived by computing the ratio TC . We know that ir = 1 is fixed. However, jr can assume any value between 0 to N − 1. So there are N possibilities. The S array contains N integers in N locations which again can be chosen in (N !) ways. So if we do not have any restriction on the values of zr and zr+1 , the total number of possibilities would have been N · (N !). However, if we have to ensure that zr = 0 where we start

More Glimpses of the RC4 Internal State Array

303

with an arbitrary random configuration of S, we must fix the position of jr . This can be viewed as losing a degree of freedom. For example, if in the starting configuration, Sr [ir ] happens to be 5 and the array value 0 is located at 20, we must have jr at the array index position corresponding to the value of 15, so that we get Sr [ir ] + Sr [jr ] = 20. So far we haven’t put any restriction on the number of permutations possible. However, if we have to ensure that zr+1 = 0 as well, then depending upon the value of Sr [ir + 1] the value of Sr [jr + Sr [ir + 1]] gets fixed because we need to ensure that Sr+1 [ir+1 ] + Sr+1 [jr+1 ] also has the value of 20 (assuming the array value 0 has not changed its position in round r + 1) or the new index value of 0 (assuming it has changed its position). This implies that we need to lose one more degree of freedom resulting in T = (N − 1)!. For all the scenarios (with double key-stream byes) this value remains unchanged. Now we compute the value of C for the following glimpse value of zr . 1. Pr(zr = Sr [ir + 1]) =

2 N.

There can be two different configurations that lead to the above mentioned glimpse. Configuration 1: jr = 1, Sr [1] = 1 and Sr [2] = 0, Configuration 2: jr = 1, Sr [jr ] = jr , Sr [1] = 2 − jr and Sr [2] = 0. It is apparent that for both the configurations we get zr = 0 and zr+1 = 0, implying that both are valid for our current analysis. Moreover, in both the configurations the glimpse value zr = Sr [ir + 1] is satisfied. In fact, these are the only two configurations that lead to the desired glimpse result for the FluhrerMcGrew key-stream byte-pair (0, 0) at ir = 1 which we have cross checked experimentally as well. We now calculate the number of permutations for the two mutually exclusive configurations to arrive at the value C. In Configuration 1, jr is fixed and two permutation array byte values at the array index positions of 1 and 2 are also decided. Thus, the total number of permutations for Configuration 1 is (N − 2)!. In Configuration 2, jr may have N − 1 possible values out of N and three array values are fixed. So the possible alternatives in Configuration 2 is (N − 1) · (N − 3)!. Combining these two values we get C = (N − 2)! + (N − 1) · (N − 3)! ≈ 2 · (N − 2)!. By taking the ratio of T and C, we get the desired result. Next, we look at the result 2. Pr(zr+1 = Sr+1 [ir ]) =

2 N.

There can be two different mutually exclusive configurations that lead to the above mentioned glimpse. Configuration 1: jr = 1, Sr [1] = 1 and Sr [2] = 0, Configuration 2: Sr [1] = 0 and Sr [jr ] = 1. Since the Configuration 1 is identical to that of the previous result the computation part remains unchanged for that too. In Configuration 2, it is easy to verify that zr = 0 by sacrificing two degrees of freedom and by fixing one more array

304

P. Chakraborty and S. Maitra

value we can ensure that zr+1 = 0 as well. So the total number of possible ways in Configuration 2 is N · (N − 3)! ≈ (N − 2)! resulting in C ≈ 2 · (N − 2)!. By taking the ratio of T to C we get the desired result. We next prove the following result. 3. Pr(zr+1 = Sr+1 [jr ]) =

3 N.

There can be three different mutually exclusive configurations that lead to the above mentioned glimpse. Configuration 1: jr = 1, Sr [1] = 1 and Sr [2] = 0, Configuration 2: jr = 1, Sr [jr ] = jr , Sr [1] = 2 − jr and Sr [ir + 1] = 0, Configuration 3: Sr [jr ] = 0 and Sr [ir ] = jr . Since Configuration 1 and Configuration 2 are identical to the ones described in the first result (corresponding to zr = Sr [ir + 1]) the computations remain unchanged for those. It is easy to verify that all the three configurations satisfy the desired glimpse of zr+1 = Sr+1 [jr ]. For Configuration 3, we have already fixed two array values. To achieve zr+1 = 0, we must fix one more value in S. Hence, the total number of possible variations in Configuration 3 are N · (N − 3)! ≈ (N − 2)!. This implies C ≈ 3 · (N − 2)!. Therefore, the desired result follows by taking the ratio of C to T . Let us also look at the Fluhrer-McGrew scenario where the consecutive keystream byte-pair is (0, 0) for ir ∈ {1, (N − 1)}. This corresponds to the scenario 2 from Table tab11. For this scenario, the glimpse of zr+1 is associated with Sr+1 [jr ] and the probability is given by Pr(zr+1 = Sr+1 [jr ]) =

2 N.

If we compare this result with sub-scenario 1(b), we find that the bias here is smaller ( N2 ) than the bias of 1(b) which is N3 . This can be explained by observing that the applicable configurations here are: Configuration 1: Sr [jr ] = jr , Sr [1] = 2 − jr and Sr [ir + 1] = 0, Configuration 2: Sr [jr ] = 0 and Sr [ir ] = jr . So the configuration that corresponds to jr = 1, Sr [1] = 1 and Sr [2] = 0 is no longer applicable here and that accounts for the reduction of bias to N2 . To provide another illustration of the mechanism behind the glimpses, let us now consider the scenario 4 of lag-one bias (from Table 2) corresponding to (zr = N − 2, zr+2 = 0) at ir = N − 2. It is interesting to observe that this scenario includes a bias value of N4 in round r + 1. We now prove 1. Pr(zr = Sr [ir ]) = N2 , 2. Pr(zr+2 = Sr+2 [jr+2 ]) = 3. Pr(zr+2 = Sr+2 [jr+1 ]) =

3 N, 4 N.

Although here we consider lag-one digraphs instead of consecutive key-stream byte-pairs, the value of T remains unchanged as (N − 1)! since the approach to arrive at the number of possible ways remains the same. To prove the first result of this scenario, i.e., Pr(zr = Sr [ir ]) = N2 , we consider the following two mutually exclusive configurations.

More Glimpses of the RC4 Internal State Array

305

Table 2. Glimpses related to lag-one biases [2]. Lag-one Biases

(zr , zr+2 )

Source of zr A

Pr(zr = A)

Source of zr+2 B

Pr(zr+2 = B)

1

(0, 0)

Sr [jr−1 ]

Sr+2 [jr+2 ]

At i = 0

Sr [2]

2 N 3 N

(N , 0) 2

Sr [ir ]

3 N 2 N 2 N 2 N 3 N 2 N 2 N 2 N

Sr [jr ] 2

At i = 0

3 4

Sr [jr ]

(0, 0)

Sr [ir + 1]

At i = N − 2

Sr [jr−1 ]

(N − 2, 0)

Sr [ir ]

At i = N − 2

Sr+2 [jr+1 ] Sr+2 [jr+2 ] Sr+2 [jr+1 ] Sr+2 [jr+1 ] Sr+2 [jr ] Sr+2 [jr+2 ] Sr+2 [jr+1 ]

2 N 3 N 2 N 3 N 3 N 4 N

Configuration 1: Sr [ir ] = (N − 2), jr = 0 and Sr [jr ] = 0, Configuration 2: Sr [ir ] = p where Sr [p] = (N − 2), jr = 0 and Sr [jr ] = 0. It is apparent that both the configurations lead to (zr = (N − 2), zr+2 = 0) lag-one digraph in the output and satisfy the desired glimpse for zr . In both the configurations we have fixed the value of jr and two other array values. Hence, C = 2 · (N − 2)! and therefore, the desired probability result follows as expected. We now move to the second result 2. Pr(zr+2 = Sr+2 [jr+2 ]) =

3 N.

To prove the second result in this scenario we consider the following three mutually exclusive configurations. Configuration 1: Sr [ir ] = (N − 2), jr = 0 and Sr [jr ] = 0, Configuration 2: Sr [ir ] = p where Sr [p] = (N − 2), jr = 0 and Sr [jr ] = 0, Configuration 3: Sr [ir + 1] = N − p where jr = p and Sr [0] = 0. In the third configuration we also consider the value of Sr [ir ] in such a way that Sr [ir ] + Sr [jr ] points to the array index corresponding to (N − 2). It is apparent that all the three configurations lead to (zr = (N − 2), zr+2 = 0) lag-one digraph in the output and satisfy the desired glimpse for zr+2 . Since the Configuration 1 and Configuration 2 are identical to the corresponding ones from the first result of this scenario, we now count only the number of possibilities for Configuration 3. In Configuration 3, we have fixed one value of S in round r (Sr [ir ]). Similarly we have also fixed the values of Sr [ir + 1] and Sr [0]. Hence, the total number of possibilities is N · (N − 3)! ≈ (N − 2)!. So we get C = 3 cot(N − 2)! in this case as expected yielding the desired probability result. We now prove the final result of this scenario: 3. Pr(zr+2 = Sr+2 [jr+1 ]) =

4 N.

For this case, three possible configurations have already been shown in the previous result, i.e., Pr(zr+2 = Sr+2 [jr+2 ]) = N3 . One may check that in each of those three configurations, the desired glimpse for zr+2 = Sr+2 [jr+1 ] is also met as jr+2 = jr+1 based on the configuration. We now present the fourth configuration:

306

P. Chakraborty and S. Maitra

Configuration 4: Sr [ir + 1] = 0. In this configuration we also consider the value of Sr [ir ] in such a way that Sr [ir ] + Sr [jr ] points to the array index corresponding to (N − 2). In addition, we also consider that in round r + 2, the permutation array values of Sr+2 [ir+2 ] and Sr+2 [jr+2 ] are such that their summation points to the new index position of 0 after the swap in round r + 1. Please note that at the end of round r + 1, the value of Sr+1 [jr+1 ] = 0. This ensures the desired glimpse value of zr+2 = Sr+2 [jr+1 ]. So the possible number of permutations is N · (N − 3)! ≈ (N − 2)!. This means together with the other three scenarios, for this result we get C = 4 · (N − 2)!, which means the desired probability value would be given by Pr(zr+2 = Sr+2 [jr+1 ]) = N4 . As the techniques are similar in nature, we prove some representative results in this section as well as in the next section. Complete proof of all the results will be presented in the journal version.

4

Merging the New Glimpses with Jenkins’ [5]

In this section we present how the effect of FM [4] or lag-one [2] biases can be accumulated with the Jenkins’ Glimpse [5] (in both additive and conflicting sense). For presenting the tables, we consider two notations: – (J1 ): Pr(S[i] = j − z), – (J2 ): Pr(S[j] = i − z). First in Table 3, we consider the effect of both FM biases [4] and Jenkins’ Glimpse [5]. Table 3. Glimpses related to FM biases [4] and Jenkins’ [5] together. FM Biases

(zr , zr+1 )

Values of i

J1 (r)

J2 (r)

J1 (r + 1)

J2 (r + 1)

1

(0, 0)

i=1

2

(0, 0)

i = 1, N − 1

3

(0, 1)

i = 0, 1

4

(i + 1, N − 1)

i = N − 2

5

(N − 1, i + 1)

i = 1, N − 2

6

(N − 1, i + 2)

i = 0, N − 1, N − 2, N − 3

3 N 2 N 3 N 3 N 4 N 2 N

7

(N − 1, 0)

i=N −2

8

(N − 1, 1)

i=N −1

3 N 2 N 2 N 2 N 2 N 3 N 3 N 3 N

9

(N − 1, 2)

i = 0, 1

3 N 2 N 2 N 3 N 2 N 2 N 2 N 2 N 2 N 2 N 2 N 2 N

2 N 2 N 2 N 2 N 3 N 3 N 3 N 3 N 3 N 2 N

10

(N + 1, N + 1) 2 2

i=2

11

(N − 1, N − 1)

i = N − 2

12

(0, i + 1)

i = 0, N − 1

2 >N 4 N 2 N 2 N

2 >N 2 N 3 N 3 N 2 N 2 N

2 >N 1 N

We now explain the mechanisms that are responsible for altering the Jenkins’ Glimpse values by considering the two most prominent scenarios from Table 3 where the correlations are around N4 . Let us first consider the Fluhrer-McGrew

More Glimpses of the RC4 Internal State Array

307

scenario [4] where the consecutive key-stream byte-pair is (N − 1, ir + 1) and ir ∈ {1, N − 2}. This corresponds to the scenario 5 as depicted in Table 3. We are going to prove Table 3, scenario 5, J2 in round r: Pr(Sr [jr ] = ir − zr ) =

4 N

Similar to Section 3, here we use a generic combinatorial method to prove the result. First, we count the possible number of ways (T ) one can achieve the desired key-stream byte pair (N − 1, ir + 1) where ir ∈ {1, N − 2}. Next, we identify certain mutually independent configurations in which the expected Glimpse relation Sr [jr ] = ir − zr gets satisfied and we count the number of possible ways this can be achieved (C). The probability is derived by computing the ratio TC . We would like to point out that in this paper two different approaches have been used for proving the altered Jenkins’ Glimpse correlations - one is a purely probabilistic approach (as shown in Theorem 2) while in this section we use counting. We know that ir is fixed and jr can assume any value between 0 to N − 1. So if we do not have any restriction on the values of zr and zr+1 , the total number of possibilities would have been N · (N !). However, if we have to ensure that zr = N − 1 where we start with an arbitrary random configuration of S, we must fix the position of jr . This can be viewed as losing a degree of freedom. Similarly, we have to ensure that zr+1 = ir + 1 as well. So, we need to lose one more degree of freedom resulting in T = (N − 1)!. We now compute the value of C for the following Glimpse correlation. Table 3, scenario 5, J2 in round r: Pr(Sr [jr ] = ir − zr ) =

4 N

Since zr = N − 1, the desired condition turns out to be Sr [jr ] = ir + 1. There can be four mutually exclusive configurations that lead to the expected value of Sr [jr ]. Configuration 1: Sr [ir ] = N Configuration 2: Sr [ir ] = p and Sr [jr ] = ir + 1, Configuration 3: Sr [ir ] = N Configuration 4: Sr [ir ] = p and Sr [jr ] = ir + 1.

− 1, Sr [ir + 1] = 0 and Sr [jr ] = ir + 1, where Sr [p + ir + 1] = N − 1, Sr [ir + 1] = 0 − 1, Sr [ir + 1] = 0 and Sr [jr ] = ir + 1, where Sr [p + ir + 1] = N − 1, Sr [ir + 1] = 0

Clearly, for all the four configurations we get zr = N − 1. Also, for Configuration 1 and Configuration 2, zr+1 = ir +1. For Configuration 3 and Configuration 4, we must put a restriction on the new value of Sr+1 [jr+1 ] such that the value of zr+1 equals ir + 1. With these assumptions, all four configurations become valid for our current analysis. Moreover, in all the configurations the desired condition of Sr [jr ] = ir + 1 is satisfied. Moreover, these are the only four configurations that lead to the Glimpse correlation J2 for the Fluhrer-McGrew key-stream byte-pair (N −1, ir +1) where ir ∈ {1, N −2} which we have cross checked experimentally. We now calculate the number of permutations for each of the four mutually exclusive configurations to arrive at the value C. In Configuration 1, although

308

P. Chakraborty and S. Maitra

jr is not fixed, three permutation array byte values at the array index positions of ir , ir + 1 and jr are decided. Thus, the total number of permutations for Configuration 1 is N · (N − 3)! ≈ (N − 2). In Configuration 2, three array byte values at the array index positions of p + ir + 1, ir + 1 and jr are decided. So the possible alternatives in Configuration 2 is around (N −2)!. In Configuration 3, we have put an additional restriction on the value of Sr+1 [jr+1 ] so that zr+1 = ir +1. Hence, two values at the array index positions of ir and jr in round r and the value of Sr+1 [jr+1 ] in round r + 1 are decided. Similarly, in Configuration 4 also, three array byte values are assigned. So for Configuration 3 as well as for Configuration 4, the possible alternatives would be around (N − 2)! each. Combining these four values we get C = 4 · (N − 2)!. By taking the ratio of T and C, we obtain the desired result. Let us also prove another result in the same scenario. Table 3, scenario 5, J2 in round r + 1: Pr(Sr+1 [jr+1 ] = ir+1 − zr+1 ) =

3 N.

Since zr+1 = ir + 1, the desired condition turns out to be Sr+1 [jr+1 ] = 0. There can be three mutually exclusive configurations that lead to the expected value of Sr+1 [jr+1 ] where the first two are identical to the corresponding ones of the previous result that we have already proved. Configuration 1: Sr [ir ] = N − 1, Sr [ir + 1] = 0 and Sr [jr ] = ir + 1, Configuration 2: Sr [ir ] = p where Sr [p + ir + 1] = N − 1, Sr [ir + 1] = 0 and Sr [jr ] = ir + 1, Configuration 3: Sr [ir + 1] = 0 and Sr [jr ] = ir + 1. As we have already seen that for Configuration 1 and Configuration 2, the Fluhrer-McGrew double byte configuration corresponding to Scenario 5 gets satisfied. For Configuration 3, we must put additional constraints of one permutation array value in round r and one in round r + 1 to ensure zr = N − 1 and zr+1 = ir + 1. In all the three configurations, we end up with the desired value of 0 at jr+1 position of S. In each of the configurations we have fixed three permutation array values implying C = 3 · (N − 2)!. By taking the ratio of C to T , we get the desired probability result. Next, we prove another prominent result from the Table 3. Table 3, scenario 10, J1 in round r: Pr(Sr [ir ] = jr − zr ) =

4 N.

It corresponds to the Fluhrer-McGrew scenario where the consecutive key-stream byte-pair is ( N2 + 1, N2 + 1) for ir = 2. Since zr = N2 + 1, the desired condition turns out to be Sr [ir ] = jr − N2 − 1. There can be four mutually exclusive configurations that lead to the expected value of Sr [ir ]. Configuration Configuration Configuration Configuration

1: 2: 3: 4:

Sr [ir ] = N2 + 1, Sr [ir + 1] = 1 and jr = ir , Sr [ir ] = N2 + 1, Sr [ir + 1] = 1 and jr = ir , Sr [ir ] = jr − N2 − 1 and Sr [jr ] = N2 + 1, Sr [ir ] = jr − N2 − 1 and Sr [jr ] = N2 + 1.

More Glimpses of the RC4 Internal State Array

309

By inspecting all the four configurations and knowing that ir = 2 is given, one can verify that the desired Jenkins’ Glimpse relation (J1 ) holds for round r. Configuration 1 has already assigned definite values for two array bytes and jr is also fixed in such a way that the Fluhrer-McGrew digraph pattern of (N − 1, ir + 1) for ir = 2 is met. The possible number of choices would be (N − 2)! in this configuration. For Configuration 2, we need to assign another permutation array byte in round r + 1 to ensure zr+1 = ir + 1. The key-stream value for round r is as expected. The number of possible alternatives in this configuration is again approximately equals (N − 2)!. In Configuration 3, two array bytes are already assigned in such a way that zr = N − 1 and we need to assign one more permutation array in round r+1 to ensure the desired value of zr+1 . This implies we have a possible (N − 2)! choices here. In Configuration 4, only one constraint has been put so far. We need to put constraints on two additional permutation array values (one in round r and the other in round r + 1) to ensure the FluhrerMcGrew digraph pattern of Scenario 10. We have a possible (N − 2)! ways to accomplish the desired outcome in this configuration as well. Hence combining all the four configurations we get a value of C = 4(N − 2)! yielding the desired probability of N4 . Table 4. Glimpses related to lag-one biases [2] and Jenkins’ [5] together. Lag-one Biases (zr , zr+2 )

Values of i J1 (r) J2 (r) J1 (r + 2) J2 (r + 2)

1

(0, 0)

i=0

2

( N2

i=0

3

(0, 0)

4

(N − 2, 0) i = N − 2

, 0)

i=N −2

3 N 3 N 2 N 2 N

2 N 3 N 2 N

> N3

3 N 3 N 2 N 4 N

2 N 2 N 1 N 3 N

Using the same approach one can demonstrate the mechanism behind the modified Jenkins’ correlation results for lag-one digraphs [2] (or alternate keystream byte pairs) also. To illustrate the point let us prove the following result. Table 4, scenario 4, J1 in round r + 1: Pr(Sr+1 [ir+1 ] = jr+1 − zr+1 ) =

4 N.

It corresponds to the lag-one digraph scenario of (zr = N − 2, zr+2 = 0) for ir = (N − 2). Here, the desired condition turns out to be Sr+2 [ir+2 ] = jr+2 . There can be four mutually exclusive configurations that lead to the expected value of Sr+2 [ir+2 ]. Configuration 1: Sr [ir ] = N − 2, Sr [0] = 0 and jr = 0, Configuration 2: Sr [ir ] = p where Sr [p] = N − 2, Sr [0] = 0 and jr = 0, Configuration 3: Sr [ir + 1] = 0 and jr = 0, Configuration 4: jr = 0 and Sr+1 [jr+1 +Sr+1 [ir+2 ]] contains the same value as its index.

310

P. Chakraborty and S. Maitra

For each of the configurations, the conditions are chosen in such a way that the desired Jenkins’ Glimpse condition Sr+2 [ir+2 ] = jr+2 is met. For Configuration 1 and Configuration 2, two permutation array bytes along with the pseudorandom index (jr ) has been fixed. In these configurations the key-stream values in round r and round r + 2 assume the desired values as per the lag-one digraph pattern of (N − 2, 0). In the Configuration 3, zr+2 = 0 is ensured. To ensure the desired value of zr , one more permutation array byte value needs to be fixed. In Configuration 4, one permutation array byte in round r and in round r + 2 must be fixed to get the desired lag-one digraph pattern. In each of the four configurations, the number of possible choices becomes (N − 2)! implying C = 4 · (N − 2)!. This proves the probability result.

5

Conclusion

In this paper we identify several new cases where the secret state information of RC4 is revealed with higher probability than the random association. First we show how Jenkins’ results [5] related to RC4 Glimpse can be identified in locations other than i, j. We further revise the Jenkins’ Glimpses [5] considering two consecutive rounds r, r + 1 with the same keystream bytes r + 1. Then we show how one can obtain several other Glimpses that are related to the FluhrerMcGrew digraph repetition biases provided in [4] as well as the lag-one biases [2] (i.e., biases related to alternate keystream output bytes). Finally, we connect these Glimpses extracted through the biases in [2,4] with that of Jenkins’ [5] and provide sharper results. All the results are checked with theoretical experiments. With so many additional information related to non-randomness in the state of RC4, it is important to examine whether one can obtain further cryptanalytic results related to RC4. Acknowledgments. The authors like to thank the anonymous reviewers for detailed comments that helped in improving the editorial as well as technical quality of this paper. The second author acknowledges the support from project “Cryptography & Cryptanalysis: How far can we bridge the gap between Classical and Quantum Paradigm”, awarded by the Scientific Research Council of the Department of Atomic Energy (DAE-SRC), the Board of Research in Nuclear Sciences (BRNS) during 2016– 2021.

References 1. AlFardan, N.J., Bernstein, D.J., Paterson, K.G., Poettering, B., Schuldt, J.C.N.: On the security of RC4 in TLS and WPA. In: 22nd USENIX Security Symposium (2013). http://www.isg.rhul.ac.uk/tls/RC4biases.pdf 2. Chakraborty, C., Chakraborty, P., Maitra, S.: Glimpses are forever in RC4 amidst the Spectre of Biases. https://eprint.iacr.org/2020/512 3. Chakraborty, C., Chakraborty, P., Maitra, S.: RC4: non-randomness in the index j and some results on its cycles. In: Hao, F., Ruj, S., Sen Gupta, S. (eds.) INDOCRYPT 2019. LNCS, vol. 11898, pp. 95–114. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35423-7 5

More Glimpses of the RC4 Internal State Array

311

4. Fluhrer, S.R., McGrew, D.A.: Statistical analysis of the alleged RC4 keystream generator. In: Goos, G., Hartmanis, J., van Leeuwen, J., Schneier, B. (eds.) FSE 2000. LNCS, vol. 1978, pp. 19–30. Springer, Heidelberg (2001). https://doi.org/10. 1007/3-540-44706-7 2 5. Jenkins, R.J.: ISAAC and RC4. 1996. http://burtleburtle.net/bob/rand/isaac. html. Accessed on 1 Oct 2019 6. Langley, A.: Disabling SSLv3 and RC4. Google Security Blog, 17 September 2015. https://security.googleblog.com/2015/09/disabling-sslv3-and-rc4.html 7. Maitra, S., Sen Gupta, S.: New long-term Glimpse of RC4 stream cipher. In: Bagchi, A., Ray, I. (eds.) ICISS 2013. LNCS, vol. 8303, pp. 230–238. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45204-8 17 8. Mantin, I., Shamir, A.: A practical attack on broadcast RC4. In: Matsui, M. (ed.) FSE 2001. LNCS, vol. 2355, pp. 152–164. Springer, Heidelberg (2002). https://doi. org/10.1007/3-540-45473-X 13 9. Microsoft security advisory: update for disabling RC4. https://support.microsoft. com/en-in/help/2868725/microsoft-security-advisory-update-for-disabling-rc4. Accessed 16 Apr 2020 10. Paterson, K.G., Poettering, B., Schuldt, J.C.N.: Big bias hunting in amazonia: large-scale computation and exploitation of RC4 biases (Invited Paper). In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol. 8873, pp. 398–419. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45611-8 21 11. Paul, G., Maitra, S.: RC4 Stream Cipher and Its Variants. CRC Press, Taylor & Francis Group, A Chapman & Hall Book, Boca Raton (2012) 12. SenGupta, S., Maitra, S., Paul, G., Sarkar, S.: (Non-)random sequences from (non-) random permutations - analysis of RC4 stream cipher. J. Cryptol. 27(1), 67–108 (2014)

Mixture Integral Attacks on Reduced-Round AES with a Known/Secret S-Box Lorenzo Grassi1,2 and Markus Schofnegger2(B) 1

2

Radboud University, Nijmegen, The Netherlands [email protected] IAIK, Graz University of Technology, Graz, Austria [email protected]

Abstract. In this work, we present new low-data secret-key distinguishers and key-recovery attacks on reduced-round AES. The starting point of our work is “Mixture Differential Cryptanalysis” recently introduced at FSE/ToSC 2019, a way to turn the “multiple-of-8” 5-round AES secret-key distinguisher presented at Eurocrypt 2017 into a simpler and more convenient one (though, on a smaller number of rounds). By reconsidering this result on a smaller number of rounds, we present as our main contribution a new secret-key distinguisher on 3-round AES with the smallest data complexity in the literature (that does not require adaptive chosen plaintexts/ciphertexts), namely approximately half of the data necessary to set up a 3-round truncated differential distinguisher (which is currently the distinguisher in the literature with the lowest data complexity). For a success probability of 95%, our distinguisher requires just 10 chosen plaintexts versus 20 chosen plaintexts necessary to set up the truncated differential attack. Besides that, we present new competitive low-data key-recovery attacks on 3- and 4-round AES, both in the case in which the S-box is known and in the case in which it is secret. Keywords: AES · Mixture Differential Cryptanalysis distinguisher · Low-data attack · Secret S-box

1

· Secret-key

Introduction

AES (Advanced Encryption Standard) [6] is probably the most used and studied block cipher, and many constructions employ reduced-round AES as part of their design. Determining its security is therefore one of the most important problems in cryptanalysis. Since there is no known attack which can break the full AES significantly faster than exhaustive search, researchers have focused on attacks which can break reduced-round versions of AES. Especially within the last couple of years, new cryptanalysis results on the AES have appeared regularly (e.g., [1,9,12,15]). While those papers do not pose any practical threat c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 312–331, 2020. https://doi.org/10.1007/978-3-030-65277-7_14

Mixture Integral Attacks on Reduced-Round AES

313

to the AES, they do give new insights into the internals of what is arguably the cipher that is responsible for the largest fraction of encrypted data worldwide. Among many others, a new technique called “Mixture Differential Cryptanalysis” [9] has been recently presented at FSE/ToSC 2019, which is a way to translate the (complex) “multiple-of-8” 5-round distinguisher [12] into a simpler and more convenient one (though, on a smaller number of rounds). Given a pair of chosen plaintexts, the idea is to construct new pairs of plaintexts by mixing the generating variables of the initial pair of plaintexts. As proved in [9], for 4-round AES the corresponding ciphertexts of the initial pair of plaintexts lie in a particular subspace if and only if the corresponding pairs of ciphertexts of the new pairs of plaintexts have the same property. Such a secret-key distinguisher, which is also independent of the details of the S-box and of the MixColumns matrix, has been reconsidered in [4], where the authors show that it is an immediate consequence of an equivalence relation on the input pairs, under which the difference at the output of the round function is invariant. Moreover, it is also the starting point for practical and competitive key-recovery attacks on 5-round AES-128 and 7-round AES-192 [1], breaking the record for these attacks which was obtained 18 years ago by the classical Square attack. In this paper, we reconsider this distinguisher on a smaller number of rounds in order to set up new (competitive) low-data distinguishers and key-recovery attacks on reduced-round AES. A summary of our results can be found in Tables 1, 2 and 3. Our Contribution and Related Work Cryptanalysis of block ciphers has focused on maximizing the number of rounds that can be broken without exhausting the full code book or key space. This often leads to attacks marginally close to that of brute force. Even if these attacks are important to e.g. determine the security margin of a cipher (i.e., the ratio between the number of rounds which can be successfully attacked and the number of rounds in the full cipher), they are not practical. For this reason, low-data distinguishers/attacks on reduced-round ciphers have recently gained renewed interest in the literature. Indeed, it seems desirable to also consider other approaches, such as restricting the attacker’s resources, in order to adhere to “real-life” scenarios. In this case, the time complexity of the attack is not limited (besides the natural bound of exhaustive search), but the data complexity is restricted to only a few known or chosen plaintexts. Attacks in this scenario have been studied in various papers, which include low-data Guess-and-Determine and Meet-in-the-Middle techniques [3], low-data truncated differential cryptanalysis [11], polytopic cryptanalysis [17], and, if adaptive chosen plaintexts/ciphertexts are allowed, yoyo-like attacks [15]. “Mixture Integral” Key-Recovery Attacks. In Sect. 4 we show that “Mixture Differential Cryptanalysis” [9] can be exploited in order to set up low-data attacks on reduced-round AES. Given a set of chosen plaintexts defined as in [9], our attacks are based on the fact that the XOR sum of the corresponding

314

L. Grassi and M. Schofnegger

Table 1. Secret-key distinguishers on 3-round AES which are independent of the secret key. The data complexity corresponds to the minimum number of chosen plaintexts/ciphertexts (CP/CC) and/or adaptive chosen plaintexts/ciphertexts (ACP/ACC) which are needed to distinguish the AES permutation from a random permutation with a success probability denoted by Prob. Property

Prob

Data

Reference

Imp. Mixt. Integral ≈ 65% ≈ 65% Trunc. Differential

6 CP 12 CP

Section 3.2 [11]

Imp. Mixt. Integral ≈ 95% ≈ 95% Trunc. Differential

10 CP 20 CP

Section 3.2 [11]

Integral

≈ 100% 256 = 28 CP

Yoyo

≈ 100% 2 CP + 2 ACC [15]

[5, 14]

texts after 2-round AES encryptions is equal to zero with prob. 1. Using the same strategy proposed in a classical square/integral attack [5,14], this zerosum property can be exploited to set up competitive attacks on 3- and 4-round AES, which require only 4 and 6 chosen plaintexts, respectively. A comparison of all known low-data attacks on AES and our attacks is given in Table 2. Since (1) the pairs of plaintexts used to set up the attacks share the same generating variables – which are mixed in the same way proposed by the Mixture Differential Distinguisher – and since (2) such attacks exploit the zero-sum property (instead of a differential one), we call this attack a mixture integral attack. “Impossible Mixture Integral” Secret-Key Distinguisher. In Sect. 3.2, we show that the previous distinguishers/attacks can also be exploited to set up a new 3-round secret-key distinguisher on AES, which is independent of the key, of the details of the S-box, and of the MixColumns operation. For a success probability of ≈ 95%, such a distinguisher requires only 10 chosen plaintexts (or ciphertexts), i.e., half of the data required by the most competitive distinguisher currently present in the literature (which does not require adaptive chosen texts). The Property Exploited by this New Distinguisher. Consider a zero-sum keyrecovery attack on 3-round AES (based on a 2-round zero-sum distinguisher). An integral attack assumes that the zero-sum property is always satisfied when decrypting under the secret key. Thus, if there is no key for which the zero-sum property is satisfied, the ciphertexts have likely been generated by a random permutation, and not by AES. Such a strategy can be used as a distinguisher, but requires key guessing and is thus not independent of the secret key. In Sect. 3.2, we show how to evaluate this property without guessing any key material by providing a property which is independent of the secret key, and which holds for the ciphertexts only in the case in which the key-recovery (mixture integral) attack just proposed fails. The obtained 3-round distinguisher can also be used to set up new key-recovery attacks on reduced-round AES.

Mixture Integral Attacks on Reduced-Round AES

315

Table 2. Attacks on reduced-round AES-128. The data complexity corresponds to the number of required chosen plaintexts (CP). The time complexity is measured in reduced-round AES encryption equivalents (E), while the memory complexity is measured in plaintexts (16 bytes). Precomputation is given in parentheses. The case in which the final MixColumns operation is omitted is denoted by “r.5 rounds” (r full rounds + the final round). “Key sched.” highlights whether the attack exploits the details of the key schedule of AES. Attack

Rounds

Data (CP)

Cost

Memory

Key sched.

Reference

TrD

2.5−3

2

231.6

28

No

[11]

G& D-MitM

2.5

2

224

216

Yes

[3]

G& D-MitM

3

2

216

28

Yes

[3]

TrD

2.5−3

3

211.2



No

[11] [3]

G& D-MitM

3

3

28

28

Yes

TrD

2.5−3

3

25.7

212

No

[11]

MixInt

2.5 − 3

4

28.1



No

Section 4.1

MixInt

2.5 − 3

4

1/2 when E is an ideally random block cipher. Hence we can recover K in polynomial time by using Simon’s algorithm. In other words, the generic attack recovers the secret key in polynomial time in R¨ otteler and Steinwandt’s model. Thus it seems less interesting to study quantum attacks in this exact same model. Our Attack Model. Our attack model is similar to R¨ otteler and Steinwandt’s, except that we impose restrictions on bit-flip patterns that adversaries can choose. More precisely, let b = b1 || · · · ||bk be a fixed k-bit string, and let Maskb := {x = x1 || · · · ||xk ∈ {0, 1}k | xi = 0 if bi = 0}. In our attack model, we allow −1 (x, M ), but we adversaries to access the quantum oracles of OK (x, M ) and OK assume that adversaries can choose x only from Maskb . The generic attack. In our model, R¨ otteler and Steinwandt’s polynomial-time attack no longer works due to the restriction of bit-flip patterns (when b = 11 · · · 1). Nevertheless, we can mount a simple quantum key-recovery attack by using the technique by Leander and May which combines Grover’s and Simon’s algorithms [18]. Lemma 1. Suppose that the Hamming weight of b (the bit-flip pattern) is w, and w is in Ω(log n). In our quantum related-key attack model, there exists a ˜ (k−w)/2 ). quantum attack that recovers the secret key in time O(2 The proof of the lemma is given in Section A.4 of this paper’s full version [6]. It seems that there is no quantum attack that is significantly faster than our attack for ideally random block ciphers. Therefore we pose the following conjecture. Conjecture 1. In our quantum related-key attack model, when the Hamming weight of b (bit-flip pattern) is w, and w is in Ω(log n), there is no key recovery attack10 that runs in time o(2(k−w)/2 ). 10

It is desirable to show that the conjecture holds, but proving quantum query lower bounds is quite difficult when quantum queries are made to both of E and E −1 .

Quantum Cryptanalysis on Contracting Feistel Structures

381

Under the assumption that this conjecture holds, theoretically then it is worth studying dedicated quantum attacks on concrete block ciphers that run in time o(2(k−w)/2 ): if there exists such an attack on a block cipher E, it follows that E does not behave like an ideally random block cipher in the quantum setting. In later sections we introduce such dedicated quantum related-key attacks on some Feistel structures. See also Section A.5 of this paper’s full version [6] for details on attack models.

3

Previous Works

This section gives an overview of previous works and results on quantum query attacks against Feistel structures. 3.1

Kuwakado and Morii’s Quantum Distinguisher on the 3-round Feistel Structure

Kuwakado and Morii showed a quantum chosen-plaintext attack that distinguishes the 3-round Feistel-F structure from a 2n-bit random permutation in polynomial time [16]. Let O denote the quantum encryption oracle of the 3-round Feistel-F structure. In addition, let OL (xL , xR ) and OR (xL , xR ) be the most and least significant n-bits of O(xL , xR ), respectively. Kuwakado and Morii’s distinguisher works as follows. First, let α0 and α1 be fixed n-bit strings such that α0 = α1 . Define a function (2) (1) f : F2 × Fn2 → Fn2 by f (b, x) := OL (x, αb ) ⊕ αb = Fk2 (Fk1 (αb ) ⊕ x). Then (1)

f ((b, x) ⊕ (1, s)) = f (b, x) holds for all (b, x) ∈ F2 × Fn2 , where s = Fk1 (α0 ) ⊕ (1)

Fk1 (α1 ), i.e., f is a periodic function and the period is (1, s). Thus, when the quantum oracle of O is available, we can apply Simon’s algorithm on f and recover the period (1, s). Remark 1. It is shown in [13] that the quantum oracle of OL (i.e., the truncation of O) can be implemented by making one query to the quantum oracle of O because O |xL |xR |y |+n = |xL |xR |y ⊕ OL (xL , xR ) |+n holds for arbitrary xL , xR , y ∈ Fn2 , where |+n := H ⊗n |0n . On the other hand, when we make a function f in the same way by using a 2n-bit random permutation instead of the 3-round Feistel-F structure, the function is not periodic with an overwhelming probability. In particular, even if we apply Simon’s algorithm on the function, the algorithm fails to find a period (since the function does not have any period). Therefore, we can distinguish the 3-round Feistel-F structure from a random permutation in polynomial time by checking the function f (made from the given quantum oracle that is either of the 3-round Feistel-F structure or a random permutation as f (b, x) := OL (x, αb ) ⊕ αb has a period, by using Simon’s algorithm. Later, the distinguishing attack was extended to a polynomial-time quantum chosen-ciphertext distinguishing attack on the 4-round Feistel-F structure [14].

382

C. Cid et al.

In addition, it has been shown that the keys of a 3-round Feistel-KF structure can be recovered by a polynomial time qCPA [7]. 3.2

Extension of the Distinguishers to Key Recovery Attacks with the Grover Search

Generally, classical distinguishing attacks on block ciphers can be extended to key recovery attacks. Here, we give an overview on how we can also extend the quantum chosen-plaintext distinguishing attack by Kuwakado and Morii to a quantum chosen-plaintext key recovery attack by using Grover’s algorithm, as shown by Hosoyamada and Sasaki [13] and Dong and Wang [10]. The time com˜ (r−3)n/2 ) when plexity of their attack on the r-round Feistel structure is in O(2 n the round keys k1 , . . . , kr are randomly chosen from {0, 1} . The basic strategy is to apply the combination of Grover’s algorithm and Simon’s algorithm shown by Leander and May [18]: guess the partial keys k4 , . . . , kr by using Grover’s algorithm, and check whether the guess is correct by applying Kuwakado and Morii’s algorithm on the first three rounds. Suppose that the quantum encryption oracle O of the r-round Feistel structure is given (r ≥ 4), and let k1 , . . . , kr be the round keys that we want to recover. Then, we can check whether a guess k4 , . . . , kr for the 4-th, . . . , r-th round keys is correct as follows. 1. Implement the quantum oracle of O := (Rk ◦ · · · ◦ Rk )−1 ◦ O. The O r 4 oracle performs the encryption with O and then the partial decryption by using k4 , . . . , kr . If the guess is correct, then O matches the partial encryption (1) (2) (3) Rk1 ◦ Rk2 ◦ Rk3 with the first three rounds. If the guess is incorrect, O is expected to behave like a random permutation. 2. Run Kuwakado and Morii’s quantum distinguisher on O . If we can distinguish the 3-round Feistel structure, with very high probability the key guess is correct. Otherwise, the key guess is incorrect. (4)

(r)

Since Simon’s algorithm can be implemented without any intermediate measurements (see Sect. 2.3 for details), we can implement a quantum circuit to calculate the Boolean function  1 (if (k4 , . . . , kr ) = (k4 , . . . , kr ))   G : (k4 , . . . , kr ) → 0 (if (k4 , . . . , kr ) = (k4 , . . . , kr )) with a small error. By applying Grover’s algorithm on G, we can then recover ˜ (r−3)n/2 ). The remaining keys k1 , k2 , and the round keys k4 , . . . , kr in time O(2 k3 can be easily recovered once k4 , . . . , kr are known. The above key-recovery attack is a quantum chosen-plaintext attack that is based on the 3-round chosen-plaintext distinguisher. If both the quantum encryption and decryption oracles are available, a quantum chosen-ciphertext ˜ (r−4)n/2 ) in the same way by using the attack recovers the keys in time O(2 4-round chosen-ciphertext distinguisher [14].

Quantum Cryptanalysis on Contracting Feistel Structures

3.3

383

Quantum Advanced Slide Attack and Nested Simon’s Algorithm

Consider the special case that there is a public random function F : {0, 1}n → (i) {0, 1}n and each round function Fki of the r-round Feistel structure is defined as (i) (3) Fki (x) := F (x ⊕ ki ) for all 1 ≤ i ≤ r. Assume also that the number of rounds r is divisible by 4, and the cyclic key-schedule is such that ki = ki+4 holds for each i is used (k1 , k2 , k3 , k4 are chosen independently and uniformly at random). In the classical setting, Biryukov and Wagner showed a chosen-ciphertext attack that recovers the keys with time O(2n/2 ) in this case [2]. In the quantum setting, Bonnetain et al. showed that the classical attack by Biryukov and Wagner can be exponentially sped up by nesting Simon’s algorithm [5], proposing a quantum attack that recovers keys in polynomial time. This section gives an overview on how Bonnetain et al.’s quantum chosen-ciphertext key-recovery attack works when r = 4. Let O and O−1 be the quantum encryption and decryption oracles of the 4-round Feistel structure of which the round functions are defined as in −1 (xL , xR )) and OR (xL , xR ) (resp., (3). In addition, let OL (xL , xR ) (resp., OL −1 OR (xL , xR )) denote the left and right n bits of O(xL , xR ) (resp., O−1 (xL , xR )), respectively. First, suppose that we can simulate the quantum oracle of the function g(x) := F (x ⊕ k1 ) ⊕ k2 ⊕ k4 .

(4)

Then, since F is a public function, we can evaluate the function H : Fn2 → Fn2 defined by H(x) := F (x) ⊕ g(x) in quantum superposition, and can thus recover the key k1 by Simon’s algorithm on H because H(x) = F (x)⊕F (x⊕k1 )⊕k2 ⊕k4 holds, and k1 is a period of H. Now, the problem is how to simulate the quantum oracle of such a function g(x) by using the quantum oracles of O and O−1 . For each fixed x ∈ Fn2 , define a function Gx : F2 × Fn2 → Fn2 by  −1 OR (y, x) if b = 0, Gx (b, y) := if b = 1. OR (y, x) Then, straightforward calculations show that Gx ((b, y) ⊕ (1, g(x))) = Gx (b, y) holds for all (b, y) ∈ F2 × Fn2 , i.e., Gx is a periodic function and the period is (1, g(x)), for arbitrarily fixed x. Therefore, by performing Simon’s algorithm on Gx without measurement (see Sect. 2.3 for details), we can implement a quantum circuit that evaluates g(x) in quantum superposition with some small error. In summary, we can recover k1 as follows: 1. Implement a quantum circuit Cg that simulates the quantum oracle of g with some small error. This can be done by applying Simon’s algorithm on Gx for each |x .

384

C. Cid et al.

2. Implement a quantum circuit that simulates the quantum oracle of H(x) by using Cg , and apply Simon’s algorithm on H to recover k1 . Note that Simon’s algorithm is nested in the above attack: when we apply Simon’s algorithm on H, another instance of Simon’s algorithm is called to evaluate the function H. Once we recover k1 , other subkeys k2 , k3 , k4 can be recovered easily. Eventually, we can recover all the keys in polynomial time. A Polynomial-Time Key Recovery Attack on the 3-Round Feistel Structure. Later, we use the technique of nested Simon’s algorithm to mount various attacks. Here we explain that Kuwakado and Morii’s distinguishing attack in Sect. 3.1 can easily be extended to a polynomial-time qCPA that recovers the key of the 3-round Feistel-KF structure, as another example on application of nested Simon’s algorithm, so that the readers will grasp the basic idea of the technique better11 . When Kuwakado and Morii’s attack in Sect. 3.1 is applied to the 3-round (1) (1) Feistel-KF structure, it recovers the value Fk1 (α0 ) ⊕ Fk1 (α1 ) = F (1) (α0 ⊕ k1 ) ⊕ F (1) (α1 ⊕ k1 ), where α0 and α1 are arbitrarily chosen constants such that α0 = α1 . Now, choose x ∈ {0, 1}n \{0n } arbitrarily, and set α0 := x and α1 := 0n . Then, given the quantum oracle of the 3-round Feistel-KF structure, Kuwakado and Morii’s attack allows us to compute the value fk1 (x) := F (1) (x ⊕ k1 ) ⊕ F (1) (k1 ) for each x = 0n . In particular, we can evaluate the function fk1 (x) in quantum superpositions by using Simon’s algorithm without intermediate measurements (note that fk1 (0n ) = 0n holds). Next, define a function Gk1 (x) by Gk1 (x) := fk1 (x) ⊕ F (1) (x). Then, since F (1) is a public function and we can evaluate fk1 (x) in quantum superpositions, we can also evaluate the function Gk1 (x) in quantum superpositions. In addition, it is easy to check that Gk1 (x) = Gk1 (x⊕k1 ) holds for all x ∈ {0, 1}n , i.e., Gk1 is a periodic function and the period is k1 . Hence we can recover the value k1 by applying Simon’s algorithm to Gk1 . Once we recover k1 , the remaining keys k2 and k3 can be recovered easily.

4

Contracting Feistel Structures

First, we present a 7-round quantum distinguisher for SM4-like structure under CPA setting in polynomial time. We further extend it to a polynomial time quantum key-recovery attack on 7-round SM4. Then we show the attacks can be generalised to attacks on (2d − 1)-round d-branch contracting Feistel structures. See Table 1 for a summary of the results in this section. (See also Theorem 1 of this paper’s full version [6] for a formalized statement on the attacks in this section.)

11

The previous polynomial-time qCPA on 3-round Feistel-KF structure [7] recovers the keys without nested Simon’s algorithm.

Quantum Cryptanalysis on Contracting Feistel Structures

385

Table 1. Polynomial-time qCPAs on contracting Feistel structures. The key-recovery attacks are applicable only to Feistel-KF structures.

4.1

Rounds Branch

Attack type

3

2

Distinguisher poly(n)

Complexity Ref. [16]

3

2

Key-recovery poly(n)

[7]

7

4

Distinguisher poly(n)

Sect. 4.2

7

4

Key-recovery poly(n)

Sect. 4.3

2d − 1

d (even) Distinguisher poly(n)

Sect. 4.4

2d − 1

d (even) Key-recovery poly(n)

Sect. 4.5

Specification

We denote the i-th round function SM4-like as follows: Xi = Xi−4 ⊕ Fi (Xi−3 ⊕ Xi−2 ⊕ Xi−1 ),

(5)

where Fi ’s are keyed functions, the input plaintext is (X−3 , X−2 , X−1 , X0 ) and the output ciphertext after r rounds is (Xr−3 , Xr−2 , Xr−1 , Xr ). r (xA , xB , xC , xD ) Let Or denote the r-round SM4-like quantum oracle, and OΛ r denote the branch xΛ of O (xA , xB , xC , xD ), where Λ ∈ {A, B, C, D}. 4.2

SM4-like Structure 7-Round Distinguisher Under CPA Setting

Idea of the Attack. The most important point of the quantum distinguishing attack on the 3-round balanced Feistel structure in Sect. 3.1 is that, given the encryption oracle, we can compute F2 (x ⊕ βa ) for arbitrary x by appropriately choosing plaintexts. Here, βa (a = 0, 1) is the constant such that we do not know its exact value and β0 = β1 . Since the function f : F2 × Fn2 → Fn2 defined by f (a, x) := F2 (x⊕βa ) has the period (1, β0 ⊕β1 ), we can mount the distinguishing attack by applying Simon’s algorithm on f . The basic strategy of our attack on SM4-like structure is similar. We try to compute the value Fi (x⊕βa ) for arbitrary x for some i. After some consideration we found that, given the encryption oracle of 7-round SM4-like structure, we can compute F4 (x ⊕ βa ) by setting X−3 = X−2 = X−1 = x and X0 = αa , where α0 and α1 are distinct constants. Details of the attack. Let X−3 = X−2 = X−1 = x and X0 = αa , where αa , a = 0, 1 are distinct constants. The branch values of each round function are as follows. From Table 2, we see that the 7-round ciphertext is (X4 , X5 , X6 , X7 ). Define a function f 7 : F2 × Fn2 → Fn2 by 7 (x, x, x, αa ) ⊕ αa f 7 (a, x) := OA

= F4 (x ⊕ g123,a )

(6)

386

C. Cid et al. Table 2. Values of branch Xi for 4-branch contracting Feistel-F structure.

Round

Xi

Notation

−3 ∼ −1 x 0

αa

1

x ⊕ g1,a

g1,a = F1 (αa )

2

x ⊕ g2,a

g2,a = F2 (αa ⊕ g1,a )

3

x ⊕ g3,a

g3,a = F3 (αa ⊕ g12,a ), g12,a = g1,a ⊕ g2,a

4

αa ⊕ g4,a (x)

g4,a (x) = F4 (x ⊕ g123,a ), g123,a = g12,a ⊕ g3,a

5

x ⊕ g1,a ⊕ g5,a (x) g5,a (x) = F5 (αa ⊕ g23,a ⊕ g4,a (x)), g23,a = g2,a ⊕ g3,a

6

x ⊕ g2,a ⊕ g6,a (x) g6,a (x) = F6 (αa ⊕ g13,a ⊕ g4,a (x) ⊕ g5,a (x)), g13,a = g1,a ⊕ g3,a

7

x ⊕ g3,a ⊕ g7,a (x) g7,a (x) = F7 (αa ⊕ g12,a ⊕ g4,a (x) ⊕ g5,a (x) ⊕ g6,a (x))

Then f 7 ((a, x) ⊕ (1, s)) = f 7 (a, x) holds for all (a, x) ∈: F2 × Fn2 , where s = g123,0 ⊕ g123,1 . One can see that f 7 (a ⊕ 1, x ⊕ g123,0 ⊕ g123,1 ) = F4 (x ⊕ g123,0 ⊕ g123,1 ⊕ g123,a⊕1 ) = F4 (x ⊕ g123,a ) = f 7 (a, x). Thus, when the quantum oracle of O7 is available, we can apply Simon’s algorithm on f 7 and recover the period (1, s)12 . 4.3

7-Round SM4 Key-Recovery Under CPA Setting

Recall that SM4 is a Feistel-KF structure. In other words, it deploys as round function Fi (x) = F (x ⊕ ki ), where F is a public function13 and ki is the round key. The key-recovery attack is similar as the distinguisher described in the previous section, except that we introduce 3 more variables and additional constraints on these variables. Let X−3 = x⊕βa , X−2 = x⊕γa , X−1 = x⊕δa and X0 = αa , where a ∈ {0, 1}. For all symbols Λ ∈ {α, β, γ, δ}, we set Λ0 = Λ ∈ Fn2 \ {0n } and Λ1 = 0n . Table 3 shows the value Xi at various round. Like before, although we consider 7-round SM4-like Feistel, we only need to know the value of X4 . Define a function f 7 : F2 × Fn2 → Fn2 by 7 f 7 (a, x) := OA (x ⊕ βa , x ⊕ γa , x ⊕ δa , αa ) ⊕ αa = F (x ⊕ h4 (a)) 12 13

(7)

7 To be more precise, we have to simulate OA (truncation) by using O7 . This can be done by using the technique explained in Remark 1. Here, we assume it to be the same public function for all rounds. In fact, every round can be an arbitrary public function and the attack still works.

Quantum Cryptanalysis on Contracting Feistel Structures

387

Table 3. Values of branch Xi for 4-branch contracting Feistel-KF structure. Round Xi

Notation

−3

x ⊕ βa

−2

x ⊕ γa

−1

x ⊕ δa

0

αa

1

x ⊕ βa ⊕ g1,a

g1,a = F (αa ⊕ γa ⊕ δa ⊕ k1 )

2

x ⊕ γa ⊕ g2,a

g2,a = F (αa ⊕ βa ⊕ δa ⊕ g1,a ⊕ k2 )

3

x ⊕ δa ⊕ g3,a

g3,a = F (αa ⊕ βa ⊕ γa ⊕ g1,a ⊕ g2,a ⊕ k3 )

4

αa ⊕ F (x ⊕ h4 (a)) h4 (a) = βa ⊕ γa ⊕ δa ⊕ g1,a ⊕ g2,a ⊕ g3,a ⊕ k4

Then f 7 ((a, x) ⊕ (1, s)) = f 7 (a, x) holds for all (a, x) ∈ F2 × Fn2 , where s = h4 (0) ⊕ h4 (1). One can see that f 7 (a ⊕ 1, x ⊕ h4 (0) ⊕ h4 (1)) = F (x ⊕ h4 (0) ⊕ h4 (1) ⊕ h4 (a ⊕ 1)) = F (x ⊕ h4 (a)) = f 7 (a, x) Thus, when the quantum oracle of O7 is available, we can apply Simon’s algorithm on f 7 and recover the period (1, h4 (0) ⊕ h4 (1)). In addition, this allows us to compute the value h4 (0) ⊕ h4 (1). Let Λ4 := (α, β, γ, δ) and T(Λ4 ) := β ⊕ γ ⊕ δ ⊕ g1 (Λ4 ) ⊕ g2 (Λ4 ) ⊕ g3 (Λ4 ), where g1 (Λ4 ) := F (α ⊕ γ ⊕ δ), g2 (Λ4 ) := F (α ⊕ β ⊕ δ ⊕ g1 (Λ4 )) and g3 (Λ4 ) := F (α ⊕ β ⊕ γ ⊕ g1 (Λ4 ) ⊕ g2 (Λ4 )). Then h4 (0) = T(Λ4 ⊕ key) and h4 (1) = T(key) hold, where key = (k1 ⊕ k2 ⊕ k3 , k2 ⊕ k3 ⊕ k4 , k1 ⊕ k3 ⊕ k4 , k1 ⊕ k2 ⊕ k4 ). In addition, let H(Λ4 ) := T(Λ4 ⊕ key) ⊕ T(key) ⊕ T(Λ4 ). Then H can be computed in quantum superposition since T((α, β, γ, δ) ⊕ key) ⊕ T(key) = h4 (0) ⊕ h4 (1) can be computed by using Simon’s algorithm on f 7 as above, and T(Λ4 ) does not depend on keys. Now, it is straightforward to check that the following conditions for s ∈ (Fn2 )4 are equivalent14 : 1. s is in the vector space V ⊂ (Fn2 )4 that is spanned by key and (0, η, η, η) for η ∈ Fn2 . 2. H(Λ4 ⊕ s ) = H(Λ4 ) holds for all Λ4 = (α, β, γ, δ) ∈ (Fn2 )4 . Thus, by applying Simon’s algorithm on H, we can compute the vector space V . Once we determine the space V , we can recover the keys k1 , k2 , and k3 since k1 = αs ⊕ γs ⊕ δs , k2 = αs ⊕ βs ⊕ δs , and k3 = αs ⊕ βs ⊕ γs hold for arbitrary (αs , βs , γs , δs ) ∈ V with αs = 0n . Once we have these round keys, the remaining round keys k4 , k5 , k6 , k7 can be recovered easily. 14

To be more precise, the conditions become equivalent with an overwhelming probability when the round function F is a random function.

388

4.4

C. Cid et al.

Generic (2d − 1)-round Distinguisher on Even Branches Contracting Feistel Structures Under CPA Setting

As mentioned earlier, the authors of [25] proved that when d is even, a d-branch r-round contracting Feistel-F structure is PRP-secure when r ≥ 2d − 1 rounds. Here, we show that it is not qPRP for r = 2d − 1 rounds for any even d. Let Or denote the r-round d-branch contracting Feistel-F quantum oracle. r (b1 , . . . , bd ) denote the branch bΛ of Or (b1 , . . . , bd ), where In addition, let OΛ Λ ∈ {1, 2, . . . , d}. We denote the value of each branch as a recursive function Xi = Xi−d ⊕ Fi (Xi−(d−1) ⊕ · · · ⊕ Xi−1 ),

(8)

where the input is (X1−d , . . . , X0 ) and output after r rounds is (Xr−(d−1) , . . . , Xr ). The distinguisher is a generalization of what is described in Sect. 4.2. In a nutshell, we show that if we let X1−d = · · · = X1 = x and X0 = αa , where α0 and α1 are distinct constants, and define a function f 2d−1 : F2 × Fn2 → Fn2 by f 2d−1 (a, x) := O12d−1 (x, . . . , x, αa ) ⊕ αa = Xd ⊕ αa

(9)

we can apply Simon’s algorithm on f 2d−1 and find a period, hence a distinguisher. The following theorem is the key observation to show the periodicity of f 2d−1 . Theorem 1. Given a d-branch contracting Feistel structure, where d is even. Now let X1−d = · · · = X1 = x and X0 = αa , where αa is constant. Then for 1 ≤ i ≤ d − 1, we have Xi = x ⊕ Ti (αa ). Here, Ti (αa ) denotes a constant value that consists of fixed keyed functions Fi and αa , and does not contain variable x. See Section B of this paper’s full version [6] for a proof. From Theorem 1, we can compute Xd = X0 ⊕ Fd (X1 ⊕ · · · ⊕ Xd−1 ) = αa ⊕ Fd (x ⊕ · · · ⊕ x ⊕T$ (αa )) = αa ⊕ Fd (x ⊕ T$ (αa ))    d−1 times

where T$ (αa ) = T1 (αa ) ⊕ · · · ⊕ Td−1 (αa ). Going back to Eq. (9), we have f 2d−1 (a, x) = Fd (x ⊕ T$ (αa )). Therefore, it is trivial to see that f 2d−1 ((a, x)⊕(1, s)) = f 2d−1 (a, x) holds for all (a, x) ∈ F2 ×Fn2 , where s = T$ (α0 ) ⊕ T$ (α1 ). 4.5

Generic (2d − 1)-round Key-Recovery on Even Branches Contracting Feistel-KF Structures Under CPA Setting

The key-recovery attack described in Sect. 4.3 can be easily extended to any even branches contracting Feistel-KF structures. The number of introduced variables, the analysis and recovered round keys will simply scale up linearly with the number of branches. For the sake of brevity, we omit the details of the keyrecovery attack for the generic case.

Quantum Cryptanalysis on Contracting Feistel Structures

5

389

Related-Key Attacks

We first show related-key attacks on the balanced Feistel-KF structures in Sect. 5.1, and then extend some of them to contracting Feistel-KF structures in Sect. 5.2. The related-key attacks on contracting Feistel-KF structures (especially, the distinguishing attack) are based on the single-key attacks in Sect. 4. 5.1

Related-Key Attacks on the Balanced Feistel-KF Structures

Recall that a bit-mask pattern is specified in our related-key setting. Adversaries do not know any information of the key itself, but can add an arbitrary value that is consistent with the bit-mask pattern when they make encryption queries (in quantum superposition). The focus of this subsection is related-key attacks on r-round Feistel-KF structures, which have rn-bit keys. Each key is denoted as (k1 , . . . , kr ) ∈ Frn 2 , where ki is the i-th round key. In our attacks, we fix a set of indices 1 ≤ i1 < i2 < · · · < iu ≤ r, and assume that adversaries can add any value in {0, 1}n to kij for all 1 ≤ j ≤ u (in quantum superpositions). For instance, we will consider an attack on the 4-round Feistel-KF structure where adversaries can add n-bit values to the first round key k1 . For ease of notation, we denote this related-key setting by (Δ, 0, 0, 0). Other related-key settings are denoted in the same way: the related-key setting on the 5-round Feistel-KF structure where adversaries can add values to k1 and k3 are denoted by (Δ, 0, Δ, 0, 0). Note that the pattern (0, . . . , 0) corresponds to the single-key setting, and (Δ, Δ, . . . , Δ) corresponds to the previous related-key setting by R¨ otteler and Steinwandt. See Table 4 for the summary of the results in this subsection and a comparison with other attacks. (See also this paper’s full version [6] for statements of the results in this subsection as a sequence of formal propositions). We denote the most and least significant n bits of a 2n-bit string X by XL and XR , respectively. Polynomial-time 5-round qCPA Distinguisher for Pattern (0, Δ, 0, 0, 0). Here we show a polynomial-time related-key qCPA that distinguishes the 5- round Feistel-KF structures from a random permutation for the related-key pattern (0, Δ, 0, 0, 0). Assume that the quantum oracle of O5,(0,Δ,0,0,0) : Fn2 × Fn2 × Fn2 → Fn2 × Fn2 , (L, R, Δ) → Enck1 ,k2 ⊕Δ,k3 ,k4 ,k5 (L, R) is available to adversaries. Here, Enc is the encryption of the 5-round FeistelKF structure or an ideally random cipher such that Enck1 ,...,k5 is a random permutation for each key (k1 , . . . , k5 ). The goal of an adversary is to distinguish whether Enc is the 5-round Feistel-KF structure or an ideally random cipher.

390

C. Cid et al. Table 4. Attack complexity of qCPAs on balanced Feistel-KF structures.

Rounds Key model

Attack type

Complexity Ref.

3

(0, 0, 0)

Distinguisher

poly(n)

[16]

3

(0, 0, 0)

Key-recovery

poly(n)

[7]

4

(Δ, Δ, Δ, Δ)

Key-recovery

poly(n)

[23]

4

(Δ, 0, 0, 0)

Key-recovery

poly(n)

This section

5

(0, Δ, 0, 0, 0)

Distinguisher

poly(n)

This section

5

(Δ, 0, Δ, 0, 0)

Key-recovery

poly(n)

This section

r

(0, . . . , 0)

Key-recovery

2(r−3)n/2

[10, 13]

r

(Δ, . . . , Δ)

Key-recovery

poly(n)

[23]

r

(Δ, 0, . . . , 0, 0, 0)

Key-recovery (only the first round key) poly(n)

This section

r

(Δ, . . . , Δ, 0, 0, 0)

Key-recovery

poly(n)

This section

r

(0, . . . , 0, Δ, 0, 0, 0)

Key-recovery

2(r−5)n/2

This section



2 + 3

((0, Δ) , 0, 0, 0)

Distinguisher

poly(n)

This section

2 + 2

({0, }(Δ, 0) , 0{, 0}) Distinguisher

poly(n)

This section

5,(0,Δ,0,0,0)

Define G : Fn2 → Fn2 by G(x) := OL (x, α0 , x) ⊕ 5,(0,Δ,0,0,0) (x, α1 , x). If Enc is the 5-round Feistel-KF structure, we have OL G(x) = α0 ⊕ α1 ⊕ F (2) (k2 ⊕ F (1) (k1 ⊕ α0 )) ⊕ F (2) (k2 ⊕ F (1) (k1 ⊕ α1 )) ⊕ F (4) (k4 ⊕ x ⊕ F (1) (k1 ⊕ α0 ) ⊕ F (3) (k3 ⊕ α0 ⊕ F (2) (k2 ⊕ F (1) (k1 ⊕ α0 )))) ⊕ F (4) (k4 ⊕ x ⊕ F (1) (k1 ⊕ α1 ) ⊕ F (3) (k3 ⊕ α1 ⊕ F (2) (k2 ⊕ F (1) (k1 ⊕ α1 )))).

In particular, G is a periodic function and s := F (3) (k3 ⊕α0 ⊕F (2) (k2 ⊕F (1) (k1 ⊕ α0 ))) ⊕ F (3) (k3 ⊕ α1 ⊕ F (2) (k2 ⊕ F (1) (k1 ⊕ α1 ))) ⊕ F (1) (k1 ⊕ α0 ) ⊕ F (1) (k1 ⊕ α1 ) is the period. On the other hand, when Enc is an ideally random cipher, G is not periodic with overwhelming probability. Hence the 5-round Feistel function can be distinguished from an ideally random cipher in polynomial time by checking whether G(x) is periodic, if the quantum related-key oracle of the pattern (0, Δ, 0, 0, 0) is given to adversaries. Remark 2. The above attack can also be used to mount a 4-round polynomialtime qCPA related-key distinguisher for the pattern (Δ, 0, 0, 0). Polynomial-time qCPA Distinguisher for the Pattern ((0, Δ) , 0, 0, 0). The polynomial-time 5-round distinguisher for the related-key pattern (0, Δ, 0, 0, 0) can easily be extended to a polynomial-time r-round qCPA distinguisher for the pattern ((0, Δ) , 0, 0, 0), where r = 2 + 3 for some  ≥ 2. See Section C.1 of this paper’s full version [6] for details. Remark 3. The attack can also be used to mount a (2 + 2)-round polynomialtime qCPA related-key distinguisher for pattern omitting either the first or last round (denoted in parenthesis) ({0, }(Δ, 0) , 0{, 0}).

Quantum Cryptanalysis on Contracting Feistel Structures

391

Polynomial-time r-round qCPA Key-recovery Attack for Related-key Pattern (Δ, 0, . . . , 0). Here we show a polynomial-time related-key qCPA that recovers the key of the r-round Feistel-KF structure for the related-key pattern (Δ, 0, . . . , 0). Our attack recovers all the keys when r = 4, and only the first round key when r > 4. (r) Let Enck1 ,...,kr denote the encryption function of the r-round Feistel-KF structure with the key (k1 , . . . , kr ) ∈ Frn 2 . Assume that the quantum oracle (r) of Or,(Δ,0,...,0) : Fn2 × Fn2 × Fn2 → Fn2 × Fn2 , (L, R, Δ) → Enck1 ⊕Δ,k2 ,...,kr (L, R) is available to adversaries. r,(Δ,0,...,0) (F (1) (x), 0, x). Define a function G : Fn2 → Fn2 × Fn2 by G(x) := OL (r) (r−1) Then G(x) = Enck1 ⊕x,k2 ,...,k4 (F (1) (x), 0) = Enck2 ,...,kr (0, F (1) (x) ⊕ F (1) (x ⊕ k1 )) holds. In particular, G(x) is periodic function with the period k1 . Thus we can recover the key k1 in polynomial time by applying Simon’s algorithm on G(x). When r = 4, once we recover k1 , other subkeys k2 , k3 , k4 can be recovered easily by using the 3-round key-recovery attack in Sect. 3.3. Since the generic ˜ 3n/2 ) (under the key-recovery attack in this related-key setting requires time O(2 assumption that Conjecture 1 holds), our attack is exponentially faster than the generic attack. Remark 4. By iteratively applying the above attack, we can also recover all the keys of r-round Feistel-FK structure in polynomial time when the related-key oracle of pattern (Δ, . . . , Δ, 0, 0, 0) is available. Polynomial-time 5-round qCPA Key-recovery Attack for the Pattern (Δ, 0, Δ, 0, 0). There also exists a polynomial-time related-key qCPA that recovers the key of the 5-round Feistel-KF for the related-key pattern (Δ, 0, Δ, 0, 0), by using nested Simon’s algorithm. See Section C.2 of this paper’s full version [6] for details. Attack on More Rounds with Grover’s Algorithm. The polynomial-time attacks introduced above can be used to mount key-recovery attacks on more rounds with the technique in Sect. 3.2, which converts key-recovery attacks and distinguishers into key-recovery attacks on more rounds by using the Grover search. For instance, assume that we are given the r-round related-key oracle of the pattern (0, . . . , 0, Δ, 0, 0, 0). Then, by guessing the first (r − 5) round keys k1 , . . . , kr−5 with the Grover search, and applying the polynomial-time 5-round distinguisher for the pattern (0, Δ, 0, 0, 0) on the remaining 5-rounds, we can ˜ (r−5)n/2 ). Once we recover k1 , . . . , kr−5 , we can recover k1 , . . . , kr−5 in time O(2 ˜ n/2 ) by applying the Grover search to guess kr−4 and recover kr−4 in time O(2 the 4-round distinguisher for the pattern (Δ, 0, 0, 0) on the last 4-rounds (see Remark 2). The remaining keys kr−3 , . . . , kr can be recovered in polynomial time by using the 4-round key-recovery attack for the pattern (Δ, 0, 0, 0) on the ˜ (r−5)n/2 ). last 4-round. Eventually, all the keys can be recovered in time O(2

392

5.2

C. Cid et al.

Related-Key Attacks on the Contracting Feistel Structures

For contracting Feistel-KF structures, let us use the same notation (0, Δ, 0, 0, 0) to denote related-key patterns under consideration, as in Sect. 5.1. Since contracting Feistel structures are much more complex than balanced Feistel structures, we focus on the setting that inserting differential is allowed for only a single round-key (that is, we do not consider the patterns such as (Δ, 0, Δ, 0, 0)). To avoid the awkward long notations for related-key patterns, we use an abbreviated notation as follows: we denote the pattern (0, Δ, 0, 0, 0) by Δd,2/5 , where “d” means that we are considering d-branch contracting Feistel structure, “5” means that we are considering 5-round construction, and “2” means that differentials are inserted into the second round key. Then, we can show that: 1. The polynomial-time qCPA related-key distinguisher on the 5-round balanced Feistel structure for the related-key pattern (0, Δ, 0, 0, 0) can be extended to a polynomial-time qCPA related-key distinguisher on the d-branch (3d − 1)round contracting Feistel structure for the pattern Δd,d/3d−1 . 2. The polynomial-time related-key qCPA that recovers the first round key k1 of the r-round balanced Feistel structure for the related-key pattern (Δ, 0, . . . , 0) can be extended to a polynomial-time related-key qCPA that recovers the first round key k1 of the d-branch (3d − 1)-round contracting Feistel structure for the related-key pattern Δd,1/r . Since it is straightforward to show the above extensions, we refrain from writing the full proofs here. See Section C.3 of this paper’s full version [6] for details. As in Sect. 4, it is also assumed that the number of the branch d is even. Attack on More Rounds with Grover’s Algorithm. The polynomial-time attacks introduced above can be used to mount key-recovery attacks on more rounds with the technique in Sect. 3.2, as well as the attacks on balanced Feistel structures in Sect. 5.1.

6

Future Work

Over and over again, we have seen polynomial-time quantum distinguisher breaking the classical PRP bound [9,16] (as well as the classical SPRP bound [14]) for Feistel structures. Our intuition is that the classical PRP or SPRP bounds represent the minimum number of rounds necessary for the last distinguishable branch to be masked by some pseudorandom value through a direct XOR. This is sufficient for the classical setting. However, in the quantum setting and with the right configuration and control over the input branches, we see that Simon’s algorithm is often able to find this masking value in polynomial time. Therefore we feel inclined to conjecture that if r is the minimum number of rounds for a Feistel structure to be classical PRP (resp. SPRP) secure, then there exists a r-round polynomial-time qCPA (resp. qCCA) distinguisher for that Feistel structure. Having said that, it would be interesting see if we can find (3d − 2)-round polynomial-time qCCA distinguisher for d-branch contracting Feistel structure.

Quantum Cryptanalysis on Contracting Feistel Structures

393

Acknowledgement. This work was initiated during the group sessions of the 9th Asian Workshop on Symmetric Key Cryptography (ASK 2019) held in Kobe, Japan. Yunwen Liu is supported by National Natural Science Foundation of China (No.61902414) and Natural Science Foundation of Hunan Province (No. 2020JJ5667).

References 1. Aoki, K., et al.: Camellia: A 128-bit block cipher suitable for multiple platforms— design and analysis. In: Stinson, D.R., Tavares, S. (eds.) SAC 2000. LNCS, vol. 2012, pp. 39–56. Springer, Heidelberg (2001). https://doi.org/10.1007/3-54044983-3 4 2. Biryukov, A., Wagner, D.: Advanced slide attacks. In: Preneel, B. (ed.) EUROCRYPT 2000. LNCS, vol. 1807, pp. 589–606. Springer, Heidelberg (2000). https:// doi.org/10.1007/3-540-45539-6 41 3. Bonnetain, X.: Collisions on Feistel-MiMC and univariate GMiMC. IACR Cryptol. ePrint Arch. 2019, 951 (2019) 4. Bonnetain, X., Hosoyamada, A., Naya-Plasencia, M., Sasaki, Yu., Schrottenloher, A.: Quantum attacks without superposition queries: the offline Simon’s algorithm. In: Galbraith, S.D., Moriai, S. (eds.) ASIACRYPT 2019. LNCS, vol. 11921, pp. 552–583. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34578-5 20 5. Bonnetain, X., Naya-Plasencia, M., Schrottenloher, A.: On quantum slide attacks. In: Paterson, K.G., Stebila, D. (eds.) SAC 2019. LNCS, vol. 11959, pp. 492–519. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38471-5 20 6. Cid, C., Hosoyamada, A., Liu, Y., Sim, S.M.: Quantum cryptanalysis on contracting Feistel structures and observation on related-key settings. IACR Cryptol. ePrint Arch. 2020, 959 (2020) 7. Daiza, T., Kurosawa, K.: Quantum/classical key recovery attack on 3-round FeistelKF structure (2020), unpublished manuscript 8. Diffie, W., Ledin, G.: SMS4 encryption algorithm for wireless networks. IACR Cryptology ePrint Archive 2008, vol. 329 (2008). http://eprint.iacr.org/2008/329 9. Dong, X., Li, Z., Wang, X.: Quantum cryptanalysis on some generalized Feistel schemes. Sci. China Inf. Sci. 62(2), 22501:1–22501:12 (2019) 10. Dong, X., Wang, X.: Quantum key-recovery attack on Feistel structures. Sci. China Inf. Sci. 61(10), 102501:1–102501:7 (2018) 11. Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania, USA, May 22–24, 1996, pp. 212–219 (1996) 12. Hodˇzi´c, S., Knudsen Ramkilde, L., Brasen Kidmose, A.: On quantum distinguishers for type-3 generalized Feistel network based on separability. In: Ding, J., Tillich, J.P. (eds.) PQCrypto 2020. LNCS, vol. 12100, pp. 461–480. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44223-1 25 13. Hosoyamada, A., Sasaki, Yu.: Quantum demiric-sel¸cuk meet-in-the-middle attacks: applications to 6-round generic Feistel constructions. In: Catalano, D., De Prisco, R. (eds.) SCN 2018. LNCS, vol. 11035, pp. 386–403. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98113-0 21 14. Ito, G., Hosoyamada, A., Matsumoto, R., Sasaki, Yu., Iwata, T.: Quantum chosenciphertext attacks against Feistel ciphers. In: Matsui, M. (ed.) CT-RSA 2019. LNCS, vol. 11405, pp. 391–411. Springer, Cham (2019). https://doi.org/10.1007/ 978-3-030-12612-4 20

394

C. Cid et al.

15. Kaplan, M., Leurent, G., Leverrier, A., Naya-Plasencia, M.: Breaking symmetric cryptosystems using quantum period finding. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. LNCS, vol. 9815, pp. 207–237. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53008-5 8 16. Kuwakado, H., Morii, M.: Quantum distinguisher between the 3-round Feistel cipher and the random permutation. In: ISIT 2010, Proceedings, pp. 2682–2685. IEEE (2010) 17. Kuwakado, H., Morii, M.: Security on the quantum-type even-Mansour cipher. In: ISITA 2012, pp. 312–316. IEEE (2012) 18. Leander, G., May, A.: Grover meets Simon – quantumly attacking the FXconstruction. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10625, pp. 161–178. Springer, Cham (2017). https://doi.org/10.1007/978-3-31970697-9 6 19. Luby, M., Rackoff, C.: How to construct pseudo-random permutations from pseudorandom functions. In: Williams, H.C. (ed.) CRYPTO 1985. LNCS, vol. 218, pp. 447–447. Springer, Heidelberg (1986). https://doi.org/10.1007/3-540-39799-X 34 20. National Bureau of Standards: Data encryption standard. FIPS 46, January 1977 21. Ni, B., Ito, G., Dong, X., Iwata, T.: Quantum attacks against type-1 generalized Feistel ciphers and applications to CAST-256. In: Hao, F., Ruj, S., Sen Gupta, S. (eds.) INDOCRYPT 2019. LNCS, vol. 11898, pp. 433–455. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35423-7 22 22. NIST: Announcing request for nominations for public-key post-quantum cryptographic algorithms. National Institute of Standards and Technology (2016) 23. R¨ otteler, M., Steinwandt, R.: A note on quantum related-key attacks. Inf. Process. Lett. 115(1), 40–44 (2015) 24. Simon, D.R.: On the power of quantum computation. In: 35th Annual Symposium on Foundations of Computer Science, Santa Fe, New Mexico, USA, 20–22 November 1994, pp. 116–123 (1994) 25. Zhang, L.T., Wu, W.L.: Pseudorandomness and super pseudorandomness on the unbalanced feistel networks with contracting functions. China J. Comput., 32(7), 1320–1330 (2009). http://cjc.ict.ac.cn/eng/qwjse/view.asp?id=2909

Evaluation of Quantum Cryptanalysis on SPECK Ravi Anand1(B) , Arpita Maitra2 , and Sourav Mukhopadhyay1 1

Department of Mathematics, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India [email protected], [email protected] 2 TCG Centres for Research and Education in Science and Technology, Kolkata 700091, West Bengal, India [email protected]

Abstract. In this work, all the versions of SPECK are evaluated against quantum adversary in terms of Grovers algorithm. We extensively study the resource requirements for quantum key search under the model of known plaintext attack and show that our estimation provides better result than the existing efforts. Further, for the first time, we explore differential cryptanalysis on SPECK in quantum framework that provides encouraging results. For both the cases, the quantum resources are evaluated in terms of several parameters, i.e., the T-depth of the circuits and the number of qubits required for the attacks. Experiments are performed in IBM-Q environment to support our claims. Keywords: Differential cryptanalysis Grover’s algorithm · Qiskit · IBM-Q

1

· Quantum reversible circuits ·

Introduction

In symmetric key cryptography, the well known fact is that the n-bit secret key can be recovered in O(2n/2 ) exploiting Grover search algorithm [8]. So to guarantee the required security, a useful heuristic is to double the key length. However, to implement Grover, one needs to design the quantum version of the cipher. In this direction, subsequent work has been done on AES and some other block ciphers [3,7,12,20,26]. In recent time, a lot of symmetric constructions are being evaluated in quantum settings. In this connection, we can mention the key recovery attack against Even-Mansour constructions and distinguishers against 3-round Feistel constructions [17]. The key recovery attacks against multiple encryptions [13] and forgery attacks against CBC-like MACs [14] have also been studied. Quantum meet-in-the-middle attack on Feistel constructions [9], key recovery attacks against FX constructions [21] have been explored too. The present trend is to study the feasibility of mapping the existing classical attacks in quantum settings [10,14,15,22]. Very recently, Bonnetain et al. [5] proposed a novel methodology for quantum cryptanalysis on symmetric ciphers. c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 395–413, 2020. https://doi.org/10.1007/978-3-030-65277-7_18

396

R. Anand et al.

They exploited the algebraic structure of certain classical cryptosystems to bypass the standard quantum superposition queries. On the other hand, lightweight ciphers like SIMON and SPECK have been left unexplored. Very recently, in [4], Grover’s search algorithm has been evaluated on SIMON. In the current initiative, we are concentrating on SPECK. In [11], K. Jang, S. Choi, H. Kwon and H. Seo analyzed Grover search on SPECK. They followed the same approach which has been followed in [4]. For addition modulo 2n , they exploited Cuccaro et al.’s ripple-carry addition circuit [6]. We follow the addition circuit presented in [23]. This is because of the fact that the circuit presented in [23] for adding two n-bit numbers exploits (2n − 2) Toffoli and (5n − 6) CNOT gates and requires no ancilla whereas the circuit described in [6] requires (2n − 2) Toffoli , (5n − 4) CNOT , and (2n − 4) NOT gates and one ancilla. Hence, while the number of Toffoli gates remain the same, the number of Clifford, i.e., (CNOT +NOT ), gates are relatively low in our circuit. Moreover, we need one less qubit compared to [11]. K Jang et al. designed the circuit and estimated the resources only for two variants of SPECK, SPECK32/64 and SPECK64/128 whereas we provide the detailed circuits and resource counts for all the variants of SPECK. In addition to this, we analyzed quantum differential cryptanalysis on SPECK and implemented it in IBM-Q interface [25]. To the best of our knowledge, this is the first initiative to implement a quantum differential attack on lightweight ciphers. The rest of the article has been organized as follows. A brief description of the cipher SPECK and some quantum gates used later is described in Sect. 2. The quantum circuit for the cipher is illustrated in Sect. 3. Section 5 provides a detailed analysis of the resources required for applying Grover’s key search on SPECK. Implementation of quantum differential cryptanalysis on the cipher is described in Sect. 6. Finally we conclude in Sect. 7. The code for all the test circuits implemented in QISKIT are provided in [27].

2

Preliminaries

In this section, we briefly describe the construction of SPECK and some quantum gates which will be used for designing the quantum reversible circuit for the cipher. 2.1

Brief Summary of SPECK

SPECK is a family of lightweight block ciphers with 10 variants (Table 1). The state update function consists of circular shift, addition modulo 2n , and bit-wise XOR operations. The state update function is defined as, F (x, y) = ((S −α x  y) ⊕ k, S β y ⊕ (S −α x  y) ⊕ k),

(1)

The structure of one round SPECK encryption is depicted in Fig. 1, where S j , S −j represents a left circular shift and right circular shift by j bits respectively,  denotes the addition modulo 2n , Li and Ri are n-bit words which

Evaluation of Quantum Cryptanalysis on SPECK

397

constitute the state of SPECK at the i-th round and ki is the round key which is generated by key scheduling algorithm described below.

Fig. 1. SPECK round function

The different variants of SPECK are denoted by SPECK2n/mn, where 2n denotes the block size of the variant, and mn is the size of the secret key. Here n can take values from 16, 24, 32, 48 or 64, and m from 2, 3 or 4. For each combination of (m, n), the corresponding round number T is adopted. Table 1. SPECK parameters Block Size Key Size (k = mn) (2n)

word size keywords rot rot Rounds (n) (m) α β (T )

32

64

16

4

7

2

22

48

72,96

24

3,4

8

3

22,23

64

96,128

32

3,4

8

3

26,27

96

96, 144

48

2,3

8

3

28,29

2,3,4

8

3

32,33,34

128

128,192,256 64

The round keys of SPECK are generated using the state update function. Let K = (lm−2 , . . . , l0 , k0 ) for m ∈ 2, 3, 4. Then the sequences ki and li are defined as: li+m−1 = (ki  S −α li ) ⊕ i ki+1 = S β ki ⊕ li+m−1 2.2

Quantum Implementation of XOR, Rotation and Addition Modulo 2n

The Add-Rotate-XOR (ARX) ciphers make use of the following operations

(2)

398

R. Anand et al.

– bitwise XOR, ⊕ – left and right circular shifts (rotations), S j and S −j , respectively, by j bits, and – addition modulo 2n , . XOR of two numbers a and b, a ⊕ b, can be implemented by CNOT in quantum circuits i.e. {b = a ⊕ b} = CN OT {a, b}. Rotation of any number can be implemented by using SWAP values across wires which requires 3 CNOT gates. However this can also be done by simply keeping track of the necessary free re-wiring. So, in the estimations, the SWAP gates are disregarded as free which increases the compactness of the circuit. We have manually checked that this does not make the circuit incompatible. For addition, we use the circuit described in [23] as it uses no ancillas. To implement b = a  b where a = a0 a1 . . . an−1 and b = b0 b1 . . . bn−1 are n-bit numbers, the circuit is constructed as described in Algorithm 1. Algorithm 1. Implementing addition modulo n 1: procedure Addition modulo n 2: for i ← 1, n − 1 do 3: CNOT {ai , bi } 4: end for 5: for i ← n − 1, 1 do 6: CNOT {ai , ai+1 } 7: end for 8: for i ← 0, n − 1 do 9: Toffoli {bi , ai , ai+1 } 10: end for 11: for i ← n − 1, 1 do 12: CNOT {ai , bi } 13: Toffoli {bi−1 , ai−1 , ai } 14: end for 15: for i ← 1, n − 2 do 16: CNOT {ai , ai+1 } 17: end for 18: for i ← 0, n − 1 do 19: CNOT {ai , bi } 20: end for 21: end procedure

 Returns b = b  a

Here an stores the carry. To implement addition modulo 2n , we do not need to store the carry, so one CNOT gate and one Toffoli gate associated with an is removed. The numbers of CNOT and Toffoli gates are 5n − 6 and 2n − 2, respectively. The circuit for addition modulo 2n with n = 3 is implemented in IBM-Q (Fig. 2). Here, a = [1, 1, 0], and b = [0, 1, 0]. In Fig. 3, we present some schematic diagrams for different operations which will be exploited later.

Evaluation of Quantum Cryptanalysis on SPECK

(a)

399

(b)

Fig. 2. (a).Circuit for implementing b = a  b. (b). Result obtained after measurement when inputs were a = [1, 1, 0], b = [0, 1, 0] |b

ADD

|a



|b  a

|a

RRα

 −α  S (a)

|a

RLα

|S α (a)

|a

|b

RX β

|a



 β  S (b) ⊕ a |a

Fig. 3. Subroutines which will be exploited later. ADD implements b = b  a, RRα rotates a right α times, RLα rotates a left α times, and RX β rotates b left β times followed by XOR with a, i.e. bi = b(i+β)%n ⊕ a.

3

A Quantum Circuit for SPECK

In this section we develop a reversible quantum circuit for SPECK and analyze our circuits based on the number of qubits, N OT gates, CNOT gates, and Toffoli gates. We first construct a circuit for the key expansion, then the circuit for round update and use them to construct the circuit for full cipher. We also provide the implementation of a SPECK like toy cipher in IBM-Q. 3.1

Circuit for Key Expansion

The key expansion routine is defined in Eq. 2. We can implement an in-place construction of the key expansion. The number of N OT gates will depend on the values of the round constants. Here we show the construction for all the three cases m = 2, 3, and 4. For m = 2 we have two key words k0 , l0 . k0 is used for the first round of encryption. k0 , l0 are states of size n, so we need 2n qubits to store this value and one ancilla to store the carry after every round. The second round key k1 can be computed on the same qubits which stores k0 . k1 can be computed from (k0 , l0 ) as shown in the circuit below (Fig. 4). For m = 3 we have three key words k0 , l0 , l1 each of size n. For m = 4 we have four key words k0 , l0 , l1 , l2 each of size n. The round key for further rounds

400

R. Anand et al. |k0 

RXβ



|l0 

RRα

ADD

rc0

|k1 



RXβ



RRα

ADD

rc1

···

|k2 

···



Fig. 4. Circuit for key expansion with 2 keywords. rci = {0, 1, 2, ..., T − 2} represents the round dependent constants |k0  |l0 

RXβ

• RRα

ADD

rc0

|k1 

RXβ



|k2 



RRα

|l1 

RRα

rc1

ADD

RXβ

• ADD

rc2

|k3 

··· ···



···



Fig. 5. Circuit for key expansion with 3 keywords. rci = {0, 1, 2, ..., T − 2} represents the round dependent constants |k0  |l0  |l1 

RXβ

• RRα

ADD

rc0

RXβ

|k1 

RXβ

|k2 

|k3 

··· ···

• RRα

ADD

rc0

···



|l2 

RRα

ADD

rc0



···

Fig. 6. Circuit for key expansion with 4 keywords. rci = {0, 1, 2, ..., T − 2} represents the round dependent constants

Fig. 7. Key expansion for m = 2.

can be computed as explained above for m = 2 and the details are shown in Figs. 5 and 6. It can easily be calculated that one round of key expansion requires 6n − 6 CNOT gates, 2n − 2 Toffoli gates and i NOT gates (depending on the round number i). Consider the circuit in Fig. 7. This represents a reduced version of the key expansion of SPECK with 2 keywords (refer to Eq. 2) in IBM-Q interface. The value of α and β is assumed to be 1. The measurement of k0 gives the value of the next round key. 3.2

Circuit for Round Update Function

The round function F is defined as, F (x, y) = ((S −α x + y) ⊕ k, S β y ⊕ (S −α x + y) ⊕ k),

(3)

Evaluation of Quantum Cryptanalysis on SPECK |K0  |L0 

KE

|K1 

ADD



|L1 



RXβ

|R1 

• RRα

|R0 

401

Fig. 8. Circuit for one round state update

Fig. 9. Circuit for a round of reduced version of SPECK

The circuit for the state update is shown Fig. 8, where Kj , Lj , Rj represent quantum states of size n for round j. So, we need 2n qubits to store the values of L0 , R0 . Here, KE is the key expansion routine described in 3.1. It can easily be calculated that one round of state requires 7n − 6 CNOT gates, and 2n − 2 Toffoli gates. Now, consider the circuit in Fig. 9. This is the QISKIT implementation of the circuit for one round of a SPECK like toy cipher. The assumed state size is 6. The state is split into L, R each of size 3. The round update is defined as in 1 with the value of α and β assumed to be 1. The circuit here describes one round of the cipher, i.e. if we measure the L and R states, we would get the values of the state after one round. 3.3

Circuit for Full SPECK

We implement a reversible circuit for SPECK, as reversibility is necessary for the cipher to be useful as a subroutine in Grover search. Using the circuits developed for round function and key expansion we can now construct the circuit for full round SPECK. The input to the circuit is the key K and the plaintext which is split into two halves L0 , R0 . The output of the circuit is the ciphertext LT −1 , RT −1 , where T is the number of rounds. The size of K, L, R are mn, n, n respectively, where m is either 2 or 3 or 4. In Fig. 10, we draw the circuit considering m = 2. Similar construction can be made for variants with m = 3 and m = 4. U is the round update function and KE is the key expansion routine described in 3.2 and 3.1 respectively. Figure 11 gives an implementation of the toy cipher with two keywords in IBM-Q. We consider the key size and state size to be 6 and the number of rounds to be 2. The value of α and β is assumed to be 1. Let the plaintext be L0 = [0, 1, 1], R0 = [1, 1, 1] and the key words be

402

R. Anand et al. |k0 



|l0  |L0  |R0 

U

KE

|k1 



|l1  |L2  |R2 



U

KE

|k1 

···

|kT −1 



|l2 

···

|lT −1 

|L3 

···

|LT −1 

|R3 

···

|RT −1 

Fig. 10. The circuit for implementing SPECK with two key words

Fig. 11. Circuit for 2 rounds reduced SPECK with two key words

Fig. 12. Measurement of circuit in Fig. 11

k0 = [1, 1, 0], l0 = [0, 1, 0]. Then after four rounds the ciphertext will be L2 = [1, 0, 1], R2 = [1, 0, 1]. In Fig. 12 the output is [1, 0, 1, 1, 0, 1] and it should be read as [R2 (2), R2 (1), R2 (0), L2 (2), L2 (1), L2 (0)]. This difference in convention is because of the fact that L2 (0) is measured first and so the value is closer to the figure and R2 (2) is measured last and so its value is farthest from the figure. This same convention is followed in all the histogram output obtained in IBM-Q interface. Implementing two rounds of SPECK requires two round updates and one key expansion. So T rounds of SPECK would require T round updates and T − 1 key expansion. T round updates would require T (7n − 6) CNOT gates, T (2n − 2) Toffoli gates. T − 1 key expansion would require (T − 1)(6n − 6) CNOT gates, (T − 1)(2n − 2) Toffoli gates. NOT gates will depend on the number of rounds. We would also require NOT gates to initialize the plaintexts and the keys, but we omit those here as the number depends on the plaintext and would add only

Evaluation of Quantum Cryptanalysis on SPECK

403

a depth of 1 to the full circuit. We also require 2n + mn qubits to represent the plaintext and the key.

4

Grover’s Oracle

In this section, we will discuss the implementation of Grover search on a block cipher under known plain text attack. Let r many pairs of plaintext-ciphertext be sufficient to successfully extract a unique solution. In this regard, we have to design an oracle that encrypts the given r plaintexts under the same key and then computes a Boolean value which determines if all the resulting ciphertexts are equal to the given classical ciphertexts. This can be done by running the block cipher circuit r many times in parallel. Then the resultant ciphertexts are compared with the classical ciphertexts. The target qubit will be flipped if the ciphertexts match. This is called Grover oracle. In Fig. 13, the construction of such an oracle is given for two instances of plaintext-ciphertext pairs considering SPECK block cipher. |K



|M1 

EN C



DEC

|K



|M1 

|C1  |0 |M2 

EN C



DEC

|0 |M2 

|C2  |−

=

|−

Fig. 13. Grover’s oracle using two plaintext-ciphertext pairs. Here, EN C represents the encryption function and DEC the decryption function. The (=) operator compares the output of the EN C with the given ciphertexts and flips the target qubit if they are equal.

We implement the above idea, i.e., the Grover oracle for our toy version of SPECK (Fig. 11) in IBM-Q simulator. The functions Grover oracle and Grover Diffusion used in our code is taken from [16] by importing the file Our Qiskit Functions. In this case, the plaintext M is [0, 1, 1, 1, 1, 1] and the key K is [1, 1, 0, 0, 1, 0]. Then after two rounds the ciphertext C will be [1, 0, 1, 1, 0, 1]. In theory, this key, i.e., K = [1, 1, 0, 0, 1, 0] will be obtained as the output of the Grover oracle. Fig. 14a shows the outcome of the Grover applied to this circuit. Then we implement Grover for another plaintext ciphertext pair where M = [0, 1, 1, 1, 1, 0], and C = [0, 1, 1, 1, 0, 1], under the same key. The output of the oracle is K = [1, 1, 0, 0, 1, 0] as shown in Fig. 14b.

404

R. Anand et al.

(a)

(b)

Fig. 14. Histogram obtained after running Grover’s on the reduced SPECK described in Fig. 11.

5 5.1

Resource Estimation Cost of Implementing SPECK

We estimate the cost of implementing SPECK as a circuit including the key expansion as well as the round function. As discussed above the round constants are implemented to the key expansion function using adequate number of N OT gates. We have not included the number of N OT required to initialize the plaintext in our estimates as it depends on the given plaintext. Table 2 gives the cost estimates of implementing all SPECK variants. Table 2. Cost of implementing SPECK variants SPECK2n/mn

# N OT # CNOT # Toffoli # qubits depth

SPECK32/64

42

4222

1290

96

1694

SPECK48/72

42

6462

1978

120

2574

SPECK48/96

45

6762

2070

144

2691

SPECK64/96

54

10318

3162

160

4082

SPECK64/128

57

10722

3286

192

4239

SPECK96/96

60

16854

5170

192

6636

SPECK96/144

64

17466

5358

240

6873

SPECK128/128 75

25862

7938

256

10144

SPECK128/192 80

26682

8190

320

10461

SPECK128/256 81

27502

8442

384

10778

Evaluation of Quantum Cryptanalysis on SPECK

5.2

405

Cost of Grover Oracle

Following [12] we assume that r = k/(2n) known plaintext-ciphertext pairs are sufficient to give us a unique solution, where 2n is the block size and k = mn is the key size of the cipher. Here, one should mention that if r = k/(2n) is an integer, then consider its next integer. The Grover’s oracle consists of comparing the 2n-bit outputs of the SPECK instances with the given r ciphertexts. This can be done using a (2n·r)-controlled CNOT gates (we neglect some N OT gates which depend on the given ciphertexts). Following [19], we estimate the number of T gates required to implement a t-fold controlled N OT gates as (32 · t − 84). We use the decomposition of Toffoli gates to 7 T -gates plus 8 Clifford gates, a T -depth of 4 and total depth of 8 as in [2]. To estimate the full depth and the T -depth we only consider the depths of the SPECK instances ignoring the multi controlled N OT gate used in comparing the ciphertexts. We also need (2 · (r − 1) · k) CNOT gates to make the input key available to all the SPECK instances in the oracle. The total number of Clifford gates is the sum of the Clifford gates used in the implementation of SPECK and the (2 · (r − 1) · k) CNOT gates needed for input key. The cost estimates for all SPECK variants are presented in Table 3 Table 3. Cost of Grover oracle SPECK SPECK2n/k

r # Clifford gates # T gates T -depth full depth # qubits

SPECK32/64

3 214.85

214.28

213.33

213.52

161

15.45

14.76

13.94

214.16

169

14.01

14.22

SPECK48/72

2 2

15.53

2

14.96

2

SPECK48/96

3 2

2

2

2

241

SPECK64/96

2 216.13

215.55

214.62

214.83

224

SPECK64/128

3 216.18

215.60

214.68

214.89

321

16.83

16.25

15.33

15.53

SPECK96/96

2 2

2

2

2

289

SPECK96/144

2 216.85

216.30

215.38

215.58

337

17.45

16.86

15.95

16.15

SPECK128/128 2 2

2

2

2

385

SPECK128/192 2 217.49

216.90

216

216.19

449

17.52

16.94

SPECK128/256 3 2

5.3

2

16.04

2

16.23

2

641

Cost of Exhaustive Key Search

Using the estimates in Table 3, we provide the cost estimates for the full exhaustive key search on all variants of SPECK in Table 4. We consider  π4 2k/2  iterations of the Grovers operator. As in [20] we do not consider the depth of implementing the two multi-controlled N OT gates while calculating the T -depth and overall depth.

406

R. Anand et al.

Table 4. Cost estimates of Grovers algorithm with  π4 2k/2  oracle iterations for SPECK SPECK2n/k

# Clifford gates # T gates T -depth Full depth # qubits

SPECK32/64

246.85

246.24

245.33

245.52

161

51.45

50.76

2

49.94

2

250.16

169 241

SPECK48/72

2

SPECK48/96

263.53

262.96

262.01

262.22

64.13

63.55

62.62

62.83

SPECK64/96

2

2

2

2

224

SPECK64/128

280.18

279.60

278.68

278.89

321

64.83

64.25

63.33

63.53

SPECK96/96

2

2

2

2

289

SPECK96/144

288.85

288.30

287.38

287.58

337

81.45

80.86

79.95

80.15

SPECK128/128 2

2

2

2

385

SPECK128/192 2113.49

2112.90

2112

2112.19

449

145.72

144.94

SPECK128/256 2

6

2

144.04

2

144.23

2

641

Quantum Differential Attack on SPECK

In this section, we study the quantum resources required to implement the differential attack on SPECK. Analysis of complexities of differential cryptanalysis for quantum adversaries has been studied in [13,18,24]. Here we present circuits for two main types of differential attacks: the differential distinguisher and the the last round attack and then analyze the resource required to implement these on a quantum computer for SPECK. Full implementation of the attacks on SPECK are not possible due to the restrictions in the maximum number of qubits that we can avail in state-ofthe-art quantum processors and simulators. So, we construct the circuits for a reduced cipher and show that the implementation of such attacks are possible. We, then, compute the resources required for mounting the attack on all variants of SPECK. 6.1

Differential Cryptanalysis

Differential attacks exploit the fact that there exists a pair (α , β  ) for a cipher E such that p = − log Pr(E(P ) ⊕ E(P  = P ⊕ α ) = β  ) < N ,

(4)

where N = 2n. The pair (α , β  ) is known as the differential. Differential cryptanalysis is a chosen-plaintext attack. The attacker selects pairs of input (P, P  ), such that P ⊕ P  = α , knowing that their corresponding ciphertexts (C, C  ) will satisfy C ⊕ C  = β  with high probability and then attempt to recover the key faster than the exhaustive search.

Evaluation of Quantum Cryptanalysis on SPECK

6.2

407

Differential Distinguisher

In this attack model, we try to find a pair of plaintexts (P, P  ), P  = P ⊕ α , which satisfy E(P ) ⊕ E(P  ) = β  . If the cipher E is ideally random, obtaining such a pair would require 2N trials. On the other hand if E satisfies Eq. 4, the attacker will collect 2p plaintext pairs such that P ⊕ P  = α , and with very high probability find atleast one pair which satisfies E(P ) ⊕ E(P  ) = β  . In quantum settings, the adversary will apply a Grover’s search over the set of all possible messages of size N . The algorithm will attempt to find a message x such that E(x) ⊕ E(x ⊕ α ) = β  . Since, the fraction of messages which satisfies the given differential is 2−p , so it is sufficient to make 2p/2 Grover iterations to obtain the correct messages. Hence, the time complexity for this attack will be p  π4 2 2 . 6.2.1 Grovers Oracle Here we present the circuit for the Grover oracle to obtain the plaintext pair which satisfies the distinguisher.  N 0  N 0 |−

HN



EN C ⊕α

EN C



• •



DEC DEC

⊕α

= β

HN

|P  |P   |−

Fig. 15. Grover’s oracle to find the distinguisher. Here, EN C represents encryption and DEC decryption. The (= β  ) operator compares the value of P  with β  and flips the target qubit if they are equal.

We implement the Grovers oracle for our toy cipher to obtain the plaintext pairs which satisfies the distinguisher given below α = [1, 0, 0, 1, 1, 1] −−−−−→ β  = [1, 0, 1, 1, 1, 1] 4 rounds

with p ≈ 2. The round update function of the toy cipher is as in 1 and the key expansion schedule is as in 2 with the values of α and β being 1. We constructed a circuit for four rounds cipher and then applied Grovers. The output of Grovers is, [0, 0, 1, 1, 1, 1] shown in Fig. 16. Let the output be P , i.e. P = [0, 0, 1, 1, 1, 1], and let P  = P ⊕ α = [1, 0, 1, 0, 0, 0], then the corresponding ciphertexts after 4 rounds are C = [0, 0, 1, 0, 0, 1] and C  = [1, 0, 0, 1, 1, 0]. We can easily verify that C ⊕ C  = β  = [1, 0, 1, 1, 1, 1].

408

R. Anand et al.

Fig. 16. The plaintext which satisfies the given distinguisher.

6.3

Last Round Attack

In this subsection, we analyze the Last Round Attack on SPECK. For SPECK it is sufficient to recover any continuous m round keys to retrieve the secret key. We will discuss the classical attack followed by the quantum attack. 6.3.1 Classical Adversary p Consider that we have a differential α − → β. r



α = (Δx0 , Δy0 ) → (Δx1 , Δy1 ) → · · · → (Δxr , Δyr ) = β  , r is the number of rounds. Let d be a constant that depends on the signal to noise ratio of the used trail (in [1] d = 16). Then the process to extract the secret key can be divided into the following three phases: 1. Collecting and Filtering Phase: Choose d · 2p plaintext pairs (P, P  ) such that P ⊕ P  = α and request for their corresponding ciphertexts after (r + 1) rounds. Let the ciphertexts be (C, C  ). Let C = (cx , cy ) and C  = (cx , cy ). Then partially decrypt cy and cy one round and check if they form the deserved difference δyr . This reduces the number of ciphertext pairs that actually needs to be checked in the next phase. There also may be other filters which could be applied, for example in case of SPECK32/64, described in [1], cx [0, 1, 2, 3] ⊕ cx [0, 1, 2, 3] = [0, 0, 0, 1]. We store only those ciphertext pairs which pass this filter. Let us assume that there is a e-bit filter applied at this phase, then only d · 2(p−e) pairs are expected to pass this filter. 2. Sub-key Guessing Phase: In this phase we examine the pairs that survived the filtering phase to derive the last round key. This is done by keeping a counter for each of the 2n possible round keys and decrypting the last round with each possible key. If a particular key leads to the expected difference at round r, the counter for that particular key is increased. The sub-keys with the highest counter is stored as a potential candidate. 3. Brute Force: In this phase, the remaining (m − 1) round keys required to retrieve the complete secret key is recovered. For this, we proceed as in phase 2, and partially decrypt the correct pairs round by round to recover the corresponding round keys.

Evaluation of Quantum Cryptanalysis on SPECK

6.3.2 Classical Complexity Phase 1 has a computational complexity of (2d · 2p ) + 2d · 2p ·

1 (r+1)

409

for SPECK

encryptions. Since we assume that d · 2(p−e) pairs to satisfy the filter in Phase 1 1, so the second phase a complexity of d.2(p−e) · 2n · (r+1) . In the last phase we proceed as in the phase 2 so we expect roughly the same computational effort for the (m − 1) remaining round keys. So, in total we obtain (2d · 2p ) + 2d · 2p ·

1 1 + m(d · 2(p−e) · 2n · ) (r + 1) (r + 1)

(5)

SPECK encryptions as the computational complexity. 6.3.3 Quantum Adversary All the above phases can be described as procedures of searching the element which satisfies given conditions. So the quantum attack can be described as a Grover search over all these phases. 1. Apply Grovers over all possible set of pairs of plaintexts (P, P  ), such that P ⊕ P  = α , and search for the pair of plaintext whose corresponding ciphertexts (C, C  ) satisfy some given condition. This step outputs a message, say M . 2. Assuming that the pair (M, M ⊕ α ) is a good pair, apply Grovers again over all 2n sub-keys to obtain the sub-key which decrypts the corresponding ciphertext of (M, M ⊕ α ) to the required difference. 3. Recover the remaining (m − 1) round keys by proceeding as in step 2 and completely retrieving the secret key. 6.3.4 Quantum Complexity In the first step, Grovers algorithm finds M after 2p/2 iterations. Each iteration makes requires encrypting the plaintexts P and P  = P ⊕α and then decrypting 1 ) them partially one round. So, the complexity of first step is π4 (2p/2 + 2p/2 (r+1) n SPECK encryptions. The second step finds the round key after 2 2 iterations, n 1 2 2 SPECK encryptions. The so the computation time of this step is π4 (r+1) computational time required for each round keys in the last phase is expected to be same as the second step. So, in total we obtain c(2p/2 + 2p/2

2n/2 1 ) + mc  (r + 1) (r + 1)

SPECK encryptions as the computational complexity with c =

(6) π 4.

6.3.5 Implementation We now implement the above described quantum attack on our toy cipher with m = 2. We have made the following assumptions

410

R. Anand et al.

– We have a five round differential α = (Δx0 , Δy0 ) = ([1, 0, 0], [1, 1, 1]) → ([0, 0, 0], [1, 1, 1]) = (Δx5 , Δy5 ) = β  with p = 4 – We implement the key recovery on 6 rounds. Let the difference in the ciphertexts after 6 rounds be denoted by (Δx6 , Δy6 ) – The secret key is K = [k0 = [1, 1, 1], k1 = [1, 0, 1]] and so the round keys are 0th sub-key: [1, 1, 1] 3nd sub-key: [0, 1, 1] follows: 1st sub-key: [1, 0, 1] 4th sub-key: [1, 1, 0] 2nd sub-key: [1, 1, 1] 5th sub-key: [0, 1, 1] We will try to retrieve the 4th and 5th sub-keys, i.e in Phase 2, the output should be [0, 1, 1] and the output of Phase 3 should be [1, 1, 0]. – Phase 1: We search for plaintext pairs whose corresponding ciphertexts satisfies the following two properties • Δx6 [2] = [1], i.e. the 2nd -bit (msb) of Δx6 is 1. • one round decryption leads to difference Δy5 [0, 1, 2] = [1, 1, 1] The output of applying Grovers is shown in Fig. 17a. The plaintext obtained is [0, 1, 1, 0, 1, 0]. Since α = [1, 0, 0, 1, 1, 1], so the plaintext pair for second phase is ([0, 1, 1, 0, 1, 0], [1, 1, 1, 1, 0, 1]). The output obtained is only P , because of the limited number of qubits (32) available in IBM-Q simulator. The code can easily be modified to obtain both P and P  simultaneously. – Phase 2: The plaintext pair is ([0, 1, 1, 0, 1, 0], [1, 1, 1, 1, 0, 1]). We run SPECK encryption of 6 rounds to obtain the corresponding pair ([1, 1, 0, 1, 1, 1], [0, 0, 1, 1, 1, 1]). Now we apply Grovers to search for the key which decrypts this pair one round to form the difference (Δx5 , Δy5 ) = ([0, 0, 0], [1, 1, 1]). The output of Grovers search is shown in Fig. 17b. As expected the output is [0, 1, 1]. – Phase 3: We need to recover the 4th round key to completely retrieve the secret key. In this phase the ciphertext pair ([1, 1, 0, 1, 1, 1], [0, 0, 1, 1, 1, 1]) is decrypted one round using the key found in above phase. The decrypted pair is ([0, 1, 0, 1, 0, 0], [0, 1, 0, 0, 1, 1]). Then as above Grover search is applied to search the key for which this new pair leads to the difference (Δx4 , Δy4 ) = ([1, 0, 1], [1, 1, 1]). The output of Grovers search is shown in Fig. 17c. As expected the output is [1, 1, 0].

6.4

Resource Estimates

Phase 1 makes 2p/2 iterations of (r+1) rounds SPECK and one round decryption, i.e. (2p/2 (r+2)) SPECK rounds. This step also has 2n measurements and requires (mn + 2n + 2n + 1) = (4n + mn + 1) qubits and 2n classical bits. Phase 2 partially decrypts one round and makes measurements of the round key, so this step has 2n/2 iterations of one round of SPECK. This step has n measurements and requires (n + 2n + 2n + 1) = (5n + 1) qubits and n classical bits.

Evaluation of Quantum Cryptanalysis on SPECK

(b)

(a)

411

(c)

Fig. 17. Output of (a) Phase 1. (b) Phase 2. (c) Phase 3

Phase 3 is a repetition of Phase 2 for (m−1) times and the resources required can be computed similarly. Hence, constructing the entire circuit, the quantum resource required is as follows. – – – –

qubits - (mn + 4n + 1) + m(5n + 1) = (6mn + 4n + m + 1), classical bits - (2n + mn), measurements - (2n + mn), resources required to implement (2p/2 (r + 2) + 2n/2 m) round updates and 2p/2 (r) key expansions.

The resources required to implement differential attack on all the variants of SPECK is summarized in Table 5. Table 5. Estimate of resources required to implement differential attack on SPECK

7

SPECK2n/k

# Clifford gates

# T gates T -depth Full depth # Measurement # qubits

SPECK32/64

225.995

224.904

224.097

225.244

96

453

SPECK48/72

235.252

234.091

233.284

234.41

120

532

SPECK48/96

235.252

234.091

233.284

234.41

144

677

SPECK64/96

245.179

244.019

243.212

244.322

160

708

SPECK64/128

245.179

244.019

243.212

244.322

192

901

SPECK96/96

258.329

257.169

256.362

257.461

192

771

SPECK96/144

258.329

257.169

256.362

257.461

240

1060

SPECK128/128 272.401

271.242

270.434

271.527

256

1027

SPECK128/192 272.401

271.242

270.434

271.527

320

1412

SPECK128/256 272.401

271.242

270.434

271.527

384

1797

Conclusion

In the present manuscript, we perform quantum cryptanalysis on the lightweight cipher, SPECK. We analyze the exhaustive key search and the differential cryptanalysis in quantum framework exploiting Grover’s algorithm. For both the

412

R. Anand et al.

cases, we estimate the quantum resources required for implementing the attacks. The resource requirements for quantum key search under the model of known plaintext attack provides better result than the existing efforts [11]. To the best of our knowledge, for the first time, we study differential cryptanalysis on SPECK in quantum paradigm. Experiments are performed in IBM-Q environment to support our claims. The codes for all the experiments are provided in [27] for independent verification.

References 1. Abed, F., List, E., Lucks, S., Wenzel, J.: Differential cryptanalysis of roundreduced Simon and Speck. In: Cid, C., Rechberger, C. (eds.) FSE 2014. LNCS, vol. 8540, pp. 525–545. Springer, Heidelberg (2015). https://doi.org/10.1007/9783-662-46706-0 27 2. Amy, M., Maslov, D., Mosca, M., Roetteler, M.: A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 32(6), 818–830 (2013) 3. Amy, M., Di Matteo, O., Gheorghiu, V., Mosca, M., Parent, A., Schanck, J.: Estimating the cost of generic quantum pre-image attacks on SHA-2 and SHA-3. In: Avanzi, R., Heys, H. (eds.) SAC 2016. LNCS, vol. 10532, pp. 317–337. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69453-5 18 4. Anand, R., Maitra, A., Mukhopadhyay, S.: Grover on SIMON. Quantum Inf. Process. 19, 340 (2020). https://doi.org/10.1007/s11128-020-02844-w 5. Bonnetain, X., Hosoyamada, A., Naya-Plasencia, M., Sasaki, Yu., Schrottenloher, A.: Quantum attacks without superposition queries: the offline simon’s algorithm. In: Galbraith, S.D., Moriai, S. (eds.) ASIACRYPT 2019. LNCS, vol. 11921, pp. 552–583. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34578-5 20 6. Cuccaro, S.A., Draper, T.G., Kutin, S.A., Moulton, D.P.: A new quantum ripplecarry addition circuit. arXiv preprint quant-ph/0410184 (2004) 7. Grassl, M., Langenberg, B., Roetteler, M., Steinwandt, R.: Applying grover’s algorithm to AES: quantum resource estimates. In: Takagi, T. (ed.) PQCrypto 2016. LNCS, vol. 9606, pp. 29–43. Springer, Cham (2016). https://doi.org/10.1007/9783-319-29360-8 3 8. Grover, L.K.: A fast quantum mechanical algorithm for database search. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, pp. 212–219, July 1996 9. Hosoyamada, A., Sasaki, Yu.: Quantum demiric-sel¸cuk meet-in-the-middle attacks: applications to 6-round generic feistel constructions. In: Catalano, D., De Prisco, R. (eds.) SCN 2018. LNCS, vol. 11035, pp. 386–403. Springer, Cham (2018). https:// doi.org/10.1007/978-3-319-98113-0 21 10. Hosoyamada, A., Sasaki, Yu.: Cryptanalysis against symmetric-key schemes with online classical queries and offline quantum computations. In: Smart, N.P. (ed.) CT-RSA 2018. LNCS, vol. 10808, pp. 198–218. Springer, Cham (2018). https:// doi.org/10.1007/978-3-319-76953-0 11 11. Jang, K., Choi, S., Kwon, H., Seo, H.: Grover on SPECK: quantum resource estimates. Cryptology ePrint Archive, Report 2020/640. https://eprint.iacr.org/ 2020/640

Evaluation of Quantum Cryptanalysis on SPECK

413

12. Jaques, S., Naehrig, M., Roetteler, M., Virdia, F.: Implementing grover oracles for quantum key search on AES and LowMC. In: Canteaut, A., Ishai, Y. (eds.) EUROCRYPT 2020. LNCS, vol. 12106, pp. 280–310. Springer, Cham (2020). https://doi. org/10.1007/978-3-030-45724-2 10 13. Kaplan, M.: Quantum attacks against iterated block ciphers. arXiv preprint arXiv:1410.1434 (2014) 14. Kaplan, M., Leurent, G., Leverrier, A., Naya-Plasencia, M.: Breaking symmetric cryptosystems using quantum period finding. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. LNCS, vol. 9815, pp. 207–237. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53008-5 8 15. Kaplan, M., Leurent, G., Leverrier, A., Naya-Plasencia, M.: Quantum differential and linear cryptanalysis. IACR Trans. Symmetric Cryptol. 2016, 71–94 (2015) 16. Koch, D., Wessing, L., Alsing, P.M.: Introduction to coding quantum algorithms: a tutorial series using Qiskit. arXiv preprint arXiv:1903.04359 (2019) 17. Kuwakado, H., Morii, M.: Security on the quantum-type Even-Mansour cipher. In: 2012 International Symposium on Information Theory and its Applications, pp. 312–316. IEEE, October 2012 18. Li, H., Yang, L.: Quantum differential cryptanalysis to the block ciphers. In: Niu, W., Li, G., Liu, J., Tan, J., Guo, L., Han, Z., Batten, L. (eds.) ATIS 2015. CCIS, vol. 557, pp. 44–51. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3662-48683-2 5 19. Wiebe, N., Roetteler, M.: Quantum arithmetic and numerical analysis using repeatuntil-success circuits. arXiv preprint arXiv:1406.2040 (2014) 20. Langenberg, B., Pham, H., Steinwandt, R.: Reducing the cost of implementing the advanced encryption standard as a quantum circuit. IEEE Trans. Quantum Eng. 1, 1–12 (2020). Article no. 2500112. https://doi.org/10.1109/TQE.2020.2965697 21. Leander, G., May, A.: Grover meets simon – quantumly attacking the FXconstruction. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10625, pp. 161–178. Springer, Cham (2017). https://doi.org/10.1007/978-3-31970697-9 6 22. Santoli, T., Schaffner, C.: Using Simon’s algorithm to attack symmetric-key cryptographic primitives. arXiv preprint arXiv:1603.07856 (2016) 23. Takahashi, Y., Tani, S., Kunihiro, N.: Quantum addition circuits and unbounded fan-out. Quantum Inf. Comput. 10(9), 872–890 (2010) 24. Zhou, Q., Lu, S., Zhang, Z., Sun, J.: Quantum differential cryptanalysis. Quantum Inf. Process. 14(6), 2101–2109 (2015). https://doi.org/10.1007/s11128-015-0983-3 25. https://quantum-computing.ibm.com 26. https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf 27. https://github.com/raviro/speckquant

Learning with Errors

Making the BKW Algorithm Practical for LWE Alessandro Budroni2(B) , Qian Guo1,2 , Thomas Johansson1 , Erik Mårtensson1 , and Paul Stankovski Wagner1 1

2

Department of Electrical and Information Technology, Lund University, Lund, Sweden {qian.guo,thomas.johansson,erik.martensson, paul.stankovski_wagner}@eit.lth.se Selmer Center, Department of Informatics, University of Bergen, Bergen, Norway [email protected]

Abstract. The Learning with Errors (LWE) problem is one of the main mathematical foundations of post-quantum cryptography. One of the main groups of algorithms for solving LWE is the Blum-Kalai-Wasserman (BKW) algorithm. This paper presents new improvements for BKW-style algorithms for solving LWE instances. We target minimum concrete complexity and we introduce a new reduction step where we partially reduce the last position in an iteration and finish the reduction in the next iteration, allowing non-integer step sizes. We also introduce a new procedure in the secret recovery by mapping the problem to binary problems and applying the Fast Walsh Hadamard Transform. The complexity of the resulting algorithm compares favourably to all other previous approaches, including lattice sieving. We additionally show the steps of implementing the approach for large LWE problem instances. The core idea here is to overcome RAM limitations by using large file-based memory. Keywords: BKW · LWE · Lattice-based cryptography Post-quantum cryptography

1

· FWHT ·

Introduction

Since a large-scale quantum computer easily breaks both the problem of integer factoring and the discrete logarithm problem [34], public-key cryptography needs to be based on other underlying mathematical problems. In post-quantum cryptography - the research area studying such replacements - lattice-based problems are the most promising candidates. In the NIST post-quantum standardization competition, 5 out of 7 finalists and 2 out of 8 alternates are lattice-based [1]. The Learning with Errors problem (LWE) introduced by Regev in [33], is the main problem in lattice-based cryptography. It has a theoretically very interesting average-case to worst-case reduction to standard lattice-based problems. It has many cryptographic applications, including but not limited to, design of c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 417–439, 2020. https://doi.org/10.1007/978-3-030-65277-7_19

418

A. Budroni et al.

Fully Homomorphic Encryption Schemes (FHE). An interesting special case of LWE is the Learning Parity with Noise problem (LPN), introduced in [12], which has interesting applications in light-weight cryptography. Considerable cryptanalytic effort has been spent on algorithms for solving LWE. These can be divided into three categories: lattice-reduction, algebraic methods and combinatorial methods. The algebraic methods were introduced by Arora and Ge in [9] and further considered in [3]. For very small noise these methods perform very well, but otherwise the approach is inefficient. The methods based on lattice-reduction are currently the most efficient ones in practise. One way of comparing the different approaches is through the Darmstadt LWE Challenges [2], where the lattice-based approach called General Sieve Kernel (G6K) is the currently most successful algorithm in breaking challenges [5]. The combinatorial algorithms are all based on the Blum-Kalai-Wasserman (BKW) algorithm and algorithms in this direction will be the focus of this paper. For surveys on the concrete and asymptotic complexity of solving LWE, see [7] and [22,24], respectively. In essence, BKW-style algorithms have a better asymptotic performance than lattice-based approaches for parameter choices with large noise. Unlike lattice-based approaches, BKW-style algorithms pay a penalty when the number of samples is limited (like in the Darmstadt challenges). 1.1

Related Work

The BKW algorithm was originally developed as the first subexponential algorithm for solving the LPN problem [13]. In [27] the algorithm was improved, introducing new concepts like LF2 and the use of the fast Walsh-Hadamard transform (FWHT) for the distinguishing phase. A new distinguisher using subspace hypothesis testing was introduced in [19,20]. The BKW algorithm was first applied to the LWE problem in [4]. This idea was improved in [6], where the idea of Lazy Modulus Switching (LMS) was introduced. The idea was improved in [23,26], where [23] introduced so called codedBKW steps. The idea of combining coded-BKW or LMS with techniques from lattice sieving [11] lead to the next improvement [21]. This combined approach was slightly improved in [22,30]. The distinguishing part of the BKW algorithm for solving LWE was improved by using the Fast Fourier Transform (FFT) in [16]. One drawback of BKW is its high memory-usage. To remedy this, timememory trade-offs for the BKW algorithm were recently studied in [15,17,18]. 1.2

Contributions

In this paper we introduce a new BKW-style algorithm including the following. – A generalized reduction step that we refer to as smooth-LMS, allowing us to use non-integer step sizes. These steps allow us to use the same time, space and sample complexity in each reduction step of the algorithm, which improves performance compared to previous work.

Making the BKW Algorithm Practical for LWE

419

– A binary-oriented method for the guessing phase, transforming the LWE problem into an LPN problem. While the previous FFT method guesses a few positions of the secret vector and finds the correct one, this approach instead finds the least significant bits of a large amount of positions using the FWHT. This method allows us to correctly distinguish the secret with a larger noise level, generally leading to an improved performance compared to the FFT based method. In addition, the FWHT is much faster in implementation. – Concrete complexity calculations for the proposed algorithm showing the lowest known complexity for some parameter choices selected as in the Darmstadt LWE Challenge instances, but with unrestricted number of samples. – An implementation approach for the algorithm that allows larger instances to be solved. The implementation is file-based and stores huge tables on disk and not in RAM only. The file read/write is minimized by implementing the algorithm in a clever way. Simulation results on solving larger instances are presented and verifies the previous theoretical arguments. 1.3

Organization

We organize the rest of the paper as follows. We introduce some necessary background in Sect. 2. In Sect. 3 we cover previous work on applying the BKW algorithm to the LWE problem. Then in Sect. 4 we introduce our new Smooth-LMS reduction method. Next, in Sect. 5 we go over our new binary-oriented guessing procedure. Section 6 and 7 cover the complexity analysis and implementation of our algorithm, respectively. Section 8 describes our experimental results using the implementation. Finally, the paper is concluded in Sect. 9.

2 2.1

Background Notation

Throughout the paper we use the following notations. – We write log(·) for the base 2 logarithm. – In the n-dimensional Euclidean space Rn , by the norm of a vector x = (x1 , x2 , . . . , xn ) we consider its L2 -norm, defined as  x = x21 + · · · + x2n . The Euclidean distance between vectors x and y in Rn is defined as x − y. q−1 – Elements in Zq are represented by the set of integers in [− q−1 2 , 2 ]. – For an [N, k] linear code, N denotes the code length and k denotes the dimension.

420

2.2

A. Budroni et al.

The LWE and LPN Problems

The LWE problem [33] is defined as follows. Definition 1. Let n be a positive integer, q a prime, and let X be an error distribution selected as the discrete Gaussian distribution on Zq with variance σ 2 . Fix s to be a secret vector in Znq , chosen from some distribution (usually the uniform distribution). Denote by Ls,X the probability distribution on Znq × Zq obtained by choosing a ∈ Znq uniformly at random, choosing an error e ∈ Zq from X and returning (a, z) = (a, a, s + e) in Znq × Zq . The (search) LWE problem is to find the secret vector s given a fixed number of samples from Ls,X . The definition above gives the search LWE problem, as the problem description asks for the recovery of the secret vector s. Another version is the decision LWE problem, in which case the problem is to distinguish between samples drawn from Ls,X and a uniform distribution on Znq × Zq . Let us also define the LPN problem, which is a binary special case of LWE. Definition 2. Let k be a positive integer, let x be a secret binary vector of length k and let X ∼ Berη be a Bernoulli distributed error with parameter η > 0. Let Lx,X denote the probability distribution on Fk2 × F2 obtained by choosing g uniformly at random, choosing e ∈ F2 from X and returning (g, z) = (g, g, x + e) The (search) LPN problem is to find the secret vector s given a fixed number of samples from Lx,X . Just like for LWE, we can also, analogously, define decision LPN. Previously, analysis of algorithms solving the LWE problem have used two different approaches. One being calculating the number of operations needed to solve a certain instance for a particular algorithm, and then comparing the different complexity results. The other being asymptotic analysis. Solvers for the LWE problem with suitable parameters are expected to have fully exponential complexity, bounded by 2cn as n tends to infinity, where the value of c depends on the algorithms and the parameters of the involved distributions. In this paper, we focus on the complexity computed as the number of arithmetic operations in Zq , for solving particular LWE instances (and we do not consider the asymptotics). 2.3

Discrete Gaussian Distributions

We define the discrete Gaussian distribution over Z with mean 0 and variance σ 2 , denoted DZ,σ as the probability distribution obtained by assigning a probability proportional to exp(−x2 /(2σ 2 )) to each x ∈ Z. Then, the discrete Gaussian distribution X over Zq with variance σ 2 (also denoted Xσ ) can be defined by

Making the BKW Algorithm Practical for LWE

421

folding DZ,σ and accumulating the value of the probability mass function over all integers in each residue class modulo q. It makes sense to consider the noise level as α, where σ = αq. We also define the rounded Gaussian distribution on Zq . This distribution samples values by sampling values from the continuous Gaussian distribution with mean 0 and variance σ 2 , rounding to the closest integer and then folding the result to the corresponding value in Zq . We denote it by Ψ¯σ,q . If two independent X1 and X2 are drawn from Xσ1 and Xσ2 respectively, we make the heuristic assumption that their sum is drawn from X√σ2 +σ2 . We make 1 2 the corresponding assumption for the rounded Gaussian distribution.

3 3.1

A Review of BKW-Style Algorithms The LWE Problem Reformulated

Assume that m samples (a1 , z1 ), (a2 , z2 ), . . . , (am , zm ), are collected from the LWE distribution Ls,X , where ai ∈ Znq , zi ∈ Zq . Let z = (z1 , z2 , . . . , zm ) and y = (y1 , y2 , . . . , ym ) = sA. We have z = sA + e,   $ where A = aT1 aT2 · · · aTm , zi = yi + ei = s, ai  + ei and ei ← X . The search LWE problem is a decoding problem, where A serves as the generator matrix for a linear code over Zq and z is a received word. Finding the secret vector s is equivalent to finding the codeword y = sA for which the Euclidean distance ||y − z|| is minimal. In the sequel, we adopt the notation ai = (ai1 , ai2 , . . . , ain ). 3.2

Transforming the Secret Distribution

A transformation [8,25] can be applied to ensure that the secret vector follows the same distribution X as the noise. It is done as follows. We write A in systematic form via Gaussian elimination. Assume that the first n columns are linearly independent and form the matrix A0 . Define D = A0 −1 and write ˆs = sD−1 − ˆ = (z1 , z2 , . . . , zn ). Hence, we can derive an equivalent problem described by A ˆ = DA. We compute ˆTn+2 , · · · , a ˆTm ), where A ˆTn+1 , a (I, a ˆ = (0, zˆn+1 , zˆn+2 , . . . , zˆm ). ˆ = z − (z1 , z2 , . . . , zn )A z Using this transformation, each entry in the secret vector s is now distributed according to X . The fact that entries in s are small is a very useful property in several of the known reduction algorithms for solving LWE. The noise distribution X is usually chosen as the discrete Gaussian distribution or the rounded Gaussian Distribution from Sect. 2.3.

422

3.3

A. Budroni et al.

Sample Amplification

In some versions of the LWE problem, such as the Darmstadt Challenges [2], the number of available samples is limited. To get more samples, sample amplification can be used. For example, assume that we have M samples (a1 , b1 ), (a2 , b2 ), ..., (aM , bM ). Then we can form new samples, using an index set I of size k, as ⎛ ⎞   ⎝ (1) ±aj , ±bj ⎠ . j∈I

j∈I

Given an initial number of samples M we can produce up to 2k−1 M k samples. √ This comes at a cost of increasing the noise level (standard deviation) to k · σ. This also increases the sample dependency. 3.4

Iterating and Guessing

BKW-style algorithms work by combining samples in many steps in such a way that we reach a system of equations over Zq of the form z = sA + E, where E = (E1 , E2 , . . . , Em ) and the entries Ei , i = 1, 2, . . . , m are sums of not too 2t many original noise vectors, say Ei = j=1 eij , and where t is the number of iterations. The process also reduces the norm of column vectors in A to be small. , i = 1, 2, . . . , t denote the number of reduced positions in step i and let Let ni i Ni = j=1 nj . If n = Nt , then every reduced equation is of form zi = ai , s + Ei ,

(2)

for i = 1, 2, . . . , m. The right hand side can be approximated as a sample drawn from a discrete Gaussian and if the standard deviation is not too large, then the sequence of samples z1 , z2 , . . . can be distinguished from a uniform distribution. We will then need to determine the number of required samples to distinguish between the uniform distribution on Zq and Xσ . Relying on standard theory from statistics, using either previous work [28] or Bleichenbacher’s definition of bias [32], we can find that the required number of samples is roughly  2π

C ·e

σ

2 √ 2π q

,

(3)

where C is a small positive constant. Initially, an optimal but exhaustive distinguisher was used [10]. While minimizing the sample complexity, it was slow and limited the number of positions that could be guessed. This basic approach was improved in [16], using the FFT. This was in turn a generalization of the corresponding distinguisher for LPN, which used the FWHT [27]. 3.5

Plain BKW

The basic BKW algorithm was originally developed for solving LPN in [13]. It was first applied to LWE in [4]. The reduction part of this approach means that

Making the BKW Algorithm Practical for LWE

423

we reduce a fixed number b of positions in the column vectors of A to zero in each step. In each iteration, the dimension of A is decreased by b and after t iterations the dimension has decreased by bt. 3.6

Coded-BKW and LMS

LMS was introduced in [6] and improved in [26]. Coded-BKW was introduced in [23]. Both methods reduce positions in the columns of A to a small magnitude, but not to zero, allowing reduction of more positions per step. In LMS this is achieved by mapping samples to the same category if the ni considered positions give the same result when integer divided by a suitable parameter p. In codedBKW this is instead achieved by mapping samples to the same category if they are close to the same codeword in an [ni , ki ] linear code, for a suitable value ki . Samples mapped to the same category give rise to new samples by subtracting them. The main idea [23,26] is that positions in later iterations do not need to be reduced as much as the first ones, giving different ni values in different steps. 3.7

LF1, LF2, Unnatural Selection

Each step of the reduction part of the BKW algorithm consists of two parts. First samples are mapped to categories depending on their position values on the currently relevant ni positions. Next, pairs of samples within the categories are added/subtracted to reduce the current ni positions to form a new generation of samples. This can be done in a couple of different ways. Originally this was done using what is called LF1. Here we pick a representative from each category and form new samples by adding/subtracting samples to/from this sample. This approach makes the final samples independent, but also gradually decreases the sample size. In [27] the approach called LF2 was introduced. Here we add/subtract every possible pair within each category to form new samples. This approach requires only 3 samples within each category to form a new generation of the same size. The final samples are no longer independent, but experiments have shown that this effect is negligible. In [6] unnatural selection was introduced.The idea is to produce more samples than needed from each category, but only keep the best samples, typically the ones with minimum norm on the current Ni positions in the columns of A. 3.8

Coded-BKW with Sieving

When using coded-BKW or LMS, the previously reduced N √i−1 positions of the columns of A increase in magnitude with an average factor 2 in each reduction step. This problem was addressed in [21] by using unnatural selection to only produce samples that kept the magnitude of the previous Ni−1 positions small. Instead of testing all possible pairs of samples within the categories, this procedure was sped-up using lattice sieving techniques of [11]. This approach was slightly improved in [22,30].

424

4

A. Budroni et al.

BKW-Style Reduction Using Smooth-LMS

In this section we introduce a new reduction algorithm solving the problem of having the same complexity and memory usage in each iteration of a BKW-style reduction. The novel idea is to use simple LMS to reduce a certain number of positions and then partially reduce one extra position. This allows for balancing the complexity among the steps and hence to reduce more positions in total. 4.1

A New BKW-Style Step

Assume having a large set of samples written as before in the form z = sA + e mod q. Assume also that the entries of the secret vector s are drawn from some restricted distribution with small standard deviation (compared to the alphabet size q). If this is not the case, the transformation from Sect. 3.2 should be applied. Moreover, in case the later distinguishing process involves some positions to be guessed or transformed, we assume that this has been already considered and all positions in our coming description should be reduced. The goal of this BKW-type procedure is to make the norms of the column vectors of A small by adding and subtracting equations together in a number of steps. Having expressions of the form zi = sai + Ei mod q, if we can reach a case where ||ai || is not too large, then sai +Ei can be considered as a random variable drawn from a discrete Gaussian distribution Xσ . Furthermore, Xσ mod q can be distinguished from a uniform distribution over Zq if σ is not too large. Now let us describe the new reduction procedure. Fix the number of reduction steps to be t. We will also fix a maximum list size to be 2v , meaning that A can have at most 2v columns. In each iteration i, we are going to reduce some positions to be upper limited in magnitude by Ci , for i = 1, ..., t. Namely, these positions that are fully treated in iteration i will only have values in the set {−Ci + 1, . . . , 0, 1, . . . , Ci − 1} of size 2Ci − 1. We do this by dividing up the q possible values into intervals of length Ci . We also adopt the notation βi = q/Ci , which describes the number of intervals we divide up the positions into. We assume that βi > 2. First Step. In the first iteration, assume that we have stored A. We first compute the required compression starting in iteration 1 by computing C1 (we will explain how later). We then evaluate how many positions n1 that can be fully reduced by computing n1 = v/ log β1 . The position n1 + 1 can be partially reduced to be in an interval of size C1 fulfilling β1 · β1n1 · 3/2 ≤ 2v , where β1 = q/C1 . Now we do an LMS step that "transfers between iterations" in the following way. We run through all the columns of A. For column i, we simply denote it as x = (x1 , x2 , . . . , xn ) and we compute:

xj div C1 , x1 ≥ 0 kj = , for j = 1, . . . , n1 , −xj div C1 , x1 < 0

xn1 +1 div C1 , x1 ≥ 0 kn1 +1 = .  −xn1 +1 div C1 , x1 < 0

Making the BKW Algorithm Practical for LWE

425

The vector Ki = (k1 , k2 , . . . , kn1 +1 ) is now an index to a sorted list L, storing these vectors1 . Except for the inverting of values if x1 < 0, samples should have the same index if and only if all position values are the same when integer divided by C1 (C1 for the last position). So we assign L(Ki ) = L(Ki ) ∪ {i}. After we have inserted all columns into the list L, we go to the combining part. We build a new matrix A in the following way. Run through all indices K and if |L(K)| ≥ 2 combine every pair of vectors in L(K) by subtracting/adding2 them to form a new column in the new matrix A. Stop when the number of new columns has reached 2v . For each column in A we have that: – the absolute value of each position j ∈ {1, . . . , n1 } is < C1 , – the absolute value of position n1 + 1 is < C1 . Next Steps. We now describe all the next iterations, numbered as l = 2, 3, . . . , t. l−1 Iteration l will involve positions from Nl−1 + 1 = i=1 ni + 1 to Nl + 1. The very first position has possibly already been partially reduced and its absolute value   , so the interval for possible values is of size 2Cl−1 − 1. Assume that is < Cl−1 the desired interval size in iteration l is Cl . In order to achieve the corresponding  − 1)/Cl subintervals. reduction factor βl , we split this interval in βl = (2Cl−1 We then compute how many positions nl that can be fully reduced by computing nl = (v − log βl )/ log βl . The position Nl + 1 can finally be partially reduced to be in an interval of size Cl fulfilling βl · βlnl −1 βl · 3/2 ≤ 2v , where βl = q/Cl . Similar to iteration 1, we run through all the columns of A. For each column i in the matrix A denoted as x we do the following. For each vector position in {Nl−1 + 1, . . . , Nl + 1} , we compute (here div means integer division)

xNl−1 +j div Cl , xNl−1 +1 ≥ 0 kj = , for j = 1, . . . , nl , −xNl−1 +j div Cl , xNl−1 +1 < 0

knl

xNl +1 div Cl , xNl−1 +1 ≥ 0 = . −xNl +1 div Cl , xNl−1 +1 < 0

(4)

The vector K = (k1 , k2 , . . . , knl+1 ) is again an index to a sorted list L, keeping track of columns3 . So again we assign L(K) = L(K)∪{i}. After we have inserted all column indices into the list L, we go to the combining part. As in the first step, we build a new A as follows. Run through all indices K and if |L(K)| ≥ 2 combine every pair of vectors by adding/subtracting them to 1

2 3

The point of inverting all position values if x1 < 0 is to make sure that samples that get reduced when added should be given the same index. For example (x1 , x2 , . . . , xn1 +1 ) and (−x1 , −x2 , . . . , −xn1 +1 ) are mapped to the same category. Depending on what reduces the sample the most. Also here the point of inverting all position values if xNl−1 +1 < 0 is to make sure that samples that get reduced when added should be given the same index. For example (xNl−1 +1 , xNl−1 +2 , . . . , xNl +1 ) and (−xNl−1 +1 , −xNl−1 +2 , . . . , −xNl +1 ) are mapped to the same category.

426

A. Budroni et al.

form a column in the new matrix A. Stop when the number of new columns has reached 2v . For the last iteration, since Nt is the last row of A, one applies the same step as above but without reducing the extra position. After t iterations, one gets equations on the form (2), where the ai vectors in A have reduced norm. 4.2

Smooth-Plain BKW

The procedure described above also applies to plain BKW steps. For example, if in the first iteration one sets C1 = 1 and C1 > 1, then each column vector x of A will be reduced such that x1 = . . . = xn1 = 0 and xn1 +1 ∈ {−C1 +1, . . . , C1 −1}. Thus, one can either continue with another smooth-Plain BKW step by setting also C2 = 1 in the second iteration, or switch to smooth-LMS. In both cases, we have the advantage of having xn1 already partially reduced. Using these smoothPlain steps we can reduce a couple of extra positions in the plain pre-processing steps of the BKW algorithm. 4.3

How to Choose the Interval Sizes Ci

To achieve as small norm of the vectors as possible, we would like the variance of all positions to be equally large, after completing all iterations. Assume that a position x takes values uniformly in the set {−(C −1)/2, . . . , 0, 1, . . . , (C −1)/2}, for C > 0. Then, we have that in Var[x] = (C−1)(C+1)/12. Assuming C is somewhat large, we approximately get Var[x] = C 2 /12. When subtracting/adding two such values, the variance increases to 2Var[x] √ in each iteration. Therefore, a reduced position will have an expected growth of 2. For this reason, we choose a relation for the interval sizes of the form Ci = 2−(t−i)/2 Ct ,

i = 1, . . . , t − 1.

This makes the variance of each position roughly the same, after completing all iterations. √ In√particular, our vectors ||ai || in A are expected to have norm at most nCt / 12, and Ct is determined according to the final noise allowed in the guessing phase. Ignoring the pre-processing step with smooth-Plain tBKW steps, the maximum dimension n that can be reduced is then n = Nt = i=1 ni . Example 1. Let q = 1601 and α = 0.005, so σ = αq ≈ 8. Let us compute how many positions that can be reduced using 2v = 228 list entries. The idea is that the variance of the right hand side in (2) should be minimized by making the variance of the two terms roughly equal. The error part Ei is the sum of 2t initial errors, so its variance is Var[Ei ] = 2t σ 2 . In order to be able to distinguish the samples according to (3), we set Var[Ei ] < q 2 /2. This will give us the number of iterations possible as 2t σ 2 ≈ q 2 /2 or 2t ≈ 16012 /(2 · 82 ) leading to t = 14. Now we bound the variance of the scalar product part of (2) also to be < q 2 /2, so 2 2 2 nσ 2 Ct2 /12 ≈ q 2 /2 leading to Ct2 ≈ 12q 2 /(2nσ 2 ) and √ Ct ≈ 12 · 1601 /(2n · 8 ) or Ct ≈ 80 if n < 38. Then one chooses Ct−1 = Ct / 2 = 57 and so on.

Making the BKW Algorithm Practical for LWE

4.4

427

Unnatural Selection

We can improve performance by using the unnatural selection discussed in Sect. 3.7. Let us make some basic observations. Combining nl positions using interval size C gives as previously described a value in the set {−(C − 1)/2, . . . , 0, 1, . . . (C − 1)/2}, and results in Var[x] = (C − 1)(C + 1)/12. Combining two vectors from the same category, a position value y = x1 +x2 , where x1 , x2 are as above, results in a value in the interval {−(C − 1), . . . , 0, 1, . . . (C − 1)} with variance Var[y] = (C − 1)(C + 1)/6. Now observe that for the resulting reduced positions, smaller values are much more probable than larger ones.

5

A Binary Partial Guessing Approach

In this section we propose a new way of reducing the guessing step to a binary version. This way, we are able to efficiently use the FWHT to guess many entries in a small number of operations. In Sect. 6 we do the theoretical analysis and show that this indeed leads to a more efficient procedure than all previous ones. 5.1

From LWE to LPN

First, we need to introduce a slight modification to the original system of equations before the reduction part. Assume that we have turned the distribution of s to be the noise distribution, through the standard transformation described in Sect. 3.2. The result after this is written as before z = sA + e.

(5)

Now we perform a multiplication by 2 to each equation, resulting in z = sA + 2e, since when multiplied with a known value, we can compute the result modulo q. Next, we apply the reduction steps and make the values in A as small as possible by performing BKW-like steps. In our case we apply the smooth-LMS step from the previous section, but any other reduction  method like coded-BKW  with sieving would be possible. If A = aT1 aT2 · · · aTm the output of this step is a matrix where the Euclidean norm of each ai is small. The result is written as (6) z = sA + 2E, 2t where E = (E1 , E2 , . . . , Em ) and Ei = j=1 eij as before. Finally, we transform the entire system to the binary case by considering z0 = s0 A0 + e mod 2,

(7)

where z0 is the vector of least significant bits in z , s0 the vector of least significant bits in s, A0 = (A mod 2) and e denotes the binary error introduced.

428

A. Budroni et al.

We can now examine the error ej in position j of e. In (6) we have equations of the form zj = i si aij + 2Ej in Zq , which can be written on integer form as  si aij + 2Ej + kj · q. (8) zj =

i

Now if | i si aij +2Ej | < q/2 then kj = 0. In this case (8) can be reduced mod 2 without error and ej = 0. In general, the error is computed as ej = kj mod 2. So one can compute a distribution for ej = kj mod 2 by computing P (kj = x). It is possible to compute such distribution either making a general approximation or precisely for each specific position j using the known values aj and zj . Note that the distribution of ej depends on zj . Also note that if aj is reduced to a small norm and the number of steps t is not too large, then it is quite likely that | i si aij + 2Ej | < q/2 leading to P (ej = 0) being large. For the binary system, we finally need to find the secret value s0 . Either 1. there are no errors (or almost no errors), corresponding to P (ej = 0) ≈ 1. Then one can solve for s0 directly using Gaussian elimination (or possibly some information set decoding algorithm in the case of a few possible errors). 2. or the noise is larger. The binary system of equations corresponds to the situation of a fast correlation attack [31], or secret recovery in an LPN problem [13]. Thus, one may apply an FWHT to recover the binary secret values. 5.2

Guessing s0 Using the FWHT

The approach of using the FWHT to find the most likely s0 in the binary system in (7) comes directly from previous literature on Fast Correlation Attacks [14]. Let k denote an n-bit vector (k0 , k1 , . . . , kn−1 ) (also considered as an integer) and consider a sequence Xk , k = 0, 1, . . . , N − 1, N = 2n . It can for example be a sequence of occurrence values in the time domain, e.g. Xk = the number of occurrences of X = k. The Walsh-Hadamard Transform is defined as ˆw = X

N −1 

Xk · (−1)w·k ,

k=0

where w · k denotes the bitwise dot product of the binary representation of the n-bit indices w and k. There exists an efficient method (FWHT) to compute the  WHT in time O(N log N ). Given the matrix A0 , we define Xk = j∈J (−1)zj , where J is the set of all columns of the matrix A0 that equal k. Then, one ˆ w |, and we have that s0 corresponds to w ˆ w¯ | = ¯ such that |X computes maxw |X ˆ w |. In addition, X ˆ w is simply the (biased) sum of the noise terms. maxw |X ˆ w actually depends on the value of Soft Received Information. The bias of X  zj . So a slightly better approach is to use “soft received information” by defining zj Xk = · zj , where zj is the bias corresponding to zj . For each j∈J (−1)

Making the BKW Algorithm Practical for LWE

429

Algorithm 1. BKW-FWHT with smooth reduction (main framework) Input: Matrix A with n rows and m columns, received vector z of length m and algorithm parameters t1 , t2 , t3 , nlimit , σset Step 0: Use Gaussian elimination to change the distribution of the secret vector; Step 1: Use t1 smooth-plain BKW steps to remove the bottom npbkw entries; Step 2: Use t2 smooth-LMS steps to reduce ncod1 more entries; Step 3: Perform the multiplying-2 operations; Step 4: Use t3 smooth-LMS steps to reduce the remaining nt ≤ nlimit entries; Step 5: Transform all the samples to the binary field and recover the partial secret key by the FWHT. We can exhaustively guess some positions.

x ∈ {−(q −1)/2, ..., (q −1)/2}, the bias x can be efficiently pre-computed so that its evaluation does not affect the overall complexity of the guessing procedure. Hybrid Guessing. One can use hybrid approach to balance the overall complexity among reduction and guessing phases. Indeed, it is possible to leave some rows of the matrix A unreduced and apply an exhaustive search over the corresponding positions in combination with the previously described guessing step. 5.3

Retrieving the Original Secret

Once s0 is correctly guessed, it is possible to obtain a new LWE problem instance ˆ = 2A and with the secret half as big as follows. Write s = 2s + s0 . Define A ˆ z = z − s0 A. Then we have that ˆ + e. ˆ z = s A

(9)

The entries of s have a bit-size half as large as the entries of s, therefore (9) is an easier problem than (5). One can apply the procedure described above to (9) and guess the new binary secret s1 , i.e. the least significant bits of s . The cost of doing this will be significantly smaller as shorter secret translates to computationally easier reduction steps. Thus, computationally speaking, the LWE problem can be considered solved once we manage to guess the least significant bits of s. Given the list of binary vectors s0 , s1 , s2 , ..., it is easy to retrieve the original secret s.

6

Analysis of the Algorithm and Its Complexity

In this section, we describe in detail the newly-proposed algorithm called BKWFWHT with smooth reduction (BKW-FWHT-SR).

430

A. Budroni et al.

6.1

The Algorithm

The main steps of the new BKW-FWHT-SR algorithm are described in Algorithm 1. We start by changing the distribution of the secret vector with the secret-noise transformation [8], if necessary. The general framework is similar to the coded-BKW with sieving procedure proposed in [21]. In our implementation, we instantiated coded-BKW with sieving steps with smooth-LMS steps discussed before for the ease of implementation. The different part of the new algorithm is that after certain reduction steps, we perform a multiplication by 2 to each reduced sample as described in Sect. 5. We then continue reducing the remain positions and perform the mod 2 operations to transform the entire system to the binary case. Now we obtain a list of LPN samples and solve the corresponding LPN instance via known techniques such as FWHT or partial guessing. One high level description is that we aim to input an LWE instance to the LWE-to-LPN transform developed in Sect. 5, and solve the instance by using a solver for LPN. To optimize the performance, we first perform some reduction steps to have a new LWE instance with reduced dimension but larger noise. We then feed the obtained instance to the LWE-to-LPN transform. 6.2

The Complexity of Each Step

From now on we assume that the secret is already distributed as the noise distribution or that the secret-noise transform is performed. We use the LF2 heuristics and assume the the sample size is unchanged before and after each reduction step. We now start with smooth-plain BKW steps and let lred be the number of positions already reduced. Smooth-Plain BKW Steps. Given m initial samples, we could on average 4 have 2m 3 categories for one plain BKW step in the LF2 setting. Instead we b0 could assume for 2 categories, and thus the number of samples m is 1.5 · 2b0 . Let CpBKW be the cost of all smooth-plain BKW steps, whose initial value is set to be 0. If a step starts with never being reduced before, we can reduce   a position b lp positions, where lp = log (q) . Otherwise, when the first position is partially 2

reduced in the previous step and we need β  categories to further   reduce this 2 (β ) . position, we can in total fully reduce lp positions, where lp = 1 + b−log log2 (q) For this smooth-plain BKW step, we compute Cpbkw += ((n + 1 − lred ) · m + Cd,pbkw ),

where Cd,pbkw = m is the cost of modulus switching for the last partially reduced position in this step. We then update the number of the reduced positions, lred += lp . 4

The number of categories is doubled compared with the LF2 setting for LPN. The difference is that we could add and subtract samples for LWE.

Making the BKW Algorithm Practical for LWE

431

After iterating for t1 times, we could compute Cpbkw and lred . We will continue updating lred and denote npbkw the length reduced by the smooth-plain BKW steps. Smooth-LMS Steps Before the Multiplication of 2. We assume that the final noise contribution from each position reduced by LMS is similar, bounded by a preset value σset . Since the noise variable generated in the i-th (0 ≤ i ≤ t2 − 1) Smooth-LMS step will be added by 2t2 +t3 −i times and also be multiplied 2t2 +t3 −i ×4C 2

i,LM S1 2 = , where Ci,LM S1 is the length of by 2, we compute σset 12 the interval after the LMS reduction in this step. We use βi,LM S1 categories q . Similar to smooth-plain BKW for one position, where βi,LM S1 =  Ci,LM S1 steps, if this step starts with an new position, we can reduce lp positions, where b

. Otherwise, when the first position is partially reduced in the lp = log (βi,LM S1 ) 2  previous step and we need βp,i,LM S1 categories to further reduce this position,

b−log (β 

)

p,i,LM S1 we can in total fully reduce lp positions, where lp = 1 + log 2(βi,LM

. Let S1 ) 2 CLM S1 be the cost of Smooth-LMS steps before the multiplication of 2, which is innitialized to 0. For this step, we compute

CLM S1 += (n + 1 − lred ) · m, and then update the number of the reduced positions, lred += lp . After iterating t2 times, we compute CLM S1 and lred . We expect lred = n−nt (nt ≤ nlimit ) positions have been fully reduced and will continue updating lred . Smooth-LMS Steps After the Multiplication of 2. The formulas are similar to those for Smooth-LMS steps before the multiplication of 2. The difference 2t3 −i C 2

i,LM S2 2 = , is that the noise term is no longer multiplied by 2, so we have σset 12 for 0 ≤ i ≤ t3 − 1. Also, we need to track the a vector of length nt for the later distinguisher. The cost is

CLM S2 = t3 · (nt + 1) · m. We also need to count the cost for multiplying samples by 2 and the mod2 operations, and the LMS decoding cost, which are CmulM od = 2 · (nt + 1) · m, Cdec = (n − npbkw + t2 + t3 ) · m. FWHT Distinguisher and Partial Guessing. After the LWE-to-LPN transformation, we have an LPN problem with dimension nt and m instance. We perform partial guessing on nguess positions, and use FWHT to recover the remaining nF W HT = nt − nguess positions. The cost is, Cdistin = 2nguess · ((nguess + 1) · m + nF W HT · 2nF W HT ).

432

6.3

A. Budroni et al.

The Data Complexity

We now discuss the data complexity of the new FWHT distinguisher. In the integer form, we have the following equation, zj =

n t −1

si aij + 2Ej + kj · q.

i=0

If | si aij + 2Ej | < q/2 then kj = 0. Then the equation can be reduced mod 2 without error. In general, the error is ej = kj mod 2. We employ a smart FWHT distinguisher with soft received information, as described in Sect. 5. From [29], we know the sample complexity can be approxi4 ln nt . mated as m ≈ Ez=t[D(e z=t ||Ub )] Table 1. The comparison between D(Xσf ,2q ||U2q ) and Ez=t [D(ez=t ||Ub )] σf

q

D(Xσf ,2q ||U2q ) Ez=t [D(ez=t ||Ub )]

0.5q 0.6q 0.7q 0.8q

1601 1601 1601 1601

−2.974149 −4.577082 −6.442575 −8.582783

−2.974995 −4.577116 −6.442576 −8.582783

For different value of zj , the distribution of ej is different. The maximum bias is achieved when zj = 0. In this sense, we could compute the divergence as  Ez=t [D(ez=t ||Ub )] = Pr[z = t] D(ez=t ||Ub ) t∈Zq

=

 t∈Zq

1  Pr[z = t] ( Pr[ez=t = i] log(2 · Pr[ez=t = i])) i=0

where ez is the Bernoulli variable conditioned on the value of z and Ub the uniform distribution over the binary field. Following the previous research [4], we approximate the noise si aij + 2Ej as discrete Gaussian with standard deviation σf . If σf is large, the probability Pr[z = t] is very close to 1/q. Then, the expectation Ez=t,t∈Zq [D(ez=t ||Ub )] can be approximated as 1 

Pr[z = t] Pr[ez=t = i] log(2q · Pr[ez=t = i, z = t]),

t∈Zq i=0

i.e., the divergence between a discrete Gaussian with the same standard deviation and a uniform distribution over 2q, D(Xσf ,2q ||U2q ). We numerically computed

Making the BKW Algorithm Practical for LWE

433

that the approximation is rather accurate when the noise is sufficiently large (see Table 1). In conclusion, we use the formula m≈

4 ln nt , D(Xσf ,2q ||U2q )

to estimate the data complexity of the new distinguisher. It remains to control the overall variance σf2 . Since we assume that the noise contribution from each reduced position by LMS is the same and the multiplication of 2 will double the 2 (n − npbkw ). standard deviation, we can derive σf2 = 4 ∗ 2t1 +t2 +t3 σ 2 + σ 2 σset Note: The final noise is a combination of three parts, the noise from the LWE problem, the LMS steps before the multiplication by 2, and the LMS steps after the multiplication by 2. The final partial key recovery problem is equivalent to distinguishing a discrete Gaussian from uniform with the alphabet size doubled. We see that with the multiplication by 2, the variances of the first and the second noise parts are increased by a factor of 4, but the last noise part does not expand. This intuitively explains the gain of the new binary distinguisher. 6.4

In Summary

We have the following theorem to estimate the complexity of the attack. Theorem 1. The time complexity of the new algorithm is C = Cpbkw + CLM S1 + CLM S2 + Cdec + Cdistin + CmulM od , under the condition that m≥

4 ln nt , D(Xσf ,2q ||U2q )

2 where σf2 = 4 ∗ 2t1 +t2 +t3 σ 2 + σ 2 σset (n − npbkw ).

6.5

Numerical Estimation

We numerically estimate the complexity of the new algorithm BKW-FWHTSR (shown in Table 2). It improves the known approaches when the noise rate (represented by α) becomes larger. We should note that compared with the previous BKW-type algorithms, the implementation is much easier though the complexity gain might be mild.

7

A New BKW Algorithm Implementation for Large LWE Problem Instances

We have a new implementation of the BKW algorithm that is able to handle very large LWE problem instances. The code is written in C, and much care has been taken to be able to handle instances with a large number of samples.

434

A. Budroni et al.

Table 2. Estimated time complexity comparison (in log2 (·)) for solving LWE instances in the TU Darmstadt LWE challenge [2]. Here unlimited number of samples are assumed. The last columns show the complexity estimation from the LWE estimator [7]."ENU" represents the enumeration cost model is employed and "Sieve" represents the sieving cost model is used. Bold-faced numbers are the smallest among the estimations with these different approaches. n

q

α

BKWLWE estimator [7] FWHT-SR Coded- usvp dec dual BKW ENU Sieve ENU Sieve ENU Sieve

40

1601

0.005 0.010 0.015 0.020 0.025 0.030

34.4 39.3 42.4 46.2 48.3 50.0

42.6 43.7 52.6 52.6 52.7 52.7

31.4 34.0 42.5 ∞ ∞ ∞

41.5 44.8 54.2 ∞ ∞ ∞

34.7 36.3 43.1 51.9 59.2 67.1

44.6 44.9 50.6 58.2 66.1 68.9

39.1 51.1 61.5 73.1 84.7 96.3

47.5 57.9 64.4 75.9 85.4 92.5

45

2027

0.005 0.010 0.015 0.020 0.025 0.030

37.7 43.5 48.3 51.2 54.1 56.3

55.2 55.2 55.2 55.2 55.3 64.1

31.8 39.5 50.4 ∞ ∞ ∞

41.9 51.2 61.3 ∞ ∞ ∞

35.0 41.2 51.2 61.1 71.0 80.2

44.8 48.2 58.3 65.0 71.4 78.7

41.5 57.0 74.3 86.8 100.7 116.2

51.6 64.6 74.9 86.1 95.0 104.1

50

2503

0.005 0.010 0.015 0.020 0.025 0.030

41.8 48.7 52.5 56.4 59.3 63.3

46.4 56.0 56.8 61.9 66.1 66.3

32.4 46.0 ∞ ∞ ∞ ∞

42.6 57.5 ∞ ∞ ∞ ∞

35.5 47.6 60.8 72.1 83.5 94.2

45.1 54.1 63.6 72.1 80.8 89.1

46.7 66.8 84.9 101.9 120.0 134.0

58.0 65.4 83.5 96.5 105.7 115.6

70

4903

0.005 58.3 0.010 67.1 0.015 73.3

62.3 73.7 75.6

52.3 54.2 ∞ ∞ ∞ ∞

110.5 124.0 136.8

133.0 93.2 135.5 111.4 181.9 133.2 ∞ ∞ 195.0 150.4 266.2 165.7 ∞ ∞ 246.4 183.2 334.0 209.8

120 14401 0.005 100.1 0.010 115.1 0.015 127.0

55.2 63.3 80.4 77.1 102.5 93.2

76.2 75.9 111.3 98.9 146.0 118.0

A key success factor in the software design was to avoid unnecessary reliance on RAM, so we have employed file-based storage where necessary and practically possible. The implementation includes most known BKW reduction step, FFT and FWHT-based guessing methods, and hybrid guessing approaches. For our experiments, presented in Sect. 8, we assembled a machine with an ASUS PRIME X399-A motherboard, a 4.0 GHz Ryzen Threadripper 1950X processor and 128GiB of 2666 MHz DDR4 RAM. While the machine was built

Making the BKW Algorithm Practical for LWE

435

from standard parts with a limited budget, we have primarily attempted to maximize the amount of RAM and the size and read/write speeds of the fast SSDs for overall ability to solve large LWE problem instances. We will make the implementation available as an open source repository. We describe below how we dealt with some interesting performance issues. File-Based Sample Storage. The implementation does not assume that all samples can be stored in RAM, so instead they are stored on file in a special way. Samples are stored sorted into their respective categories. For simplicity, we have opted for a fixed maximum number of samples per category. The categories are stored sequentially on file, each containing its respective samples (possibly leaving some space if the categories are not full). A category mapping, unique for each reduction type, defines what category index a given sample belongs to5 . Optional Sample Amplification. We support optional sample amplification. That is, if a problem instance has a limited number of initial samples (e.g.., the Darmstadt LWE challenge), then it is possible to combine several of these to produce new samples (more, but with higher noise). While this is very straightforward in theory, we have noticed considerable performance effects when this recombination is performed naïvely. For example, combining triplets of initial samples using a nested loop is problematic in practice for some instances, since some initial samples become over-represented – Some samples are used more often than others when implemented this way. We have solved this by using a Linear Feedback Shift Register to efficiently and pseudo-randomly distribute the selection of initial samples more evenly. Employing Meta-Categories. For some LWE problem instances, using a very high number of categories with few samples in each is a good option. This can be problematic to handle in an implementation, but we have used meta-categories to handle this situation. For example, using plain BKW reduction steps with modulus q and three positions, we end up with q 3 different categories. With q large, an option is to use only two out of the three position values in a vector to first map it into one out of q 2 different meta-categories. When processing the (meta-)categories, one then needs an additional pre-processing in form of a sorting step in order to divide the samples into their respective (non-meta) categories (based on all three position values), before proceeding as per usual. We have used this implementation trick to, for example, implement plain BKW reduction for three positions. One may think of the process as bruteforcing one out of three positions in the reduction step.

5

In this section a category is defined slightly differently from the rest of the paper. A category together with its adjacent category are together what we simply refer to as a category in the rest of the paper.

436

A. Budroni et al.

Secret Guessing with FFT and FWHT. The same brute-forcing techniques are also useful for speeding up the guessing part of the solver. We have used this to improve the FFT and FWHT solvers in the corresponding way. For the FWHT case, if the number of positions to guess is too large for the RAM to handle, we leave some of them to brute-force. This case differs from the above by the fact that binary positions are brute-forced (so more positions can be handled) and that the corresponding entries in the samples must be reduced.

8

Experimental Results

In this section we report the experimental results obtained in solving some LWE problems. Our main goal was to confirm our theory and to prove that BKW algorithms can be used in practice to solve relatively large instances. Therefore, there is still room to run a more optimized code (for example, we did not use any parallelization in our experiments) and to make more optimal parameter choices (we generally used more samples than required and no brute-force guessing techniques were used). We considered two different scenarios. In the first case, we assumed for each LWE instance to have access to an arbitrary large number of samples. Here we create the desired amount of samples ourselves6 . In the second case, we considered instances with a limited number of samples. An LWE problem is “solved” when the binary secret is correctly guessed, for the reasons explained in Sect. 5.3. Unlimited Number of Samples. We targeted the parameter choices of the TU Darmstadt challenges [2]. For each instance, we generated as many initial samples as needed according to our estimations. In Table 3 we report the details of the largest solved instances. Moreover, in Example 2 we present our parameter choices for one of these. Table 3. Experimental results on target parameters. n

q

α

Number of samples Running time

40 1601 0.005 45 million 40 1601 0.01 1.6 billion 45 2027 0.005 1.1 billion

12 min 12 h 13 h

Example 2. Let us consider an LWE instance with n = 40, q = 1601 and σ = 0.005 · q. To successfully guess the secret, we first performed 8 smooth-plain BKW steps reducing 18 positions to zero. We used the following parameters. ni = 2, Ci =     (C1 , C2 , C3 , C4 , C5 , C6 , C7 , C8 ) 6

1, for i = 1, . . . , 8, = (165, 30, 6, 1, 165, 30, 6, 1).

we used rounded Gaussian noise for simplicity of implementation.

Making the BKW Algorithm Practical for LWE

437

Note that C4 = C8 = 1. In this way, we exploited the smoothness to zero 9 positions every 4 steps. For this reason, we start steps 5 and 9 by skipping one position. Finally, we did 5 smooth-LMS steps using the following parameters: (n9 , n10 , n11 , n12 , n13 ) = (3, 4, 4, 5, 6) (C9 , C10 , C11 , C12 , C13 ) = (17, 24, 34, 46, 66)    (C9 , C10 , C11 , C12 ) = (46, 66, 23, 81). These parameters are chosen in√such a way that the number of categories within each step is ≈13M and Ci ≈ 2Ci−1 . We used ≈40M samples in each step so that each category contained 3 samples in average. This way we are guaranteed to have enough samples in each step. Limited Number of Samples. As a proof-of-concept, we solved the original TU Darmstadt LWE challenge instance [2] with parameters n = 40, α = 0.005 and the number of samples limited to m = 1600. We did this by sample amplifying with triples of samples, taking 7 steps of smooth-plain BKW on 17 entries, 5 steps of smooth-LMS on 22 entries and 1 position was left to brute-force. The overall running time was of 3 h and 39 min.

9

Conclusions and Future Work

We introduced a novel and easy approach to implement the BKW reduction step which allows balancing the complexity among the iterations, and an FWHTbased guessing procedure able to correctly guess the secret with relatively large noise level. Together with a file-based approach of storing samples, the above define a new BKW algorithm specifically designed to solve practical LWE instances. We leave optimization of the implementation, including parallelization, for future work. Acknowledgements. This work was supported in part by the Swedish Research Council (Grants No. 2015–04528 and 2019–04166), the Norwegian Research Council (Grant No. 247742/070), and the Swedish Foundation for Strategic Research (Grant No. RIT17-0005 and strategic mobility grant No. SM17-0062). This work was also partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.

References 1. NIST Post-Quantum Cryptography Standardization. https://csrc.nist.gov/ Projects/Post-Quantum-Cryptography/Post-Quantum-Cryptography-Standardization. Accessed 24 Sep 2018 2. TU Darmstadt Learning with Errors Challenge. https://www.latticechallenge.org/ lwe_challenge/challenge.php. Accessed 01 May 2020

438

A. Budroni et al.

3. Albrecht, M., Cid, C., Faugere, J.C., Fitzpatrick, R., Perret, L.: On the complexity of the arora-Ge algorithm against LWE (2012) 4. Albrecht, M.R., Cid, C., Faugère, J.-C., Fitzpatrick, R., Perret, L.: On the complexity of the BKW algorithm on LWE. Des. Codes Crypt. 74(2), 325–354 (2013). https://doi.org/10.1007/s10623-013-9864-x 5. Albrecht, M.R., Ducas, L., Herold, G., Kirshanova, E., Postlethwaite, E.W., Stevens, M.: The general sieve kernel and new records in lattice reduction. In: Ishai, Y., Rijmen, V. (eds.) EUROCRYPT 2019. LNCS, vol. 11477, pp. 717–746. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17656-3_25 6. Albrecht, M.R., Faugère, J.-C., Fitzpatrick, R., Perret, L.: Lazy modulus switching for the BKW algorithm on LWE. In: Krawczyk, H. (ed.) PKC 2014. LNCS, vol. 8383, pp. 429–445. Springer, Heidelberg (2014). https://doi.org/10.1007/9783-642-54631-0_25 7. Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. J. Math. Crypt. 9(3), 169–203 (2015) 8. Applebaum, B., Cash, D., Peikert, C., Sahai, A.: Fast cryptographic primitives and circular-secure encryption based on hard learning problems. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 595–618. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03356-8_35 9. Arora, S., Ge, R.: New algorithms for learning in presence of errors. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP 2011. LNCS, vol. 6755, pp. 403–415. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22006-7_34 10. Baignères, T., Junod, P., Vaudenay, S.: How far can we go beyond linear cryptanalysis? In: Lee, P.J. (ed.) ASIACRYPT 2004. LNCS, vol. 3329, pp. 432–450. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30539-2_31 11. Becker, A., Ducas, L., Gama, N., Laarhoven, T.: New directions in nearest neighbor searching with applications to lattice sieving. In: Krauthgamer, R. (ed.) 27th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 10–24. ACM-SIAM, Arlington, VA, USA, 10–12 January 2016 12. Blum, A., Furst, M., Kearns, M., Lipton, R.J.: Cryptographic primitives based on hard learning problems. In: Stinson, D.R. (ed.) CRYPTO 1993. LNCS, vol. 773, pp. 278–291. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-48329-2_24 13. Blum, A., Kalai, A., Wasserman, H.: Noise-tolerant learning, the parity problem, and the statistical query model. In: 32nd Annual ACM Symposium on Theory of Computing, pp. 435–440. ACM Press, Portland, OR, USA, 21–23 May 2000 14. Chose, P., Joux, A., Mitton, M.: Fast correlation attacks: an algorithmic point of view. In: Knudsen, L.R. (ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 209–221. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-46035-7_14 15. Delaplace, C., Esser, A., May, A.: Improved low-memory subset sum and LPN algorithms via multiple collisions. In: Albrecht, M. (ed.) IMACC 2019. LNCS, vol. 11929, pp. 178–199. Springer, Cham (2019). https://doi.org/10.1007/978-3-03035199-1_9 16. Duc, A., Tramèr, F., Vaudenay, S.: Better algorithms for LWE and LWR. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 173– 202. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46800-5_8 17. Esser, A., Heuer, F., Kübler, R., May, A., Sohler, C.: Dissection-BKW. In: Shacham, H., Boldyreva, A. (eds.) CRYPTO 2018. LNCS, vol. 10992, pp. 638– 666. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96881-0_22 18. Esser, A., Kübler, R., May, A.: LPN decoded. In: Katz, J., Shacham, H. (eds.) CRYPTO 2017. LNCS, vol. 10402, pp. 486–514. Springer, Cham (2017). https:// doi.org/10.1007/978-3-319-63715-0_17

Making the BKW Algorithm Practical for LWE

439

19. Guo, Q., Johansson, T., Löndahl, C.: Solving LPN using covering codes. In: Sarkar, P., Iwata, T. (eds.) ASIACRYPT 2014. LNCS, vol. 8873, pp. 1–20. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45611-8_1 20. Guo, Q., Johansson, T., Löndahl, C.: Solving LPN using covering codes. J. Cryptol. 33(1), 1–33 (2020) 21. Guo, Q., Johansson, T., Mårtensson, E., Stankovski, P.: Coded-BKW with sieving. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10624, pp. 323–346. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70694-8_12 22. Guo, Q., Johansson, T., Mårtensson, E., Stankovski Wagner, P.: On the asymptotics of solving the LWE problem using coded-BKW with sieving. IEEE Trans. Inf. Theory 65(8), 5243–5259 (2019) 23. Guo, Q., Johansson, T., Stankovski, P.: Coded-BKW: solving LWE using lattice codes. In: Gennaro, R., Robshaw, M. (eds.) CRYPTO 2015. LNCS, vol. 9215, pp. 23–42. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-47989-6_2 24. Herold, G., Kirshanova, E., May, A.: On the asymptotic complexity of solving LWE. Des. Codes Crypt. 86(1), 55–83 (2017). https://doi.org/10.1007/s10623-016-0326-0 25. Kirchner, P.: Improved generalized birthday attack. Cryptology ePrint Archive, Report 2011/377 (2011). http://eprint.iacr.org/2011/377 26. Kirchner, P., Fouque, P.-A.: An improved BKW algorithm for LWE with applications to cryptography and lattices. In: Gennaro, R., Robshaw, M. (eds.) CRYPTO 2015. LNCS, vol. 9215, pp. 43–62. Springer, Heidelberg (2015). https://doi.org/10. 1007/978-3-662-47989-6_3 27. Levieil, É., Fouque, P.-A.: An improved LPN algorithm. In: De Prisco, R., Yung, M. (eds.) SCN 2006. LNCS, vol. 4116, pp. 348–359. Springer, Heidelberg (2006). https://doi.org/10.1007/11832072_24 28. Lindner, R., Peikert, C.: Better key sizes (and attacks) for LWE–based encryption. In: Kiayias, A. (ed.) CT-RSA 2011. LNCS, vol. 6558, pp. 319–339. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19074-2_21 29. Lu, Y., Meier, W., Vaudenay, S.: The conditional correlation attack: a practical attack on Bluetooth encryption. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 97–117. Springer, Heidelberg (2005). https://doi.org/10.1007/ 11535218_7 30. Mårtensson, E.: The asymptotic complexity of coded-BKW with sieving using increasing reduction factors. In: 2019 IEEE International Symposium on Information Theory (ISIT), pp. 2579–2583 (2019) 31. Meier, W., Staffelbach, O.: Fast correlation attacks on certain stream ciphers. J. Cryptol. 1(3), 159–176 (1988). https://doi.org/10.1007/BF02252874 32. Mulder, E.D., Hutter, M., Marson, M.E., Pearson, P.: Using bleichenbacher’s solution to the hidden number problem to attack nonce leaks in 384-bit ECDSA: extended version. J. Cryptographic Eng. 4(1), 33–45 (2014) 33. Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. In: Gabow, H.N., Fagin, R. (eds.) 37th Annual ACM Symposium on Theory of Computing, pp. 84–93. ACM Press, Baltimore, MA, USA, 22–24 May 2005 34. Shor, P.W.: Algorithms for quantum computation: discrete logarithms and factoring. In: 35th Annual Symposium on Foundations of Computer Science, pp. 124–134. IEEE Computer Society Press, Santa Fe, New Mexico, 20–22 November 1994

On a Dual/Hybrid Approach to Small Secret LWE A Dual/Enumeration Technique for Learning with Errors and Application to Security Estimates of FHE Schemes Thomas Espitau1 , Antoine Joux2,3 , and Natalia Kharchenko4(B) 1

2

3

NTT Corporation, Tokyo, Japan [email protected] Institut de Math´ematiques de Jussieu–Paris Rive Gauche, CNRS, INRIA, Univ Paris Diderot, Paris, France CISPA Helmholtz Center for Information Security, Saarbr¨ ucken, Germany 4 Sorbonne Universit´e, LIP 6, CNRS UMR 7606, Paris, France [email protected] Abstract. In this paper, we investigate the security of the Learning With Error (LWE) problem with small secrets by refining and improving the so-called dual lattice attack. More precisely, we use the dual attack on a projected sublattice, which allows generating instances of the LWE problem with a slightly bigger noise that correspond to a fraction of the secret key. Then, we search for the fraction of the secret key by computing the corresponding noise for each candidate using the newly constructed LWE samples. As secrets are small, we can perform the search step very efficiently by exploiting the recursive structure of the search space. This approach offers a trade-off between the cost of lattice reduction and the complexity of the search part which allows to speed up the attack. Besides, we aim at providing a sound and non-asymptotic analysis of the techniques to enable its use for practical selection of security parameters. As an application, we revisit the security estimates of some fully homomorphic encryption schemes, including the Fast Fully Homomorphic Encryption scheme over the Torus (TFHE) which is one of the fastest homomorphic encryption schemes based on the (Ring-)LWE problem. We provide an estimate of the complexity of our method for various parameters under three different cost models for lattice reduction and show that the security level of the TFHE scheme should be re-evaluated according to the proposed improvement (for at least 7 bits for the most recent update of the parameters that are used in the implementation).

1

Introduction

The Learning With Errors (LWE) problem was introduced by Regev [Reg05] 2005. A key advantage of LWE is that it is provably as hard as certain lattice approximation problems in the worst-case [BLP+13], which are believed to be hard even on a quantum computer. The LWE problem has been a rich source of cryptographic constructions. As a first construction, Regev proposed c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 440–462, 2020. https://doi.org/10.1007/978-3-030-65277-7_20

On a Dual/Hybrid Approach to Small Secret LWE

441

an encryption scheme, but the flexibility of this security assumption proved to be extremely appealing to construct feature-rich cryptography [GPV08,BV11]. Among these constructions, Fully homomorphic encryption (FHE) is a very interesting primitive, as it allows performing arbitrary operations on encrypted data without decrypting it. A first FHE scheme relying on the so-called ideal lattices was proposed in a breakthrough work of Gentry [G+09]. After several tweaks and improvements through the years, the nowadays popular approaches to FHE rely on the LWE problem or its variants (e.g. [FV12,GSW13,BGV14, CS15,DM15]). Informally, when given several samples of the form (a, a, s+e mod q) where s is secret, a ∈ Zqn is uniform and e is some noise vector, the LWE problem is to recover s. In its original formulation, the secret vector is sampled uniformly at random from Znq , but more recent LWE-based constructions choose to use distribution with small entropy for the secret key to increase efficiency. For example, some FHE schemes use binary [DM15,CGGI16], ternary [CLP17], or even ternary sparse secrets [HS15]. Theoretical results are supporting these choices, which show that the LWE remains hard even with small secrets [BLP+13]. In practice, however, such distributions can lead to more efficient attacks [BG14,SC19, CHHS19]. The security of a cryptosystem, of course, depends on the complexity of the most efficient known attack against it. In particular, to estimate the security of an LWE-based construction, it is important to know which attack is the best for the parameters used in the construction. It can be a difficult issue; indeed, the survey of existing attacks against LWE given in [APS15] shows that no known attack would be the best for all sets of LWE parameters. In this article, we are interested in evaluating the practical security of the LWE problem with such small secrets. As an application, we consider the bitsecurity of several very competitive FHE proposals, such as the Fast Fully Homomorphic Encryption scheme over the Torus [CGGI16,CGGI17,CGGI20], FHEW [DM15], SEAL [LP16], and HElib [HS15]. The security of these constructions relies on the hardness of variants of the LWE problem which can all be encompassed in a scale-invariant version named Torus-LWE. This “learning a character” problem, captures both the celebrated LWE and Ring-LWE problems. In the case of TFHE, in [CGGI17], the authors adapted and used the dual distinguishing lattice attack from [Alb17] to evaluate the security of their scheme. Recently, in [CGGI20, Remark 9], the authors propose an updated set of the parameters for their scheme and estimate the security of the new parameters using the LWE estimator from [ACD+18]. It turns out that this approach falls into the caveat we described above: the estimator relies on attacks that are not fine-tailored to capture the peculiar choice of distributions. According to the LWE estimator, the best attack against the current TFHE parameters is the unique-SVP attack [BG14].

442

1.1

T. Espitau et al.

Contributions and Overview of the Techniques

We present our work in the generic context of the so-called scale-invariant LWE problem or Torus-LWE, which appears to give a more flexible mathematical framework to perform the analysis of the attacks. We aim at extending the usecase of the dual lattice attack which is currently one of the two main techniques used to tackle the LWE problem. Given Torus-LWE samples collected in matrix  ts + e = b mod 1, the vanilla dual attack consists in finding a vector v form A  t = 0 mod 1, yielding the equation v te = v tb mod 1. Since v te such that v t A should be small, we can then distinguish it from a random vector in the torus. A Refined Analysis of the Dual Attack. First, we introduce a complete nonasymptotic analysis of the standard dual lattice attack1 on LWE. In particular, we prove non-asymptotic bounds on the number of samples and the corresponding bit-complexity of the method, allowing precise instantiations for parameter selection2 . To do so, we introduce an unbiased estimator of the Levy transform for real random variables that allows us to sharpen the analysis of the attack. The intuition behind these techniques is the following. The crux of the dual attack is distinguishing the Gaussian distribution modulo 1 from the uniform distribution. Since we are working modulo 1, a natural approach is to tackle the problem by harmonic analysis techniques. Moving to the Fourier space is done by the so-called Levy transform (which is essentially given by x → e2iπx ). In this space, the Levy transform of the Gaussian distribution mod 1 and the full Gaussian distribution coincides, so we somehow get rid of the action of the modulo. Then we use Berry-Esseen inequality to derive sharper bounds. We hope these techniques may find other interests for the community. A Hybrid Enumeration/Dual Attack. In the second step, we show that applying the dual attack to a projected sublattice and combining it with the search for a fraction of the key can yield a more efficient attack. It can be considered as a dimension reduction technique, trading the enumeration time with the dimension of the lattice attack. More precisely, we obtain a tradeoff between the quality of lattice reduction in the dual attack part and the time subsequently spent in the exhaustive search. Additionally, for the lattice reduction algorithms using sieving as an SVP-oracle, we demonstrate that the pool of short vectors obtained by the sieving process can be used to amortize the cost of the reduction part. We also discuss possible improvements based on so-called “combinatorial” techniques, where we perform a random guess of the zero positions of the secret if it is sparse enough. In a word, our attack starts by applying lattice reduction to a projected sublattice in the same way as it is applied to the whole lattice in the dual attack. This way, we generate LWE instances with bigger noise but in smaller dimension, 1

2

We point out that this attack is slightly more subtle than the vanilla dual technique, as it encompasses a continuous relaxation of the lazy modulus switching technique of [Alb17]. Up to our knowledge previous analyses rely on instantiation of asymptotic inequalities and overlook the practical applicability.

On a Dual/Hybrid Approach to Small Secret LWE

443

corresponding to a fraction of the secret key. Then, the freshly obtained instances are used to recover the remaining fraction of the secret key. For each candidate for this missing fraction, we compute the noise vector corresponding to the LWE instances obtained at the previous step. This allows us to perform a majority voting procedure to detect the most likely candidates. For small secrets, this step boils down to computing a product of a matrix of the LWE samples with the matrix composed of all the possible parts of the secret key that we are searching for. We show that this computation can be performed efficiently thanks to the recursive structure of the corresponding search space. Applications. In the last part, we estimate the complexity of our attack under three different models of lattice reduction and compare the complexity of our attack with the standard dual attack and with the primal unique-SVP attack for a wide range of LWE parameters in the case of small non-sparse secrets. Concerning the comparison with the primal unique-SVP attack, both attacks give quite close results. Our attack is better than uSVP when the dimension and the noise parameter are big, the uSVP attack is better when the dimension is big and the noise parameter is small (see [?]). We also provide experiments in small dimensions, supporting the whole analysis. To evaluate the practicality of our approach, we apply our attack to the security analysis of competitive FHE schemes, namely TFHE [CGGI20], FHEW [DM15], SEAL [LP16], and HElib [HS15]. We show that our hybrid dual attack gives improvement compared to the unique-SVP or dual technique of [ACD+18] for the latest TFHE’s, FHEW’s and SEAL’s parameters. In case of sparse secrets in the HElib scheme, our attack doesn’t provide improvements over the dual attack from [Alb17], but gives comparable results. The results of our comparison for the TFHE scheme are presented in Table 1. Table 1. Security estimates of the parameters of TFHE from [CGGI20, Table 3, Table 4] and from the public implementation [G+16]. n denotes the dimension, α is the parameter of the modular Gaussian distribution. The bold numbers denote the overall security of the scheme for a given set of parameters. The “uSVP” column corresponds to the estimates obtained using the LWE Estimator [ACD+18] for the primal uSVP attack. For the lattice reduction algorithm, in all the cases, the sieving BKZ cost model is used.

Old param.

New param.

Parameters (n, α)

Dual [ACD+18, Alb17]

This work uSVP [ACD+18]

Switching key n = 500, α = 2.43 · 10−5

113

94

101

Bootstrapping key n = 1024, α = 3.73 · 10−9

125

112

116

Switching key n = 612, α = 2−15

140

118

123

Bootstrapping key n = 1024, α = 2−26

134

120

124

144

121

127

140

125

129

Implem. param. Switching key n = 630, α = 2−15 Bootstrapping key n = 1024, α = 2−25

444

1.2

T. Espitau et al.

Related Work

The survey [APS15] outlines three strategies for attacks against LWE: exhaustive search, BKW algorithm [BKW03,ACF+15], and lattice reduction. Lattice attacks against LWE can be separated into three categories depending on the lattice used: distinguishing dual attacks [Alb17], decoding (primal) attacks [LP11,LN13], and solving LWE by reducing it to the unique Shortest Vector Problem (uSVP) [AFG13]. The idea of a hybrid lattice reduction attack was introduced by Howgrave– Graham in [HG07]. He proposed to combine a meet-in-the-middle attack with lattice reduction to attack NTRUEncrypt. Then, Buchmann et al. adapted Howgrave–Graham’s attack to the settings of LWE with binary error [BGPW16] and showed that the hybrid attack outperforms existing algorithms for some sets of parameters. This attack uses the decoding (primal) strategy for the lattice reduction part. Following these two works, Wunderer has provided an improved analysis of the hybrid decoding lattice attack and meet-in-the-middle attack and re-estimated security of several LWE and NTRU based cryptosystems in [Wun16]. Also, very recently, a similar combination of primal lattice attack and meet-in-the-middle attack was applied to LWE with ternary and sparse secret [SC19]. This last reference shows that the hybrid attack can also outperform other attacks in the case of ternary and sparse secrets for parameters typical for FHE schemes. A combination of the dual lattice attack with guessing for a part of the secret key was considered in [Alb17, Section 5], in the context of sparse secret keys. Also, recently, a similar approach was adapted to the case of ternary and sparse keys in [CHHS19]. Both of these articles can be seen as dimension reduction techniques as they both rely on a guess of the part of the secret to perform the attack in smaller dimension. They gain in this trade-off by exploiting the sparsity of the secret: guessing the position of zero bits will trade positively with the dimension reduction as soon as the secret is sparse enough. However, the main difference of this work compared to [CHHS19,Alb17] is that the secret is not required to be sparse, and thus can be considered to be slightly more general. We positively trade-off with the dimension gain by exploiting the recursive structure of the small secret space. However, all these techniques are not incompatible! In Sect. 4.4, we propose a combination of the guessing technique with our approach, allowing us to leverage at the same time the sparsity and the structure of small secrets. Overall we can consider this work as providing a proper dual analog of enumeration-hybrid technique existing for primal attacks. Outline. This paper is organized as follows. In Sect. 2, we provide the necessary background on lattice reduction and probability. In Sect. 3, we revisit the dual lattice attack and provide a novel and sharper analysis of this method. In Sect. 4,

On a Dual/Hybrid Approach to Small Secret LWE

445

we describe our hybrid dual lattice attack and discuss its extension to sparse secrets. In Sect. 5, we compare the complexities of different attacks, revisit the security estimate of TFHE, and several other FHE schemes and provide some experimental evidence supporting our analysis.

2

Background

We use column notation for vectors and denote them using bold lower-case letters (e.g. x). Matrices are denoted using bold upper-case letters (e.g. A). For a vector x, xt denotes the transpose of x, i.e., the corresponding row-vector. Base2 logarithm is denoted as log, natural logarithm is denoted as ln. We denote the set of real numbers modulo 1 as the torus T. For a finite set S, we denote by |S| its cardinality and by U(S) the discrete uniform distribution on its elements. For any compact set S ⊂ Rn , the uniform distribution over S is also denoted by U(S). When S is not specified, U denotes uniform distribution over (−0.5; 0.5). 2.1

The LWE Problem

Abstractly, all operations of the TFHE scheme are defined on the real torus T and to estimate the security of the scheme it is convenient to consider a scaleinvariant version of LWE problem. [Learning with Errors, [BLP+13, Definition 2.11]] Let n  1, s ∈ Zn , ξ be a distribution over R and S be a distribution over Zn . We define the LWEs,ξ distribution as the distribution over Tn × T obtained by sampling a from U(Tn ), sampling e from ξ and returning (a, at s + e). Given access to outputs from this distribution, we can consider the two following problems: • Decision-LWE . Distinguish, given arbitrarily many samples, between U(Tn × T) and LWEs,ξ distribution for a fixed s sampled from S. • Search-LWE . Given arbitrarily many samples from LWEs,ξ distribution with fixed s ← S, recover the vector s. To complete the description of the LWE problem we need to choose the error distribution ξ and the distribution of the secret key S. Given a finite set of integers B, we define S to be U(B n ) and ξ to be a centered continuous Gaussian distribution, i.e., we consider the LWE problem with binary secret. This definition captures the binary and ternary variants of LWE by choosing B to be respectively {0, 1} and {−1, 0, 1}. In [BLP+13], it is shown that this variation of LWE with small secrets remains hard. Finally, we use the notation LWEs,σ as a shorthand for LWEs,ξ , when ξ is the Gaussian distribution centered at 0 and with standard deviation σ.

446

2.2

T. Espitau et al.

Lattices

A lattice Λ is a discrete subgroup of Rd . As such, a lattice Λ of rank n can be described as a set of all integer linear combinations of n  d linearly independent vectors B = {b1 , . . . , bd } ⊂ Rd : Λ = L(B) := Zb1 ⊕ · · · ⊕ Zbd , called a basis. Bases are not unique, one lattice basis may be transformed into another one by applying an arbitrary unimodular transformation. The volume of the lattice vol(Λ) is equal to the square root of the determinant of the Gram  matrix Bt B: vol(Λ) = det(Bt B). For every lattice Λ we denote the length of its shortest non-zero vector as λ1 (Λ). Minkowski’s theorem states that λ1 (Λ)  √ γn · vol(Λ)1/n for any d-dimensional lattice Λ, where γd < d is d-dimensional Hermite’s constant. The problem of finding the shortest non-zero lattice vector is called the Shortest Vector Problem(SVP). It is known to be NP-hard under randomized reduction [Ajt98]. 2.3

Lattice Reduction

A lattice reduction algorithm is an algorithm which, given as input some basis of the lattice, finds a basis that consists of relatively short and relatively pairwiseorthogonal vectors. The quality of the basis produced by lattice reduction algo b1 , where b1 is rithms is often measured by the Hermite factor δ = det(Λ)1/d  n/4 can the first vector of the output basis. Hermite factors bigger than 43 be reached in polynomial time using the LLL algorithm [LLL82]. In order to obtain smaller Hermite factors, blockwise lattice reduction algorithms, like BKZ2.0 [CN11] or S-DBKZ [MW16], can be used. The BKZ algorithm takes as input a basis of dimension d and proceeds by solving SVP on lattices of dimension β < d using sieving [BDGL16] or enumeration [GNR10]. The quality of the output of BKZ depends on the blocksize β. In [HPS11] it is shown that after a polynomial number of calls to SVP oracle, the BKZ algorithm with blocksize β produces a basis B that achieves the following bound: d−1

b1  2γβ2(β−1)

+ 32

· vol(B)1/d .

However, up to our knowledge, there is no closed formula that tightly connects the quality and complexity of the BKZ algorithm. In this work, we use experimental models proposed in [ACF+15,ACD+18] in order to estimate the running time and quality of the output of lattice reduction. They are based on the following two assumptions on the quality and shape of the output of BKZ. The first assumption states that the BKZ algorithm outputs vectors with balanced coordinates, while the second assumption connects the Hermite factor δ with the chosen blocksize β.

On a Dual/Hybrid Approach to Small Secret LWE

447

Assumption 2.1. Given as input, a basis B of a d-dimensional lattice Λ, BKZ outputs a vector of norm close to δ d · det(Λ)1/d with balanced coordinates. Each coordinate of this vector follows a distribution that can be √approximated by a Gaussian with mean 0 and standard deviation δ d det(Λ)1/d / d. Assumption 2.2. BKZ with blocksize β achieves Hermite factor δ=

1  β  2(β−1) 1 (πβ) β . 2πe

This assumption is experimentally verified in [Che13]. BKZ Cost Models. To estimate the running time of BKZ, we use three different models. The first model is an extrapolation by Albrecht [ACF+15] et al. of the Liu–Nguyen datasets [LN13]. According to that model, the logarithm of the running time of BKZ-2.0 (expressed in bit operations) is a quadratic function of log(δ)−1 : 0.009 log(T (BKZδ )) = − 27. log(δ)2 We further refer to this model as the delta-squared model. The model was used in [CGGI17] to estimate the security of TFHE. Another cost model [ACD+18] assumes that the running time of BKZ with blocksize β for d-dimensional basis is T (BKZβ,d ) = 8d·T (SVPβ ), where T (SVPβ ) is the running time of an SVP oracle in dimension β. For the SVP oracle, we use the following two widely used models: T (SVPβ ) ≈ 20.292β+16.4 ,

Sieving model:

T (SVPβ ) ≈ 20.187β log(β)−1.019β+16.1 .

Enumeration model:

Analysing the proof of the sieving algorithm [BDGL16] reveals that around  4  n2 short vectors while solving SVP on an n-dimensional lattice. Therefore, 3 when using the sieving model, we shall assume that one run of the BKZ routine  β produces 43 2 short lattice vectors, where β is the chosen blocksize. As such, we shall provide the following heuristic, which generalizes the repartition given in Assumption 2.1 when the number of output vectors is small with regards to the number of possible vectors of the desired length: 2

Assumption 2.3. Let R δ d Vd and R  (4/3)β/2 where Vd is the volume of the 2 unit ball in dimension d. Given as input, a basis B of a d-dimensional lattice Λ, BKZβ with a sieving oracle as SVP oracle outputs a set of R vectors of norm close to δ d · det(Λ)1/d with balanced coordinates. Each coordinate of these vector follows a distribution that can be√approximated by a Gaussian with mean 0 and standard deviation δ d det(Λ)1/d / d. In practice, for the dimensions involved in cryptography, this assumption can be experimentally verified. In particular, for the parameters tackled in this

448

T. Espitau et al.

work, the number of vectors used by the attack is way lower than the number of potential candidates. An experimental verification of this fact is conducted in [?]. In general settings (when we need to look at all the vectors of the sieving pool), one might see this exploitation as a slight underestimate of the resulting security parameters. An interesting open problem that we leave for future work as it is unrelated to the attacks mounted here, would be to quantify precisely the distribution of the sieved vectors. A related idea seems quite folklore in the lattice reduction community. In particular, in [Alb17], the output basis of the reduction is randomized with slight enumeration and a pass of LLL for instance. This approach is slightly more costly than just extracting the sieved vectors but is comparable for its effect as an amortization technique when a batch reduction is needed.

3

Dual Distinguishing Attack Against LWE

In this first section, we revisit the distinguishing dual attack against LWE (or more precisely for the generic corresponding scale-invariant problem described in [BLP+13,CGGI20]), providing complete proofs and introducing finer tools as a novel distinguisher for the uniform distribution and the modular Gaussian. In particular, all the results are non-asymptotic and can be used for practical instantiations of the parameters. Note that it also naturally encompasses a continuous relaxation of the lazy modulus switching technique of [Alb17], as the mathematical framework used makes it appear very naturally in the proof technique. Settings. In all of the following, we denote by B a finite set of integers (typically {0, 1} or {−1, 0, 1}). Let s ∈ B n be a secret vector and let α > 0 be a fixed constant. The attack takes as input m samples (a1 , b1 ), . . . , (am , bm ) ∈ Tn+1 × T which are either all from LWEs,α distribution or all from U(Tn × T), and guesses the input distribution. We can write input samples in a matrix form: A := (a1 , . . . , am ) ∈ Tn×m ,

b = (b1 , . . . , bm )t ∈ Tm ,

if input samples are from the LWEs,α distribution: b = At s + e mod 1. Distinguisher Reduction Using a Small Trapdoor. To distinguish between the two distributions, the attack searches for a short vector v = (v1 , . . . , vm )t ∈ Zm such that the linear combination of the left parts of the inputs samples defined by v, i.e.: m  vi ai = Av mod 1 x := i=1

is also a short vector. If the input was from the LWE distribution, then the corresponding linear combination of the right parts of the input samples is also small as a sum of two relatively small numbers: vt b = vt (At s + e) = xt s + vt e

mod 1.

(1)

On a Dual/Hybrid Approach to Small Secret LWE

449

On the other hand, if the input is uniformly distributed, then independently of the choice of the non-zero vector v, the product v · b mod 1 has uniform distribution on (−1/2; 1/2). Recovering a suitable v thus turns the decisionalLWE problem into an easier problem of distinguishing two distributions on T. This remaining part of this section is organized in the following way. First, in Sect. 3.1 we describe how such a suitable vector v can be discovered by lattice reduction and analyze the distribution of vt b. Then, in Sect. 3.2, we estimate the complexity of distinguishing two distributions on T that we obtain after this first part. Finally, Sect. 3.3 estimates the time complexity of the whole attack. 3.1

Trapdoor Construction by Lattice Reduction

Finding a vector v such that both parts of the sum (1) are small when the input has LWE distribution is equivalent to finding a short vector in the following (m + n)-dimensional lattice: 

Av mod 1 m+n m . L(A) = ∈R ∀v ∈ Z v The lattice L(A) can be generated by the columns of the following matrix:

In A ∈ R(m+n)×(m+n) B= 0m×n Im A short vector in L(A) can be found by applying a lattice reduction algorithm to the basis B. Using Assumption 2.1, we expect that the lattice reduction process produces a vector w = (x||v)t ∈ Zn+m with equidistributed coordinates. Our goal is to minimize the product vt b = xt s + vt e. The vectors e and s come from different distributions and have different expected norms. For practical schemes, the variance of e is much smaller than the variance of s. To take this imbalance into account, one introduces an additional rescaling parameter q ∈ R>0 . The first n rows of the matrix B are multiplied by q, the last m rows are multiplied by q −n/m . Obviously, this transformation doesn’t change the determinant of the matrix. A basis Bq of the transformed lattice is given by

qIn qA Bq = ∈ R(m+n)×(m+n) . 0m×n q −n/m Im We apply a lattice reduction algorithm to Bq . Denote the first vector of the reduced basis as wq . By taking the last m coordinates of wq and multiplying them by q n/m we recover the desired vector v. This technique can be thought of as a continuous relaxation of the modulus switching technique. That part of the attack is summarized in [?]. The following lemma describes the distribution of the output of [?] under Assumption 2.1 that BKZ outputs vectors with balanced coordinates.

Let2 1 α > 0 be a fixed constant, B a finite set of integers of variance S 2 = |B| b∈B b

450

T. Espitau et al.

and n ∈ Z>0 . Let s be a vector such that of its coefficients are sampled independently and uniformly in B. Suppose that Assumption 2.1 holds and let δ > 0 be the quality of the output of the BKZ algorithm. Then, given as input m = n · ln(S/α) ln(δ) − n samples from the LWEs,α distribution, lattice reduction outputs a random variable x with a distribution that can be approximated by a Gaussian distribution with mean 0 and standard deviation σ    σ = α · exp 2 n ln(S/α) ln(δ) .

Denote as Fx the cumulative distribution function of x and denote as Φσ the cumulative distribution function of the Gaussian distribution with mean 0 and standard deviation σ. Then, the distance between the two distributions can be  1 √ , as n → ∞. The crux of the proof bounded: sup |Fx (t) − Φσ (t)| = O 2 S (m+n)

t∈R

of Sect. 3.1 relies on the Berry-Esseen theorem. We provide the complete details in [?]. 3.2

Exponential Kernel Distinguisher for the Uniform and the Modular Gaussian Distributions

We now describe a novel distinguisher for the uniform and the modular Gaussian distributions. Formally, we construct a procedure that takes as input N samples which are all sampled independently from one of the two distributions and guesses this distribution. The crux of our method relies on the use of an empirical estimator of the Levy transform of the distributions, to essentially cancel the effect of the modulus 1 on the Gaussian. Namely, from the N samples X1 , . . . , XN , we construct the N

estimator Y¯ = N1 · e2πiXi . As N is growing to infinity, this estimator converges i=1

to the Levy transform at 0 of the underlying distribution, that is to say: • to 0 for2 the uniform distribution 2 • to e−2π σ for the modular Gaussian. Hence, to distinguish the distribution used to draw the samples, we now only 2 2 need to determine whether the empirical estimator Y¯ is closer to 0 or to e−2π σ . The corresponding algorithm is described in [?]. Let σ > 0 be a fixed constant. Assume that [?] is given as input N points that are sampled independently from the uniform distribution U or from the modular of the  input points Gaussian distribution Gσ . Then, [?] guesses the distribution  −4π 2 σ 2

· N . The time correctly with a probability at least pσ = 1 − exp − e 8 complexity of the algorithm is polynomial in the size of the input. Proof. See full version.

 

Section 3.2 implies that to distinguish the uniform distribution and the modular Gaussian distribution with the parameter σ with a non-negligible probability, 2 2 we need to take a sample of size N = O(e4π σ ).

On a Dual/Hybrid Approach to Small Secret LWE

451

Remark 1. The dual attack proposed in [CGGI20], does not specify, which algorithm is used for distinguishing the uniform and the modular Gaussian distributions. Instead, to estimate the size of the sample, needed to distinguish the distributions, they estimate the statistical distance ε (see [CGGI20, Section 7, Equation(6)] and use O(1/ε2 ) as an estimate for the required size of the sample. However, such an estimate does not allow a practical instantiation in the security analysis since it hides the content of the O. It turns out that the exponential kernel distinguisher, described in [?] has the same asymptotic complexity as the statistical distance estimate from [CGGI20] suggests, while enjoying a sufficiently precise analysis to provide non-asymptotic parameter estimation. 3.3

Complexity of the Dual Attack

The distinguishing attack is summarized in Algorithm 1. It takes as input m×N samples from an unknown distribution, then transforms them into N samples which have the uniform distribution if the input of the attack was uniform and the modular Gaussian distribution if the input was from the LWE distribution. Then, the attack guesses the distribution of N samples using [?] and outputs the corresponding answer.

Algorithm 1: Dual distinguishing attack

1 2 3 4 5 6 7 8 9 10

n×m input : {(Ai , bi )}N bi ∈ Tm , α > 0, S > 0, i=1 , where ∀i Ai ∈ T δ ∈ (1; 1.1) output: guess for the distribution of the input: Uniform or LWE distribution DistinguishingAttack({Ai , bi }N i=0 , α, S, δ): X := ∅    σ := α · exp 2 n ln(S/α) ln(δ) for i ∈ {1, . . . , N } do x ← LWEtoModGaussian(Ai , bi , S, α, δ) X ←X ∪x if (DistinguishGU(X, σ) = G) then return LWE distribution else return Uniform

The following theorem states that the cost of the distinguishing attack can be estimated by solving a minimization problem. The proof is deferred to [?]. Let α > 0 be a fixed constant, B a finite set of integers of variance 1 2 S 2 = |B| b∈B b and n ∈ Z>0 . Let s be a vector with all coefficients sampled independently and uniformly in B. Suppose that Assumption 2.1 holds. Then, the time complexity of solving Decision-LWEs,α with probability of success p by the distinguishing attack described in Algorithm 1 is   TDualAttack = min N (σ, p) · T (BKZδ ) , (2) δ

452

T. Espitau et al.

   2 2 1 where σ = α · exp 2 n ln(S/α) ln(δ) , N (σ, p) = 8 ln( 1−p ) · e4π σ .

4

Towards a Hybrid Dual Key Recovery Attack

In this section, we show how the dual distinguishing attack recalled in Sect. 3 can be hybridized with exhaustive search on a fraction of the secret vector to obtain a continuum of more efficient key recovery attacks on the underlying LWE problem. Recall that B is a finite set of integers from which the coefficients of the secret are drawn. Let then s ∈ B n be a secret vector and let α > 0 be a fixed constant. Our approach takes as input samples from the LWE distribution of form (3) (A, b = At s + e mod 1) ∈ (Tn×m , Tm ), where e ∈ Rm has centered Gaussian distribution with standard deviation α. The attack divides the secret vector into two fractions: s = (s1 ||s2 )t ,

s1 ∈ B n1 ,

s2 ∈ B n2 ,

n = n1 + n2 .

The matrix A is divided into two parts corresponding to the separation of the secret s: ⎛ ⎞ a1,1 . . . a1,m ⎜ .. ⎟ .. ⎜ . ⎟ . ⎜ ⎟

⎜ an1 ,1 . . . an1 ,m ⎟ A1 ⎟ A=⎜ (4) ⎜an +1,1 . . . an +1,m ⎟ = A2 1 1 ⎜ ⎟ ⎜ . ⎟ .. ⎝ .. ⎠ ... . an,1

. . . an,m

Then, Eq. 3 can be rewritten as At1 s1 + At2 s2 + e = b mod 1. By applying lattice reduction to matrix A1 as described in [?], we recover a vector v such that vt (At1 s1 + e) is small and it allows us to transforms m input LWE samples (A, b) ∈ (Tn×m , Tm ) into one new LWE sample (ˆ a, ˆb) ∈ (Tn2 , T) of smaller dimension and bigger noise: vt At s2 + vt (At1 s1 + e) =  vt b   2    a



mod 1.

(5)

ˆ b

The resulting LWE sample in smaller dimension can be used to find s2 . Let x ∈ B n2 be a guess for s2 . If the guess is correct, then the difference ˆb − a ˆ t x = ˆb − a ˆt s2 = (ˆ e mod 1) ∼ Gσ is small.

(6)

On a Dual/Hybrid Approach to Small Secret LWE

453

If the guess is not correct and x = s2 , then there exist some y = 0 such that ˆt x in the following way: x = s2 − y. Then, we rewrite ˆb − a ˆb − a ˆ t x = (ˆb − a ˆt s2 ) + a ˆt y = a ˆt y + eˆ. ˆt y + eˆ) as a sample from the LWE distribution that correWe can consider (ˆ a, a sponds to the secret y. Therefore, we may assume that if x = s2 , the distribution ˆt x mod 1 is close to uniform, unless the decision-LWE is easy to solve. of ˆb − a In order to recover s2 , the attack generates many LWE samples with reduced dimension. Denote by R the number of generated samples and put them into ˆ ∈ Tn2 ×R × TR . There are |B|n2 possible candidates for ˆ b) matrix form as (A, s2 . For each candidate x ∈ B n2 , the attack computes an R-dimensional vector ex = b − At s. The complexity of this computation for all the candidates is ˆ and S2 , where S2 is essentially the complexity of multiplying the matrices A a matrix whose columns are all vectors of (the projection of) the secret space in dimension n2 . Naively, the matrix multiplication requires O(n · |B|n2 · R) operations. However, by exploiting the recursive structure of S2 , it can be done in time O(R · |B|n2 ). Then, for each candidate x for s2 the attack checks whether the corresponding vector ex is uniform or concentrated around zero distribution. The attack returns the only candidate x whose corresponding vector ex has concentrated around zero distribution. The rest of this section is organized as follows. First, we describe the auxiliary algorithm for multiplying a matrix by the matrix of all vectors of the secret space that let us speed up the search for the second fraction of the secret key. Then, we evaluate the complexity of our attack. 4.1

Algorithm for Computing the Product of a Matrix with the Matrix of All Vectors in a Product of Finite Set

Let B = {b1 , . . . , bk } ⊂ Z be a finite set of integer numbers such that bi < bi+1 for all i ∈ {1, . . . , k − 1}. For any positive integer d, denote by S(d) the matrix whose columns are all vectors from {b1 , . . . , bk }d written in the lexicographical order. These matrices can recursively. For d = 1 the matrix is a   be constructed d single row, i.e., S(1) = b1 . . . bk , and for any d > 1 the matrix S(d) ∈ Zd×k can be constructed by concatenating k copies of the matrix S(d−1) and adding a row which consists of k d−1 copies of b1 followed by k d−1 copies of b2 and so on:

b . . . b1 b2 . . . b2 . . . bk . . . bk S(d) = 1 . (7) S(d−1) S(d−1) . . . S(d−1) Let a = (a1 , . . . , ad )t be a d-dimensional vector. Our goal is to compute the scalar products of a with each column of S(d) . We can do it by using the recursive structure of S(d) . Assume that we know the desired scalar products for

454

T. Espitau et al.

a(d−1) = (a2 , . . . , ad )t and S(d−1) Then, using Eq. (7), we get

  b1 . . . b1 . . . bk . . . bk t t a a a S(d) = 1 (d−1) · S(d−1) . . . S(d−1)  = (a1 · b1 , . . . , a1 · b1 )t + at(d−1) S(d−1) . . . (a1 · bk , . . . , a1 · bk )t + at(d−1) S(d−1)

(8) 

that is, the resulting vector is the sum of the vector at(d−1) S(d−1) concatenated with itself k times with the vector consisting of k d−1 copies of a1 ·b1 concatenated with k d−1 copies of a1 ·b2 and so on. The approach is summarized in Algorithm 2.

Algorithm 2: Compute a scalar product of a matrix of all vectors from {b1 , . . . , bk }d .

1 2 3 4 5 6 7 8 9

input : a = (a1 , . . . , ad )t , B = {b1 , . . . , bk } ⊂ Z such that b1 < b2 < · · · < bk . d output: at S(d) , where S(d) ∈ {b1 , . . . , bk }k ×d is the matrix whose columns are all the vectors from the set {b1 , . . . , bk }d written in the lexicographical order computeScalarProductWithAllVectors(a, B): x ← (ad · b1 , . . . , ad · bk )t ; y1 ← ∅, y2 ← ∅ for i ∈ {d − 1, . . . , 1} do for j ∈ {1, . . . , k} do y1 ← y1 ∪ x y2 ← y2 ∪ (ai · bj , . . . , ai · bj )t x ← y1 + y2 y1 ← ∅, y2 ← ∅ return x

Let d be a positive integer number and B = {b1 , . . . , bk } be a set of k integer numbers. Algorithm 2, given as input a d-dimensional vector a, outputs the vector x of dimension k d such that for all x = at S(d) . The time complexity of the algorithm is O(k d ). Proof. See Sect. 3.1

 

Let A be a matrix with R rows and d columns. The product of A and S(d) can be computed in time O(R · k d ). Proof. In order to compute A · S(d) we need to compute the product of each of the R rows of A with Sd . By Line 9 it can be done in time O(k d ). Then the   overall complexity of multiplying the matrices is O(R · k d ).

On a Dual/Hybrid Approach to Small Secret LWE

4.2

455

Complexity of the Attack

The pseudo-code corresponding to the full attack is given in Algorithm 3. Let α > 0, p ∈ (0; 1), S ∈ (0; 1), and n ∈ Z>0 be fixed constants. Let s ∈ B n and σ > 0. Suppose that Assumption 2.1 holds. Then, the time complexity of solving the Search-LWEs,α problem with the probability of success p by the attack described in Algorithm 3 is     n2 Tdual hybrid = min |B| + T (BKZδ ) · R(n2 , σ, p) , (9) δ,n2

where R(n2 , σ, p) = 8 · e4π

2

σ2

(n2 ln(2) − ln(ln(1/p))).

Proof. See full version.

 

Algorithm 3: Hybrid key recovery attack

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

4.3

n×m input : {(Ai , bi )}R , bi ∈ Tm , α > 0, S > 0, δ > 1, i=1 , where ∀i Ai ∈ T n1 ∈ {2, . . . , n − 1} output: s2 ∈ B n−n1 recoverS({(Ai , bi )}R i=1 , α, S, δ, n1 ): n2 ← (n − n1 )    σ ← α · exp 2 n1 ln(S/α) ln(δ) ˆ←∅ ˆ ←∅,b A /* lattice reduction part */ for i ∈ {1, . . . , R} do A ← Ai , b ← b i (A1 , A2 ) ← splitMatrix(A, n1 )  see Equation (4) v ← computeV(A1 , S, α, δ)  [?] ˆ←b ˆ ∪ {vt b} ˆ ←A ˆ ∪ {A2 v}, b A /* search for s2 */ S(n2 ) ← matrix of all vectors in the secret space of dimension n2 in lexicographical order ˆ . . . , b) ˆ ∈ TR×|B|n2 ˆ ← (b, B t ˆ ←B ˆ −A ˆ S(n ) mod 1  see Line 9 and Algorithm 2 E 2 for i ∈ {1, . . . , |B|n2 } do ˆ ˆ ← E[i] e /* guess the distribution of e (see [?]) */ if (distinguishGU(ˆ e, σ) = G) then return S(n2 ) [i]

Using Sieving in the Hybrid Attack

Assume that BKZ uses a sieving algorithm (for instance [BDGL16]) as an SVP oracle. At its penultimate step, the sieving algorithm produces many short vectors. Thus we may suppose that BKZ with the sieving SVP oracle produces

456

T. Espitau et al.

many short vectors in one run. Then,  we need N short lattice vectors, we need  N if times, where m is the number of short to run the lattice reduction only m vectors, returned by the lattice reduction. In the following corollary from Line 16, we use this property to revisit the time complexity of our attack under the sieving BKZ cost model. Let α, p, n, σ and s ∈ {0; 1}n be as in Line 16. Assume that the lattice reduction algorithm, used by Algorithm 1, uses the sieving algorithm from [BDGL16] as an oracle for solving SVP. Suppose that Algorithm 2.3 holds. Then, the time complexity of solving the Search-LWEs,α problem with probability of success p by the attack described in Algorithm 3 is    R(n , σ, p)  2 n2 Thybrid+sieving = min |B| · R(n1 , σ, p) + T (BKZδ ) · , (10) δ,n2 (4/3)β/2 where β is the smallest blocksize such that the lattice reduction with the blocksize β achieves the Hermite factor δ; R(n2 , σ, p) is as defined in Line 16. Proof. See full version. 4.4

 

The Sparse Case: Size Estimation and Guessing a Few Bits

When the secret is sparse we can use so-called combinatorial techniques [Alb17] to leverage this sparsity. Assume that only h components of the secret are nonzero. Then, we guess k zero components of the secret s and then run the full attack in dimension (n − k). If the guess was incorrect, we restart with a new and independent guess for the positions of zeroes. For sparse enough secrets, the running time of the attack in smaller dimension trade-offs positively with the failure probability. Also, the variance of the scalar product v ts is smaller in the sparse case because the variance of the key contains many zeros. Combining these observations, we obtain the following result for sparse secrets: Let α > 0, n > 0 and fix s ∈ B n . Suppose that s has exactly 0  h < n non-zero components. Suppose that Assumption 2.1 holds. Assume that the lattice reduction algorithm, used by Algorithm 1, uses the sieving algorithm from [BDGL16] as an oracle for solving SVP. Then, the time complexity of solving Decision-LWEs,α with probability of success p by the distinguishing attack described in Algorithm 1 is given by  

−1

n−h n (11) Tsparse = min · Thybrid (n − k, α) 0kh k k where β is the smallest blocksize such that the lattice reduction with the blocksize β achieves the Hermite factor δ; σ and N (σ, p) are as defined in Line 10. Proof. Please refer to the full version of the paper.

 

On a Dual/Hybrid Approach to Small Secret LWE

5

457

Bit-Security Estimation and Experimental Verification

We implement an estimator script for the attack that, given parameters of an LWE problem and a BKZ cost model as an input, finds optimal parameters for the dual attack (see Sect. 3) and our hybrid attack (see Sect. 4). Using this script, we evaluate the computational costs for a wide range of small-secret LWE parameters. In this section, we report the results of our numerical estimation and show that the security level of the TFHE scheme should be updated with regards to the hybrid attack. We also apply our attack to the parameters of FHEW, SEAL, and HElib. For completeness purposes, in the full version of this paper we also provide an in-depth comparison with the primal unique-SVP technique. Eventually, we support our argument by an implementation working on a small example. 5.1

Bit-Security of LWE Parameters

We numerically estimate the cost of solving LWE problem by the dual attack (as described in Sect. 3) and by our attacks for all pairs of parameters the (n, α) from the following set: (n, − log(α)) ∈ {100, 125, . . . , 1050} × {5, 6.25, . . . , 38.5}. We create a heatmap representing the cost of our attack as a function of parameters n and α. In Fig. 1 we present an estimation of the bit-security of the LWE parameters according to the combination of our attack and the collision attack, with time complexity 2n/2 . Thus, Fig. 1 represents the function min(TourAttack (n, α), 2n/2 ), where TourAttack (n, α) is the cost of our attack for the parameters n and α. Figure 1 is obtained under the sieving BKZ cost model. We also created similar heatmaps for our hybrid dual attack and the dual attack described in Sect. 3 under three BKZ cost models: enumeration, sieving, and delta-squared. For completeness, these heatmaps are presented in [?]. 5.2

Application to FHE Schemes

Non-sparse small secrets. TFHE. The TFHE scheme uses two sets of parameters: for the switching key and for the bootstrapping key. The security of the scheme is defined by the security of the switching key, which is the weaker link. The parameters of the TFHE scheme were updated several times. In Table 2, we presents the results of our estimates for the recently updated parameters from the public implementation [G+16, v1.1]. For completeness, we also reevaluate the security all the previous sets of TFHE parameters. The results for the previous parameters of TFHE can be found in [?]. FHEW. The fully homomorphic encryption scheme FHEW [DM15], as TFHE, uses binary secrets. Its parameters are given as n = 500, σ = 217 , q = 232 . The bit-security of these parameters under our hybrid dual attack in the sieving model is 96 bits, which is slightly better than the primal or dual attack estimated with [ACD+18] giving respectively 101 and 115 bits of security.

458

T. Espitau et al.

Fig. 1. Bit-security as a function of the LWE parameters n and α assuming the sieving BKZ cost model. Here, n denotes the dimension, α denotes the standard deviation of the noise, the secret key is chosen from the uniform distribution on {0, 1}n . The picture represents the security level λ of LWE samples, λ = log(min(TourAttack (n, α), 2n/2 )). The numbered lines on the picture represent security levels. The star symbol denotes the old TFHE key switching parameters from [CGGI17], the diamond symbol denotes the key switching parameters recommended in [CGGI20, Table 4]. Table 2. Security of the parameters of the TFHE scheme from the public implementation [G+16] (parameter’s update of February 21, 2020) against dual attack (as described in Sect. 3) and hybrid dual attack (as described in Sect. 4). λ denotes security in bits, δ and n1 are the optimal parameters for the attacks. “-” means that the distinguishing attack doesn’t have the parameter n1 . BKZ model

Switching key n = 630, α = 2−15 Bootstrapping key n = 1024, α = 2−25

Delta-squared Attack Dual

λ

δ

New attack 176 1.005 Sieving Enumeration

Dual

n1

270 1.0042 – 485

131 1.0044 –

Attack

λ

Dual

256 1.0042 –

δ

n1

New attack 190 1.0048 862 Dual

131 1.0044 –

New attack 121 1.0047 576

New attack 125 1.0046 967

Dual

Dual

292 1.0042 –

New attack 192 1.0052 469

280 1.0041 –

New attack 209 1.0049 842

SEAL. The SEAL v2.0 homomorphic library [LP16] uses ternary non-sparse secrets. We target these parameters directly with our hybrid approach and compare the (best) results with the dual attack of [Alb17]. The results are compiled in [?]. The results are very slightly better for our techniques, although being very comparable.

On a Dual/Hybrid Approach to Small Secret LWE

459

Sparse Secrets: HElib. The HElib homomorphic library [HS15] uses ternary sparse secrets which have exactly 64 non-zero components. We can then target these parameters using the combination of our hybrid attack with guessing. The results are compiled in [?]. The results are very slightly worse for our techniques, although are still very comparable. A reason might be that the exploitation of the sparsity in our case is more naive than the range of techniques used in [Alb17]. An interesting open question would be to merge the best of these two worlds to get even stronger attacks. We leave this question for future work as it is slightly out of the scope of the present paper. 5.3

Experimental Verification

In order to verify the correctness of our attack, we have implemented it on small examples. Our implementation recovers 5 bits of a secret key for LWE problems with the following two sets of parameters: (n, α) = (30, 2−8 ) and (n, α) = (50, 2−8 ). For implementation purposes, we rescaled all the elements defined over torus T to integers modulo 232 . For both examples, we use BKZ with blocksize 20, which yields the quality of the lattice reduction around δ  1.013. We computed the values of parameters of the attack required to guess correctly 5 bits of the key with probability 0.99 assuming that quality of the output of BKZ. The required parameters for both experiments are summarized in [?]. The first experiment was repeated 20 times, the second was repeated 10 times. For both experiments, the last five bits of the key were successfully recovered at all attempts. The correctness of both attacks rely on assumptions made in Sect. 3.1 for approximating the distribution of vt (At s + e) mod 1 by modular Gaussian distribution Gσ . In order to verify these assumptions, while running both experiments we have collected samples to check the distribution: each time when the attack found correctly the last bits of the secret key s2 , we collected the corre˜ −a ˜t s2 = vt (At s1 + e). For the first experiment, the size of the sponding e˜ = b collected sample is 20×R1 = 640, for the second experiment, it is 10×R2 = 740. The collected data is presented in the full version. In the full version, we compare theoretical predictions and estimations obtained from the experiments for the parameters of modular Gaussian distribution Gσ . Experimental estimations of mean and variance in both cases match closely theoretical predictions. For the analysis of our attack under the sieving BKZ cost model we assume that the sieving oracle produces at least (4/3)β/2 short lattice vectors, where β is the blocksize of BKZ (see Sect. 2.3). In order to check the correctness of the assumption, we studied the distribution of vectors returned by the sieving algorithm on random lattices in several small dimensions. All the experiments, details in [?], reveals essentially the same results: the size of the list was always a small constant around the expected (4/3)n/2 and√the norms of the vectors in the list were concentrated around a value of order n · vol(Λ)1/n , validating the heuristic used in practice. Acknowledgments. We thank the anonymous reviewers for valuable comments on this work, as well as Alexandre Wallet and Paul Kirchner for interesting discussions.

460

T. Espitau et al.

References [ACD+18] Albrecht, M.R., et al.: Estimate all the LWE, NTRU schemes!. In: Catalano, D., De Prisco, R. (eds.) SCN 2018. LNCS, vol. 11035, pp. 351–367. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98113-0 19 [ACF+15] Albrecht, M.R., Cid, C., Faugere, J.C., Fitzpatrick, R., Perret, L.: On the complexity of the BKW algorithm on LWE. Des. Codes Cryptogr. 74(2), 325–354 (2015) [AFG13] Albrecht, M.R., Fitzpatrick, R., G¨ opfert, F.: On the efficacy of solving LWE by reduction to unique-SVP. In: Lee, H.-S., Han, D.-G. (eds.) ICISC 2013. LNCS, vol. 8565, pp. 293–310. Springer, Cham (2014). https://doi. org/10.1007/978-3-319-12160-4 18 [Ajt98] Ajtai, M.: The shortest vector problem in L2 is NP-hard for randomized reductions. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 10–19. ACM (1998) [Alb17] Albrecht, M.R.: On dual lattice attacks against small-secret LWE and parameter choices in HElib and SEAL. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017. LNCS, vol. 10211, pp. 103–129. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56614-6 4 [APS15] Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. J. Math. Cryptol. 9(3), 169–203 (2015) [BDGL16] Becker, A., Ducas, L., Gama, N., Laarhoven, T.: New directions in nearest neighbor searching with applications to lattice sieving. In: Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 10–24. Society for Industrial and Applied Mathematics (2016) [BG14] Bai, S., Galbraith, S.D.: Lattice decoding attacks on binary LWE. In: Susilo, W., Mu, Y. (eds.) ACISP 2014. LNCS, vol. 8544, pp. 322–337. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08344-5 21 [BGPW16] Buchmann, J., G¨ opfert, F., Player, R., Wunderer, T.: On the hardness of LWE with binary error: revisiting the hybrid lattice-reduction and meet-in-the-middle attack. In: Pointcheval, D., Nitaj, A., Rachidi, T. (eds.) AFRICACRYPT 2016. LNCS, vol. 9646, pp. 24–43. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31517-1 2 [BGV14] Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans. Compu. Theory (TOCT) 6(3), 13 (2014) [BKW03] Blum, A., Kalai, A., Wasserman, H.: Noise-tolerant learning, the parity problem, and the statistical query model. J. ACM (JACM) 50(4), 506–519 (2003) [BLP+13] Brakerski, Z., Langlois, A., Peikert, C., Regev, O., Stehl´e, D.: Classical hardness of learning with errors. In: Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 575–584. ACM (2013) [BV11] Brakerski, Z., Vaikuntanathan, V.: Fully homomorphic encryption from ring-LWE and security for key dependent messages. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 505–524. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22792-9 29 [CGGI16] Chillotti, I., Gama, N., Georgieva, M., Izabach`ene, M.: Faster fully homomorphic encryption: bootstrapping in less than 0.1 seconds. In: Cheon, J.H., Takagi, T. (eds.) ASIACRYPT 2016. LNCS, vol. 10031, pp. 3–33. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53887-6 1

On a Dual/Hybrid Approach to Small Secret LWE

461

[CGGI17] Chillotti, I., Gama, N., Georgieva, M., Izabach`ene, M.: Faster packed homomorphic operations and efficient circuit bootstrapping for TFHE. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017. LNCS, vol. 10624, pp. 377–408. Springer, Cham (2017). https://doi.org/10.1007/978-3-31970694-8 14 [CGGI20] Chillotti, I., Gama, N., Georgieva, M., Izabach`ene, M.: TFHE: fast fully homomorphic encryption over the torus. J. Cryptol. 33(1), 34–91 (2020) [Che13] Chen, Y.: R´eduction de r´eseau et s´ecurit´e concrete du chiffrement completement homomorphe. Ph.D. thesis, Paris 7 (2013) [CHHS19] Cheon, J.H., Hhan, M., Hong, S., Son, Y.: A hybrid of dual and meetin-the-middle attack on sparse and ternary secret LWE. IEEE Access 7, 89497–89506 (2019) [CLP17] Chen, H., Laine, K., Player, R.: Simple encrypted arithmetic library SEAL v2.1. In: Brenner, M., et al. (eds.) FC 2017. LNCS, vol. 10323, pp. 3–18. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-7027801 [CN11] Chen, Y., Nguyen, P.Q.: BKZ 2.0: better lattice security estimates. In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 1–20. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25385-0 1 [CS15] Cheon, J.H., Stehl´e, D.: Fully homomophic encryption over the integers revisited. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 513–536. Springer, Heidelberg (2015). https://doi.org/10. 1007/978-3-662-46800-5 20 [DM15] Ducas, L., Micciancio, D.: FHEW: bootstrapping homomorphic encryption in less than a second. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 617–640. Springer, Heidelberg (2015). https:// doi.org/10.1007/978-3-662-46800-5 24 [FV12] Fan, J., Vercauteren, F.: Somewhat practical fully homomorphic encryption. IACR Cryptology ePrint Archive, 2012:144 (2012) [G+09] Gentry, C., et al.: Fully homomorphic encryption using ideal lattices. In: STOC, vol. 9, pp. 169–178 (2009) [G+16] Gama, N., et al.: Github repository. TFHE: Fast fully homomorphic encryption library over the torus (2016). https://github.com/tfhe/tfhe [GNR10] Gama, N., Nguyen, P.Q., Regev, O.: Lattice enumeration using extreme pruning. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 257–278. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-64213190-5 13 [GPV08] Gentry, C., Peikert, C., Vaikuntanathan, V.: Trapdoors for hard lattices and new cryptographic constructions. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, pp. 197–206. ACM (2008) [GSW13] Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013. LNCS, vol. 8042, pp. 75– 92. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-4004145 [HG07] Howgrave-Graham, N.: A hybrid lattice-reduction and meet-in-the-middle attack against NTRU. In: Menezes, A. (ed.) CRYPTO 2007. LNCS, vol. 4622, pp. 150–169. Springer, Heidelberg (2007). https://doi.org/10.1007/ 978-3-540-74143-5 9

462

T. Espitau et al.

[HPS11] Hanrot, G., Pujol, X., Stehl´e, D.: Analyzing blockwise lattice algorithms using dynamical systems. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 447–464. Springer, Heidelberg (2011). https://doi.org/10.1007/ 978-3-642-22792-9 25 [HS15] Halevi, S., Shoup, V.: Bootstrapping for HElib. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 641–670. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46800-5 25 [LLL82] Lenstra, A.K., Lenstra, H.W., Lov´ asz, L.: Factoring polynomials with rational coefficients. Math. Ann. 261(4), 515–534 (1982) [LN13] Liu, M., Nguyen, P.Q.: Solving BDD by enumeration: an update. In: Dawson, E. (ed.) CT-RSA 2013. LNCS, vol. 7779, pp. 293–309. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36095-4 19 [LP11] Lindner, R., Peikert, C.: Better key sizes (and attacks) for LWE-based encryption. In: Kiayias, A. (ed.) CT-RSA 2011. LNCS, vol. 6558, pp. 319–339. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-64219074-2 21 [LP16] Laine, K., Player, R.: Simple encrypted arithmetic library-seal (v2. 0). Technical report (2016) [MW16] Micciancio, D., Walter, M.: Practical, predictable lattice basis reduction. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 820–849. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3662-49890-3 31 [Reg05] Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. In: STOC, pp. 84–93. ACM (2005) [SC19] Son, Y., Cheon, J.H.: Revisiting the hybrid attack on sparse and ternary secret LWE. IACR Cryptology ePrint Archive, 2019:1019 (2019) [Wun16] Wunderer, T.: Revisiting the hybrid attack: Improved analysis and refined security estimates. IACR Cryptology ePrint Archive, 2016:733 (2016)

Encryption and Signatures

Adaptively Secure Threshold Symmetric-Key Encryption Pratyay Mukherjee(B) Visa Research, Palo Alto, USA [email protected] Abstract. In a threshold symmetric-key encryption (TSE) scheme, encryption/decryption is performed by interacting with any threshold number of parties who hold parts of the secret-keys. Security holds as long as the number of corrupt (possibly colluding) parties stay below the threshold. Recently, Agrawal et al. [CCS 2018] (alternatively called DiSE) initiated the study of TSE. They proposed a generic TSE construction based on any distributed pseudorandom function (DPRF). Instantiating with DPRF constructions by Naor, Pinkas and Reingold [Eurocrypt 1999] (also called NPR) they obtained several efficient TSE schemes with various merits. However, their security models and corresponding analyses consider only static (and malicious) corruption, in that the adversary fixes the set of corrupt parties in the beginning of the execution before acquiring any information (except the public parameters) and is not allowed to change that later. In this work we augment the DiSE TSE definitions to the fully adaptive (and malicious) setting, in that the adversary is allowed to corrupt parties dynamically at any time during the execution. The adversary may choose to corrupt a party depending on the information acquired thus far, as long as the total number of corrupt parties stays below the threshold. We also augment DiSE’s DPRF definitions to support adaptive corruption. We show that their generic TSE construction, when plugged-in with an adaptive DPRF (satisfying our definition), meets our adaptive TSE definitions. We provide an efficient instantiation of the adaptive DPRF, proven secure assuming decisional Diffie-Hellman assumption (DDH), in the random oracle model. Our construction borrows ideas from Naor, Pinkas and Reingold’s [Eurocrypt 1999] statically secure DDH-based DPRF (used in DiSE) and Libert, Joye and Yung’s [PODC 2014] adaptively secure threshold signature. Similar to DiSE, we also give an extension satisfying a strengthened adaptive DPRF definition, which in turn yields a stronger adaptive TSE scheme. For that, we construct a simple and efficient adaptive NIZK protocol for proving a specific commit-and-prove style statement in the random oracle model assuming DDH.

1

Introduction

Symmetric-key encryption is an extremely useful cryptographic technique to securely store sensitive data. However, to ensure the purported security it is of c Springer Nature Switzerland AG 2020  K. Bhargavan et al. (Eds.): INDOCRYPT 2020, LNCS 12578, pp. 465–487, 2020. https://doi.org/10.1007/978-3-030-65277-7_21

466

P. Mukherjee

utmost important to store the key securely. Usually this is handled by using secure hardware like HSM, SGX, etc. This approach suffers from several drawbacks including lack of flexibility (supports only a fixed operation), being expensive, prone to side-channel attacks (e.g. [22,26]) etc. Splitting the key among multiple parties, for example, by using Shamir’s Secret Sharing [31] provide a software-only solution. While this avoids a single-point of failure, many enterprise solutions, like Vault [4], simply reconstructs the key each time an encryption/decryption is performed. This leaves the key exposed in the memory in the reconstructed form. Threshold cryptography, on the other hand, ensures that the key is distributed at all time. A few recent enterprise solutions, such as [1–3], use threshold cryptographic techniques to provide software-only solutions without reconstruction for different applications. While studied extensively in the public-key setting [5,8–10,12,14–16,18,19,32], threshold symmetric-key encryption has been an understudied topic in the literature. This changed recently with the work by Agrawal et al. [6], that initiated the formal study of threshold symmetric-key encryption scheme (TSE). Their work, alternatively called DiSE, defines, designs and implements threshold (authenticated) symmetric-key encryption. They put forward a generic TSE scheme based on any distributed pseudorandom function (DPRF). They provided two main instantiations of TSE based on two DPRF constructions from the work of Naor Pinkas and Reingold [28] (henceforth called NPR): (i) One (DP-PRF) based on any  pseudorandom function, which is asymptotically efficient as long as O( nt ) is polynomial in the security parameter (when n is the total number of parties and t is the threshold). (i) Another one (DP-DDH) based on DDH assumption (proven secure in the random oracle model) and is efficient for any n, t (t ≤ n). By adding specialized NIZK to it, they constructed a stronger variant of DPRF– when plugged-in to the generic construction, a stronger TSE scheme is obtained. A shortcoming of the DiSE TSE schemes is that they are secure only in a static (albeit malicious) corruption model, in that the set of corrupt parties is decided in the beginning and remains unchanged throughout the execution.1 This is unrealistic in many scenarios where the adversary may corrupt different parties dynamically throughout the execution. For instance, an attacker may just corrupt one party and actively takes part in a few executions and based on the knowledge acquired through the corrupt party, decides to corrupt the next party and so on. Of course, the total number of corruption must stay below the threshold t, as otherwise all bets are off.

1

We note that, DiSE’s definition has a weak form of adaptivity, in that the corrupt set can be decided depending on the public parameters (but nothing else). Moreover, we do not have an adaptive attack against their scheme; instead their proof seems to rely crucially on the non-adaptivity.

Adaptively Secure Threshold Symmetric-Key Encryption

467

Our Contribution. In this paper we augment DiSE to support adaptive corruption.2 Our contributions can be summarized as follows: 1. We augment the security definitions of TSE, as presented in DiSE [6], to support adaptive corruption. Like DiSE, our definitions are game-based. While some definitions (e.g. adaptive correctness, Definition 7) extend straightforwardly, others (e.g. adaptive authenticity, Definition 9) require more work. Intuitively, subtleties arise due to the dynamic nature of the corruption, which makes the task of tracking “the exact information” acquired by the attacker harder. 2. Similarly, we also augment the DPRF security definitions (following DiSE) to support adaptive corruption. We show that the generic DiSE construction satisfies our adaptive TSE definitions when instantiated with any such adaptively secure DPRF scheme. The proof of the generic transformation for adaptive corruption works in a fairly straightforward manner. 3. We provide new instantiations of efficient and simple DPRF constructions. Our main construction is based on the NPR’s DDH-based DPRF (DP-DDH, as mentioned above) and the adaptive threshold signature construction by Libert, Joye and Yung [23] (henceforth called LJY). It is proven secure assuming DDH in the random oracle model. The proof diverges significantly from DiSE’s proof of DP-DDH. Rather, it closely follows the proof of the LJY’s adaptive threshold signature. Similar to DiSE, we also provide a stronger DPRF construction by adding NIZK proofs – this yields a stronger version of TSE via the same transformation. However, in contrast to DiSE, we need the NIZK to support adaptive corruption. We provide a construction of adaptive NIZK for a special commit-and-prove style statement from DDH in the random oracle model – this may be of an independent interest. Finally, we observe that the adaptively secure construction obtained via a standard complexity leveraging argument on DP-PRF is asymptotically optimal. So, essentially the same construction can be used to ensure security against adaptive corruption without any additional machinery. Efficiency Compared to DiSE. Though we do not provide implementation benchmarks, it is not too hard to see the performance with respect to DiSE. Our DDHbased DPRF scheme is structurally very similar to DP-DDH– the only difference is in the evaluation, which requires two exponentiation operations instead of one used in DiSE. So, we expect the overall throughput of the TSE scheme may degrade by about a factor of 2. However, the communication remains the same because in our construction, like DP-DDH, only one group element is sent by each party. 2

We remark that adaptive security is generically achieved from statically secure schemes using a standard complexity leveraging   argument, in particular, just by guessing the corrupt set ahead of time. When nt is super-polynomial (in the security parameter), this technique naturally incurs a super-polynomial blow-up, for which super-polynomially hard assumptions are required. In contrast, all our constructions are based on polynomially hard assumptions.

468

P. Mukherjee

For the stronger (NIZK-added) version, we expect again a factor of 2 slow down when instantiated with our adaptive NIZK construction (c.f. Appendix A) – we have 8 exponentiations as compared to DiSE’s 4 in this case. In this case, however, there are also some degradation (around a factor of 2) in the communication due to presence of two additional elements of Zp in the NIZK proof, compared to the (publicly verifiable) DiSE instantiation (Figure 5 of [7]).3 Roadmap. We discuss related works in Sect. 2 and technical overview in Sect. 3. After providing some basic notations and preliminaries in Sect. 4, we put forward our adaptive DPRF definitions in Sect. 5 and adaptive TSE definitions in Sect. 6. We give our DPRF constructions in Sect. 7. The full proof of our main construction is deferred to the full version [27]. In Appendix A we provide our adaptive NIZK construction. Our TSE construction, that extends fairly straightforwardly from DiSE, is deferred to the full version [27].

2

Related Work

Threshold cryptography may be thought of as a special-case of general-purpose secure multiparty computation (MPC). Specific MPC protocols for symmetric encryption has been proposed in the literature [1,13,20,29], that essentially serve the same purpose. While they adhere to the standardized schemes, such as AES, their performance often falls short of the desired level. Theoretical solutions, such as universal thresholdizers [11] resolve the problem of threshold (authenticated) symmetric-key encryption generically, but are far from being practical. For a detailed discussion on threshold cryptography literature, and comparison with other approaches such as general MPC we refer to DiSE. Adaptive DPRF has been constructed from lattice-based assumption by [25]. Their construction has an advantage of being in the standard model apart from being post-quantum secure. However, their construction is quite theoretical and is therefore not suitable for constructing practical TSE schemes. Our construction can be implemented within the same framework as DiSE with minor adjustments and will be similarly efficient (only with a factor of up to 2 degradation in the throughput, which is a reasonable price to pay for ensuring adaptive security). Our approach to construct adaptive DPRF is similar to and inspired by the adaptive threshold signature scheme of LJY [24]. Our proof strategy follows their footsteps closely. In fact, the LJY construction was mentioned as a plausible approach to construct adaptively secure DPRF in [25]. However, the threshold signature construction, as presented in LJY, becomes more cumbersome due to the requirement of public verification. In particular, it additionally requires bilinear maps and therefore is not suitable for using in the adaptive TSE setting. In this paper we provide a clean and simple exposition of the DPRF (that largely resembles the DiSE’s), formalize the arguments and apply it to construct efficient 3

A privately verifiable version, similar to Figure 6 of [7] can be constructed analogously with similar efficiency. We do not elaborate on that.

Adaptively Secure Threshold Symmetric-Key Encryption

469

adaptive TSE. Furthermore, we go one step further to combine it with adaptive NIZK to obtain a stronger version of adaptive DPRF and subsequently a stronger adaptive TSE.

3

Technical Overview

Technical Overview. Our main technical contribution is twofold: (i) definitions and (ii) constructions. DiSE definitions are game-based and hence they crucially rely on “tracking the exact amount of information” acquired by an attacker via playing security games with the challenger. In other words, the attacker is “allowed” to receive a certain amount of information and is only considered a “winner” if it is able to produce more than that. For standard (non-interactive) schemes, like CCA-secure encryption this is captured simply by requiring the adversary to submit challenge ciphertexts that are not queried to the decryption oracle (in other words the attacker is allowed to receive decryption of any ciphertexts that are not challenge ciphertexts). In the interactive setting this becomes challenging as already elaborated in DiSE. In the adaptive case this becomes even trickier as the corrupt set is not known until the very end.4 Our general strategy is to “remember everything” (in several lists) the adversary has queried so far. Later in the final phase, when the entire corrupt set is known, the challenger “extracts” the exact amount of information learned by the adversary from the list and decides whether the winning condition is satisfied based on that. Our DDH-based adaptive DPRF construction, inspired by the adaptive threshold signature of LJY, is based on a small but crucial tweak to the NPR’s DP-DDH construction. Recall that, NPR’s DP-DDH construction simply outputs y := DPk (x) = H(x)k on input x, when k is the secret-key and H : {0, 1} → G is a hash function (modeled as a random oracle) mapping to a cyclic group G; k is shared among n parties by a t-out-of-n Shamir’s secret sharing scheme. Each party i outputs yi := H(x)ki where ki is the i-th share of k. The reconstruction is a public procedure, that takes many yi ’s from a set S   λ (of size at least t) and computes i∈S yi i,S = H(x) i∈S λi,S ki = y where λi,S is the Lagrange coefficient for party i for set S. The DPRF pseudorandomness requires that, despite learning multiple real outputs DPk (x1 ), DPk (x2 ), . . . (xi ’s are called non-challenge inputs), the value H(x )k on a fresh x = xi remains pseudorandom. To prove security, the key k must not be known to the reduction (as it would be replaced by a DDH challenge) except in the exponent (that is g k ); the reduction may sample at most (t − 1) random ki ’s which leaves k completely 4

Note that, we assume a stronger erasure-free adaptive model, in that each party keeps its entire history from the beginning of execution in its internal state. Therefore, when the adversary corrupts a party, it gets access to the entire history. This compels the reduction to “explain” its earlier simulation of that party, before it was corrupt. In a weaker model, that assumes erasure, parties periodically removes their history.

470

P. Mukherjee

undetermined. Let us assume a relatively simpler case when the attacker corrupts exactly (t − 1) parties. In this case, a static adversary declares the corrupt set C ahead of time and the reduction may pick only those ki such that i ∈ C, which are given to the adversary to simulate corruption. The evaluation queries on nonchallenge inputs are easily simulated using DDH challenges and extrapolating adequately in the exponent (those (t − 1) keys and the DDH challenge fully determines the key in the exponent). However, in the adaptive case the reduction can not do that because it does not know C until the very end– in the mean-time it has to simulate evaluation queries on non-challenge inputs. If the reduction tries to simulate them by sampling any (t − 1) keys, it may encounter a problem. For example, the reduction may end up simulating a party i’s response yi on a non-challenge input on an extrapolated key ki , which is only known to the reduction in the exponent as g ki . Later, an adaptive adversary may corrupt party i, when the reduction has to return ki – which is not known in the “clear”. The tweak we employ here is very similar to the one used in the adaptive threshold signature scheme of LJY [23]. The DPRF construction will now have two keys u and v and the output on x will be DPu,v (x) := w1u w2v where H(x) = (w1 , w2 ) ∈ G × G. The main intuition is that the value w1u w2v is not revealing enough information on (u, v). In our proof we program the random oracle on non-challenge inputs such that they are answered by H(x) = (g sj , g wsj ) for the same w. This change is indistinguishable to the attacker as long as DDH is hard. Once we are in this hybrid, information theoretically it is possible to argue that the attacker only gets information {s1 (u + vw), s2 (u + vw), . . .} – this basically leaves (u, v) undetermined except u + vw = k for given k and w. Hence, it is possible to handle adaptive queries as the original keys u, v are always known to the reduction. For the case when the attacker corrupts  < t − 1 parties, the DiSE DP-DDH proof becomes significantly more complex. This is due to the fact that, in that case the attacker may ask evaluation queries on the challenge x too, albeit up to g = (t − 1 − ) many of them (otherwise the attacker would have enough information to compute DPk (x )). Now, it becomes hard to simulate the challenge and non-challenge evaluation queries together from the DDH challenge if it is not known ahead of time which g evaluation queries would be made on x . To handle that, the DiSE proof first makes the non-challenge evaluation queries independent of the challenge evaluation queries – they went through q hybrids, where q is the total number of distinct evaluation queries. Each successive hybrids are proven indistinguishable assuming DDH; in each of these reductions the knowledge of C plays a crucial role. Our construction, on the other hand, takes care of this setting already due to the adaptive nature. Specifically, when we switch all the random oracle queries on any no-challenge input x to H(x) = (g sj , g wsj ) for the same w, the answers to the evaluation queries on x (that are still being programmed with H(x) = (g sj g tj ) for uniform tj ) becomes statistically independent of them. So, no additional effort is needed to handle the case  < t − 1. Similar to DiSE, we provide a stronger version of DPRF with a stronger adaptive correctness property; plugging-in with the generic DiSE construction

Adaptively Secure Threshold Symmetric-Key Encryption

471

we obtain a stronger variant of TSE scheme that too achieves adaptive correctness. Our stronger DPRF is obtained by adding a commit-and-prove technique similar to DiSE. However, in contrast to DiSE, which relies on trapdoor commitments, we only require statistically hiding commitment scheme. This is due to the fact that DiSE DPRF definition supports a weak form adaptivity that allows the adversary to choose the corrupt set based on the public parameters. The public parameters contain the commitments. Therefore, the reduction needs to produce the commitments in an “equivocal manner” before knowing C – when C is known, the reduction picks up the shares for corrupt parties and uses the trapdoor to open them to the committed values. Our construction, on the other hand, tackles adaptivity in a different way and hence trapdoors are not required. However, we need adaptive NIZKs for this strengthened construction. We provide a simple and efficient adaptive NIZK construction for the specific commit-andprove statement in Appendix A based on Schnorr’s protocol and Fiat-Shamir, which may be of independent interest.

4

Preliminaries

In this paper, unless mentioned otherwise, we focus on specific interactive protocols that consist of only two-rounds of non-simultaneous interactions: an initiating party (often called an initiator) sends messages to a number of other parties and gets a response from each one of them. In particular, the parties contacted do not communicate with each other. Our security model considers adaptive and malicious corruption. Common Notation. We use notations similar to DiSE. Let N denote the set of positive integers. We use [n] for n ∈ N to denote the set {1, 2, . . . , n}. A function f : N → N is negligible, denoted by negl, if for every polynomial p, f (n) < 1/p(n) for all large enough values of n. We use D(x) =: y or y := D(x) to denote that y is the output of the deterministic algorithm D on input x. Also, R(x) → y or y ← R(x) denotes that y is the output of the randomized algorithm R on input x. R can be derandomized as R(x; r) =: y, where r is the explicit random tape used by the algorithm. For two random variables X and Y we write X ≈comp Y to denote that they are computationally indistinguishable and X ≈stat Y to denote that they are statistically close. Concatenation of two strings a and b is either denoted by (ab) or (a, b). Throughout the paper, we use n to denote the total number of parties, t to denote the threshold, and κ to denote the security parameter. We make the natural identification between players and elements of {1, . . . , n}. We will use Lagrange interpolation for evaluating a polynomial. For any polynomial P , the i-th Lagrange coefficient for a set S to compute P (j) is denoted by λj,i,S . Matching the threshold, we will mostly consider (t − 1)-degree polynomials, unless otherwise mentioned. In this case, at least t points on P are needed to compute any P (j). Inputs and Outputs. We write [j : x] to denote that the value x is private to party j. For a protocol π, we write [j : z  ] ← π([i : (x, y)], [j : z], c) to denote

472

P. Mukherjee

that party i has two private inputs x and y; party j has one private input z; all the other parties have no private input; c is a common public input; and, after the execution, only j receives an output z  . We write [i : xi ]∀i∈S or more compactly xS to denote that each party i ∈ S has a private value xi . Network Model. We assume that all the parties are connected by point-to-point secure and authenticated channels. We also assume that there is a known upperbound on the time it takes to deliver a message over these channels. Cryptographic Primitives. We need some standard cryptographic primitives to design our protocols like commitments, secret-sharing, adaptive non-interactive zero-knowledge proofs, etc. For completeness we provide the formal definitions, mostly taken verbatim from DiSE, in the full version [27].

5

Distributed Pseudo-Random Functions: Definitions

We now present a formal treatment of adaptively secure DPRF. First we present the DPRF consistency which is exactly the same as in DiSE [6] and is taken verbatim from there as it does not consider any corruption. Definition 1 (Distributed Pseudo-random Function). A distributed pseudo-random function (DPRF) DP is a tuple of three algorithms (Setup, Eval, Combine) satisfying a consistency property as defined below. – Setup(1κ , n, t) → ((sk1 , . . . , skn ), pp). The setup algorithm generates n secret keys (sk1 , sk2 , . . . , skn ) and public parameters pp. The i-th secret key ski is given to party i. – Eval(ski , x, pp) → zi . The Eval algorithm generates pseudo-random shares for a given value. Party i computes the i-th share zi for a value x by running Eval with ski , x and pp. – Combine({(i, zi )}i∈S , pp) =: z/⊥. The Combine algorithm combines the partial shares {zi }i∈S from parties in the set S to generate a value z. If the algorithm fails, its output is denoted by ⊥. Consistency. For any n, t ∈ N such that t ≤ n, all ((sk1 , . . . , skn ), pp) generated by Setup(1κ , n, t), any input x, any two sets S, S  ⊂ [n] of size at least t, there exists a negligible function negl such that Pr[Combine({(i, zi )}i∈S , pp) = Combine({(j, zj )}j∈S  , pp) = ⊥] ≥ 1 − negl(κ), where zi ← Eval(ski , x, pp) for i ∈ S, zj ← Eval(skj , x, pp) for j ∈ S  , and the probability is over the randomness used by Eval. Next we define the adaptive security of DPRF. This differs from the definition provided in DiSE as for both correctness and pseudorandomness adaptive corruption is considered.

Adaptively Secure Threshold Symmetric-Key Encryption

473

Definition 2 ((Strong)-adaptive security of DPRF). Let DP be a distributed pseudo-random function. We say that DP is adaptively secure against malicious adversaries if it satisfies the adaptive pseudorandomness requirement (Definition 3). Also, we say that DP is strongly-adaptively-secure against malicious adversaries if it satisfies both the adaptive pseudorandomness and adaptive correctness (Definition 4) requirements. A DPRF is adaptively pseudorandom if no adaptive adversary can guess the PRF value on an input for which it hasn’t obtained shares from at least t parties. It is adaptively correct if no adaptive adversary can generate shares which lead to an incorrect PRF value. We define these properties formally below. Definition 3 (Adaptive pseudorandomness). A DPRF DP := (Setup, Eval, Combine) is adaptively pseudorandom if for all PPT adversaries A, there exists a negligible function negl such that |Pr [PseudoRandDP,A (1κ , 0) = 1] − Pr [PseudoRandDP,A (1κ , 1) = 1]| ≤ negl(κ), where PseudoRand is defined below. PseudoRandDP,A (1κ , b): – Initialization. Run Setup(1κ , n, t) to get ((sk1 , . . . , skn ), pp). Give pp to A. Initialize the state of party-i as sti := {ski }. The state of each honest party is updated accordingly– we leave it implicit below. Initialize a list L := ∅ to record the set of values for which A may know the PRF outputs. Initialize the set of corrupt parties C := ∅. – Adaptive corruption. At any point receive a new set of corrupt parties C˜ from ˜ A. Give the states {sti }i∈C˜ of these parties to A and update C := C ∪ C. Repeat this step as many times as A desires. – Pre-challenge Evaluation. In response to A’s evaluation query (Eval, x, i) return Eval(ski , x, pp) to A. Repeat this step as many times as A desires. Record all these queries. – Build lists. For each evaluation query on an input x, build a list Lx containing all parties contacted at any time in the game. – Challenge. A outputs (Challenge, x , S  , {(i, zi )}i∈U  ) such that |S  | ≥ t and U  ∈ S  ∩ C. Let zi ← Eval(ski , x , pp) for i ∈ S  \U  and z  := Combine({(i, zi )}i∈S\U ∪ {(i, zi )}i∈U , pp). If z  = ⊥, return ⊥. Else, if b = 0, return z  ; otherwise, return a uniformly random value. – Post-challenge evaluation and corruption. Exactly same as the pre-challenge corruption and evaluation queries. – Guess. When A returns a guess b then do as follows: – if the total number of corrupt parties, |C| ≥ t then output 0 and stop; – if the challenge x has been queried for evaluation for at least g := t − |C| honest parties, that is if Lx ∩ ([n]\C) ≥ g then output 0 and stop; – otherwise output b .

474

P. Mukherjee

Remark 1 (Difference with static security [6]). The main differences with the static version, given in DiSE, are in the “Corruption” and the “Guess” phase. Corruption takes place at any time in the security game and the set of all corrupt parties is updated correspondingly. Now, we need to prevent the adversary to win trivially. For that, we maintain lists corresponding to each evaluation input (in DiSE definition only one list suffices) and in the end check that whether the adversary has sufficient information to compute the DPRF output itself. This becomes slightly trickier than the static case due to constant updating of the list of corrupt parties. Remark 2 (Comparing with definition of [25]). Our pseudorandomness definition is stronger than the definition of Libert et al. [25], in that a malicious adversary is not allowed to supply malformed partial evaluations during the challenge phase. We handle this by attaching NIZK proofs. Definition 4 (Adaptive correctness). A DPRF DP := (Setup, Eval, Combine) is adaptively correct if for all PPT adversaries A, there exists a negligible function negl such that the following game outputs 1 with probability at least 1 − negl(κ). – Initialization. Run Setup(1κ , n, t) to get ((sk1 , . . . , skn ), pp). Give pp to A. Initialize the state of party-i as sti := {ski }. The state of each honest party is updated accordingly– we leave it implicit below. Initialize the set of corrupt parties C := ∅. – Adaptive Corruption. At any time, receive a new set of corrupt parties C˜ from ˜ < t. Give the secret states {sti } ˜ of these parties to A A, where |C ∪ C| i∈C ˜ Repeat this step as many times as A desires. and update C := C ∪ C. – Evaluation. In response to A’s evaluation query (Eval, x, i) for some i ∈ [n]\C, return Eval(ski , x, pp) to A. Repeat this step as many times as A desires. – Guess. Receive a set S of size at least t, an input x , and shares {(i, zi )}i∈S∩C from A. Let zj ← Eval(skj , x , pp) for j ∈ S and zi ← Eval(ski , x , pp) for i ∈ S\C. Also, let z := Combine({(j, zj )}j∈S , pp) and z  := Combine({(i, zi )}i∈S\C ∪ {(i, zi )}i∈S∩C , pp). Output 1 if z  ∈ {z, ⊥}; else, output 0.

6

Threshold Symmetric-Key Encryption: Definitions

In this section, we provide the formal definitions of threshold symmetric-key encryption (TSE). We start by specifying the algorithms that constitute a TSE scheme, which is taken verbatim from DiSE [6]. Definition 5 (Threshold Symmetric-key Encryption). A threshold symmetric-key encryption scheme TSE is given by a tuple (Setup, DistEnc, DistDec) that satisfies the consistency property below.

Adaptively Secure Threshold Symmetric-Key Encryption

475

– Setup(1κ , n, t) → (sk[n] , pp): Setup is a randomized algorithm that takes the security parameter as input, and outputs n secret keys sk1 , . . . , skn and public parameters pp. The i-th secret key ski is given to party i. – DistEnc(sk[n] , [j : m, S], pp) → [j : c/⊥]: DistEnc is a distributed protocol through which a party j encrypts a message m with the help of parties in a set S. At the end of the protocol, j outputs a ciphertext c (or ⊥ to denote failure). All the other parties have no output. – DistDec(sk[n] , [j : c, S], pp) → [j : m/⊥]: DistDec is a distributed protocol through which a party j decrypts a ciphertext c with the help of parties in a set S. At the end of the protocol, j outputs a message m (or ⊥ to denote failure). All the other parties have no output. Consistency. For any n, t ∈ N such that t ≤ n, all (sk[n] , pp) output by Setup(1κ ), for any message m, any two sets S, S  ⊂ [n] such that |S|, |S  | ≥ t, and any two parties j ∈ S, j  ∈ S  , if all the parties behave honestly, then there exists a negligible function negl such that  Pr [j  : m] ← DistDec(sk[n] , [j  : c, S  ], pp) |

 [j : c] ← DistEnc(sk[n] , [j : m, S], pp) ≥ 1 − negl(κ),

where the probability is over the random coin tosses of the parties involved in DistEnc and DistDec. Next we define the security of a TSE scheme in presence of an adaptive and malicious adversary. Definition 6 ((Strong)-Adaptive Security of TSE). Let TSE be a threshold symmetric-key encryption scheme. We say that TSE is (strongly)adaptively secure against malicious adversaries if it satisfies the (strong)adaptive correctness (Definition 7), adaptive message privacy (Definition 8) and (strong)-adaptive authenticity (Definition 9) requirements. 6.1

Adaptive Correctness

The adaptive correctness definition barely changes from the static version in DiSE, except the required adjustment in the corruption phase. Definition 7 (Adaptive Correctness). A TSE scheme TSE := (Setup, DistEnc, DistDec) is adaptively correct if for all PPT adversaries A, there exists a negligible function negl such that the following game outputs 1 with probability at least 1 − negl(κ). – Initialization. Run Setup(1κ ) to get (sk[n] , pp). Give pp to A. Initialize the state of party-i as sti := {ski }. The state of each honest party is updated accordingly– we leave it implicit below. Initialize the set of corrupt parties to C := ∅.

476

P. Mukherjee

– Adaptive Corruption. At any time receive a new set of corrupt parties C˜ from ˜ < t. Give the secret-states {sti } ˜ to A. Repeat this as A, where |C ∪ C| i∈C many times as A desires. – Encryption. Receive (Encrypt, j, m, S) from A where j ∈ S\C and |S| ≥ t. Initiate the protocol DistEnc from party j with inputs m and S. If j outputs ⊥ at the end, then output 1 and stop. Else, let c be the output ciphertext. – Decryption. Receive (Decrypt, j  , S  ) from A where j  ∈ S  \C and |S  | ≥ t. Initiate the protocol DistDec from party j  with inputs c, S  and pp. – Output. Output 1 if and only if j  outputs m or ⊥. A strongly-adaptively-correct TSE scheme is a correct TSE scheme but with a different output step. Specifically, output 1 if and only if: – If all parties in S  behave honestly, then j  outputs m; or, – If corrupt parties in S  deviate from the protocol, then j  outputs m or ⊥. Remark 3. Note that, an adaptive adversary may corrupt a party j right after the encryption phase such that the condition j ∈ S\C does not hold anymore. However, this does not affect the winning condition, because we just need the party j, who makes the encryption query, to output a legitimate ciphertext immediately – for which we need that party to be honest only within the encryption phase. 6.2

Adaptive Message Privacy

Similar to DiSE our definition is a CPA-security style definition additionally accompanied by an indirect decryption access to the attacker. However, due to adaptive corruption, handling indirect decryption queries become more subtle. We provide the formal definition below. Definition 8 (Adaptive message privacy). A TSE scheme TSE := (Setup, DistEnc, DistDec) satisfies message privacy if for all PPT adversaries A, there exists a negligible function negl such that   Pr MsgPrivTSE,A (1κ , 0) = 1 −

where MsgPriv is defined below.

  Pr MsgPrivTSE,A (1κ , 1) = 1  ≤ negl(κ),

Adaptively Secure Threshold Symmetric-Key Encryption

477

MsgPrivTSE,A (1κ , b): – Initialization. Run Setup(1κ , n, t) to get (sk[n] , pp). Give pp to A. Initialize the state of party-i as sti := {ski }. The state of each honest party is updated accordingly– we leave it implicit below. Initialize a list Ldec := ∅. – Adaptive Corruption. Initialize C := ∅ At any time receive a new set of ˜ < t. Give the secret-states {sti } ˜ corrupt parties C˜ from A, where |C ∪ C| i∈C to A. Repeat this as many times as A desires. – Pre-challenge encryption queries. In response to A’s encryption query (Encrypt, j, m, S), where j ∈ S and |S| ≥ t, run an instance of the protocol DistEnc with A5 . If j ∈ / C, then party j initiates the protocol with inputs m and S, and the output of j is given to A. Repeat this step as many times as A desires. – Pre-challenge indirect decryption queries. In response to A’s decryption query (Decrypt, j, c, S), where j ∈ S\C and |S| ≥ t, party j initiates DistDec with inputs c and S. Record j in a list Ldec . Repeat this step as many times as A desires. – Challenge. A outputs (Challenge, j  , m0 , m1 , S  ) where |m0 | = |m1 |, j  ∈ S  \C and |S  | ≥ t. Initiate the protocol DistEnc from party j  with inputs mb and S  . Give c (or ⊥) output by j  as the challenge to A. – Post-challenge encryption queries. Repeat pre-challenge encryption phase. – Post-challenge indirect decryption queries. Repeat pre-challenge decryption phase. – Guess. Finally, A returns a guess b . Output b if and only if (i) j  ∈ C and (ii) Ldec ∩ C = ∅; otherwise return a random bit. Remark 4. The main difference from the non-adaptive setting comes in the Guess phase, in that, the winning condition requires that neither (i) the initiator of the challenge query, (ii) nor any of the initiator of an indirect decryption query is corrupt. To handle the later we introduce a list Ldec which records identities of all parties who made an indirect decryption query. 6.3

Adaptive Authenticity

Similar to DiSE, our authenticity definition follows a one-more type notion. To adapt the authenticity definition into the adaptive setting, we need to make sure to exactly track the information gained by adversary which is sufficient to produce valid ciphertexts. This leads to some subtleties in the adaptive case. We incorporate that below by “delaying” the counting of the number of honest responses per query. Definition 9 (Adaptive authenticity). A TSE scheme TSE := (Setup, DistEnc, DistDec) satisfies authenticity if for all PPT adversaries A, there exists a negligible function negl such that Pr [AUTHTSE,A (1κ ) = 1] ≤ negl(κ), 5

Note that j can be either honest or corrupt here. So both types of encryption queries are captured.

478

P. Mukherjee

where AUTH is defined below. AUTHTSE,A (1κ ): – Initialization. Run Setup(1κ , n, t) to get (sk[n] , pp). Give pp to A. Initialize the state of party-i as sti := {ski }. The state of each honest party is updated accordingly– we leave it implicit below. Initialize counter ct := 0 and ordered lists Lact , Lctxt := ∅. Below, we assume that for every query, the (j, S) output by A are such that j ∈ S and |S| ≥ t. Initialize C := ∅. – Adaptive Corruption. At any time receive a new set of corrupt parties C˜ from ˜ < t. Give the secret-states {sti } ˜ to A. Repeat this as A, where |C ∪ C| i∈C many times as A desires. – Encryption queries. On receiving (Encrypt, j, m, S) from A, run the protocol DistEnc with m, S as the inputs of j. Append (j, S) into list Lact . If j ∈ C, then also append the ciphertext into the list Lctxt . – Decryption queries. On receiving (Decrypt, j, c, S) from A run the protocol DistDec with c, S as the inputs of j. Append (j, S) into the list Lact . – Targeted decryption queries. On receiving (TargetDecrypt, j, , S) from A for some j ∈ S\C, run DistDec with c, S as the inputs of j, where c is the -th ciphertext in Lctxt . Append (j, S) into Lact . – Forgery. For each j ∈ C, for each entry (there can be multiple entries, which are counted as many times) (j, S) ∈ Lact increment ct by |S\C|. Define g := t− |C| and k := ct/g. A outputs ((j1 , S1 , c1 ), (j2 , S2 , c2 ), . . . , (jk+1 , Sk+1 , ck+1 )) / C and cu = cv for any u = v ∈ [k + 1] (ciphertexts such that j1 , . . . , jk+1 ∈ are not repeated). For every i ∈ [k + 1], run an instance of DistDec with ci , Si as the input of party ji . In that instance, all parties in Si behave honestly. Output 0 if any ji outputs ⊥; else output 1. A TSE scheme satisfies strong-authenticity if it satisfies authenticity but with a slightly modified AUTH: In the forgery phase, the restriction on corrupt parties in Si to behave honestly is removed (for all i ∈ [k + 1]). Remark 5. The above definition has some important differences with the static case [6], because tracking the exact amount of information acquired by the adversary throughout the game for producing legitimate ciphertexts becomes tricky in presence of adaptive corruption. To enable this we keep track of pairs (j, S) for any encryption or decryption query made at any time irrespective of whether j is honest or corrupt at that time. This is because, an adaptive adversary may corrupt j at a later time. In the forgery phase, when the whole corrupt set C is known, then for each such pair the corresponding number of maximum possible honest responses |S\C| for that query is computed. For static corruption this issue does not come up as the corrupted set C is known in the beginning.

7

Our DPRF Constructions

In this section we provide several DPRF constructions against adaptive attackers. In Sect. 7.1 we provide our main DPRF construction. In Sect. 7.2, we provide an extension which is strongly adaptively secure. Finally in Sect. 7.3 we briefly argue the adaptive security naturally achieved by the DP-PRF construction.

Adaptively Secure Threshold Symmetric-Key Encryption

7.1

479

Adaptively-Secure DPRF

Our DDH-based DPRF is provided in Fig. 1. We prove the following theorem formally.

Fig. 1. An adaptively secure DPRF protocol Πadap based on DDH.

Theorem 1. Protocol Πadap in Fig. 1 is an adaptively secure DPRF under the DDH assumption in the programmable random oracle model. proof sketch. We need to show that the construction given in Fig. 1 satisfies the consistency and adaptive pseudorandomness. The consistency is straightforward from the construction. So below we only focus on adaptive pseudorandomness. In particular, we show that if there exists a PPT adversary A which breaks the adaptive pseudorandomness game, then we can build a polynomial time reduction that breaks the DDH assumption. The formal proof is provided in the full version [27]. We give a sketch below. Somewhat surprisingly our proof is significantly simpler than DiSE. This is because, since our construction is purposefully designed to protect against adaptive corruption, we can easily switch to a hybrid where the information obtained by the adverasry through all non-challenge evaluation queries are simulated using a key, that is statistically independent from the actual key. In contrast, DiSE’s proof needs to carefully make the non-challenge evaluation query independent to reach a similar hybrid by crucially relying on the knowledge of corrupt set from the beginning. For any PPT adversary A and a bit b ∈ {0, 1} we briefly describe the hybrids below: PseudoRandA (b). This is the real game in that the challenger chooses random (si , ti )←$ Z2p for simulating the i-th random oracle query on any xi as H(xi ) := (g si , g ti ). Hyb1A (b). In the next hybrid experiment Hyb1A (b) the only change we make is: the challenger guesses the challenge input x randomly (incurring a 1/qH loss for

480

P. Mukherjee

qH = poly(κ) distinct random oracle queries) and simulates the random oracle query on all xi = x as H(xi ) := (g si , g ωsi ) where ω, s1 , s2 , . . . are each sampled uniformly random from Zp . This implicitly sets ti := ωsi . Note that, each query has the same ω, but a different si – this way the challenger ensures that the attacker does not learn any new information by making more queries. However, for x the random oracle is programmed as usual by sampling random s , t   values as H(x ) := (g s , g t ). Claim. Assuming DDH is hard in group G, we have that PseudoRandA (b) ≈comp Hyb1A (b). Given a DDH challenge g α , g β , g γ where γ is either is equal to αβ or uniform random in Zp the reduction set g si := g μi · g αbi for uniform random μi , σi ∈ Z2p and g ti := g μi β · g γσi . Now, if γ = αβ, then implicitly (in the exponent) the challenger sets si := μi + σi α and ti := β(μi + σi α), which implies that ω := β and ti := si ω. So Hyb1 is perfectly simulated. On the other hand, when γ is uniform random, then ti := μi β + σi γ which is uniform random in Zp – this perfectly simulates Hyb0 . Hyb2A (b). In this hybrid we do not make any change from Hyb1A (b) except that all non-challenge evaluation queries are responded with a key k←$ Zp sampled uniformly at random, whereas the corruption query for party j are answered using randomly sampled uj , vj subject to uj + ωvj = kj where kj is the j-th Shamir’s share of k. In particular, for a non-challenge evaluation query Eval(xi , j), the challenger returns g si kj where H(xi ) = (g si , g ωsi ). The challenge query and the evaluation queries on x are answered similar to Hyb1A (b). Claim. We have that: Hyb1A (b) ≈stat Hyb2A (b) Note that, this statement is information theoretic and hence we assume that the adversary here can be unbounded. First we notice that, in both the hybrids an unbounded adversary learns values {s1 , s2 , . . .}, ω from the random oracle responses. Furthermore, it learns at most t − 1 pairs {uj , vj }i∈C where |C| < t, given which (u, v) remains statistically hidden. Now, the only difference comes in the non-challenge evaluation queries. In Hyb1A (b), the adversary learns {u + ωv} from them whereas in Hyb2A (b) it learns a random k. Now, we note that conditioned on the common values, u + vω is uniformly random for randomly chosen u, v, because it is basically a universal hash function where ω is the input and (u, v) are uniform random keys. Hence the two distributions are statistically close. Now, once we are in Hyb2A (b), it is easy to observe that the response to the challenge query is uniformly random irrespective of b, which in turn implies that Hyb2A (0) ≈stat Hyb2A (1), which concludes the proof of the theorem. We provide the full proof in detail in the full version.  

Adaptively Secure Threshold Symmetric-Key Encryption

7.2

481

Strongly-Adaptively Secure DPRF

Adding adequate NIZK proofs and commitments we obtain an adaptively-secure DPRF which satisfies adaptive correctness too. However, we need to rely on statistically hiding and a specific adaptive NIZK6 argument [21] for a commitand-prove style statement. We also provide a simple and efficient construction of adaptive NIZK argument system for the particular commit-and-prove statement in Appendix A based on Schnorr’s protocol and Fiat-Shamir’s transformation, which may be of independent interest. The protocol is proven secure assuming DDH in the random oracle model. The construction is provided in Fig. 2. Formally we prove the following theorem. We skip the full proof as it is quite similar to the proof of Theorem 1 with the adequate changes. Instead we provide a sketch below remarking on the changes needed. Theorem 2. Protocol Πstr-adap in Fig. 2 is a strongly adaptively secure DPRF under the DDH assumption in the programmable random oracle model. proof sketch. For strong adaptive security we need to show (i) adaptive pseudorandomness and (ii) adaptive correctness. First we discuss adaptive correctness. The proof of adaptive correctness is very similar to the one provided in DiSE for the non-adaptive case except adequate changes in the statement of the NIZK proof. Recall that, the main idea of the construction (Fig. 2) is to use commitments using which the Setup procedure publishes commitments of everyone’s secret key. Later, when queried for an evaluation, each party, in addition to the evaluation, sends a NIZK proof stating that the evaluation is correctly computed using the actual keys (those are committed before). So, to violate adaptive correctness the attacker must break either the simulation-soundness of NIZK or the binding of the commitment scheme– rendering its task infeasible. Note that, the adversary does not gain anything by being adaptive in this case. The adaptive pseudorandomness proof follows the footstep of the proof of Theorem 1. However, due to presence of the commitments and NIZKs some adjustments are required. In particular, we need to ensure that neither the commitments, nor the proofs give away more information, for which statistical hiding and adaptive zero-knowledge properties of them will be used respectively. We describe the hybrids below and highlight the changes in red from the proof of Theorem 1. For any PPT adversary A and a bit b ∈ {0, 1} we briefly describe the hybrids below: PseudoRandA (b). This is the real game in that the challenger chooses random (si , ti )←$ Z2p for simulating the i-th random oracle query on any xi as H(xi ) := (g si , g ti ). Hyb1A (b). In the next hybrid experiment Hyb1A (b) the only change we make is: the challenger guesses the challenge input x randomly (incurring a 1/qH loss for qH = poly(κ) distinct random oracle queries) and simulates the random oracle 6

Groth et al. [21] alternatively calls them zero-knowledge in erasure-free model.

482

P. Mukherjee

query on all xi = x as H(xi ) := (g si , g ωsi ) where ω, s1 , s2 , . . . are each sampled uniformly random from Zp . This implicitly sets ti := ωsi . Note that, each query has the same ω, but a different si – this way the challenger ensures that the attacker does not learn any new information by making more queries. However, for x the random oracle is programmed as usual by sampling random s , t   values as H(x ) := (g s , g t ). Claim. Assuming DDH is hard in group G, we have that PseudoRandA (b) ≈comp Hyb1A (b). This claim follows analogously to Theorem 1. 1.5 (b). In this hybrid the only changes made are in the NIZK proofs– the HybA challenger sends simulated NIZK proofs instead of the actual NIZK proofs to the attacker for all honest evaluation/challenge requests. Hence, the NIZK proofs are made independent of the witnesses (ui , vi , ρ1i , ρ2i ). From the adaptive zero-knowledge property of NIZK we conclude that 1.5 (b). Note that, adaptive zero-knowledge is crucial here Hyb1A (b) ≈comp HybA as the attacker may corrupt a party, on behalf of which a simulated proof has already been sent earlier. The corruption request is simulated by providing a randomness along with the witnesses ((ui , vi , ρ1i , ρ2i ) for party i in this case) to explain the simulated proof. Hyb2A (b). In this hybrid we do not make any change from Hyb1A (b) except that all non-challenge evaluation queries are responded with an independent key k←$ Zp sampled uniformly at random, whereas the corruption query for party j are answered using randomly sampled uj , vj subject to uj + ωvj = kj where kj is the j-th Shamir’s share of k. In particular, for a non-challenge evaluation query Eval(xi , j), the challenger returns g si kj where H(xi ) = (g si , g ωsi ). Challenge q