RISC-V ISA Extension for Control Flow Integrity

Low-level programming languages such as C and C++ delegate memory management to the programmer. Incorrect memory handlin

889 97 365KB

German Pages 49 Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Rüdimoisi isa

Remsu koolilood Tartus õppimise ajast. Lugu on tõsielutruu, remsulik, mõnusa lopsaka huumoriga ja algab kirjandusklassik

313 77 120KB Read more

Optimal impulsive control. The extension approach 9783030022594, 9783030022600

370 71 1MB Read more

Novel Techniques in Recovering, Embedding, and Enforcing Policies for Control-Flow Integrity (Information Security and Cryptography) [1st ed. 2021] 3030731405, 9783030731403

There are three fundamental components in Control-Flow Integrity (CFI) enforcement. The first component is accurately re

231 13 5MB Read more

Novel Techniques in Recovering, Embedding, and Enforcing Policies for Control-Flow Integrity (Information Security and Cryptography) [1st ed. 2021] 3030731405, 9783030731403

There are three fundamental components in Control-Flow Integrity (CFI) enforcement. The first component is accurately re

108 66 4MB Read more

Plant Flow Measurement and Control Handbook [1st ed.] 9780128124383

Plant Flow Measurement and Control Handbook is a comprehensive reference source for practicing engineers in the field of

2,301 231 68MB Read more

XV6 RISCV Book 中文版

711 91 73KB Read more

S-Parameters for Signal Integrity 9781108784863

Master the usage of s-parameters in signal integrity applications and gain full understanding of your simulation and mea

3,272 363 7MB Read more

riscv spec v2.1 中文版

1,588 275 1MB Read more

Power Integrity for I/O Interfaces: With Signal Integrity/Power Integrity Co-Design 1809050979, 9780137011193, 0137011199, 2332342362

Foreword by Joungho KimThe Hands-On Guide to Power Integrity in Advanced Applications, from Three Industry ExpertsIn thi

1,311 302 9MB Read more

Isa Genzken: Sculpture as World Receiver 9780226410036

The work of German sculptor Isa Genzken is brilliantly receptive to the ever-shifting conditions of modern life. In this

160 23 8MB Read more

RISC-V ISA Extension for Control Flow Integrity

Author / Uploaded
Leander Seidlitz

Categories
Computers
Security

Table of contents :
Introduction......Page 7
Contributions......Page 8
RISC-V......Page 9
Code Reuse Attacks......Page 11
Control Flow Integrity......Page 12
Current Defense Mechanisms......Page 13
ARM Pointer Authentication......Page 14
Authenticating and Verifying Pointers......Page 15
QARMA......Page 16
Security Properties......Page 17
Pointer Authentication on RISC-V......Page 20
Authentication and Verification of Pointers......Page 21
Key Management......Page 22
Instructions......Page 23
Control Flow Integrity......Page 25
Protection of Generic Data Structures......Page 26
Hardware......Page 27
Instructions......Page 28
Control and Status Registers......Page 29
GCC......Page 30
Linux Binutils and GDB......Page 32
Code Reuse Attacks......Page 34
PAC Entropy......Page 35
Signing Gadgets......Page 36
Instructions......Page 37
Return address signing......Page 38
Compatibility......Page 39
Related Work......Page 40
Conclusion......Page 41
Creating PACs for other Privilege Levels......Page 42
Extending Software Support......Page 43
Acronyms......Page 46
auth Instruction Implementation......Page 47
vrfy Instruction Implementation......Page 48
strp Instruction Implementation......Page 49

Citation preview

D EPARTMENT OF I NFORMATICS T ECHNICAL U NIVERSITY OF M UNICH

Bachelor’s Thesis in Informatics

RISC-V ISA Extension for Control Flow Integrity

Leander Seidlitz

D EPARTMENT OF I NFORMATICS T ECHNICAL U NIVERSITY OF M UNICH

Bachelor’s Thesis in Informatics

Adaption von Control Flow Integrity in der RISC-V ISA RISC-V ISA Extension for Control Flow Integrity

Author: Leander Seidlitz Supervisor: Prof. Dr. Claudia Eckert Advisor: Lukas Auer, M.Sc. Submission Date: 12. April 2019

I confirm that this Bachelor’s Thesis is my own work and I have documented all sources and material used.

Ich versichere, dass ich diese Bachelorarbeit selbst¨andig verfasst und nur die angegebenen Quellen und Hilfsmittel verwendet habe.

Ort, Datum

Leander Seidlitz

Abstract Low-level programming languages such as C and C++ delegate memory management to the programmer. Incorrect memory handling may cause memory errors, which present a prime target for attackers. Currently wide deployed defense mechanisms provide good protection against certain classes of attacks. Many mechanisms are defeated by powerful attackers with arbitrary memory access, as they rely on secrets stored in memory. We recognize the need for defense measures that can cope with such attackers. With ARMv8.3-A ARM has introduced ARM PAC, hardware support for pointer authentication. A PAC is a Message Authentication Code bound to the pointer value, a context, and a secret key. The PAC is stored in the unused bits of the pointer. It allows reliable tamper detection. It can be used to enforce Control Flow Integrity, providing strong hardware-based protection against code-reuse attacks. In this work we present an adaption of ARM PAC on the RISC-V architecture. We develop an extension to the Instruction Set Architecture for hardware-based pointer authentication. We modify GCC to support return address protection using pointer authentication instructions. Our approach allows for protection against strong attackers with arbitrary memory access.

Contents 1

Introduction 1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 2

2

Background 2.1 RISC-V . . . . . . . . . . . . . 2.2 Code Reuse Attacks . . . . . . 2.3 Control Flow Integrity . . . . 2.4 Current Defense Mechanisms

3

4

5

6

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

3 3 5 6 7

ARM Pointer Authentication 3.1 Pointer Authentication . . . . . . . . . . . . . 3.1.1 Authenticating and Verifying Pointers 3.1.2 Key Management . . . . . . . . . . . . 3.1.3 QARMA . . . . . . . . . . . . . . . . . 3.2 Instructions . . . . . . . . . . . . . . . . . . . 3.3 Security Properties . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

8 9 9 10 10 11 11

. . . . . . . . . .

14 14 15 15 16 17 17 19 19 19 20

. . . . . . .

21 21 22 23 24 24 24 26

. . . . . . .

28 28 28 29 29 30 31 31

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

Design 4.1 Pointer Authentication on RISC-V . . . . . . . . . 4.1.1 Attacker Model . . . . . . . . . . . . . . . . 4.1.2 Authentication and Verification of Pointers 4.1.3 Key Management . . . . . . . . . . . . . . . 4.1.4 Cryptographic Primitives . . . . . . . . . . 4.2 Instructions . . . . . . . . . . . . . . . . . . . . . . 4.3 Use Cases of Pointer Authentication . . . . . . . . 4.3.1 Return Address Protection . . . . . . . . . 4.3.2 Control Flow Integrity . . . . . . . . . . . . 4.3.3 Protection of Generic Data Structures . . . Implementation 5.1 Hardware . . . . . . . . . . . . . . . 5.1.1 Instructions . . . . . . . . . . 5.1.2 Control and Status Registers 5.1.3 Cryptographic Primitives . . 5.2 Software . . . . . . . . . . . . . . . . 5.2.1 GCC . . . . . . . . . . . . . . 5.2.2 Linux Binutils and GDB . . .

. . . . . . .

. . . . . . .

. . . . . . .

Evaluation 6.1 Security Properties and Attack Resistance 6.1.1 Code Reuse Attacks . . . . . . . . 6.1.2 Signing Keys . . . . . . . . . . . . 6.1.3 PAC Entropy . . . . . . . . . . . . 6.1.4 Signing Gadgets . . . . . . . . . . 6.1.5 Leaking Keys . . . . . . . . . . . . 6.1.6 RIPE Testsuite . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

6.2

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

31 31 32 33

Discussion and Conclusion 7.1 Related Work . . . . . . . . . . . . . . . . . . . . 7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . 7.3.1 Combined Instructions . . . . . . . . . . . 7.3.2 Creating PACs for other Privilege Levels 7.3.3 Multiple Keys . . . . . . . . . . . . . . . . 7.3.4 Extending Software Support . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

34 34 35 36 36 36 37 37

6.3 7

Overhead . . . . . . . . . . . . . 6.2.1 Instructions . . . . . . . 6.2.2 Return address signing Compatibility . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

Acronyms Appendix auth Instruction Implementation . vrfy Instruction Implementation . authg Instruction Implementation strp Instruction Implementation .

40

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

41 41 42 43 43

1

Introduction

Low-level programming languages such as C and C++ delegate memory management to the programmer. Compilers for such languages are unable to check for invalid memory accesses or undefined behavior. We call these languages memory-unsafe. The programmer is left with access to the raw memory and is expected to manage it — and the programmer is prone to human error. A lot of system software that runs with high privileges is written in such memory-unsafe languages, e.g. the Linux kernel or device drivers. This makes memory errors a prime target for attackers, which leverage memory errors to gain control over a system by strategically corrupting control data stored in writable memory. The arms race between attackers and countermeasures has gone on for some time. Prior to Data Execution Prevention (DEP) [4], attackers could inject executable code into process memory. The attacker would then trigger execution of this code. Direct execution of injected code is prevented by the hardware-assisted DEP present in modern CPUs. It forces writable memory to be non-executable and therefore prevents the execution of injected code. As DEP prevents the former exploit, attackers have moved on to attacks based on other vectors. Code Reuse Attacks do not rely on attacker crafted code but rather corrupt control data and utilize existing program code. These attacks leverage existing instruction sequences in the executable in order to bypass non-executable memory. As the writable memory contains control data and pointers, attackers strategically corrupt these in order to hijack the program control flow. Chaining together program parts in a way not intended by the programmer allows for attacker defined behavior. Many currently widely deployed protection mechanisms are defeated by attackers with arbitrary memory access. Therefore we need approaches for better protection. A class of such approaches is based on Control Flow Integrity (CFI). CFI monitors program execution and is able to detect when a program diverts from normal behavior. Attacks cause unexpected behavior, which CFI detects. CFI approaches often trade off precision for performance. Solutions offering very precise CFI are able to reliably detect attacks but come with a large runtime overhead or require source code modification, properties unacceptable for protection mechanisms that need to be widely compatible. In the case of legacy systems, source code modification is not an option. Attack mitigation mechanisms need to have a low overhead and require little to no effort from the programmer in order to be accepted. Many protection mechanisms are implemented in software, yet the underlying hardware poses limits. Adding new mechanisms to the hardware itself allows for approaches 1

1

Introduction

not limited by existing structures. Furthermore, hardware-based implementations have the advantage of being uninfluenceable by an attacker. Software is influenceable by an attacker as he can access the memory but the underlying hardware is out of his reach. With Pointer Authentication Codes (PACs) [16, 17, 5] ARM has implemented an addition to its Instruction Set Architecture (ISA) allowing for integrity protection and authentication of pointers. These additions can also be used to enforce hardware-assisted CFI. ARM PAC remains effective against strong attackers with arbitrary memory access. ARM’s hardware-based approach offers a lower overhead in contrast to software solutions. Protection schemes making use of ARM’s additions, such as PARTS [14] offer very good runtime protection. PARTS offers protection of all data and instruction pointers at a runtime overhead of 19.5%. Protection of the return address comes with a low overhead of 0.5%. As PARTS is compiler based, it does not require source code modification.

1.1

Contributions

In this paper we present an approach to pointer authentication for the RISC-V ISA, based on ARM’s approach to pointer authentication [16, 17, 5]. We present a design and an extension of the RISC-V ISA for pointer integrity protection and authentication. We report the hardware modifications and instructions added. Our approach allows reliable detection of pointer tampering. It can withstand strong attackers with arbitrary memory access. We present a use-case of our approach, adding compiler support for protection of the return address stored on the stack. We show that our solution is effective against Code Reuse Attacks. Our hardware additions allow for strong protection mechanisms utilizing the added instructions. The remainder of this thesis is organized as follows: In Chapter 2 we give an overview over the RISC-V ISA, the concept of CFI as well as relevant attacks and countermeasures. Chapter 3 gives an overview over ARM PAC, ARM’s approach to pointer authentication. In Chapter 4 we present our design for pointer authentication for the RISC-V ISA. Chapter 5 presents the implementation details of our design as well as a proof-of-concept: protection of the return address stored on the stack. In Chapter 6 we evaluate our approach. In Chapter 7 we give an overview over related work, a conclusion, and future extensions.

2

2

Background

Our approach to pointer authentication extends the RISC-V ISA. Relevant background is given in the following. We introduce to the RISC-V ISA, code reuse attacks, CFI and common protection mechanisms.

2.1

RISC-V

The RISC-V architecture [23, 22] is a free and open load/store Reduced Instruction Set Computer (RISC) ISA, providing support for 32 and 64-bit address space variants. In contrast to other architectures, the RISC-V ISA takes a modular approach. The modular approach splitting up the ISA allows for custom, non-standard extensions that extend the ISA for special purposes. Instruction Set Architecture. There are four base integer instruction sets: the two primary base ISAs RV32I and RV64I, the 128-bit variant RV128I, and RV32E, which is targeted at small microcontrollers. Each base ISA provides a minimal target for toolchains and computation. In the following we focus on the 64-bit variant RV64I. The base integer instruction set is complemented by extensions (such as floating point or atomics) to provide a target for general-purpose software. The 64-bit base integer instruction set used in the following is RV64I (RV64 denoting the 64-bit variant and I denoting the integer base instruction set). The RISC-V ISA string consists of the base architecture (RV64I) followed by the extensions present. The generalpurpose instruction set includes the base integer ISA and extensions, being denoted as RV64G which is equal to RV64IMAFD. The general-purpose instruction set includes • the integer base instruction set I • the standard extension M for Integer Multiplication and Division • the standard extension A for Atomic Instructions • the standard extension F for Single-Precision Floating-Point • the standard extension D for Double-Precision Floating-Point. By splitting the ISA into a base and extensions, RISC-V avoids bloating the standard instruction set. RISC-V aims to only have the minimal necessary set of instructions present. 3

2

31

25 24

funct7 imm[11:0] imm[11:5]

20

rs2 rs2 imm[31:12]

19

15

rs1 rs1 rs1

14

12

funct3 funct3 funct3

11

7

rd rd imm[4:0] rd

6

Background

0

opcode opcode opcode opcode

R-type I-type S-type U-type

Table 1: RISC-V instruction formats. imm[11:0] denotes immediate bit positions 11 to 0. Taken from the RISC-V ISA manual [23].

Registers. The RV64I base RISC-V ISA specifies 31 general-purpose registers (x1-x31), a dedicated zero register (x0) and a program counter (pc). The register width is denoted by XLEN. For the 64-bit variant RV64 XLEN is equal 64. Instruction Formats. Instructions of the general-purpose ISA are 32-bit wide with exception of the compressed instructions of the C extension, which are 16-bit wide. The C extension offers compressed 16-bit instructions for common operations. Instructions must be four-byte aligned in memory, except if the C extension is present. In this case alignment must be on a two-byte boundary. The base instruction set specifies four core instruction formats: R, I, S and U. The instruction formats are listed in Table 1. There are two additional immediate encoding variants (B, J) encoding branch and jump instructions. The rs1 and rs2 fields encode the source registers. The result of the instruction is written to the register rd. Immediate values are encoded by the imm fields and may be split into multiple fields. If present, funct7, funct3 and opcode encode the instruction. opcode specifies the instruction’s major opcode. By keeping the source and destination register at the same position decoding is simplified. Privilege Levels. The ISA specifies three privilege levels: User mode (U-mode), Supervisor mode (S-mode) and Machine mode (M-mode). The lowest privilege level is U-mode. The levels and encoding are listed in Table 2. The minimum an implementation has to support is M-mode. In a Linux environment the bootloader and firmware run in M-mode, the kernel in S-mode and user software in U-mode. Control and Status Registers. Control and Status Registers (CSRs) are special registers that contain state information of a processor, such as timers and counters. The RISC-V ISA specifies 12-bit wide CSR addresses. A CSR’s address implicitly encodes the access permissions. The highest two bits (bits [11:10]) encode whether the CSR is read/write accessible (values 00, 01, 10) or read-only (11). The next two bits (bits [9:8]) encode the lowest privilege level that can access the CSR. Privilege level encodings are listed in Table 2. The CSRs are accessed directly by the CSR instructions which atomically read-modifywrite a single CSR. Accessing a CSR without having the appropriate privilege level raises an illegal instruction exception. Virtual Memory. RV64I U-mode and S-mode support a mode for direct addressing as well as 39-bit (Sv39) and 48-bit (Sv48) wide Virtual Addresses (VAs). M-mode does not support address translation. The MODE field in the satp (Supervisor Address Translation 4

2

Level 0 1 2 3

Encoding 00 01 10 11

Background

Name User / Application Supervisor Reserved Machine

Table 2: Encoding of RISC-V privilege levels.

and Protection) CSR encodes the currently active addressing mode. Both Sv39 and Sv48 use 4KiB pages, resulting in the lowest 12-bit of the VA representing the offset. The offset remains untranslated. Memory permissions (r/w/x for read/write/execute) are enforced on a per-page basis. Sv39 utilizes a three-level page table for address translation, using 9-bit to address each level. Sv48 adds another level to the translation process, resulting in four levels (each using 9-bit). An entry of any level can be a leaf entry. This adds hardware support for large pages. In this case, the remaining bits of the VA are treated as offset. In case of Sv39, a first-level leaf entry addresses a 1 GiB gigapage, and a second-level entry addresses a 2 MiB megapage.

2.2

Code Reuse Attacks

Code Reuse Attacks (CRAs) leverage existing program code in order to bypass nonexecutable memory. Existing instruction sequences are chained together and executed in an attacker defined order, leading to attacker controlled behavior. In the context of this work so called Return Oriented Programming (ROP) attacks are especially relevant. ROP [18, 21] reuses short instruction sequences that end with a return instruction. Such a sequence is called a ROP-Gadget. Each gadget fulfills a sub-task in the exploit. When chained together, the gadgets lead to attacker defined behavior. A series of memory addresses of such ROP-Gadgets is called a ROP-Chain. The ROP-Chain is placed in writable memory, such as the stack. The attacker then leverages a memory error to overwrite the return address of a function stored on the stack with the ROP-Chain. As this function returns the program counter will be loaded with the content of the address that previously contained the return address, which now contains the address of the first gadget of the ROP-Chain. As this gadget returns, the stack pointer is increased. Execution then works its way down the ROP-Chain. Additional control of the stack pointer allows the attacker to choose the memory location the return address is restored from on return. By using a gadget that allows the attacker to modify the stack pointer he is able to utilize ROP-Chains that are not placed directly behind the return address such as on the heap. Figure 1 shows a simplified ROP scenario on the RISC-V architecture. The left side shows the stack layout, the right side shows selected disassembled pieces of the code segment. These represent the ROP-gadgets. The attacker leverages a continuous buffer overflow to overwrite the return address, starting at the address of buffer. He overwrites the saved return address with the address of his first gadget (0x9170), which enables him to set register a0. On the return of the current function, control flow is diverted to this gadget as the return restores its address into the program counter. The value that a0 will take is the next member on the stack, this value is loaded into the register by the gadget.

5

2

Code Segment

Stack

Overflow

buffer

’/bin/bash\0’

Stack Frame Members

...

Saved Registers

$sp

0x9170

Background

0x1000 ld sp,0(sp) ld ra,8(sp) ret

0x1234 j ret

Return Address

&buffer 0x9170

0x1234 ...

ld ra,0(sp) ld a0,8(sp) addi sp,sp,0x10 ret

Figure 1: Memory layout of a ROP-Chain.

Before the return of the a0-gadget, the return address register is loaded with 0x1234, which is the address of a gadget that calls system. On return of the a0-gadget, control flow is transferred to the system-gadget as the program counter is set to the value of the ra register. At this point the attacker has set a0 to the address of the buffer and is about to call system(). a0 contains the first and only argument of system(). The attacker has placed the null-terminated string ’/bin/bash’ in the buffer to which a0 now points, therefore he calls system("/bin/bash"), spawning a shell. The effective instruction sequence executed is ret; ld ra,0(sp); ld a0,8(sp); addi sp,sp,0x10; ret; j system;.

2.3

Control Flow Integrity

CFI [1] mechanisms aim to be effective against CRAs, which use existing program code to craft exploits. The principle of CFI is to restrict program execution to valid control flows. The control flow of a program is the order in which individual basic blocks are executed. A basic block is a sequence of instructions without any jumps or jump-targets. A valid control flow stays within the Control Flow Graph (CFG). The CFG is the directed graph of all possible program flows. A normal program control flow stays within the CFG. In this graph, nodes represent individual basic blocks and edges represent jumps between them. So called forward-edges are caused by jumps as well as calls, backwardedges are caused by returns. Leaving the CFG equals unintended program behavior, for example caused by an attacker influencing the control flow. CFI monitors the program execution flow (control flow) and reacts as the control flow diverts from normal behavior. CFI enforcing mechanisms do not prevent the attack vectors themselves but intervene as abnormal control flows occur. Many exploits will cause the control flow to deviate from normal, e.g. when executing an unexpected jump. CFI approaches differentiate between forward-edge and backward-edge security. Forward-edge and backward-edge security enforce the respective edges in the graph. Forward-edge security is achieved by restricting call and jump targets to valid locations.

6

2

Background

Backward-edge security protects returns and is often accomplished by a shadow stack, a stack that saves a reference return address in an attacker-inaccessible location to prevent tampering. Various CFI approaches work on the basis of the CFG of a program. They limit jump and return targets to those found in the CFG. The CFG can be constructed through static and dynamic analysis of the binary. The precision of the CFG limits the effectiveness of the CFI enforcing mechanism building on it. Effective CFI needs a precise CFG, the computation of which introduces significant overhead. Solutions therefore often trade-off precision for performance. Protection on the basis of a low-precision CFG may cause the CFI mechanism to fail. Software based CFI approaches instrument the binary to enforce the CFG at runtime. This includes checking indirect jump targets before executing the jump or validating return addresses. Software CFI assumes that the code itself is immutable and direct branching cannot be influenced. This causes such CFI mechanisms to be unsuitable for self-modifying or just-in-time compiled code. Hardware based solutions integrate CFI measures into the hardware implementation. Such approaches might integrate a monitor into the processors instruction pipeline or work with the processors debug interface in order to observe the control flow. Recent advancements built on cryptography take a different path. ARM PAC (Chapter 3) adds hardware support for pointer and general data protection. It can be used to enforce CFI.

2.4

Current Defense Mechanisms

Currently wide deployed defense mechanisms provide protection against limited classes of attacks. DEP offers protection against the execution of attacker-crafted and injected code by marking writable memory pages as non-executable. DEP prevents execution of injected code, thus attackers have moved on to code reuse attacks. Still, DEP is difficult to enforce in some scenarios, such as for self-modifying or just-in-time compiled code. Address Space Layout Randomization (ASLR) [19] and canaries [10] are aimed at code reuse attacks. ASLR makes reuse of existing code harder by randomizing the layout of the process address space at page granularity. The code segment of binaries compiled as Position Independent Executables (PIEs) can be placed at randomized base addresses. Layout randomization prevents an attacker from reusing existing code as he does not know the segment base addresses and therefore the addresses of the ROP-Gadgets. Stack canaries protect against continuous buffer overflows. A random check-value is placed between function member variables and the saved registers on the stack (including the return address). A reference value is saved in an attacker-inaccessible location. As an attacker overflows into the return address he will modify the canary. Before the function returns the canary is checked and in the case of tampering, execution is aborted. While canaries offer good protection against continuous overflows, vulnerabilities that allow an attacker to write to arbitrary memory locations can bypass the protection. Such vulnerabilities enable modifying the return address without touching the canary. Both ASLR and canaries are defeated by information leaks. Leaking memory contents allows an attacker to determine the base addresses of the randomized segments as pointers to these segments are saved in memory (e.g. on the stack). Knowing the base address of a segment allows the attacker to use ROP-Gadgets. Leaking the canary enables an attacker to overflow into the return address. He can write back the correct, leaked canary when overflowing. The attacker-written canary then passes the check on function return. 7

3

ARM Pointer Authentication

ARM’s approach to pointer authentication [16, 17, 5] allows for integrity protection and authentication of pointers and normal data. The security of pointer authentication is based on a secret key, saved in special registers in hardware, inaccessible to an attacker. A PAC is a Message Authentication Code (MAC) calculated using the value of the pointer, a context value (ARM refers to this as “modifier”), and the key. Architectures often leave bits in the pointer unused as their virtual address space is not 64-bit wide. The PAC is stored in these bits of the pointer, protecting the pointer. As an attacker modifies the pointer, PAC and pointer will not match. The attacker does not know the key, he cannot forge PACs for arbitrary pointers. A pointer protected with a PAC is called an authenticated pointer. An authenticated pointer no longer represents a valid VA, it will fail during address translation if used. By not being able to directly use an authenticated pointer, validation is enforced. An authenticated pointer has to be verified or stripped before it can be used. Both actions remove the PAC from the pointer. The address translation therefore indirectly enforces the validation of PACs as the attacker cannot remove the PAC. Verification checks whether the PAC matches the pointer it is stored in. A reference PAC is calculated based on the current pointer value. If the reference PAC and the PAC currently stored in the pointer match, the PAC is removed and the pointer value restored. This results in a valid pointer that can be used. If PAC and pointer do not match, the pointer is invalidated to indicate the failed validation. This pointer will cause a fault on next use, which can be recognized by an exception handler. ARM’s pointer authentication primitives allow for protection of pointers and generalpurpose data, the main use case being pointers. Pointer handling is influenced by external factors such as the width of VAs. The authentication of general-purpose data is independent of pointer handling, it calculates a PAC over two registers, the result being a 32-bit PAC. As this result can be taken as an input to the next calculation, PACs can be calculated over longer sequences of data through chaining.

8

3

1 2 3 4 5

ARM Pointer Authentication

def addPAC ( ptr , c t x , key ) : # g i v e n : p o i n t e r , c o n t e x t , k e y mac machine−>frame . mask , RETURN ADDR REGNUM) ) ) ) ) && TARGET 64BIT ) { emit insn ( gen vrfyra ( ) ) ; }

Listing 8: This listing shows the emission of the vrfy ra,ra,sp instruction in the function prologue generation in the RISC-V GCC. The relevant function is riscv expand epilogue() in /gcc/config/riscv/riscv.c. After the stack frame is removed and the registers are restored we verify the return address. If return address singing is enabled for non-leaf functions (line 3) and the function is non-leaf (line 4), the return address is verified. Leaf functions may also have saved the ra register to stack (line 5,6). Pointer authentication is only supported on 64-bit architectures (line 7). Return address verification is enforced by all as parameter for -msign-return-address (line 2). After the above code the return is performed passing control to the calling function.

1 2 3 4 5

( d e f i n e i n s n ” authr a ” [ ( s e t ( reg : DI RETURN ADDR REGNUM) ( unspec : DI [ ( reg : DI RETURN ADDR REGNUM) ( reg : DI SP REGNUM) ] UNSPECV AUTH) ) ] ”” ” auth ra , ra , sp ” )

6 7 8 9 10 11

( define insn ” vrfyra ” [ ( s e t ( reg : DI RETURN ADDR REGNUM) ( unspec : DI [ ( reg : DI RETURN ADDR REGNUM) ( reg : DI SP REGNUM) ] UNSPECV VRFY) ) ] ”” ” v r f y ra , ra , sp ” )

Listing 9: Definition of the authra and vrfyra instructions in GCCs RISC-V machine description. The relevant file is /gcc/config/riscv/riscv.md. These definitions add instructions support for return address signing. The instructions are used to emit the auth ra,ra,sp instruction in function prologue and the vrfy ra,ra,sp instruction in the function epilogue.

27

6

Evaluation

Our RISC-V ISA extension offers hardware-based pointer authentication. We target controlflow related attacks, especially ROP. The hardware modifications we made serve as a base for software-assisted defense mechanisms. The presented hardware modifications also allow for authentication of generic data. In Section 5.2 we presented a PoC protecting the return address on the stack. This addition is solely compiler based. Return address protection needs no effort from the programmer as the source code does not have to be modified. Handling pointer authentication is done in hardware. By implementing our approach in hardware we separate the pointer authentication primitives and keys from the software itself. As the attacker cannot influence the hardware, the primitives and keys are inaccessible to him. Fault handling can be done by the existing exception handler and therefore integrates easily into existing structures.

6.1

Security Properties and Attack Resistance

In this section we will evaluate our approach from a security perspective as well as present selected attack vectors. As our solution to pointer authentication is similar to ARM PAC the security properties presented in Section 3.3 mostly apply to our approach too. We solve a few key aspects differently than ARM. 6.1.1

Code Reuse Attacks

Code reuse attacks leverage existing instruction sequences in the executable to chain together exploits. ROP attacks are often based on overwriting the return address saved on the stack in order to divert the control flow. We presented ROP attacks in Chapter 2.2. Return address signing (as presented in Section 5.2) makes such ROP attacks significantly harder. Since we sign return addresses pushed to the stack, tampering with will be detected as the address is verified on return. Return address signing prevents tampering with the return address as attack vector. ROP is based on the currently executing gadget of the ROP-Chain calling the next on return. As returns verify the return address, this will fail for an unauthenticated attacker-supplied address. ROP gadgets will therefore fail to be chained together.

28

6

Evaluation

As our solution to return address signing only signs non-leaf functions by default, leaf functions remain unprotected and therefore do not verify their return address on return. While this does not help an attacker in breaking out of the control-flow, these functions offer ROP gadgets as the return will accept an unauthenticated attacker supplied address. This weakness can be solved by also signing the return addresses of leaf functions, which as a trade-off comes with a larger overhead. 6.1.2

Signing Keys

We only use one key per privilege level for all signing operations. While this reduces the implementation and hardware complexity, it causes all pointer types and non-pointer data PACs to be created using the same key. While different pointer types can be distinguished using different context values, the context value may be under attacker control. Using different keys for different purposes allows to further distinguish pointer types apart from using different context values. While we can combine multiple context values using the authg instructions in order to make an attack more difficult, this produces a large overhead as multiple pointer authentication operations are needed for a single pointer. Combining multiple context values has the advantage of further distinguishing pointer types (such as data and instruction pointers). Some of these context values may be stored in attacker uninfluenceable readonly memory. This ensures that an attacker cannot control all values influencing the PAC. Using different keys achieves the same with no additional overhead. Different keys can be used to distinguish data from instruction pointers, solving the need for multiple context values. As the keys would be inaccessible to an attacker he could not influence the distinguishing value. In order to support different keys the instructions would have to be implemented for each key, e.g. authA and authB using key A or B. 6.1.3

PAC Entropy

As child processes are forked they inherit the key of the father. As the child contains pointers authenticated by the father process, changing the key would result in verification failure of PACs created by the father. In a scenario in which an attacker can spawn a large number of child processes (such as a server spawning a child for every connection) the attacker can use this to his advantage. As all children have the same key, he has access to an oracle that tells whether a PAC is correct or incorrect (as child processes will die on verification failure). He may use this to reduce the complexity of a brute-force attack. While the above scenario may simplify a brute-force attack, an attacker may still try to simply guess the PAC for a given pointer in a plain brute-force attack. The Sv39 addressing mode supports 25-bit wide PACs. A correct guess succeeding with a 50% likelihood for a b-bit PAC requires 2b−1 guesses. The complexity for a bruteforce attack against a 25-bit PAC is therefore O(224 ) on average. Assuming an attacker can achieve 10000 attempts per second (which is reasonable for an offline attack), it would take on average approximately 28 minutes to succeed. This is the complexity to forge a single PAC. As an attack will likely need to forge multiple addresses we expect 25-bit PACs to be reasonably robust. The Sv48 addressing mode reduces the PAC width to 16 bits. The above attack would succeed after around 6.6 seconds in this case. Still, assuming 16-bit PACs the attack would need roughly 65536 tries to succeed once. This many failed attempts are detectable by the 29

6

Evaluation

OS, which then can counteract the attack. This may be done by delaying fork() calls if too many child processes fail. 6.1.4

Signing Gadgets

A signing gadget is an instruction sequence in existing code that an attacker can leverage to sign an arbitrary pointer with an attacker defined context value. It acts as an oracle to the attacker allowing him to forge arbitrary authenticated pointers without knowing the signing key. Our approach is resistant against the vulnerability of ARM PAC we presented in Section 3.3. This flaw was based on ARM not checking for the validity of the VA to be signed before calculating and adding the PAC, but only after. ARM corrupts the pointer in case of an invalid VA — but this corruption can be easily reversed by the attacker (Section 3.3). As we check for validity before PAC calculation and raise an exception in case of an invalid VA, our approach is not vulnerable to this flaw. This behavior also reduces the number of available signing gadgets. In contrast to ARM we only use a single key for all signing operations independent of the pointer type. We distinguish different pointer classes only by the context value used. A single signing gadget therefore suffices to forge authenticated pointers of any type. We propose to distinguish pointer types by a context value stored in read-only memory (e.g. the hash of the pointer type). This context value cannot be influenced by an attacker. By making the compiler aware of instruction sequences that lead to signing gadgets we can avoid existence of such in the binary. Pointers can also be bound to multiple contexts (e.g. a hash of the type and the pointer location) by first combining multiple context values into a single one using the authg instruction and then using this value as context for the pointer. Listing 10 and Listing 11 show an example for a possible signing gadget. The vulnerability is the unprotected function epilogue shown in Listing 10. An attacker could use a buffer overflow to modify the frame pointer and return address saved on the stack during function runtime. In the epilogue shown in Listing 10 those values are then restored into the respective registers. He would set the frame pointer to an arbitrary value he wants to sign and the return address to the address of the function prologue shown in Listing 11. The function prologue is the actual signing gadget. It will sign the frame pointer (which is now an attacker defined pointer) and store it to the stack. As by our attacker model definition the attacker has access to the stack and therefore he now has access to the signed pointer. Still, this pointer is bound to a stack pointer value the attacker cannot influence as

1 2 3 4 5

... ld ra , 3 2 ( sp ) ld fp , 1 6 ( sp ) addi sp , sp , 48 ret

1 2 3 4 5 6

Listing 10: A vulnerable function epilogue.

7

auth ra , ra , sp auth fp , fp , sp addi sp , sp , −32 sd ra , 1 6 ( sp ) sd fp , 8 ( sp ) add fp , sp , 32 ...

Listing 11: A function prologue signing the frame pointer and return address before saving them to the stack.

30

6

Evaluation

the stack pointer is not restored from memory. The existence of the presented exploit chain is unlikely since it is based on an unprotected return, the existence of which can be easily prevented by the compiler (using -msign-return-address=ALL in our implementation). 6.1.5

Leaking Keys

Since the keys are stored in hardware we can enforce hardware based access control. A privilege level cannot access its own key. An attacker therefore cannot leak the key of the process he hijacked. He cannot access the keys of higher privilege levels either as hardware prevents him from doing so. Assuming a Linux kernel environment, processes each have a unique key. The keys have to be stored in kernel memory as the hardware can only hold one key at a time. The kernel could store each process key in the process control block it utilizes to represent a process. A kernel level attacker is able to leak all user process keys as he can access the kernel memory space. In contrast to ARM PAC, a kernel level attacker cannot leak the kernel key as the hardware prevents access to it. Lacking the possibility of leaking the key of the hijacked process an attacker cannot forge PACs directly. The largest attack vectors are signing gadgets or brute force attacks. 6.1.6

RIPE Testsuite

We tested our approach against a fork of the RIPE Buffer Overflow Testsuite8 executed on our modified spike ISA simulator. As expected, return address protection prevents all attacks using the return address as attack vector.

6.2

Overhead

We implemented our approach as extension of the Spike ISA simulator and GCC. As Spike is only a functional simulation and not cycle accurate we can only estimate the overhead of our solution. In the following we will approximate the cost of the individual instructions as well as the overhead of return address signing. 6.2.1

Instructions

The auth, authg, and vrfy instructions perform one-block AES-128-ECB encryption. strp does not perform any cryptograpic operations. The encryption is the largest factor in the overall overhead. AES-128 encryption can be achieved in 10 cycles per block [20]. We encrypt exactly one block for PAC generation or verification. Apart from the cryptographic operations, the instructions only perform logic operations, which we assume to complete without delay. We therefore approximate the cost of the instructions as the cost of the AES-128 encryption. Table 6 reports the approximated overhead of the instructions. While AES is suitable for hardware implementations, the QARMA cipher used by ARM has a latency and area advantage over AES [7, Sec. 5]. Using the QARMA cipher instead of AES would further reduce the overhead. According to R. Avanzi, a QARMA-64 encryption has a latency of between 2.2ns and 3.6ns when optimized for minimum delay while AES-128 has a latency of 15.67ns [7, Sec. 5]. Using QARMA instead of AES would therefore reduce the cryptographic overhead by a factor of 4.3 to 7.1. QARMA can be used 8

https://github.com/draperlaboratory/hope-RIPE

31

6 Instruction auth vrfy authg strp

Cycles 10 10 10 1

Evaluation

AES-128 operation encrypt encrypt encrypt none

Table 6: Approximated cost of the pointer authentication instructions.

as a drop-in replacement as only the cryptographic primitive and calls to it have to be replaced. 6.2.2

Return address signing

We extended GCC to support return address signing. The return address is signed as it is saved to the stack during subroutine calls and verified on retrieval. We therefore execute an auth instruction in the function prologue and a vrfy in the epilogue. This results in an overhead of two pointer authentication instructions per function call. This results in return address signing having an estimated overhead of 20 cycles per function. The overhead scales linearly with the number of functions called. An overhead of 20 cycles per function is typically small compared to the full execution time of a function. For performance overhead evaluation we added two counters to Spike in order to count the respective number of auth and vrfy instructions executed during program runtime. We then evaluated different scenarios in order to estimate the overhead. Since Spike is not cycle accurate and the number of cycles needed for a specific task is hardware implementation dependent, we cannot report a percental overhead. Our test cases are a simple echo program that reads input through gets() and prints the input using printf(), as well as quicksort, and bubblesort sorting each the same set of 10 and 100 elements. Figure 5 reports the results for the tests each compiled with -msign-return-address=sign or -msign-return-address=all. The number of auth instructions executed is equal to the number of vrfy instructions. As the echo test calls library functions, the number of auth instructions executed is comparably high. In contrast to bubblesort, quicksort follows a recursive definition. With growing number of elements the overhead grows as the recursion gains depth. When signing all functions the swap() function both algorithms utilize (which is a leaf function) becomes protected. This causes an increase in overhead. For bubblesort, the swap operations needed grows exponentially with the number of elements. This can be seen in the exponential increase of auth instructions executed when signing of the swap function is forced. Between 3.6% and 5.4% of all instructions in our tests were pointer authentication instructions. Liljestrand et al. have implemented return address protection as part of their instrumentation framework PARTS [14] based on ARM PAC. Their overall overhead for their approach to return address protection is less than 0.5%. The estimated cost for return address protection in their approach is 12-16 cycles per function. As our approach is comparable to theirs, we assume that the overhead for return address protection on the RISC-V architecture is similar.

32

6

Evaluation

Number of auth instructions

2,770 1,000

540 500 229

158 40 63 0

echo

quicksort n=10

22 40 quicksort n=100

sign non-leaf functions

bubblesort n=10

22 bubblesort n=100

sign all functions

Figure 5: The number of times the auth instruction was executed per test. The tests are compiled with -msign-return-address=sign and -msign-return-address=all. The number of vrfy instructions is equal to the number of auth instructions.

6.3

Compatibility

Our approach to pointer authentication utilizes the unused bits of a pointer to store a PAC. As these bits may be used by other extensions this may result in incompatibilities. Disabling the use of problematic bits in the pointer by decreasing the PAC size is possible. The hypervisor extension [22] uses 41 or 50 bit wide VAs, PACs would have to spare the extra two bits used in comparison to normal VAs. Return address protection is not expected to cause compatibility problems as it only affects the protected function. Protecting other pointers will cause compatibility problems with other software such as dynamic libraries. These might not expect authenticated pointers and cannot handle them correctly if compiled without support for pointer authentication. We compiled buildroot9 based on a Linux 4.19 kernel with our modified GCC and the default return address signing flag (protection of non-leaf functions). The upstream 4.19 kernel has native support for the RISC-V architecture. We did not have to modify the kernel source in order to boot successfully. We can protect the kernel space by compiling the kernel with return address protection enabled. User-space protection will require support by the kernel and therefore source modification.

9

https://buildroot.org/

33

7

Discussion and Conclusion

We presented an approach to pointer authentication on the RISC-V architecture. We extended the RISC-V ISA and added compiler support for return address protection. Our approach is based on ARM PAC. Prior to our work and ARM PAC, other approaches to pointer protection were made.

7.1

Related Work

We presented ARM PAC in Chapter 3. Recently, it has gained more attention as Apple added support for pointer authentication in their recent A12 and S4 SoCs [2]. Further, solutions for pointer protection based on ARM PAC, such as PARTS [14], have been developed. The idea of protecting pointers using cryptography is not new. PointGuard [9] instruments a program to encrypt pointers stored in memory. The encryption scheme is a secret mask XORed onto the pointer. Without knowledge of the mask an attacker cannot reliably forge pointers. Pointers are only decrypted as they are loaded into CPU registers. Nevertheless, PointGuard cannot reliably detect pointer tampering. Tampered with pointers will point to a random location after decryption, but the modification is not detected during decryption. This is because PointGuard does not save a reference value of the pointer. Access to the tampered with decrypted pointer will likely fail as the address space is only populated sparsely, but may have side effects if pointing to accessible memory. The secret used for encryption is stored in the process memory space. Therefore PointGuard is ineffective against an attacker with arbitrary memory access. In contrast, pointer authentication can reliably detect tampering as a MAC of the pointer is saved. Pointer authentication can withstand attackers with arbitrary memory access. Code-Pointer Integrity (CPI) [13] follows a different approach. Rather than detecting pointer tampering after the fact, CPI prevents an attacker from accessing the pointer by storing sensitive pointers (e.g. function pointers) in a separate, attacker-inaccessible memory region (the safe region). Additionally, Backward-edge CFI is handled by a safe stack mechanism for performance reasons. The attacker accessible memory region does not contain pointers into the safe region. Accesses to the safe region are handled by special primitives. CPI guarantees full memory safety of all sensitive pointers. By isolating pointers from the attacker instead of protecting them using cryptography the overall average overhead of 8.4% is smaller than for comparable cryptographic solutions. 34

7

Discussion and Conclusion

Cryptographic CFI (CCFI) [15] uses MACs to protect pointers, similar to ARM PAC or our approach. CCFI is a software-based solution providing pointer authentication. The MACs are calculated using AES. Besides the pointer value itself, the pointer class (e.g. a function or a return pointer) influences the value of the MAC. CCFI uses a 128-bit key, it stores the key in reserved general-purpose CPU registers the compiler reserves for this purpose. Therefore CCFI remains effective against adversaries with arbitrary memory access. While CCFI uses hardware-accelerated AES for MAC computation, the average overhead of 52% in the SPEC benchmark is larger than for entirely hardwarebased solutions such as PARTS [14] (19.5% overall average overhead) based on ARM PAC. RAGuard [24] is a hardware-based approach to backward-edge CFI. It protects every return address on the stack with a so called RAMAC, an authentication code. RAGuard computes the RAMAC using a hash function over the stack pointer, the return address and secret information. The secret input is provided by a Physically Unclonable Function (PUF) with the process ID as input. At function calls the return address and the RAMAC are stored on the stack. On function return the RAMAC is verified. While the integration of a PUF simplifies the handling of the secret information, an attacker may be able to learn the challenge-response pairs of the PUF for a given process ID, and therefore may know the secret input. Our approach avoids this problem by using a unique key for every process. Liljestrand et al. have proposed PARTS [14], an instrumentation framework building on ARM PAC. PARTS integrates pointer authentication based defenses into the LLVM compiler. PARTS can provide forward-edge as well as backward-edge CFI and data pointer integrity. While return address signing has a negligible overhead of less than 0.5%, enforcing pointer integrity for data pointers introduces a large overhead of up to 40% for data pointer intensive programs. The overall average overhead is 19.5%. Approaches similar to PARTS but building on our instructions are possible on RISC-V.

7.2

Conclusion

We presented an approach to hardware-assisted pointer authentication for RISC-V. Our scheme utilizes unused bits of a pointer to store a PAC, allowing us to provide protection of the pointer. The PAC is calculated over the pointer value, a context value, and a secret key. We bind the PAC to a context value, in consequence limiting an attacker’s capabilities to swap an authenticated pointer with another. Our approach is similar to ARM PAC. We added instructions to the RISC-V ISA, enabling hardware support for pointer authentication. Hardware-enforced access control protects the signing keys of the different privilege levels. We implemented support for return address protection to GCC utilizing our instructions, providing backward-edge CFI and eliminating the return address as an attack vector. Additionally, we made GDB and the Linux binutils aware of the new instructions and registers. Our approach to pointer authentication offers protection against strong attackers with arbitrary memory access. We enable hardware-based defense against code reuse attacks as well as the protection of generic data. Our approach to return address signing does not require source code modification. Our solution provides a foundation for future protection schemes that utilize our instructions to provide strong, hardware-enforced defense. Solutions building on pointer authentication can prevent control-flow related attacks such as ROP or return-to-libc. As 35

7

Discussion and Conclusion

attackers may move on to data oriented attacks, these will become a challenge to be solved by additional protection mechanisms other than pointer authentication.

7.3

Future Work

We presented a solution to pointer authentication on the RISC-V architecture as well as software support for return address protection. Our approach can be extended. In the following we present starting points for further development. 7.3.1

Combined Instructions

In general, RISC-V aims to only include the minimal set of required instructions. Our RISC-V ISA extension provides a basis for pointer protection solutions utilizing the new instructions. We provide the minimal necessary instructions in order to be able to perform pointer authentication. In contrast to ARM PAC we do not provide any combined instructions. Such could include “verify and return” to verify the return address and then return or “verify and load” for loading data from memory using an authenticated pointer. Combined instructions lower register pressure. In our approach, loads from authenticated pointers have to verify and save the pointer to another register first before using it. A combined instruction could directly work with the authenticated pointer. In the case of loads, the combined instructions would only result in minor improvements. This is because data pointers are possibly used more than once after being loaded from memory. It is more efficient to verify the pointer on load once rather than on every access. Using the combined instruction would result in a verification on every access. On the other hand, combined return instructions could provide a benefit as the return address is only used once. Multiple instructions may also be fused by the decoding stage of the processor. The decoder combines multiple operations into a single one, such as a verify and a return. Fused instructions offer a performance benefit without having to change the ISA. As a single combined instruction replaces multiple instructions the overall code size overhead is reduced. 7.3.2

Creating PACs for other Privilege Levels

Our instruction implementation selects the key based on the current privilege level. This may result in problems in some scenarios, such as the loader creating authenticated pointers for the process being loaded. The loader is part of the OS and prepares a process for execution. As the loader executes in a different privilege level than the process loaded, the pointer authentication instructions will use the loader’s key. The process loaded will run in user mode, using a different key than the loader. Verification of authenticated pointers created by the loader will therefore fail in the process unless the loader and the process share the same key. Including a way to select the key to be used (e.g. supervisor mode using the key currently stored in the user mode CSR) would solve this problem. Higher privilege levels would be able to use the keys of lower levels but lower levels not the keys of higher ones, as we already enforce for the CSRs. On process creation the loader would set the key of

36

7

Discussion and Conclusion

the process being created and authenticate pointers using the process key. This allows for protection starting at process creation. 7.3.3

Multiple Keys

We only use one key per privilege level. Providing a key for different pointer types allows to distinguish the types by a secret, attacker uncontrolled value apart from the context value. Providing a different key for data and code pointers would allow to prevent pointer swapping of different pointer types. Adding a key specific to the general-purpose authentication instruction authg would be another addition. Currently authg uses a zero key for use in M-mode. To prevent M-mode from accessing its key directly, the hardware could lock access after the key is set. Key access can be restored by resetting the processor. 7.3.4

Extending Software Support

We provide software support for return address protection with our extension of GCC. This only provides backward-edge CFI. Adding compiler support for forward-edge CFI and data pointer integrity are future extensions. This is not a trivial task. As the key and some of the context values of numerous pointers are unknown at compile and link time, authentication has to take place at load time. The program loader then adds PACs to the pointers as the key and all context values are known. This introduces overhead to the loader process. Return address protection only affects the current function and therefore causes no compatibility problems. Adding PACs to pointers that are used by different program parts that might not be compiled with support for pointer authentication (such as pre-compiled shared libraries) might introduce compatibility problems. Pointer authentication will have to solve such problems in a way that does not introduce additional overhead.

37

References

References [1] Mart´ın Abadi et al. “Control-flow integrity principles, implementations, and applications”. In: ACM Transactions on Information and System Security 13.1 (Oct. 2009), pp. 1–40. ISSN: 10949224. DOI: 10.1145/1609956.1609960. URL: http://portal.acm. org/citation.cfm?doid=1609956.1609960 (visited on 02/11/2019). [2] Apple Inc. iOS Security iOS 12.1 November 2018. 2018. URL: https://www.apple.com/ business/site/docs/iOS_Security_Guide.pdf (visited on 04/11/2019). [3] Apple Inc. Preparing Your App to Work with Pointer Authentication. URL: https:// developer.apple.com/documentation/security/preparing_your_app_to_work_with_ pointer_authentication (visited on 03/01/2019).

[4] Arjan van de Ven. New Security Enhancements in Red Hat Enterprise Linux v.3, update 3. Aug. 2004. URL: https://static.redhat.com/legacy/f/pdf/rhel/WHP0006US_ Execshield.pdf (visited on 02/28/2019). [5]

ARM LTD. ARM A64 Instruction Set Architecture.

[6]

ARM LTD. ARM Cortex-A Series Programmer’s Guide for ARMv8-A. 2015.

[7]

Roberto Avanzi. The QARMA Block Cipher Family. Mar. 2017.

[8] Azad Brandon. Examining Pointer Authentication on the iPhone XS. Feb. 2019. URL : https : / / googleprojectzero . blogspot . com / 2019 / 02 / examining - pointer authentication-on.html (visited on 03/01/2019). [9] Crispin Cowan et al. “PointGuardTM: Protecting Pointers From Buffer Overflow Vulnerabilities”. In: Proceedings of the 12th USENIX Security Symposium. Vol. 12. Washington, D.C., USA: USENIX Association, Aug. 2003. [10]

Crispin Cowan et al. “StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks”. In: Proceedings of the 7th USENIX Security Symposium. San Antonio, Texas: USENIX Association, Jan. 1998.

[11]

Joan Daemen and Rijmen, Vincent. The design of Rijndael: AES — the Advanced Encryption Standard. 2002.

[12]

Jonathan Ganz and Sean Peisert. “ASLR: How Robust Is the Randomness?” In: 2017 IEEE Cybersecurity Development (SecDev). Cambridge, MA, USA: IEEE, Sept. 2017, pp. 34–41. ISBN: 978-1-5386-3467-7. DOI: 10.1109/SecDev.2017.19. URL: http: //ieeexplore.ieee.org/document/8077804/ (visited on 03/03/2019).

[13]

Volodymyr Kuznetsov et al. “Code-Pointer Integrity”. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, 2014.

[14]

Hans Liljestrand et al. PAC it up: Towards Pointer Integrity using ARM Pointer Authentication. arXiv: 1811.09189. Nov. 2018. URL: http://arxiv.org/abs/1811.09189 (visited on 02/05/2019).

[15]

Ali Jose Mashtizadeh et al. “CCFI: Cryptographically Enforced Control Flow Integrity”. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security - CCS ’15. Denver, Colorado, USA: ACM Press, 2015, pp. 941–951. ISBN : 978-1-4503-3832-5. DOI : 10.1145/2810103.2813676. URL : http://dl.acm.org/ citation.cfm?doid=2810103.2813676 (visited on 02/05/2019). 38

References

[16]

Qualcomm Technologies, Inc. Pointer Authentication on ARMv8.3: Design and Analysis of the New Software Security Instructions. Jan. 2017. URL: https : / / www . qualcomm . com/media/documents/files/whitepaper-pointer-authentication-on-armv8-3.pdf

(visited on 04/11/2019). [17]

Mark Rutland. ARMv8.3 Pointer Authentication. 2017. URL: https : / / events . static . linuxfound . org / sites / events / files / slides / slides _ 23 . pdf (visited on 04/11/2019).

[18]

Hovav Shacham. “The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)”. In: Proceedings of the 14th ACM Conference on Computer and Communications Security. CCS ’07 (2007), pp. 552–561. DOI: 10.1145/ 1315245.1315313.

[19]

Hovav Shacham et al. “On the effectiveness of address-space randomization”. In: Proceedings of the 11th ACM conference on Computer and communications security - CCS ’04. Washington DC, USA: ACM Press, 2004. ISBN: 978-1-58113-961-7. DOI: 10.1145/ 1030083.1030124. URL: http://portal.acm.org/citation.cfm?doid=1030083.1030124 (visited on 03/10/2019).

[20]

P.V. Sriniwas Shastry, Amruta Kulkarni, and Mukul S. Sutaone. “ASIC implementation of AES”. In: 2012 Annual IEEE India Conference (INDICON). Kochi, India: IEEE, Dec. 2012, pp. 1255–1259. ISBN: 978-1-4673-2272-0 978-1-4673-2270-6 978-1-4673-22713. DOI: 10.1109/INDCON.2012.6420811. URL: http://ieeexplore.ieee.org/document/ 6420811/ (visited on 03/19/2019).

[21]

L. Szekeres et al. “SoK: Eternal War in Memory”. In: 2013 IEEE Symposium on Security and Privacy. Berkeley, CA: IEEE, May 2013, pp. 48–62. ISBN: 978-0-7695-4977-4 978-14673-6166-8. DOI: 10.1109/SP.2013.13. URL: http://ieeexplore.ieee.org/document/ 6547101/ (visited on 02/11/2019).

[22]

Andrew Waterman and Krste Asanovic. Privileged ISA. Mar. 2019. URL: https : //riscv.org/specifications/privileged-isa/ (visited on 03/09/2019).

[23]

Andrew Waterman and Krste Asanovic. Unprivileged ISA. Mar. 2019. URL: https: //riscv.org/specifications/ (visited on 03/09/2019).

[24]

Jun Zhang et al. “RAGuard: A Hardware Based Mechanism for Backward-Edge Control-Flow Integrity”. In: Proceedings of the Computing Frontiers Conference on ZZZ - CF’17. Siena, Italy: ACM Press, 2017, pp. 27–34. ISBN: 978-1-4503-4487-6. DOI: 10.1145/3075564.3075570. URL: http://dl.acm.org/citation.cfm?doid=3075564. 3075570 (visited on 03/10/2019).

39

Acronyms

Acronyms ASLR Address Space Layout Randomization. CFG Control Flow Graph. CFI Control Flow Integrity. CRA Code Reuse Attack. CSR Control and Status Register. DEP Data Execution Prevention. GDB GNU Debugger. GOT Global Object Table. ISA Instruction Set Architecture. M-mode Machine mode. MAC Message Authentication Code. MSB Most Significant Bit. OS Operating System. PAC Pointer Authentication Code. PIE Position Independent Executable. PoC Proof of Concept. PUF Physically Unclonable Function. RISC Reduced Instruction Set Computer. ROP Return Oriented Programming. S-mode Supervisor mode. U-mode User mode. VA Virtual Address.

40

Appendix auth Instruction Implementation 1

require rv64 ;

2 3 4 5

/ / g e t t h e c u r r e n t a d d r e s s i n g mode r e g t s a t p = STATE . s a t p ; r e g t mode = g e t f i e l d ( satp , SATP64 MODE ) ;

6 7 8 9 10 11 12 13 14 15 16 17 18 19

/ ∗ do n o t w r i t e a p a c i f n o t i n U/ S−mode o r when a d d r e s s t r a n s l a t i o n i s d i s a b l e d ∗ / i f ( ! ( p−>g e t s t a t e ( )−>prv == PRV U | | p−>g e t s t a t e ( )−>prv == PRV S ) | | mode == 0 ) { WRITE RD ( RS1 ) ; } else { r e g t mask = 0 ; int vabits = 0; i f ( mode == 8 ) { / / Sv39 vabits = 39; } e l s e i f ( mode == 9 ) { / / Sv48 vabits = 48; } else { / / unsupported c o n s t e l l a t i o n trap illegal instruction (0) ; }

20

/ ∗ B u i l d t h e p a c b i t m a s k b a s e d on t h e VA mode we a r e i n . 111...1100...000 ( RS1 & ˜ mask == p t r ) | || | ( RS1 & mask == p a c ) 6 4 . . . 3 9 | | 3 8 . . . . 0 f o r Sv39 ∗/ mask = ˜ ( ( 1 u l l prv == PRV U ) { / / u s e r mode key [ 0 ] = p−>g e t c s r ( CSR UCFIKEY1 ) ; key [ 1 ] = p−>g e t c s r ( CSR UCFIKEY2 ) ; } e l s e i f ( p−>g e t s t a t e ( )−>prv == PRV S ) { / / s u p e r v i s o r mode key [ 0 ] = p−>g e t c s r ( CSR SCFIKEY1 ) ; key [ 1 ] = p−>g e t c s r ( CSR SCFIKEY2 ) ; } / ∗ Z e r o t h e b i t s t h e PAC w i l l u s e . ∗ / r e g t p t r = RS1 & ˜ mask ;

27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

u i n t 8 t encr [ 1 6 ] = {0}; memcpy( encr , &ptr , 8∗ s i z e o f ( u i n t 8 t ) ) ; / / r e g s a r e 64 b i t (=8B ) memcpy( e n c r +8 , &RS2 , 8∗ s i z e o f ( u i n t 8 t ) ) ; s t r u c t AES ctx c t x ; A E S i n i t c t x (& c t x , ( u i n t 8 t ∗ ) key ) ; AES ECB encrypt(& c t x , e n c r ) ;

48 49 50 51 52 53 54

r e g t auth = 0 ; memcpy(&auth , encr , s i z e o f ( auth ) ) ; r e g t pac = auth & mask ; r e g t r s 1 a u t h d = p t r | pac ;

55 56 57 58 59

WRITE RD ( r s 1 a u t h d ) ;

60 61

}

41

vrfy Instruction Implementation 1

require rv64 ;

2 3 4 5

/ / g e t t h e c u r r e n t a d d r e s s i n g mode r e g t s a t p = STATE . s a t p ; r e g t mode = g e t f i e l d ( satp , SATP64 MODE ) ;

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

/ ∗ do n o t w r i t e a p a c i f n o t i n U/ S−mode o r when a d d r e s s t r a n s l a t i o n i s d i s a b l e d ∗/ i f ( ! ( p−>g e t s t a t e ( )−>prv == PRV U | | p−>g e t s t a t e ( )−>prv == PRV S ) | | mode == 0 ) { WRITE RD ( RS1 ) ; } else { r e g t mask = 0 ; int vabits = 0; i f ( mode == 8 ) { / / Sv39 vabits = 39; } e l s e i f ( mode == 9 ) { / / Sv48 vabits = 48; } else { / / unsupported c o n s t e l l a t i o n trap illegal instruction (0) ; }

21

/ / Get k e y f r o m c s r ( d e p e n d i n g on mode ) u i n t 6 4 t key [ 2 ] = { 0 } ; i f ( p−>g e t s t a t e ( )−>prv == PRV U ) { / / u s e r mode key [ 0 ] = p−>g e t c s r ( CSR UCFIKEY1 ) ; key [ 1 ] = p−>g e t c s r ( CSR UCFIKEY2 ) ; } e l s e i f ( p−>g e t s t a t e ( )−>prv == PRV S ) { / / s u p e r v i s o r mode key [ 0 ] = p−>g e t c s r ( CSR SCFIKEY1 ) ; key [ 1 ] = p−>g e t c s r ( CSR SCFIKEY2 ) ; }

22 23 24 25 26 27 28 29 30 31

/ ∗ B u i l d t h e p a c b i t m a s k b a s e d on t h e VA mode we a r e i n . 111...1100...000 ( RS1 & ˜ mask == p t r ) | || | ( RS1 & mask == p a c ) 6 4 . . . 3 9 | | 3 8 . . . . 0 f o r Sv39 ∗/ mask = ˜ ( ( 1 u l l g e t c s r ( CSR UCFIKEY1 ) ; key [ 1 ] = p−>g e t c s r ( CSR UCFIKEY2 ) ; } e l s e i f ( p−>g e t s t a t e ( )−>prv == PRV S ) { / / s u p e r v i s o r mode key [ 0 ] = p−>g e t c s r ( CSR SCFIKEY1 ) ; key [ 1 ] = p−>g e t c s r ( CSR SCFIKEY2 ) ; }

14 15 16 17 18 19 20

u i n t 8 t encr [ 1 6 ] = {0}; memcpy( encr , &RS1 , 8∗ s i z e o f ( u i n t 8 t ) ) ; memcpy( e n c r +8 , &RS2 , 8∗ s i z e o f ( u i n t 8 t ) ) ; s t r u c t AES ctx c t x ; A E S i n i t c t x (& c t x , ( u i n t 8 t ∗ ) key ) ; AES ECB encrypt(& c t x , e n c r ) ;

21 22 23

r e g t auth = 0 ; memcpy(&auth , encr , s i z e o f ( auth ) ) ;

24 25

WRITE RD ( auth ) ;

strp Instruction Implementation 1

require rv64 ;

2 3 4

r e g t s a t p = STATE . s a t p ; r e g t mode = g e t f i e l d ( satp , SATP64 MODE ) ;

5 6 7 8 9 10 11 12 13 14 15 16 17

/ ∗ We o n l y s t r i p t h e p a c i f we a r e i n a s u p p o r t e d mode ( Sv39 o r Sv48 ) ∗/ i f ( mode ! = 8 && mode ! = 9 ) { WRITE RD ( RS1 ) ; / / do n o t h i n g } else { r e g t mask = 0 ; int vabits = 0; i f ( mode == 8 ) { / / Sv39 vabits = 39; } e l s e i f ( mode == 9 ) { / / Sv48 vabits = 48; }

18

/ ∗ B u i l d t h e p a c b i t m a s k b a s e d on t h e VA mode we a r e i n . 111...1100...000 ( RS1 & ˜ mask == p t r ) | || | ( RS1 & mask == p a c ) 6 4 . . . 3 9 | | 3 8 . . . . 0 f o r Sv39 ∗/ mask = ˜ ( ( 1 u l l