Further Improvements in the Boolean Domain [1 ed.] 1527503712, 9781527503717

The amount of digital systems supporting our daily life is increasing continuously. Improved technical facilities for th

308 103 7MB

English Pages 536 [537] Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Further Improvements in the Boolean Domain [1 ed.]
 1527503712, 9781527503717

Table of contents :
Contents
List of Figures
List of Tables
Foreword • Mitchell A. Thornton
Preface • Bernd Steinbach
Acknowledgments
List of Abbreviations
I Extensions in Theory and Computations
1 Models, Methods, and Techniques • Christian Posthoff and Bernd Steinbach
2 Accelerated Computations • Stuart W. Schneider and Jon T. Butler
II Digital Circuits
3 Synthesis, Visualization, and Benchmarks • Bernd Steinbach and Christian Posthoff
4 Reliability and Linearity of Circuits • Hila Rabii, Yaara Neumeier, and Osnat Keren
III Towards Future Technologies
5 Reversible and Quantum Logic • Suzana Stojković, Milena Stanković, Claudio Moraga, and Radomir S. Stanković
Bibliography
List of Authors
Index of Authors
Index

Citation preview

Further Improvements in the Boolean Domain

Further Improvements in the Boolean Domain Edited by

Bernd Steinbach

Further Improvements in the Boolean Domain Edited by Bernd Steinbach This book first published 2018 Cambridge Scholars Publishing Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright © 2018 by Bernd Steinbach and contributors All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. ISBN (10): 1-5275-0371-2 ISBN (13): 978-1-5275-0371-7

Contents List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xv

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxxiii List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x . xxvii

I

Extensions in Theory and Computations

1 Models, Methods, and Techniques . . . . . . . . . . . . . . . . . . . 1.1 NP-Problems and Boolean Equations . . . . . . . . . . 1.1.1 Classes of Hard Problems . . . . . . . . . . . . 1.1.2 Boolean Functions and Equations . . . . . . . . 1.1.3 Boolean Equations and Ternary Vectors . . . . 1.1.4 NP-Complete Problems . . . . . . . . . . . . . 1.1.5 Boolean Equations - a Unifying Instrument . . 1.2 Number of Variables of Index Generation Functions . 1.2.1 Background . . . . . . . . . . . . . . . . . . . . 1.2.2 An Index Generation Unit . . . . . . . . . . . . 1.2.3 Notation . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Expected Number of Variables . . . . . . . . . 1.2.5 Distribution of the Expected Number . . . . . 1.2.6 Expected Number of Balanced Columns . . . . 1.2.7 Found Results . . . . . . . . . . . . . . . . . .

1 3 3 3 4 5 9 24 25 25 26 27 29 31 38 42

vi

Contents

1.3

Computational Complexity of Error Metrics . . . . . . 1.3.1 Approximate Computing . . . . . . . . . . . . 1.3.2 Preliminaries . . . . . . . . . . . . . . . . . . . 1.3.3 Error Metrics . . . . . . . . . . . . . . . . . . . 1.3.4 Complexity of Computing Error Metrics . . . . Spectral Techniques - Origins and Applications . . . . 1.4.1 Origins and Evolution of Spectral Techniques . 1.4.2 Digital System Design . . . . . . . . . . . . . . 1.4.3 Signal processing . . . . . . . . . . . . . . . . . 1.4.4 Towards FFT . . . . . . . . . . . . . . . . . . . 1.4.5 Towards Alternative Spectral Techniques . . . 1.4.6 Applications of Spectral Techniques . . . . . . A Relational Approach to Finite Topologies . . . . . . 1.5.1 Experimentation as Motivation . . . . . . . . . 1.5.2 Relation Algebra . . . . . . . . . . . . . . . . . 1.5.3 Modeling Sets and Finite Topologies . . . . . . 1.5.4 Closures, Interiors and Boundaries . . . . . . . 1.5.5 Topological Relations and Random Topologies 1.5.6 Implementation and Related Work . . . . . . . A Real-World Model of Partially Defined Logic . . . . 1.6.1 Real-World Asynchronous Feedback . . . . . . 1.6.2 Related Topics . . . . . . . . . . . . . . . . . . 1.6.3 Use Case: Low-Active RS-Latch . . . . . . . . 1.6.4 Stabilized Dual-Rail Implementation . . . . . . 1.6.5 Results . . . . . . . . . . . . . . . . . . . . . .

43 43 44 46 49 54 54 55 57 59 60 63 69 69 70 71 74 79 84 87 87 88 89 93 96

2 Accelerated Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Bent Function Enumeration Using an FPGA . . . . . 2.1.1 Background . . . . . . . . . . . . . . . . . . . . 2.1.2 Properties of Bent Functions . . . . . . . . . . 2.1.3 Architecture for Bent Function Discovery . . . 2.1.4 Circular Pipeline Architecture . . . . . . . . . . 2.1.5 Circuit of the Circular Pipeline . . . . . . . . . 2.1.6 Experimental Results . . . . . . . . . . . . . . 2.1.7 Analytical Results . . . . . . . . . . . . . . . . 2.1.8 Practical Aspects . . . . . . . . . . . . . . . . . 2.2 Efficient Generation of Bent Functions Using a GPU . 2.2.1 Discovery of Bent Functions . . . . . . . . . . . 2.2.2 Bent Functions in Two Domains . . . . . . . . 2.2.3 Random Generation of Bent Functions . . . . .

99 99 99 100 101 106 110 111 115 117 120 120 122 128

1.4

1.5

1.6

Contents

2.3

2.4

II

2.2.4 Implementation on a GPU Platform . . . . . . 2.2.5 Comparison Between CPU and GPU Platforms Multi-GPU Approximation Methods . . . . . . . . . . 2.3.1 Error Detection and Correction . . . . . . . . . 2.3.2 Computing Distance Distribution of AN Codes 2.3.3 Results . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Summary . . . . . . . . . . . . . . . . . . . . . Orthogonalization of a TVL in Disjunctive or Conjunctive Form . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Orthogonality . . . . . . . . . . . . . . . . . . . 2.4.2 Ternary Vector List (TVL) . . . . . . . . . . . 2.4.3 Orthogonal Operations for Ternary Vectors . . 2.4.4 Orthogonalization of a TVL in DF or CF . . . 2.4.5 Experimental Results . . . . . . . . . . . . . .

Digital Circuits

3 Synthesis, Visualization, and Benchmarks . . . . . . . . . . 3.1 Vectorial Bi-Decompositions for Lattices . . . . . . . . 3.1.1 Synthesis of Combinational Circuits . . . . . . 3.1.2 Vectorial and Single Derivative Operations . . 3.1.3 Generalized Lattices of Boolean Functions . . . 3.1.4 Vectorial Bi-Decompositions . . . . . . . . . . . 3.1.5 Application of the Vectorial Bi-Decomposition 3.1.6 Comparison with Other Synthesis Approaches . 3.2 Hardware/Software Co-Visualization: The Lost World 3.2.1 The Lost World . . . . . . . . . . . . . . . . . . 3.2.2 Visualization Techniques . . . . . . . . . . . . . 3.2.3 Core Issues . . . . . . . . . . . . . . . . . . . . 3.2.4 Enabling the Fiddlers and Tinkerers . . . . . . 3.2.5 A Brave New World . . . . . . . . . . . . . . . 3.3 Synthesis of Complemented Circuits . . . . . . . . . . 3.3.1 Decomposition with Two-Input Operators . . . 3.3.2 Previous Work . . . . . . . . . . . . . . . . . . 3.3.3 Boolean Relations . . . . . . . . . . . . . . . . 3.3.4 Complemented Circuits . . . . . . . . . . . . . 3.3.5 Minimization of Complemented Circuits . . . . 3.3.6 Structure of Complemented Circuits . . . . . . 3.3.7 Experimental Results . . . . . . . . . . . . . .

vii

130 134 136 136 140 147 154 156 156 159 162 166 171

173 175 175 175 180 182 189 194 196 199 199 200 208 210 212 214 214 217 218 220 224 225 228

viii

Contents

3.4

Design of Multipliers Using Fourier Transformations . 3.4.1 Approaches for Multiplication on Hardware . . 3.4.2 Design of Monolithic Multipliers . . . . . . . . 3.4.3 The Adder Tree in Multiplication . . . . . . . . 3.4.4 Discussion of the Results . . . . . . . . . . . . Low Power Race-Free State Assignment . . . . . . . . 3.5.1 Synthesis for Low Power Consumption . . . . . 3.5.2 A Behavioral Model . . . . . . . . . . . . . . . 3.5.3 The Condition for Absence of Critical Races . . 3.5.4 Minimizing the Length of State Codes . . . . . 3.5.5 Minimizing the Switching Activity . . . . . . . 3.5.6 A Heuristic Method . . . . . . . . . . . . . . . Boolean Discrete Event Modeling of Circuit Structures 3.6.1 Different Abstraction Levels for Modeling . . . 3.6.2 Theoretical Foundation . . . . . . . . . . . . . 3.6.3 Syntax for Discrete Event Modeling . . . . . . 3.6.4 Use Case: CMOS Inverter . . . . . . . . . . . . 3.6.5 Results . . . . . . . . . . . . . . . . . . . . . . Collection of Logic Synthesis Examples . . . . . . . . . 3.7.1 The Reasons to be Prudent . . . . . . . . . . . 3.7.2 Benchmarks to Compare Design Tools . . . . . 3.7.3 Origins of Benchmarks . . . . . . . . . . . . . . 3.7.4 Improvements of the Usability of Benchmarks . 3.7.5 A Provided Collection of Benchmarks . . . . . 3.7.6 Examples and Statistical Properties . . . . . .

240 240 241 246 249 253 253 253 255 257 258 263 268 268 270 272 274 279 281 281 288 290 294 297 300

4 Reliability and Linearity of Circuits . . . . . . . . . . . . . . . 4.1 Low Complexity High Rate Robust Codes . . . . . . . 4.1.1 The Hardware Security Problem . . . . . . . . 4.1.2 Security Versus Reliability . . . . . . . . . . . . 4.1.3 Robust Codes . . . . . . . . . . . . . . . . . . . 4.1.4 The Shortened QS Code . . . . . . . . . . . . . 4.1.5 The Triple QS and Triple Sum Codes . . . . . 4.2 Synthesis for Reliability Using Bi-Decompositions . . . 4.2.1 Methods to Improve the Reliability of Circuits 4.2.2 Synthesis for Reliability . . . . . . . . . . . . . 4.2.3 Experimental Results . . . . . . . . . . . . . . 4.2.4 Reliability: Challenges and Approaches . . . . 4.3 Linearization of Partially Defined Boolean Functions . 4.3.1 Partially Defined Boolean Functions . . . . . .

303 303 303 304 306 312 313 319 319 321 327 333 334 334

3.5

3.6

3.7

Contents

4.3.2 4.3.3 4.3.4 4.3.5

ix

The Linearization Method . . . . . . . . . . . . Efficient Finding of a Linear Transform . . . . Existence of an Injective Linear Transformation Connections to Linear Error-Correcting Codes

III Towards Future Technologies 5 Reversible and Quantum Logic . . . . . . . . . . . . . . . . . . . . . . 5.1 FDD-Based Reversible Synthesis by Levels . . . . . . . 5.1.1 Methods to Synthesize Reversible Circuits . . . 5.1.2 Post-Order FDD-Based Reversible Synthesis . 5.1.3 FDD-Based Reversible Synthesis by Levels . . 5.1.4 Experimental Results . . . . . . . . . . . . . . 5.2 Distributed Evolutionary Design of Reversible Circuits 5.2.1 Extending RIMEP2 to DRIMEP2 . . . . . . . 5.2.2 Background . . . . . . . . . . . . . . . . . . . . 5.2.3 The Hierarchical Model Based on RIMEP2 . . 5.2.4 The Islands Model Based on RIMEP2 . . . . . 5.2.5 Hybrid Models . . . . . . . . . . . . . . . . . . 5.2.6 Experiments and Interpretation of the Results 5.2.7 Relevant Features . . . . . . . . . . . . . . . . 5.3 Towards Classification of Reversible Functions . . . . . 5.3.1 Reversible Boolean Functions . . . . . . . . . . 5.3.2 Preliminaries . . . . . . . . . . . . . . . . . . . 5.3.3 Homogeneous Component Functions . . . . . . 5.3.4 Motivation for Future Work . . . . . . . . . . . 5.4 The Cn F Logic Functions Derived from Cn NOT Gates 5.4.1 Reconfigurable Reversible Processors . . . . . . 5.4.2 Background . . . . . . . . . . . . . . . . . . . . 5.4.3 C3 NOT Gate and C3 F Functions . . . . . . . . 5.4.4 Analysis of C4 NOT and C4 F . . . . . . . . . . 5.4.5 Generalizations and Remarks . . . . . . . . . .

336 337 344 347

353 355 355 355 357 363 369 372 372 372 374 375 376 377 382 384 384 388 391 403 405 405 406 408 412 413

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 List of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 Index of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475

List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.23 1.24 1.25

Ten queens and two pawns located at c3 and e5 . . . . Example of a vertex cover . . . . . . . . . . . . . . . . Birkhoff’s Diamond: uncolored and colored . . . . . . Graph with a clique of three vertices . . . . . . . . . . The adjacency matrix of a graph . . . . . . . . . . . . Graph with a red colored clique . . . . . . . . . . . . . The incidence matrix of the graph of Figure 1.6 . . . . Edge cover of a graph . . . . . . . . . . . . . . . . . . Hamiltonian path . . . . . . . . . . . . . . . . . . . . . Graph for exploring Eulerian paths . . . . . . . . . . . Index generation unit . . . . . . . . . . . . . . . . . . Fraction of functions realized using 4 to 32 vectors . . Adder circuit approximated from a ripple carry adder A relation and two vector-models of sets of sets . . . . Models of a topology and the set of closed sets . . . . Sets and their closures, interiors and boundaries . . . Graphical visualization of a closure operation . . . . . Specialization, distinction and separation relation . . . RelView-programs: closures, interiors, boundaries . . Low-active RS-latch and metastability . . . . . . . . . Race condition resulting in non-determined signals . . Behavior of the programmable JK-/RS-buffer . . . . . Structure of the programmable JK-/RS-buffer . . . . . RS-buffer . . . . . . . . . . . . . . . . . . . . . . . . . Parallel composed partial low-active RS-latch . . . . .

2.1 2.2

Original bent function enumeration circuit . . . . . . . 102 Distribution of functions to the number of bent weights for n = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 108

10 13 16 17 18 19 19 21 22 23 26 35 48 75 75 78 78 83 84 90 90 94 94 95 96

xii

List of Figures

2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8

Distribution of functions to persistence in a circular pipeline for n = 4 . . . . . . . . . . . . . . . . . . . . . Architecture of a single circular pipeline . . . . . . . . Speed-Up and number of LUTs versus the number of circular pipelines . . . . . . . . . . . . . . . . . . . . . The Xilinx ZedBoard system . . . . . . . . . . . . . . The complete system . . . . . . . . . . . . . . . . . . . Butterfly operations for the Reed-Muller and Walsh transform matrices . . . . . . . . . . . . . . . . . . . . Example of a data-flow graph of the fast Reed-Muller transform algorithm of the Cooley-Tukey class . . . . High-level architecture diagram of the GPU . . . . . . Host program to launch a GPU program . . . . . . . . Encoding of data words and decoding of code words. . CUDA implementation using local memory . . . . . . CUDA implementation using shared memory . . . . . Shared memory access pattern . . . . . . . . . . . . . A pA 2 and p3 for odd A’s for k ∈ {8, . . . , 16} . . . . . . . Convergence of the maximum relative error Δ . . . . . Influence of the number of iterations M . . . . . . . . How value A controls the probability for k = {8, 16} . How value A controls the probability for k = 24 . . . . Structure of a Ternary Vector List (TVL). . . . . . . . Preprocessing steps to speed-up the orthogonalization Comparison of computation time for N=20 . . . . . . Average number of Ternary Vectors (TVs) in the solution TVL for N=20 . . . . . . . . . . . . . . . . . . . . General structure of the bi-decomposition . . . . . . . Adding several directions of change to the independence matrix IDM(f ). . . . . . . . . . . . . . . . . . . Vectorial bi-decompositions: structure and conditions Karnaugh-maps of (a) the given lattice and (b) the realized Boolean function. . . . . . . . . . . . . . . . . . Circuit structure with two vectorial OR-bi-decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . Circuit structure synthesized by weak and strong bidecompositions . . . . . . . . . . . . . . . . . . . . . . The Voronoi treemap of a software system . . . . . . . Software visualization through metaballs . . . . . . . .

109 110 112 117 118 127 128 132 133 137 143 145 147 150 151 152 152 153 160 170 171 172 177 188 190 195 196 197 201 201

List of Figures

3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29 3.30 3.31 3.32 3.33 4.1 4.2 4.3 4.4 4.5 4.6 4.7

xiii

The Pharo system visualizes software evolution . . . . The code_swarm system animates software evolution . The state of current hardware visualizations . . . . . . Waveform viewer illustrating signal assignments over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multi view hardware visualization . . . . . . . . . . . SystemC visualization of an adder module . . . . . . . SystemC visualization of a RISC CPU . . . . . . . . . Alternative realizations of a function f (x) with complemented circuits . . . . . . . . . . . . . . . . . . . . Maximal frequencies of monolithic multipliers and multipliers synthesized by Synopsys . . . . . . . . . . . . . Adder tree for a multiplication in regular arithmetic . Adder tree for a multiplication in saturation arithmetic: version 1 . . . . . . . . . . . . . . . . . . . . . . . . . . Adder tree for a multiplication in saturation arithmetic: version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . Maximal frequencies of multipliers with adder trees . . Module M i with input vector ix and output vector i y Pull-up transistor circuit of a CMOS inverter . . . . . Pull-down transistor circuit of a CMOS inverter . . . . Parallelly composed CMOS inverter . . . . . . . . . . Impact of transforming circuit examples into SOPs . . Impact of transforming circuit examples into SOPs for 20 iterations . . . . . . . . . . . . . . . . . . . . . . . . Comparison of IWLS’91 and IWLS’93 . . . . . . . . . Comparison of IWLS’91 and IWLS’93 after synthesis . Development of the most popular benchmark sets . . . Structure of the circuit collection . . . . . . . . . . . . Circuit sizes by literals in the original description . . . Circuit sizes by LUTs after synthesis . . . . . . . . . .

202 202 203

Schematic illustration of a communication model . . . Schematic of a hardware system . . . . . . . . . . . . . Mathematical model of a protected hardware system . Architecture for circuits with a high reliability . . . . Separation of a variable using XOR gates . . . . . . . Structure of strong and vectorial bi-decompositions . . Vectorial and strong XOR bi-decomposition of the carry function . . . . . . . . . . . . . . . . . . . . . . . .

305 307 310 321 323 326

204 206 207 207 216 243 247 249 250 251 273 275 276 278 284 285 286 287 292 299 301 302

328

xiv

List of Figures

4.8 Gate equivalents for the decomposed adder . . . . . . 330 4.9 Power for the decomposed adder . . . . . . . . . . . . 330 4.10 FPGA synthesis results . . . . . . . . . . . . . . . . . 332 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12

Types of Toffoli gates . . . . . . . . . . . . . . . . . Realization of the 4mod5 benchmark . . . . . . . . . Optimal realization of the 4mod5 benchmark . . . . Utilization of a shared successor . . . . . . . . . . . . Zero-polarity FDD for rd53 . . . . . . . . . . . . . . Circuit rd53 synthesized by the post-order algorithm Circuit rd53 synthesized by mapping of levels . . . . Topologies used to design reversible circuits . . . . . C2 NOT gate implemented in the CNOT/CV/CV† . . . . . C3 NOT using the Barenco model . . . . . . . . . . . . Karnaugh-map of the function C 4 F . . . . . . . . . Function fg (5) · fs (5) . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

357 363 364 364 367 367 368 377 406 408 412 417

List of Tables 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 2.1 2.2

Negation . . . . . . . . . . . . . . . . . . . . . . . . . . Boolean functions of two arguments . . . . . . . . . . Binary code of ternary values . . . . . . . . . . . . . . Two pawns and n + 2 queens on an n × n chessboard . Set cover: all characteristic functions and all minimal covers . . . . . . . . . . . . . . . . . . . . . . . . . . . Exact set cover . . . . . . . . . . . . . . . . . . . . . . All solutions to color Birkhoff’s Diamond using 3, 4 or 5 colors . . . . . . . . . . . . . . . . . . . . . . . . . . Ramsey numbers R(r, s) . . . . . . . . . . . . . . . . . Example of a registered vector table of weight 8 . . . . Fraction of k × k binary matrices versus the number of columns needed to distinguish all rows for 2 ≤ k ≤ 7 . Fraction of k × k binary matrices versus the number of columns needed to distinguish all rows for 8 ≤ k ≤ 64 Fraction of binary matrices T (n, k) with k ≤ 32 rows and n columns (all rows are distinguished) . . . . . . . Average number of columns to distinguish rows in random binary matrices . . . . . . . . . . . . . . . . . . . Fraction of binary matrices such that all rows are distinguished . . . . . . . . . . . . . . . . . . . . . . . . . Truth table and bitflips for the exact adder + and the ˆ . . . . . . . . . . . . . . . . . . approximated adder + Low-active RS-latch—logical vs. digital . . . . . . . .

4 5 7 11 12 13 17 20 28 30 31 33 36 41 49 91

Speed and resources used in a circular pipeline compared to the resources available in a Xilinx FPGA . . 113 Simultaneous bent functions found per clock period for n = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

xvi

List of Tables

2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16

Functions with persistence 8 and their contributions . Functions with persistence 7 and their contributions . Clocks used by functions of various persistence values Limitation of the number of non-zero PPRM coefficients of bent functions . . . . . . . . . . . . . . . . . Comparison of random generation of one bent function on a CPU and two GPUs . . . . . . . . . . . . . . . . Calculation of the distance distribution of AN codes . Solution time for 1D grid point sampling with A=61 . Profiler results . . . . . . . . . . . . . . . . . . . . . . Super A’s and minimum Hamming distances . . . . . Orthogonal conjunctions . . . . . . . . . . . . . . . . . Orthogonal disjunctions . . . . . . . . . . . . . . . . . Relationship between a ternary element and a Boolean Variable . . . . . . . . . . . . . . . . . . . . . . . . . . Complement of ternary elements . . . . . . . . . . . . Intersection of two ternary elements . . . . . . . . . .

115 116 117

Alternative circuits for the same function of a lattice . Example of a Boolean relation . . . . . . . . . . . . . . A three-output function specified by a relation . . . . Binary operations depending on both x and y . . . . . Relations corresponding to the operation AND . . . . Relations corresponding to the operation OR . . . . . Relations corresponding to the operation XOR . . . . Boolean relations for functions with two outputs . . . Comparison of SOP vs. complemented circuits using the method EXACT . . . . . . . . . . . . . . . . . . . Comparison of SOP vs. complemented circuits using the method HEURISTIC . . . . . . . . . . . . . . . . Minimization results using the method EXACT . . . . Minimization results using the method HEURISTIC . Gain of complemented circuits w.r.t. corresponding SOP forms . . . . . . . . . . . . . . . . . . . . . . . . Average gain of complemented circuits w.r.t. corresponding SOP forms . . . . . . . . . . . . . . . . . . . Comparison of time, area and delay between Espresso and complemented circuits - EXACT case . . . . . . . Comparison of time, area and delay between Espresso and complemented circuits - HEURISTIC case . . . .

198 219 219 221 222 223 224 228

129 135 148 148 149 154 158 158 160 161 161

230 231 232 233 234 234 235 236

List of Tables

xvii

3.17 Number of conjunctions of the DNF and the minimized DF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.18 Number of adders needed for FTM multiplications . . 3.19 Transition function . . . . . . . . . . . . . . . . . . . . 3.20 Conditional probabilities of the explored transitions . 3.21 Absolute probabilities of the explored transitions . . . 3.22 Compatible sets . . . . . . . . . . . . . . . . . . . . . 3.23 Definition of the implication and equivalence . . . . . 3.24 Partial and total specification of proposition C . . . . 3.25 Pin specification of the pull-up transistor . . . . . . . 3.26 Pin specification of the pull-down transistor . . . . . . 3.27 Circuit origins and filename prefixes . . . . . . . . . . 3.28 Basic statistical properties . . . . . . . . . . . . . . . . 4.1 4.2

242 248 255 260 261 261 270 271 275 277 298 301 317

4.3 4.4

Comparison of binary high rate robust separable codes Comparison of non-binary high rate robust separable codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . Degree of linearity of adder functions s0 , . . . , s8 . . . . Areas and delay for different decomposed adders . . .

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17

Realization of BDD and FDD nodes by Toffoli gates Realization of positive Davio nodes by Toffoli gates . Realization of negative Davio nodes by Toffoli gates Reversible networks using zero-polarity FDDs . . . . Reversible networks using optimal FPFDDs . . . . . Benchmark specifications from [357, 394] . . . . . . . Average quantum cost within 100 runs . . . . . . . . The total execution time for 100 independent runs . Successful runs over 100 runs . . . . . . . . . . . . . Performance of DRIMEP2 versus [357, 394] . . . . . Establishing values of the 3-variable function f1 . . . Truth table for the function NPC3×3 . . . . . . . . . Construction of a 3 × 3 reversible function . . . . . . Two components of the C2 NOT gate . . . . . . . . . . Truth table of the C3 NOT function . . . . . . . . . . . Truth table of the Majority controlled NOT gate . . . Significant control functions of the fg (·) components

359 360 361 370 371 379 380 381 382 383 399 400 403 406 409 411 413

. . . . . . . . . . . . . . . . .

318 325 331

Foreword Further Improvements in the Boolean Domain contains some of the latest innovations with regard to the theory and application of algebraic methods over the Boolean domain. Algebras involving the Boolean domain have been studied and used by philosophers, scientists, mathematicians, and engineers since at least the time of Aristotle’s development of the syllogism. In the past century, electrical and electronic artifacts that utilize switching elements have been extensively modeled with switching algebras or binary-valued algebras due to the advent of digital computation and communication. Although many theorists and practitioners have studied and used methods in the Boolean domain, new and useful results continue to emerge as the information age continues to evolve. This useful compilation of further improvements continues this tradition. The book is organized into three parts titled: “Extensions in Theory and Computations”, “Digital Circuits”, and “Towards Future Technologies”. These three parts are further divided into five separate chapters that provide results in areas ranging from theoretical concerns to those that are applicable to modern design and implementation challenges such as automated synthesis and reliability. Emerging computational paradigms based upon reversible functions and quantum mechanical phenomena continue to utilize frameworks in the Boolean domain, further underscoring the need for continued improvements in this area of discrete mathematics. The first part of the book is devoted to theory and computation. Chapter One contains several new theoretical results including the relationship of Boolean equations to problems in the class N P . A recent area of interest is the study of the class of functions known as index generation functions. New theoretical characteristics are provided for these functions that have many useful applications in data networks and memory. Approximate computing encompasses the use of func-

xx

Foreword

tions that are not precisely equivalent to those they approximate. The use of approximate functions can lead to significant efficiencies although a corresponding loss in precision accompanies their use and this topic is considered. Spectral methods have been the subject of both practical and theoretical concern for many years although new results continue to emerge and some of the latest results are provided in a survey of applications. Next, the topic of finite topologies is considered with the interesting approach of using a relational algebraic framework provided by the RelView computer algebra system. Chapter One concludes with a subsection devoted to the application of partially defined logic to the important and timely area of asynchronous circuit design. The second Chapter of the book is concerned with accelerated computations. Performance continues to be a major concern and new results in the Boolean domain are applied to achieve performance enhancement. Bent functions are those that exhibit maximal nonlinearity and are known to have desirable characteristics when employed in certain classes of cryptographic algorithms. While bent functions are desirable to use in these circumstances, their enumeration and discovery remains a hard problem that motivates the development of new architectures for that purpose. An approach based upon FPGAs for the purpose of finding such functions is described and its effectiveness is analyzed. A second approach for generating bent functions combines a random method with GPU computational cores. Next, the subject of an arithmetic code known as the AN code is considered. AN codes are nonlinear and find their application in error detection at the hardware level. Once again, a GPU-based analysis and experimentation environment is described that allows for the computation of AN code distance distributions and SDC probabilities. The final contribution in Chapter Two considers the situation wherein associated forms of Boolean functions are often preferable to normal forms in terms of the literal count; however, the associated forms are not necessarily orthogonal. Ternary vector lists (TVLs) are presented and a means for using them to find orthogonal associated forms is provided and validated with experimental results. The next part of the book is devoted to digital circuits and is comprised of Chapter Three which is concerned with synthesis, visualization, and benchmarks, and, Chapter Four which is concerned with

Foreword

xxi

reliability and linearity. A fundamental operation in digital circuit synthesis is that of decomposition. A particular form of decomposition, namely vectorial bi-decomposition for lattices is described in detail in the first contribution of Chapter Three. The next contribution takes a somewhat philosophical view and considers the use of visualization as a tool in hardware/software design with both a survey of present methods and predictions about the future of this area and its corresponding potential impact. The subject of complemented circuits and their role in logic synthesis is described with emphasis placed upon the minimization problem and experimental results provided to validate the approach. A large percentage of digital circuit data-paths include arithmetic circuitry with the multiplier being a common element. An approach for the design of such multipliers based upon the use of the Fourier transform is described and example multiplier designs using both regular and saturated arithmetic are provided. The state assignment problem is considered next with respect to the criterion of minimizing power dissipation. A heuristic approach to the state assignment problem for low power is provided with an accompanying example to illustrate the method. Simulation is a basic need in digital circuit design and analysis and is often used in a stand-alone manner, or in support of other digital circuit engineering tasks. Discrete event modeling is considered and a syntax is provided based on both partially and totally specified propositions. The final contribution of Chapter Three is concerned with the use of benchmark circuits for the purpose of evaluating new approaches in digital circuit engineering tasks. A history and analysis of many common benchmark circuit collections is provided as well as an analysis of their performance characteristics when used in a variety of different digital circuit engineering tasks. Chapter Four is also included in the digital circuits section of the book and is comprised of three contributions. The first contribution is concerned with security oriented codes that are referred to as low complexity high rate robust codes. The motivation for the use of these types of codes is to overcome the effects of adversaries that may be employing side channel or other types of attacks. The next section is aimed toward increasing reliability through decomposing a circuit into linear and non-linear portions. A degree of linearity is introduced

xxii

Foreword

whereby the measure can be used to guide a bi-decomposition of a candidate circuit. The final contribution of Chapter Four is concerned with partially specified functions and describes how such functions can be linearized. The third and final part of the book is concerned with future technologies and is comprised of four contributions. The first contribution is concerned with reversible circuit synthesis via the use of functional decision diagrams (FDDs). Reversible circuit design is also considered in the second contribution; however this time a probabilistic approach in the form of an evolutionary algorithm is used. Although irreversible function classification has a rich history, the classification of reversible functions has not been studied to a similar depth. The next contribution is concerned with the classification of reversible circuits and provides several definitions and theorems. The final contribution moves from reversible logic into the more general realm of quantum operators and considers various decompositions for the Cn F gate as derived from the Cn NOT gate.

Mitchell A. Thornton Southern Methodist University, Dallas, Texas, USA June 2017

Preface Digital systems significantly contribute to the progress in almost all areas of our life. Boolean variables and functions are used to describe such systems. These variables can only carry two different values: 0 and 1. This is the smallest possible number and contributes to both a high reliability and a simpler production in comparison to systems with a higher number of different basic values. This book presents further improvements regarding a large number of problems by 36 authors from the international Boolean domain research community. Basic versions of the contributions of this book have been published in the proceedings of the 12th International Workshop on Boolean Problems [320]. Improvements in the Boolean domain require both progress in theory and powerful tools which utilize the new theory. The first part of this book deals with methods, algorithms, and programs for these aims. Solutions of many Boolean problems exponentially depend on the number of variables. Hence, we are faced in the Boolean domain with the most complex problems. In addition to the well-known CD-SATformulas which are restricted to conjunctions of disjunctions (clauses), the more compact CDC-SAT-formulas are introduced where the variables of the disjunctions are replaced by the conjunction of Boolean variables. Due to the improved power of SAT-solvers and the high performance of ternary vectors and further concepts implemented in the XBOOLE system, Boolean problems can be solved that have a complexity far beyond any human possibilities. Hence, it remains to find a proper description of the problem using Boolean variables, Boolean functions, and Boolean equations. Using many examples reaching from combinatorics on the chessboard, over several covering problems, to different graph coloring problems, the creation of models represented by Boolean equations as a unifying instrument have been demonstrated.

xxiv

Preface

An index generation function is a function which maps a binary input pattern to a unique non-zero integer index value. Such a pattern may represent a virus to be detected or a packet to be routed. The number of Boolean variables needed to distinguish between all patterns determines the size and cost of the hardware needed to realize such a function. Assuming that the k patterns must be detected then m variables are needed, where log2 k ≤ m ≤ k − 1. Using an experimental approach it has been found that the minimum number of variables needed in the realization of an index generation function can be expected to be closer to the lower bound than to the upper bound, especially when k is large. Hence, most index generation functions can be realized using inexpensive conventional memory. Furthermore, it has been found that balanced columns are of benefit to the search for minimum distinguishing sets, especially when k is small. Significant improvements in terms of performance and energy efficiency can be achieved when instead of an exact implementation an approximate one is realized. Approximate computing is a technique that relaxes the requirement for an exact equivalence between the specification and the implementation of circuits. This approach can be used, e.g., when the limited perceptual capabilities of humans do not require an exact numerical computation. The quality of an approximation is measured using an error metric that compares the approximated function with the original one. Several error metrics are explored with the result that the synthesis for approximate computing with precise error bounds is a difficult task. The derived challenges have a need for strong methods in computing precise errors as well as heuristics methods. Spectral techniques based on various spectral transforms in different algebraic structures provide the foundations for the approaches for classifying Boolean functions, detecting their hidden properties, or reducing the computation effort. A comprehensive review of the origins and evolution of spectral techniques provide the readers with a very useful basis for research in the areas of design of digital systems and signal processing. Many references to books or articles in journals support this research. This review indicates that both serious tasks and restricted resources are incitements for scientific progress and practical applications. It can be expected that spectral techniques furthermore contribute to improvements in the Boolean domain.

Preface

xxv

Topology is a fundamental branch of mathematics that explores the properties of mathematical structures. Basics of topology are geometry and set theory. The descriptive set theory that explores operator algebras, computability, mathematical logic, as well as harmonic analysis belong to the wide field of applications of topologies. Due to the focus on computational problems finite topologies are explored. It is shown how objects and concepts from finite topologies can be modeled using relations, how related tasks can be expressed using the language of relation algebra and how the RelView system can be used to compute and visualize solutions. The efficient implementation of this tool allows for experiments with very large topologies. The clocked synchronization of digital systems ensures their deterministic behavior to the price of a certain inefficiency. Analog systems work efficiently but suffer under an ambiguous behavior. It is a challenge to couple the advantages of these types of systems to create efficient deterministic circuits. Key issues in this field of research are partially defined functions, their models, and utilization within the design process. A new formal methodology is suggested that warrants the match between the partially specified functions and real world asynchronous feedback structures. This dual-rail approach combines the benefits of both traditional types of systems and can even be used for safety critical systems. Due to the exponentiation complexity of almost all Boolean problems efficient tools for their solution are needed. As an example of a very hard Boolean problem the computation of the number of bent functions has been selected. Bent functions are the most non-linear functions that can be used in cryptography to resist linear attacks. In a previous work an expensive reconfigurable computer was used to speed-up this calculation by about 60,000 times. The utilization of both a deeper knowledge about bent functions and the application of a circular pipeline on a much cheaper Field Programmable Gate Array (FPGA) result in an additional speed-up of more than two orders of magnitude. The computation of bent functions is also the topic of other research. The key idea of this approach consists of the random generation of a function of a certain even number of variables and the check to see whether it is a bent function. Due to the very small fraction of

xxvi

Preface

bent functions the generation is executed in the Reed-Muller domain, where the search spaces for bent functions can be restricted be means of several theorems. The fast Reed-Muller transform has been used to compute the truth vector of a Boolean function. Utilizing the GPU an additional speedup of up to three orders of magnitude has been reached for bent functions of up to ten variables. An important practical problem is the detection and correction of one or more bit flips (errors) in data words, for which data coding is typically exploited. There is always the risk that bit flips change valid code words into other valid code words, which prohibits both error detection and correction. In order to minimize this risk nonsystematic, non-linear AN-codes are used. The letter A indicates an integer constant used to encode the data word N , which is usually also an integer number. Here, the error detection capability is influenced by both the parameter A and the data type of N . To estimate the risk of undetectable bit flips, we need to compute the distance distribution between the codewords for each possible value of A depending on the width k of the data words. This computation is a big challenge which has time complexity in the order of 4k . Efficient multi-GPU implementations have been developed to solve this problem and determine preferable values of A. The representation of a Boolean function as an orthogonal list of ternary vectors allows us to use such a TVL for both a disjunctive form and an antivalence form. The knowledge that each binary vector cannot be covered by more than one ternary vector of such an orthogonal TVL is an additional advantage. A special order in which the ternary vectors are selected from a TVL in disjunctive form to compute the needed orthogonal difference leads to a shorter number of ternary vectors in the resulting TVL. The preprocessing steps of absorption and sorting of the ternary vectors by increasing numbers of dashes additionally contribute to both a shorter time needed to compute an orthogonal TVL and the smaller number of ternary vectors in the result. Improvements in the Boolean domain considerably affect the development and application of digital circuits. Due to the extensive use of such circuits in almost all areas of our daily life we immediately notice this progress. Digital circuits have been developed and applied over

Preface

xxvii

several decades. Hence, one could think all problems about them have already been solved. The second part of this book explores new insights in this field and confirms the continuous progress in appropriate research and applications. Bi-decomposition is a powerful synthesis method for combinational circuits. This methods splits a given function into two simpler functions which control the inputs of an AND-, an OR, or an XOR-gate such that the given function appears on the output of this gate. The simplification of the decomposition functions is reached in the case of the strong bi-decomposition by a smaller number of variables that the decomposition functions are depending on. Strong and weak bi-decompositions are sufficient for a complete synthesis approach. The decomposition functions of recently suggested vectorial bi-decompositions are simpler than the given function because of the independence of the simultaneous change of several variables. The generalized theory of derivative operations for lattices of the second level has been utilized for vectorial bi-decompositions of such lattices and furthermore reduces the needed chip-area, power, and delay of the synthesized circuits. An interesting analysis about the visualization in both the hardware and software domains come to astounding and alarming results. While software visualization is an active field of research that supports the software designer with many helpful visualization techniques, the hardware visualization is in a state of a “lost world”. Almost unchanged over several decades are graph views that are used to show how parts of the hardware are interconnected and waveform views visualize the signal changes over time. In the context of growing system designs, where both hardware and software contribute to the solution, innovative tools are needed for Hardware/Software Co-Visualization. The answer to the question “why” there is such a discrepancy between hardware and software visualization approaches can help to remove obstacles and encourage engineers and scientist to fill the recent gap in this field. Traditional aims in circuit design consist of the synthesis of smaller and faster circuits for a given function. A new contribution to improve the reached limits of these aims is the synthesis of complemented circuits. This approach utilizes the differences in the space and delay

xxviii

Preface

needed that can exist between a circuit that realizes the given function and a circuit of the complement of this function. The benefits of the new complemented circuits result from the common use of both the function and its complement as well as the utilization of given and created don’t cares. The theoretical basis of this approach is the utilization of Boolean relations which are explored for all ten operations depending on two-inputs. Comprehensive experimental results for both the exact and heuristic synthesis of more than 100 benchmark functions show that this new approach improves the known results of three-level circuits in many cases. Next to addition, multiplication is a frequently used arithmetic operation in digital circuits. While several approaches of optimized adders are known, the possibilities to optimize multipliers are not as yet completely utilized. There are two types of multipliers. Assuming n bits for each of the two input values, in regular arithmetic the output of the multiplier contains 2n bits, but in saturation arithmetic the output is restricted to n bits. A comprehensive exploration of circuit structures of multipliers in both regular and saturation arithmetic leads to up to 33% faster circuits in comparison to the multipliers synthesized by the commercial tool Synopsys. The sources for this improvement are the use of a monolithic multiplier block of a size of around 4 × 4, the concatenation of fitted intermediate results, and a restricted tree of adders. Unfortunately, the reached speed up of monolithic multiplier blocks of a size of 4 × 4 or 5 × 5 requires about two to five times more area. The power consumption becomes a more and more important limitation factor for very large scale integrated circuits. The thermal leakage power causes a temperature rise that constrains the circuit behavior and requires additional equipment for heat transmission away from the device. Low power consumption is also welcome for a long period of use of a mobile device until the next charge of the rechargeable battery. The power consumption is caused by the switching elements. One contribution to reduce the power consumption consists of the state assignment of an automaton such that a minimal number of switching elements must be changed for the needed transitions. A basic model and an heuristic algorithm for this task will be explained. This approach can be used for asynchronous finite state machines and leads to race-free circuits of low power consumption.

Preface

xxix

Digital systems are realized by logic gates and flip-flops. Many synthesis approaches are known to find a circuit structure of these building blocks for a given behavior. Transistors are the real switching elements used within the logic gates and flip-flops. A more fine granular modeling technique has been suggested that directly allows us to use transistors as basic building blocks of digital systems. The theoretical foundation is constituted on the definition of both partial and total operations of the implication and equivalence. This approach can be uniformly used on several levels of abstraction: the global behavior represented by a directed graph, the more concrete signal flow graph which can be seen as the most abstract structural view of a arbitrary circuit, the even more concrete signal flow plan that consists of modules of partially defined behaviors, down to the transaction level modeling. The complexity of digital circuits requires the use of design automation tools. Consequently, new synthesis procedures are implemented in such tools to improve the structure of the designed circuits. The only way to compare the properties of several synthesis tools is the synthesis of a set of circuits based on the same descriptions of these benchmark circuits. This general approach has been used over several decades and different benchmark sets were published and used for this purpose. However, the circuit implementations have been changed over the years and influenced the creation of new benchmark sets. A prudent approach led to a comprehensive collection of benchmark circuits for logic synthesis and optimization. The benefit of this collection is a unique description of benchmarks of many sources presented in a cleaned, flattened form and split into connected components. Secret information stored within a hardware system is the target of side-channel attacks such as the differential fault analysis. Such attacks try to inject faults into the system that alter the output. Knowing the injected faults and the associated output signals the wanted secret keys can be calculated. The faults can be injected within the communication channel. Security oriented codes for the transmitted data can be used to detect injected faults and prevent such sidechannel attacks. The few known codes have drawbacks regarding the error masking probability as well as the cost of their implementation. New suggested low complexity, high rate robust codes are the shortened quadratic-sum, triple sum, and triple-quadratic-sum codes.

xxx

Preface

These codes close the mentioned gaps and are able to detect any error in the transmitted data with non-zero probability. The decrease of circuit structures to a few nanometers has the benefit of both a reduced area and a reduced power consumption but unfortunately increases the appearance of faults caused by outer influences like cosmic radiations. Hence, fault tolerant techniques must be used to improve the reliability of digital circuitry. One of these techniques is the extension of the circuit by redundancy such that an error correction becomes realizable. Low density parity check codes can be used for this purpose and were successfully applied to improve the reliability of XOR-only logic network. This requires a split of the circuit into a linear and a non-linear part. Methods to synthesize circuits where the linear part is separated from the non-linear part are explored for adders of different sizes. Both the strong and the vectorial bi-decompositions contribute to this aim of synthesis. Another application of a linear circuit is the transformation of incompletely specified Boolean functions of n variables into a Boolean space of m variables where m < n. Such a transformation is possible when the function values are specified only for k input patterns and the value of k is much smaller than 2n . Applications of this task are, e.g., the design of on-line real-time control systems or built-in self-test equipment. The number of variables m of the target space should be as small as possible. An efficient method of finding a linear transform that is injective for the k specified input patterns will be explained. Using the knowledge of the coding theory and the theory of finite fields a lower bound has been found which strongly reduces the search space for such a transformation. This lower bound depends on both the number of variables n and the number of specified input patterns k. The provided results have interesting connections to linear error-correcting codes. The third part of this book deals with problems that the Boolean domain will be faced with in the future. The continued reduction of the size of the switching elements brings us closer (and close) to the level of single atoms and quantum logic. Completely different physical laws must be considered in this field. The exploration of reversible circuits can be seen as a bridge between traditional circuit structures and circuits realized using future quantum technologies.

Preface

xxxi

The embedding of an irreversible function into a reversible function requires ancilla lines. The number of these lines should be as small as possible. Both the number and types of the used reversible gates determine the total quantum cost which must also be minimized. One synthesis approach of reversible circuits consists of the mapping of the nodes of a decision diagram to the gate of a reversible circuit. New algorithms utilize the benefit of several types of functional decision diagrams, the method of the traversal of these diagrams, as well as the order of the used reversible gates for a successful minimization of both the number of ancilla lines and the quantum cost of generated reversible circuits. Another approach to find an optimal reversible circuit consists of the application of genetic algorithms. The chromosomes of these algorithms represent sequences of reversible gates. Known experiments have shown that this approach is able to find solutions and reduce the quantum cost over consecutive populations. However, due to the very large search space this approach is very time consuming. Four distributed models were suggested which adapt the genetic algorithm to the new topologies with the result that similar reversible circuits were found much faster and in the best case the quantum cost could be reduced to about one half. A more theoretical contribution to the synthesis of reversible circuits consists of the systematic analysis of reversible functions. The key idea is the classification of reversible functions based on known properties of their component functions like the equivalence regarding the permutation and/or negation of variables, unateness, nonlinearity, self-duality, self-complementarity, and linearity regarding selected variables. An obvious result is that the negation or permutation of both the variables and the component functions cannot transform a reversible into a non-reversible function or vice versa. Proved theorems that the reversible functions with certain properties of their component functions cannot exist or the relationship between reversible hidden weighted bit functions and their component functions can be utilized in future synthesis procedures have been shown. Reversible circuits are explored with the aim to support the synthesis of quantum circuits. However, quantum gates significantly extend the possibilities to create a wanted function. A general approach of

xxxii

Preface

the synthesis of quantum circuits based on the template of the Cn NOT gates will be provided. This approach allows us to generate circuit structures of reversible functions which are cheaper than the original Cn NOT gates. The decomposition of a C n N OT function into a function that contains only a set of single qubit controlled operators and a function of a V† gate controlled by a symmetric function support the utilized manipulations. Formal representations of this approach could be simplified by means of the used quantum operator form.

Bernd Steinbach Department of Computer Science Technische Universität Bergakademie Freiberg Freiberg, Saxony, Germany

Acknowledgments This is the third book of a series that summarizes the best scientific results of contributions to the International Workshop on Boolean Problems (IWSBP). The first two books [323, 324] were entitled Recent Progress in the Boolean Domain and Problems and New Solutions in the Boolean Domain. Attendees of the 12th IWSBP, as well as many people working in the Boolean domain or faced with such problems, encouraged me to prepare again a book about the best results from this workshop. Cambridge Scholars Publishing accepted my proposal for this third book within this series about problems and solutions in the Boolean domain. Many people contributed to the origins of this book. I would like to thank all of them, starting with the scientists and engineers who have been working hard on Boolean problems. Their results are the basis for all papers submitted to the 12th IWSBP. My next thanks go to the 29 members of the program committee from 12 countries. Based on their reviews the best papers submitted were selected for presentation at the 12th IWSBP. Furthermore, the hints of the reviewers helped the authors to improve the final versions of their papers. My special thanks go to the invited speakers: Prof. Gerhard W. Dueck from the University of New Brunswick, Canada, and Prof. Rolf Drechsler from the University of Bremen, Germany, as well as all presenters of the papers and attendees for their very interesting presentations and fruitful discussions during the workshop. Besides the technical program, such an international workshop requires a lot of work to organize all the necessary parts. Without the support of Ms. Dr.

xxxiv

Acknowledgments

Galina Rudolf, Ms. Karin Schüttauf, and Ms. Birgit Steffen, I would not have been able to organize this series of workshops. Hence, I would very much like to thank them for their valuable hard work. As well as the authors often larger groups contribute to the presented results. In many cases these people are financially supported by grants of many different organizations. Both the authors of the sections of this book and me thank them for this significant support. The list of these organizations, the numbers of grants, and the titles of the supported projects is so long that I must forward the interested reader for this information to the proceedings of the 12th IWSBP [320]. Exemplary, we acknowledge some of these supporters here: Sometimes a single person originates very fruitful research, like C. Johnson [159] whose initial work on circular pipelines inspired the authors of Section 2.1 to find their presented results. Research projects often require financial support for the realization. As an example, the German Research Foundation (DFG) supported the research about the computational complexity of error metrics presented in Section 1.3 in the project MANIAC (DR 287/29-1). Even the collaborative research on several topics in the field of reversible circuits which are presented in Sections 5.1, 5.2, and 5.3 has commonly been partially supported by the EU COST Action IC 1405 “Reversible Computation - Extending Horizons of Computing”. I would like to emphasize that this book is the collective work of many authors. Their names are directly associated to each section and additionally summarized in lexicographical order in the List of Authors starting on page 465 and the Index of Authors on page 473. Many thanks to all of them for their excellent collaboration and high quality contributions. My special thanks go to Prof. Mitchell A. Thornton for his Foreword, Alison Rigg for corrections to the English text, and Dr. Galina Rudolf for her support in solving several issues regarding the LATEX-tools. The scientific source of this series of books is the International Workshop on Boolean Problems (IWSBP). I initiated this workshop and have contributed more than two decades as General Chair to its continuing progress and the growing prestige of this workshop at the Freiberg University of Mining and Technology and all over the world.

Acknowledgments

xxxv

It was a pleasure for me to notice the interest of the attendees, the warm atmosphere during the workshops, and the fruitful influence to the progress in the Boolean domain. Unfortunately, due to my retirement, I cannot continue this work; but, it is a great pleasure for me to announce that Prof. Rolf Drechsler from the University of Bremen agreed to continue this series of workshops. I would like to thank Prof. Drechsler for continuing the bi-annual series of International Workshops on Boolean Problems and wish him success in this work. I also like to thank everyone who helped me in my work for the IWSBP over this period and hope that the successful history of this workshop continues in Bremen. Finally, I would like to thank Ms. Victoria Carruthers and Ms. Hannah Fletcher from the department of Author Liaison of the publisher Cambridge Scholars Publishing for her acceptance in preparing this scientific book using LATEX, and for her very kind collaboration. I hope that all readers enjoy reading the book and find helpful suggestions for their own work in the future. It will be my pleasure to talk with many readers at one of the next International Workshops on Boolean Problems or indeed at any other place.

Bernd Steinbach Department of Computer Science Technische Universität Bergakademie Freiberg Freiberg, Saxony, Germany

List of Abbreviations AES

Advanced Encryption Standard

AIG

AND Inverter Graph

ANF

Antivalence Normal Form

AlgNF

Algebraic Normal Form

AF

Antivalence Form

ASIC

Application-Specific Integrated Circuit

ATPG

Automatic Test Pattern Generation

BCH

Bose Chaudhuri Hocquenghem

BDC

Boolean Differential Calculus

BDD

Binary Decision Diagram

BLIF

Berkeley Logic Interchange Format

CAM

Content Addressable Memory

CSF

Complete Symmetric Function

CLA

Carry Look-Ahead Adder

CMOS

Complementary Metal-Oxide-Semiconductor

CNF

Conjunctive Normal Form

CPE

Code Prediction Encoding

CF

Conjunctive Form

CPU

Central Processing Unit

CUDA

Compute Unified Device Architecture

CUDD

Colorado University Decision Diagram

xxxviii

List of Abbreviations

DD

Decision Diagram

DES

Data Encryption Standard

DF

Disjunctive Form

DFA

Differential Fault Analysis

DFT

Discrete Fourier Transformation

DG

Directed Graph

DNF

Disjunctive Normal Form

DOE

Design Of Experiments

DRAM

Dynamic Random Access Memory

DUT

Device Under Test

ECC

Error Correction Code

EDA

Electronic Design Automation

EDC

Error Detection Code

EDIF

Electronic Design Interchange Format

ENF

Equivalence Normal Form

EF

Equivalence Form

ESL

Electronic System Level

ESOP

Exclusive-OR Sum Of Products

FDD

Functional Decision Diagram

FFT

Fast Fourier Transformation

FPFDD

Fixed-Polarity Functional Decision Diagram

FPGA

Field Programmable Gate Array

FTM

Fourier Transformation Method

FSM

Finite State Machine

GE

Gate Equivalent

GPC

Generalized Punctured-Cubic

List of Abbreviations

xxxix

GPU

Graphics Processing Unit

HDL

Hardware Description Language

HWB

Hidden Weighted Bit

KFDD

Kronecker Functional Decision Diagram

ISCAS

International Symposium on Circuits and Systems

ISF

Incompletely Specified Function

ITC

International Test Conference

IWLS

International Workshop on Logic Synthesis

IWSBP

International Workshop on Boolean Problems

LB

Lower Bound

LDPC

Low Density Parity Check

LEKO

Logic synthesis Examples with Known Optimal circuit

LEKU

Logic synthesis Examples with Known Upper bounds

LUT

Look-Up Table

LV

Linear Variable

MCNC

Microelectronics Center of North California

MUX

Multiplexer

MVPLA

Multi-Valued Programmable Logic Array

NP

Negation and Permutation of variables

NPC

Negation with Preservation of Constants

NPN

Negation and Permutation of variables, Negation of the function

NPNP

Negation and Permutation of variables, Negation and Permutation of component functions

NPL

National Physical Laboratory

OBDD

Ordered Binary Decision Diagram

xl

List of Abbreviations

P

Permutation of variables

PC

Punctured-Cubic

PLA

Programmable Logic Array

POS

Product Of Sums

PPRM

Positive Polarity Reed-Muller

PS

Punctured-Square

QA

Quality Assurance

QC

Quantum Cost

QOF

Quantum Operator Form

QS

Quadratic-Sum

QUIP

Quartus University Interface Program

RAM

Random Access Memory

RCA

Ripple Carry Adder

RNG

Random Number Generator

RSA

Rivest Shamir and Adleman cryptosystem

RTL

Register Transfer Level

SCA

Side Channel Attack

SDC

Silent Data Corruption

SEU

Single Event Upset

SFP

Signal Flow Plan

SFG

Signal Flow Graph

SIMD

Single Instruction, Multiple Data

SOP

Sum Of Products

SPP

Sum of Pseudo-Products

SQS

Shortened Quadratic-Sum

STG

Signal Transition Graph

List of Abbreviations

TQS

Triple-Quadratic-Sum

TS

Triple-Sum

TV

Ternary Vector

TVL

Ternary Vector List

TLM

Transaction Level Modeling

UCP

Unate Covering Problem

VHDL

Very High Speed Integrated Circuit Hardware Description Language

VLSI

Very-Large-Scale Integration

VR

Virtual Reality

WS

Weighted Sum

xli

Extensions in Theory and Computations

1. Models, Methods, and Techniques 1.1. NP-Problems and Boolean Equations Christian Posthoff

Bernd Steinbach

1.1.1. Classes of Hard Problems In this contribution we are not aiming to achieve a deep understanding of complexity theory. This would require a book on its own. Instead, we present effective methods for solving a set of very difficult problems by means of efficient methods for the solution of Boolean equations. Our considerations can be based on the satisfiability problem [39]. It asks whether a formula of the propositional calculus can be satisfied (and for which values of the variables). Mainly, the special case of a conjunctive form [331] will be considered. This problem can be solved in a time that depends exponentially on the number of variables. It is possible to construct a table (the truth table) with all possible combinations of values for the variables. In the case of n variables the truth table will have 2n lines (2n binary vectors). The SAT-problem belongs to the N P class of problems that can be solved non-deterministically in polynomial time. This means that there is a guess for a solution which must be verified. It is unsolved whether the satisfiability problem can be solved deterministically in polynomial time. Recent results and the achieved solutions show that

4

Models, Methods, and Techniques

this problem may not be so important anymore because the application of SAT-solvers or lists of ternary vectors together with settheoretic operations solves problems far beyond any human capabilities. SAT-solvers have considerably improved over the last 20 - 30 years; however they rely on search algorithms of many different shapes. Algorithms around sets implemented by the concepts of XBOOLE [327, 349] also show a very high performance. We will concentrate on these concepts. This contribution emphasizes another property of the SAT-problem as well. It is N P-complete. This means that each problem of the class N P can be transformed into a SAT-problem. Instead of using many different solution algorithms the problem only needs to be transformed into a SAT-problem; thereafter we are using XBOOLE algorithms for their solution. This section will first give an understanding of the solution of Boolean equations; thereafter we present some examples for the transformation of other N P-problems.

1.1.2. Boolean Functions and Equations The basic functions are well known, but they are here summarized for completeness. The negation converts the two values 0 and 1 into each other: Table 1.1. Negation

x

x

0 1

1 0

It depends only on one argument; the functions shown in Table 1.2 use two arguments. There are several rules for using these functions which are well known. It should be noted that the equivalence (x  y) is the negation of the antivalence and vice versa.

NP-Problems and Boolean Equations

5

Table 1.2. Boolean functions of two arguments

conjunction (AND)

disjunction (OR)

antivalence (XOR)

implication (IF . . . THEN)

x

y

x∧y

x∨y

x⊕y

x→y

0 0 1 1

0 1 0 1

0 0 0 1

0 1 1 1

0 1 1 0

1 1 0 1

In order to define the functions of n variables, we consider vectors of n values 0 or 1: x = (x1 , . . . , xn ). There are 2n different vectors. If we assign a value of 0 or 1 to each vector, then we get a function f (x) which depends on n variables. It can be written as a vector of the length 2n .

1.1.3. Boolean Equations and Ternary Vectors Definition 1.1 (Boolean Equation). Let x = (x1 , . . . , xn ), f (x) and g(x) be two Boolean functions, then f (x) = g(x)

(1.1)

is a Boolean equation of n variables. The vector b = (b1 , . . . , bn ) is a solution of this equation if f (b) = g(b), i.e., f (b) = g(b) = 0 or f (b) = g(b) = 1. It can always be assumed that the two functions f and g depend on the same set of variables. If, for instance, the variable xi is not used in a formula for f then it can be added by f (x) = (xi ∨ xi )f (x) = xi f (x) ∨ xi f (x) . In practice, however, it is not necessary to do so. It is possible to easily reduce the considerations to homogeneous equations.

6

Models, Methods, and Techniques

Theorem 1.1 (Homogeneous Equation). The equation f (x) = g(x) is equivalent to the two homogeneous equations f (x) ⊕ g(x) = 0 , f (x)  g(x) = 1 ,

(1.2) (1.3)

where the symbol ⊕ stands for the antivalence, and  for the equivalence of Boolean values. The original equation and the two homogeneous equations have the same solution set [243]. A SAT-equation [39] is a special equation where the expression on the left-hand side is a conjunction of disjunctions (clauses) of literals, and the right-hand side is equal to 1. Ternary vectors can be used to store the solution set of a Boolean equation in a compact manner. Let t = (t1 , . . . , tn ), ti ∈ {0, 1, −}, i = 1, . . . , n, then t is a ternary vector which can be understood as an abbreviation for a set of binary vectors. The symbol − can be replaced by 0 and 1, therefore two vectors can be generated by one −. A ternary vector can also express a conjunction of Boolean variables xi using the following assignments: xi :

ti

=

1,

xi :

ti

=

0,

xi missing :

ti

=

−,

which then represents the set of binary vectors for which the conjunction is equal to 1. For the implementation of the operations in a program the coding of Table 1.3 has been used. The intersection of two sets is empty if and only if bit1(x) ∧ bit1(y) ∧ (bit2(x) ⊕ bit2(y)) = 0 .

NP-Problems and Boolean Equations

7

Table 1.3. Binary code of ternary values ternary value

bit1

bit2

0 1 −

1 1 0

0 1 0

If the intersection is not empty it can be determined by the following bit vector operations: bit1(x ∩ y) = bit1(x) ∨ bit1(y) , bit2(x ∩ y) = bit2(x) ∨ bit2(y) . Example 1.1 (3-SAT-Problem). Let f (a, b, c, d, e) = (a ∨ b ∨ c)(b ∨ d ∨ e)(a ∨ d ∨ e)(b ∨ c ∨ e) = 1 . This is an example for a 3-SAT-problem: there are four disjunctions with three variables connected by conjunctions. We want to find all vectors (a, b, c, d, e) with f (a, b, c, d, e) = 1 and keep in mind that a conjunction is equal to 1 if each disjunction is equal to 1. The search space has a size of 25 . Using Ternary Vector Lists (TVLs) [243, 327, 342, 349] as the appropriate data structure, we get the following four TVLs as partial solution sets Si of the given four disjunctions: a 1 S1 = 0 0

b − 0 1

c − − 0

d − − −

e − − −

a − S2 = − −

b c 1 − 0 − 0 −

d − 0 1

e − − 0

a 0 S3 = 1 1

b − − −

c − − −

d − 1 0

e − − 1

a − S4 = − −

b 1 0 0

d − − −

e − − 0

c − 1 0

8

Models, Methods, and Techniques

Each row represents a set of solutions for one disjunction: for the first disjunction (a ∨ b ∨ c) a = 1 is a solution independent of the values for the other variables. If a = 0 then b must be equal to 0 in order to get a solution, and finally for a = 0 and b = 1 the value of c must be equal to 0. There are no other solutions, and the three solution sets are disjoint (orthogonal). The intersection of these four TVLs (of the four solution sets) provides the set S a − 1 S= 0 1 0

b 0 1 1 − 0

c 1 − 0 − −

d 0 − − 1 −

e 1 1 − 0 0

of all solutions of the equation: (a ∨ b ∨ c)(b ∨ d ∨ e)(a ∨ d ∨ e)(b ∨ c ∨ e) = 1 .

(1.4)

This computation can easily be done using the XBOOLE-Monitor [342]. Five ternary vectors represent the 18 solution vectors in an orthogonal manner. The XBOOLE operation OBBC (orthogonal block building and change) was used to reduce the number of ternary vectors of the intermediate solution set. Advantages of this data structure are: • parallel operations on the machine level (64 bits); • restricted horizontal evaluation of ternary vectors; • approach not limited to 3-SAT-problems—conjunctions of arbitrary functions can be handled in this way; • very often there is no need to use the formula; ternary vectors can be used immediately; • restrictions can be included into the vectors;

NP-Problems and Boolean Equations

9

• all the possible solutions are captured, there is no need to search for other possibilities; • using the intersection reduces the number of possible solutions; after each intersection the sets of solution candidates will be smaller; • all solutions will be found at the same time. Formula (1.4) is an example of a SAT-equation. The left-hand side of this equation consists of conjunctions of disjunctions. Single variables (negated or not negated) are connected by disjunctions within each clause (disjunction). We call such problems CD-SAT-problems. Such problems can also be solved by using SAT-solvers [38, 39]. The general method of SAT-solvers is the assignment of values to the variables and a check of whether the chosen assignment satisfies the given equation. However, SAT-solvers only find either one solution or all solutions separately, they are not able to merge all solutions into a similar compact representation as shown in Example 1.1. The example in Figure 1.1, however, shows that frequently it is not necessary to go back to single variables. Restrictions can be included directly into the equation, so instead of disjunctions of variables we get disjunctions of conjunctions (DC). These disjunctions of conjunctions are now combined by conjunctions, and we receive an extended CDC-SAT-problem that can be efficiently solved using the XBOOLE operation intersection (ISC). However, according to our knowledge, SAT-solvers are restricted to CD-SAT-problems and not able to solve the more compact CDC-SAT-problems.

1.1.4. NP-Complete Problems Here, several N P-complete problems will be presented together with the respective Boolean equations and their solutions.

Combinatorics on the Chessboard. This is already a nice example for the application of this method. The logical model of this problem

10

Models, Methods, and Techniques

0Z0Z0Z0L Z0Z0L0Z0 6 0ZQZ0Z0Z 5 L0Z0OQZ0 4 0Z0L0Z0Z 3 ZQO0Z0L0 2 0Z0ZQZ0Z 1 (a) Z0L0Z0Z0 8 7

a

b

c

d

e

f

g

h

0Z0Z0Z0L Z0L0Z0Z0 6 0Z0ZQZ0Z 5 ZQZ0O0L0 4 0Z0L0Z0Z 3 L0O0ZQZ0 2 0ZQZ0Z0Z 1 (b) Z0Z0L0Z0 8 7

a

b

c

d

e

f

g

h

Figure 1.1. Ten queens and two pawns located at c3 and e5: (a) first solution (b) second solution.

results in a Boolean equation that includes the requirements as well as the restrictions. We want to explore the hypothesis that on a board 8 × 8 the addition of a pawn on the board allows 8 + 1 = 9 queens, the addition of 2 pawns allows 8 + 2 = 10 queens etc. and find the position of 10 queens not threatening each other when two pawns have been placed. It is easy to understand that the pawns interrupt the action lines of the queens, however, the number of real solutions and the exact placement of the pawns remain to be found. For the first column of the chessboard we use the requirement (a1 ∨ a2 ∨ a3 ∨ a4 ∨ a5 ∨ a6 ∨ a7 ∨ a8) and connect each position with the respective restrictions: (a1a2 a3 a4 a5 a6 a7 a8 ∨ a1a2 a3 a4 a5 a6 a7 a8 ∨ . . . a1 a2 a3 a4 a5 a6 a7 a8) . Further restrictions must be added to each conjunction for the fields on the associated row and both diagonals: for the queen on a1, we add the fields on the first row as well as the field b2. The pawn on c3

NP-Problems and Boolean Equations

11

stops the action of the queen on a1: a1a2 a3 a4 a5 a6 a7 a8 b1 c1 d1 e1 f 1 g1 h1 b2 . When this has been done for all the fields (except for those occupied by the pawns) then we directly get a CDC-SAT-formula. The benefit of this approach is that each conjunction specifies a possible assignment and all consequences. For line c we have a special situation: the constraints are interrupted by the pawn on c3; therefore we have (c1 ∨ c2)(c4 ∨ c5 ∨ c6 ∨ c7 ∨ c8), together with the respective restrictions. In this way we get a CDC-SAT-problem of 10 disjunctive forms instead of a CD-SAT-problem of 619 clauses. We achieved a complete solution of the problem by arranging all pairs of pawns on the lines from b to g and on the rows from 2 to 7. Figure 1.1 shows the two solutions of ten queens that exist for the positions of the pawns on c3 and e5. There are 44 solutions of ten queens and two pawns on a 8 × 8 chessboard. Table 1.4 summarizes the solutions using XBOOLE. Table 1.4. Two pawns and n + 2 queens on an n × n chessboard

rows

columns

queens

solutions

6 7 8 9 10 11 12

6 7 8 9 10 11 12

8 9 10 11 12 13 14

0 4 44 280 1,208 11,432 96,476

time in milliseconds 140 640 2,421 8,515 26,359 76,359 308,375

Set Covers. Given a set of elements U = {1, 2, . . . , m} and a set S = {S1 , . . . , Sn } of subsets of S with S1 ∪ S2 ∪ . . . ∪ Sn = U , find the smallest subset C ⊆ S of sets whose union is equal to U . Example 1.2 (Set cover). Let U = {1, 2, . . . , 5} and S = {{1, 2, 3}, {2, 4}, {3, 4}, {2, 4, 5}} .

12

Models, Methods, and Techniques

Table 1.5. Set cover: (a) all characteristic functions, (b) all minimal covers

(a)

χ

1

2

3

4

5

χ(S1 ) χ(S2 ) χ(S3 ) χ(S4 )

1 0 0 0

1 1 0 1

1 0 1 0

0 1 1 1

0 0 0 1

(b)

χ

1

2

3

4

5

χ(S1 ) χ(S4 )

1 0

1 1

1 0

0 1

0 1

We can cover all the elements with the following smaller number of sets: {{1, 2, 3}, {2, 4, 5}}

=⇒

{1, 2, 3} ∪ {2, 4, 5} = U .

C = S1 (S1 ∨ S2 ∨ S4 )(S1 ∨ S3 )(S2 ∨ S3 ∨ S4 ) S4 = S1 S4 = 1 .            1

2

3

4

5

This equation shows, for each element of U , which subsets can be taken for this element. The solution of this equation mainly uses the distribution and absorption laws. This is a minimal, but not an exact cover because element 2 of U is covered twice (see Table 1.5). The requirement for an exact cover can generally be specified by an equivalent equation. We find it, as shown in Example 1.3, by requesting that only one set can be used for each element. Example 1.3 (Exact set cover). Let U = {1, 2, 3, 4, 5, 6, 7} and S = { a = {1, 4, 7}, b = {1, 4}, c = {4, 5, 7}, d = {3, 5, 6}, e = {2, 3, 6, 7}, f = {2, 7} } , f (a, b, c, d, e, f ) = (a b ∨ a b) (e f ∨ e f ) (d e ∨ d e) ∧          1

2

3

(a b c ∨ a b c ∨ a b c) (c d ∨ c d) (d e ∨ d e) ∧          4

5

6

(a c e f ∨ a c e f ∨ a c e f ∨ a c ef ) = 1 .    7

NP-Problems and Boolean Equations

13

Table 1.6. Exact set cover

χ

1

2

3

4

5

6

7

b d f

1 0 0

0 0 1

0 1 0

1 0 0

0 1 0

0 1 0

0 0 1

a=0, b=1, c=0, d=1, e=0, f=1 is a solution of the equation. Table 1.6 confirms that the found solution is an exact cover. The solution set is empty if such an exact cover does not exist.

Vertex Covers. A vertex cover (sometimes node cover) of a graph is a set of vertices such that each edge of the graph is incident to at least one vertex of the set. It is easy to find the logical model. For each edge from vertex i to vertex j the vertex i or the vertex j or both of them must be an element of the vertex cover. Figure 1.2 shows an example of a graph, the associated CD-SAT-formula, and the solution.

(a)

1

2

6

3

4

5

(b) (x1 ∨ x3 )(x1 ∨ x2 )(x2 ∨ x3 )(x3 ∨ x4 )(x3 ∨ x5 )(x3 ∨ x6 ) = 1 (c) x1 x3 ∨ x2 x3 ∨ x1 x2 x4 x5 x6 = 1 Figure 1.2. Example of a vertex cover: (a) graph, (b) CD-SAT-equation, (c) minimal solutions.

In this way we get three sets of vertex covers: • {1, 3} ; • {2, 3} ; • {1, 2, 4, 5, 6} .

14

Models, Methods, and Techniques

The first two sets have the smallest number of elements. The complement of a vertex cover is the set of independent vertices. No vertex of this set is connected to another vertex of the set: • {1, 3} = {2, 4, 5, 6} ; • {2, 3} = {1, 4, 5, 6} ; • {1, 2, 4, 5, 6} = {3} . Problems of the vertex cover belong to the class of Unate Covering Problems (UCPs). Another important application of UCP consists of selecting a minimal set of prime-conjunctions in the circuit design. This class of problems has two special properties: 1. the clauses of the SAT-formula only contain non-negated variables; 2. the solution wanted consists of minimal sets; that means no element can be removed from a solution set without losing the satisfiability. The first property can be seen in Figure 1.2 (b). Due to the second property, SAT-solvers are not well suited for solving such problems. The assignment of the value 1 to all variables satisfies the given equation but does not belong, in most cases, to the solution wanted. Hence, the utilization of a SAT-solver requires the calculation of all solutions followed by a sieving with regard to minimal solutions. A basic method for solving a UCP consists of the application of the distributive, the idempotent, and the absorption rule. In [341] we found that the absorption applied to each intermediate result reduces the needed time by a factor of approximately 103 . We used this improved algorithm as the basis for later comparisons. A more powerful approach for solving a UCP utilizes the XBOOLE operations NDM (negation with regard to De Morgan) and CPL (complement), see [340]. This approach reduces the time needed to solve a UCP by a factor of about 105 on a single CPU-core. Instead of the calculation of the complement as a whole, several difference opera-

NP-Problems and Boolean Equations

15

tions can be executed on the available cores on the CPU. Using four CPU-cores, where one of them operates as an intelligent master, an additional speedup factor of more than 350 was measured [343]. By means of special algorithms on a Graphics Processing Unit (GPU) we were able to speed up the solution time by the remarkable factor of about 1011 [345]. Further optimization of this GPU-approach reduced the solution time by an additional factor of 3 [338].

Graph Colorings and Chromatic Numbers. There are many different problems dealing with the coloring of vertices and edges of graphs. All can be solved in the same way as the following examples show: • vertex coloring: labeling of the graph’s vertices with colors such that no two vertices sharing the same edge have the same color; • edge coloring of a graph is a proper coloring of the edges, an assignment of colors to edges so that no vertex is incident to two edges of the same color; • an edge coloring with k colors is called a k-edge-coloring. The smallest number of colors needed for an edge coloring of a graph G is the chromatic index, or edge chromatic number, χ(G). We start with Birkhoff’s Diamond. This is an example of node coloring problems. The vertices of the graph are supposed to be colored by four colors such that connected vertices have different colors. The constraints of this problem can be described using Boolean variables xvc where the index v indicates the number of the vertex and c the chosen color with the encoding of Figure 1.3 (c). We use vertex 6 as an example for the requirements of the model: (x61 ∨ x62 ∨ x63 ∨ x64 ) .

(1.5)

This expression describes that one of the four colors must be assigned to vertex 6. Now this requirement can be extended by the restrictions: (x61 x62 x63 x64 x51 x71 x11 ∨ x62 x61 x63 x64 x52 x72 x12 ∨ x63 x61 x62 x64 x53 x73 x13 ∨ x64 x61 x62 x63 x54 x74 x14 ) .

(1.6)

16

Models, Methods, and Techniques

5

4

5

10 6

10

7

9

3

6

7

8

(a)

9

3

8

1

2

(c)

4

(b)

1

c

color

pattern

1 2 3 4

red green blue yellow

horizontal lines vertical lines dots north east lines

2

Figure 1.3. Birkhoff’s Diamond: (a) uncolored, (b) colored by four colors, (c) encoding used.

Each vertex must be colored in exactly one of the four colors. If, e.g., vertex 6 is colored in red (x61 = 1) then this vertex cannot be green (x62 = 1), blue (x63 = 1), or yellow (x64 = 1). Further restrictions result from the adjacent vertices. Figure 1.3 (a) shows that vertex 6 is connected with vertices 1, 5, and 7. The last three literals in the first conjunction of (1.6) describe that the assignment of the color 1 (red) to vertex 6 prohibits the assignment of the same color to the adjacent vertices 1, 5, and 7. The other three conjunctions of (1.6) describe analog restrictions for the other three colors. There are ten such disjunctions of inner conjunctions connected by the outer conjunctions in this CDC-SAT-model; two of them contain seven, four of them eight, and the remaining four of them nine variables in the inner conjunctions. The ternary vectors for the conjunctions can be built directly—it is not necessary to write down all the disjunctions of inner conjunctions that describe the problem. The size of the search space is equal to 410 = 1, 048, 576, i.e., 4 colors for 10 vertices. The whole CDC-SATproblem is modeled by only ten disjunctions and the solution requires their intersection. Figure 1.3 (b) shows one of the 576 colorings of the

NP-Problems and Boolean Equations

17

Table 1.7. Calculation of all solutions to color Birkhoff’s Diamond using 3, 4 or 5 colors

number of nodes

colors

variables

10 10 10

3 4 5

30 40 50

time in solutions 0 576 40800

seconds 0.00 0.00 0.02

ten vertices of Birkhoff’s Diamond. Table 1.7 shows our experimental results [245] solving the CDC-SAT-problem for 3, 4 and 5 colors. This graph coloring can also be modeled as a CD-SAT-problem that consists of ten clauses for the requirements of the ten vertices such as (1.5), 10 ∗ (3 + 2 + 1) = 60 clauses for the color restrictions of the ten vertices, e.g., (x11 ∨ x12 ), and 21 ∗ 4 = 84 clauses for the color restrictions of the 21 edges, e.g., (x11 ∨ x21 ). Hence, the CD-SAT-problem consists of 154 clauses. The coloring of edges according to given constraints follows the same principles. Finding Ramsey numbers can also be achieved in this way, however, in this case the complexity increases very quickly.

Complete Subgraphs of a Graph - Cliques. We use the loop-free graph of Figure 1.4 as example and look for triangles—the three sides of a triangle must be edges of the graph. A schematic solution uses a very simple model: we select any three vertices and check whether the three required edges exist in the given

1

2

3

4

6

5

Figure 1.4. Graph with a clique of three vertices colored in red.

18

Models, Methods, and Techniques

graph. This leads to one equation for each possible clique: x12 x13 x23 x13 x14 . . . x56 = 1 x12 x14 x24 x13 x14 . . . x56 = 1 .. . x12 x15 x25 x13 x14 . . . x56 = 1 x45 x46 x56 x13 x14 . . . x36 = 1 .

(1.7)

 The number of triangles is growing very fast (here 63 = 20), however, non-existing edges reduce the size of the problem. In this example only the solution set of the last equation of (1.7) is not empty. Hence, 4 5 - 6 is a complete subgraph with three nodes. Figure 1.5 shows the adjacency matrix of the graph that can also be considered as a Boolean function and used to detect existing cliques of a certain size:

1 2 3 4 5 6

1 0 1 0 0 0 0

2 1 0 1 1 0 0

3 0 1 0 0 1 0

4 0 1 0 0 1 1

5 0 0 1 1 0 1

6 0 0 0 1 1 0

Figure 1.5. The adjacency matrix of the graph.

In order to find a clique we proceed as follows: • which columns and rows have to be deleted to get a submatrix with values 1 only (except the main diagonal)? • here we delete the rows and columns 1 - 2 - 3. The incidence matrix of a graph would allow the same procedure, but requires the assignment of labels to the edges. Figure 1.6 shows

NP-Problems and Boolean Equations

e1

1

e2

2

19

e3

4

e4

e7 e5

3

6 e6

5

Figure 1.6. Graph with a red colored clique specified by three edges.

a possible assignment of labels to the edges of the explored graph. Figure 1.7 depicts the associated incidence matrix:

1 2 3 4 5 6

e1 1 1 0 0 0 0

e2 0 1 0 1 0 0

e3 0 0 0 1 0 1

e4 0 1 1 0 0 0

e5 0 0 1 0 1 0

e6 0 0 0 0 1 1

e7 0 0 0 1 1 0

Figure 1.7. The incidence matrix of the graph of Figure 1.6.

The selection of the rows 4, 5 and 6 as well as the columns for e6 , e3 and e7 shows again the complete subgraph. Larger cliques can be built from triangles—a complete subgraph consisting of the nodes 1 - 2 - 3 - 4 must be built from four triangles 1 2 - 3, 1 - 2 - 4, 1 - 3 - 4 and 2 - 3 - 4. This property (also for more than 4 vertices) can be efficiently included into Boolean models.

Ramsey Numbers. The existence of monochromatic cliques of a certain size in a completely edge colored graph has been explored by Ramsey [252]. Ramsey’s Theorem states that one will find monochromatic cliques in any edge labeling (with colors) of a sufficiently large complete graph. This theorem can be used to solve the party problem: what is the minimum number (Ramsey number) of guests R(m, n) that must be invited so that at least m will know each other or at

20

Models, Methods, and Techniques

Table 1.8. Ramsey numbers R(r, s) s r

3

4

5

6

7

8

9

10

3 4 5 .. .

6 9 14 .. .

9 18 25 .. .

14 25 43-49 .. .

18 36-41 58-87 .. .

23 49-61 80-143 .. .

28 58-84 101-216 .. .

36 73-115 126-316 .. .

40-42 92-149 144-442 .. .

least n will not know each other. This number is called the Ramsey number R(m, n): • we use two colors and two positive integers’ r and s. There exists a least positive integer R(r, s) for which every blue-red edge coloring of the complete graph contains a blue clique on r vertices or a red clique on s vertices; • it is also possible to use more than two colors: R(3, 3, 3) or R(3, 4, 5) etc. Only a few Ramsey numbers are known, for other values of r and s huge intervals can be given (see Table 1.8). It has been found that R(3, 3, 3) = 17 . The respective Boolean   equations are very large. For R(3, 3, 3), for instance, there are 17 3 = 680 different triangles, and their edges can be colored with one of three colors. For each triangle there are 27 different colorings. The Boolean equations will express for how long a coloring of triangles with only one color can be avoided. Let us take three vertices 1 - 2 - 3 and three colors a, b and c. All non-monochromatic colorings of this triangle are the solutions of: x12a x13a x23a ∨ x12b x13b x23b ∨ x12c x13c x23c = 1 (x12a ∨ x13a ∨ x23a )(x12b ∨ x13b ∨ x23b )(x12c ∨ x13c ∨ x23c ) = 1 . (1.8)

NP-Problems and Boolean Equations

21

r = 4, s = 5, R(4, 5) = 25: a complete graph with 25 nodes has either a red clique of 4 vertices or a blue clique of 5 vertices: • Equation (1.8) will not allow any monochromatic coloring of one triangle;     = 560 triangles for 16 and 17 = 680 triangles • there are 16 3 3 for 17 nodes; • combining the equations for 560 triangles by ∧ - there will still be solutions of the equation; for 680 equations the solution set will be empty, and a monochromatic coloring of at least one triangle cannot be avoided; • these equations also show that it would be extremely difficult to find Ramsey numbers for larger values of r and s.

Edge Covers. An edge cover of a graph is a set of edges such that every vertex of the graph is incident to at least one edge of the set. Find the minimal edge cover! The building of the model is again not very difficult. For each vertex we combine the edges that are connected to the selected vertex by ∨. Reusing the graph of Figure 1.4 we get the following CD-SAT-formula: x12 (x12 ∨ x23 ∨ x24 )(x23 ∨ x35 )(x24 ∨ x45 ∨ x46 ) ∧(x35 ∨ x45 ∨ x56 )(x46 ∨ x56 ) = 1 . The solution for the edge cover is: solve the CD-SAT-problem and take the solution with a minimum number of values 1. The three red colored edges (not dashed) of Figure 1.8 belong to the edge cover of this graph. 1

2

3

4

6

5

Figure 1.8. Edge cover of a graph.

22

Models, Methods, and Techniques

The calculation of minimal edge covers belongs to the unate covering problems in the same way as the calculation of minimal vertex covers. Hence, very efficient solution methods as introduced for the vertex covers on page 13 can be applied.

Hamiltonian Paths. A Hamiltonian path starts from any selected vertex v of a graph, uses each vertex precisely once and returns to the chosen vertex v. We use the graph of Figure 1.9 (a) to explain the rules for Hamiltonian paths: • an edge from vertex i to vertex j (xij = 1) prohibits the reverse edge being used (xji = 0): (x12 → x21 ) = 1 ;

• an edge from vertex i to vertex j (xij = 1) prohibits all edges to other destination vertices k (xik = 0): (x12 → x13 ) ∧ (x12 → x14 ) = 1 ;

• an edge from vertex i to vertex j (xij = 1) prohibits all edges from other vertices s (xsj = 0): (x12 → x32 ) ∧ (x12 → x52 ) = 1 .

2

1

3

(a)

4

2

5

1

3

(b)

4

5

Figure 1.9. Hamiltonian path: (a) graph, (b) one solution.

NP-Problems and Boolean Equations

23

Using the starting edge x12 , the following equation summarizes the conditions introduced above: (x12 ∨ x21 )(x12 ∨ x13 )(x12 ∨ x14 )(x12 ∨ x32 )(x12 ∨ x52 ) ∧ x12 = 1 . It can be simplified by means of the absorption law to: x12 x21 x13 x14 x32 x52 = 1 .

(1.9)

Possible edges starting from a vertex can also be described by an equation. We use vertex 1 as an example: (x12 ∨ x13 ∨ x14 ) = 1 .

(1.10)

All Hamiltonian paths are the solution of a system of equations consisting of an adjusted equation of the type (1.9) for each directed edge and an adjusted equation of the type (1.10) for each vertex. Figure 1.9 (b) shows the solution 1-3-2-5-4-1. Other solutions are possible.

Eulerian Paths and Bridges. An Eulerian path visits every edge exactly once. Some vertices may be used several times. An Eulerian circuit or Eulerian cycle starts and ends on the same vertex. These paths were first discussed by Leonhard Euler while solving the famous Seven Bridges of Königsberg Problem in 1736. We use the graph of Figure 1.10 as an example . 1

2

0

3

4

Figure 1.10. Graph for exploring Eulerian paths.

It is not difficult to find the respective Boolean equation; for each edge we note that it can be used in one, and only one, direction:

24

Models, Methods, and Techniques

(x12 x21 ∨ x21 x12 ) (x10 x01 ∨ x01 x10 ) (x20 x02 ∨ x02 x20 )         12

01

02

∧ (x30 x03 ∨ x03 x30 ) (x34 x43 ∨ x43 x34 ) = 1 .       03

34

This equation is equal to 1, e.g., for the Eulerian paths 4-3-0-2-1-0 or 0-2-1-0-3-4.

1.1.5. Boolean Equations - a Unifying Instrument It could be seen that Boolean equations are a unifying instrument for solving many problems from combinatorics, graph theory etc. The suggested methods can also be extended to directed graphs, feedback problems and many more. In extension to the widely used CD-SATequations, CDC-SAT-equations were introduced. CDC-SAT-equations need many fewer disjunctions in comparison to the CD-SATequation to express the same information. The equations combine the requirements and the restrictions for a given problem in a very user-friendly way. Even the solution for cryptographic problems can be approached in this way. The respective programs for building the CD-SAT-problem are already available. We also recall the results of grid coloring and some Ramsey numbers which had been presented in [244, 339, 344, 346, 390].

Number of Variables of Index Generation Functions

25

1.2. Analysis of the Number of Variables to Represent Index Generation Functions Jon T. Butler

Tsutomu Sasao

1.2.1. Background An index generation function is an integer function in which a certain binary input pattern maps to a unique non-zero integer index value if that pattern is stored. If that pattern is not stored, the input pattern maps to 0. For example, a pattern may represent a virus to be detected or a packet to be routed. An index generation function is specified by a k × n binary matrix, where there are k rows (data entries or vectors) and n columns (variables or features). The minimization problem for index generation functions is to find the fewest columns whose bits represent sub-patterns that are all different across the k rows. That is, since duplicate rows are not allowed, all n columns distinguish all k rows. However, fewer columns may distinguish all k rows. A circuit realizing an index generation function is more economical if there are fewer columns. It has been shown [263, 264, 268, 275, 276] that k distinct rows can always be distinguished by m columns, where log2 k ≤ m ≤ k − 1. In this section, we show experimentally that the minimum number of distinguishing columns needed in the realization of a random index generation function is likely to be much closer to the lower bound than to the upper bound. This means that inexpensive conventional memory can be efficiently used to realize most index generation functions. In the case of (rare) index generation functions where many columns are needed, we can make another observation: It is likely that two rows are not distinguished until one column distinguishes them. We also show that balanced columns, where the numbers of 1’s and 0’s are the same, tend to reduce the number of columns needed.

26

Models, Methods, and Techniques

1.2.2. An Index Generation Unit An index generation function can be realized as a Content Addressable Memory (CAM). Unfortunately, typical CAMs tend to be powerhungry and expensive. A memory-based implementation of a CAM based on the index generation function has been proposed that uses one comparator and conventional Random Access Memories (RAMs) [268]. It consumes relatively little power and is inexpensive. Since it is memory-based, it can be easily modified to adapt the stored data to newly acquired data. Figure 1.11 shows the circuit. registered vectors x1 x

Linear Circuit

y

index

Main Memory

f

x2 AUX Memory Comparator = Figure 1.11. Index Generation Unit (IGU)

This circuit works as follows: The given data x is divided into two parts, x1 and x2 , and compared to the stored data across the two parts. x1 is chosen so that each stored data has a unique value among elements of x1 . There may be more than one choice for x1 , and one typically chooses an x1 with the smallest cardinality. This minimizes the size of the Main Memory. x1 is then applied to the address lines of the Main Memory through a Linear Circuit, which we assume, for the moment, is absent; viz., y = x1 . To show this, the Linear Circuit in Figure 1.11 is shaded. The data stored at each location is the unique index that accesses the remaining part of the data, as stored in the AUX Memory. Here, the rest of the data is stored at the location specified by the index, and is compared with the incoming data at x2 , using a comparator. Therefore, in the context of a match for both x1 and x2 , x is declared to be contained in the memory, where

Number of Variables of Index Generation Functions

27

x = (x1 , x2 ). In this case, the output f is the corresponding index. The data will not match if either or both x1 or x2 do not match. If x1 fails to match, the index out of the Main Memory is all 0’s, which causes f to be all 0’s, indicating a mismatch. On the other hand, if x1 does match, a value for x2 is extracted from the AUX Memory. If x2 fails to match, the comparator produces 0, and f is all 0’s. The non-zero data is called the index and is specific to the application. For example, if the application is routing, the index specifies a port number to which the current packet is to be directed. Further explanation on how this circuit works follows the notation presented in the next subsections. The Linear Circuit in Figure 1.11, if present, is used to reduce the number of address lines of the Main Memory, and, as a result, the size of memory in the Main Memory. That is, it may be possible to combine individual input variables in x1 using the exclusive OR operation without degrading the circuit’s ability as a CAM. More information on this can be found in [263, 266, 267, 275, 277].

1.2.3. Notation We formulate the above problem as follows: Definition 1.2. Let x = (x1 , x2 , . . . , xn ) be an n-bit binary vector, where xi ∈ {0, 1}. We say x is a registered vector, and X is a registered vector set if all x ∈ X are stored by an index generation function f . Specifically, f is a mapping f : {0, 1}n → {0, 1, 2, . . . , k}, such that f is a bijection (one-to-one and onto) between X and {1, 2, . . . k}, and, for x ∈ {0, 1}n \ X , f (x) = 0. The elements of {1, 2, . . . , k} are called indices. It follows that a bijection exists between the registered vector vector set and the indices, and that all vectors not in the registered vector set map to 0 under f . For example, in a virus detection application, a registered vector is a potential (stored) virus, and its index represents an address where that virus is processed. The weight of f is k. We assume the table representing f contains the k registered vectors only. Further, we assume that all non-registered vectors map to 0. This yields compact tables since typically, k is much less than the total number of possible

28

Models, Methods, and Techniques

vectors (2n ). Example 1.4. Table 1.9 shows an example of a registered vector table where n = 11 and the weight k is 8. Omitted from Table 1.9 are 2040 (= 211 − 8) non-registered vectors. These all map to 0. Table 1.9. Example of a registered vector table of weight 8 x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

f

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 0 0 0 0 0 0 1

0 0 0 0 0 0 1 0

0 0 0 0 0 1 0 0

0 0 0 0 1 0 0 0

0 0 0 1 0 0 0 0

0 0 1 0 0 0 0 0

0 1 0 0 0 0 0 0

1 0 0 0 0 0 0 0

1 2 3 4 5 6 7 8

Definition 1.3. A set S of variables (columns) is said to distinguish the rows (vectors) of a registered vector table if no two rows have identical values across the variables in S. Set S is an irredundant distinguishing set if removing any element of S yields a set that does not distinguish the rows of the registered vector table. Set S is a minimum distinguishing set if no distinguishing set exists with fewer variables. Example 1.5. In Table 1.9, the variable set S1 = {x1 , x2 , x3 } distinguishes the eight rows. The complement set S2 = {x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 } \ S1 = {x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 } also distinguishes all rows. However, removing any single variable from S2 yields a smaller set that distinguishes all rows; thus, S2 is not an irredundant distinguishing set. However, for any xi ∈ {x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 }, any set {x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 } \ xi is an irredundant distinguishing set. {x1 , x2 , x3 } is a minimum distinguishing set, since no set of two variables is sufficient to distinguish eight rows.

Number of Variables of Index Generation Functions

29

This introduces the problem addressed in this section: Problem 1.1. Over the set of index generation functions, find the distribution of the sizes of the minimum distinguishing sets. The goal is to understand the search space for minimum distinguishing sets over which algorithms and heuristics must operate. The significance of this problem’s solution is that a minimum or nearminimum distinguishing set can be stored in the Main Memory shown in Figure 1.11 and used to check the data associated with the variables in x1 . If x1 matches in a stored entry, the index is used to locate the rest of the registered vector in the AUX Memory, which can be used to check if x2 matches the entry. When a match is obtained for both x1 and x2 , then the index generation unit has identified a complete match, in which case, the circuit produces the value of the index. Minimizing the distinguishing set minimizes the storage requirement in the Main Memory.

1.2.4. Expected Number of Variables in the Minimal Distinguishing Set An index generation function is defined by an k × n binary array B, where k is the number of rows and n is the number of columns. We seek a set of m columns, where m ≤ n, such that all rows in B are distinct, and m is the minimum. This set is a minimum distinguishing set. From our previous discussion, we know that log2 k ≤ m ≤ k−1. However, this is a wide range, and it will provide insight into the synthesis process if we can derive the distribution. For example, do we expect most binary arrays to reside at the upper or lower end of this range? We answer this question experimentally. We devised two programs to enumerate k × k binary arrays and to compute the distribution of the cardinalities of the minimum distinguishing set: 1. A Verilog program running on the SRC-6 reconfigurable computer computed the distribution for k × k binary arrays for 2 ≤ k ≤ 7, and 2. A FORTRAN program that provided Monte-Carlo simulation data for 8 ≤ k ≤ 64.

30

Models, Methods, and Techniques

Table 1.10. Fraction of k×k binary matrices versus the number of columns needed to distinguish all rows for 2 ≤ k ≤ 7 #

k×k

columns needed to disting. all rows in a k × k binary array 1

2

3

4

5

6

2×2 1.000 0 0 0 0 0 3×3 0 1.000 0 0 0 0 4×4 0 0.534 0.4659 0 0 0 5×5 0 0 0.9132 0.0868 0 0 6×6 0 0 0.6955 0.2978 0.0067 0 7×7 0 0 0.3819 0.5941 0.0239 0.0002

7

8

9

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

#

10 11 samples 0 0 0 0 0 0

0 0 0 0 0 0

All All All All All All

In the case of the Verilog program, the minimum number of columns needed to distinguish all rows was achieved by exhaustive examination of all solutions. In the case of the FORTRAN program, a minimum solution was achieved by the use of an enumeration of a sum-of-products expression, which is equivalent to an exhaustive enumeration of all solutions. Table 1.10 shows the results  for 2 ≤ k ≤ 7. For example, for 4 × 4 binary arrays, among the 16 4 = 1820 arrays, in 972 (53.4%), all rows were distinguished by 2 columns, while 848 (46.6%) were distinguished by 3 columns (and not by 2 columns). In restricting the binary arrays  = 1820, we include only those arrays that have distinct rows, to 16 4 where any permutation of the rows is considered as one array. We were able to do an exhaustive enumeration for arrays of a size up to 7×7. The right-most column shows this with the entry “All”, meaning that one copy of each of the k × k binary arrays contributed 1 to the enumeration. However, for larger arrays, exhaustive enumeration was too time-consuming, so we generated random sample sets. Table 1.11 shows the results for 8 ≤ k ≤ 64. Here, the right-most column shows the number of samples in a Monte-Carlo enumeration. Specifically, k × k binary arrays were generated uniformly randomly using sample set sizes from 100 through 10,000, as specified in the right-most column. Table 1.11 shows that, although the range of possible values for the number of columns to distinguish all the rows can be large, most values are concentrated near the low end of the range. So, for example, for the largest binary array considered, 64×64, 100.0% of the (100) samples required a minimum of 8 columns to

Number of Variables of Index Generation Functions

31

Table 1.11. Fraction of k×k binary matrices versus the number of columns needed to distinguish all rows for 8 ≤ k ≤ 64 #

columns needed to disting. all rows in a k × k binary array

k×k 1 2 8×8 9×9 10×10 11×11 12×12 13×13 14×14 15×15 16×16 17×17 : 30×30 : 32×32 : 64×64

0 0 0 0 0 0 0 0 0 0

3

4

5

6

0 0.0927 0.8531 0.0535 0.0007 0 0 0.8995 0.0998 0.0007 0 0 0.8204 0.1787 0.0009 0 0 0.7081 0.2905 0.0014 0 0 0.5126 0.4853 0.0021 0 0 0.2781 0.7196 0.0023 0 0 0.9640 0.9010 0.0026 0 0 0.0180 0.9887 0.0033 0 0 0.0018 0.9925 0.0127 0 0 0 0.9999 0.0001

8

9

10 11 samples

-

0 -

0 0 -

0 0 0 -

0 0 0 0 -

10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000

-

-

-

-

-

1,000

-

-

-

-

1,000

1.000 -

-

-

100

0 0

0

0

-

1.0000

0 0

0

0

-

0.9999 0.0001

0 0

0

0

0

-

#

7

-

“-” indicates this entry should be a small non-zero value, but there were no samples because the sample set size is too small. A 0 entry is exactly 0.

distinguish all 64 rows. At the same time, the range of possible values extends from 6 to 63. This suggests that, if a set of 8 columns is found that distinguishes all 64 rows of a 64 × 64 binary array, then it is not likely that fewer columns will also distinguish all rows. We divided the table into two parts in part to illustrate that the binary arrays enumerated by the computer SRC-6 reconfigurable computer were across all possible k × k arrays with no special vector corresponding to a non-registered vector (2 ≤ k ≤ 7), while the binary arrays enumerated by a FORTRAN program were across randomly selected arrays which included a special vector corresponding to all non-registered vectors (8 ≤ k ≤ 64).

1.2.5. Distribution of the Expected Number of Distinguishing Columns The data from the previous subsection are useful in understanding the behavior of an algorithm to find the minimum number of columns that

32

Models, Methods, and Techniques

distinguishes all rows in a binary array. It shows that most minimum distinguishing sets are close to the lower bound. Further insight is obtained by a different enumeration that involves index generation functions where the distinguishing set is far above the lower bound. We consider k × n binary arrays that are built up in sequence. We begin by enumerating one column with k rows, then two columns with k rows, then three columns with k rows, etc. At each step, we compute how many of the possible arrays have all rows distinguished. For example, consider k = 4. In the case of binary arrays with one column, none have all rows distinguished; since registered vectors are binary, at most two rows can be distinguished by one column. In the case of binary arrays with two columns, the fraction of binary arrays in which both columns are distinguished is 4!/44 = 0.09375. In the case of binary arrays with three columns, the fraction of binary arrays in which all three columns are distinguished is 8 × 7 × 6 × 5/84 . However, this count includes 4 × 3 binary arrays in which the first two columns distinguish all rows. The fraction of this type of binary array is 4! × 24 /84 . Here, 4! counts the 4 × 2 binary arrays where all 2 element rows are distinct, and the factor 24 counts the number of ways to choose the third column. And, 84 counts the total number of 4 × 3 binary arrays. Therefore, the fraction of 4 × 3 binary arrays in which three columns distinguish all four rows, but not the left two columns is (8 × 7 × 6 × 5 − 4! × 24 )/84 = 0.31641. This can be extended to the general case, as follows: Theorem 1.2. The fraction of k × n binary arrays T (n, k) in which the right-most column is the first column such that all rows are distinguished is T (n, k) =

2n × (2n−1) × . . . × (2n − (k − 1)) − 2nk 2n−1 × (2n−1 − 1) × . . . × (2n−1 − (k − 1)) . 2(n−1)k

(1.11)

Table 1.12 shows values of T (n, k) for k ∈ {4, 8, 16, 32}. The data in this table was generated by a MATLAB program. A C program that generated random binary arrays using sample sets with 10,000,000 random index generation functions distributed uniformly experimentally confirmed these results.

Number of Variables of Index Generation Functions

33

Table 1.12. Fraction of binary matrices T (n, k) with k ∈ {4, 8, 16, 32} rows and n columns such that all rows are distinguished analytical results - unrestricted columns columns (n)

k=4

k=8

k = 16

k = 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

0 0.09375 0.31641 0.25635 0.15656 0.08585 0.04488 0.02294 0.01159 0.00583 0.00292 0.00146 0.00073 0.00037 0.00018 0.00009 0.00005 0.00002 0.00001 0.00001 -

0 0 0.00240 0.11842 0.26489 0.24831 0.16597 0.09543 0.05110 0.02644 0.01344 0.00678 0.00340 0.00171 0.00085 0.00043 0.00021 0.00011 0.00005 0.00003 0.00001 0.00001 -

0 0 0 0.01040 0.11861 0.24712 0.24357 0.16946 0.09972 0.05406 0.02814 0.01436 0.00725 0.00364 0.00183 0.00091 0.00046 0.00023 0.00011 0.00006 0.00003 0.00001 0.00001 -

0 0 0 0 0.00008 0.01435 0.11793 0.23950 0.24113 0.17095 0.10175 0.05550 0.02898 0.01481 0.00749 0.00376 0.00189 0.00094 0.00047 0.00024 0.00012 0.00006 0.00003 0.00001 0.00001 -

LB avg.

2.00000 4.19048

3.00000 6.26937

4.00000 8.30181

5.00000 10.31749

“-” indicates this entry should be a small non-zero value, but the precision is insufficient to show a non-zero value. A 0 entry is exactly 0.

Tracing down the columns corresponding to the four values of k, k = 4, k = 8, k = 16, and k = 32, shows how the fraction of rows that are newly distinguished increases to 0.31641, then decreases to 0.25635, to 0.15656, etc. After peaking, the values taper down at a rate of approximately 50%. That is, as the number of columns increases by

34

Models, Methods, and Techniques

1, the fraction of binary arrays with all rows distinguished decreases by approximately one-half. This is likely due to the fact that, when the number of columns is large, most rows are distinguished, except for two identical rows, one such pair being more likely than two, three, etc. pairs. The addition of one column to the binary array yields four possibilities for the two non-distinguished rows: 0 0 1 1 , , , and . 0 1 0 1 • Half of these possibilities,

0 1 and distinguish the pair of rows, 1 0

and • half,

0 1 and , do not. 0 1

Thus, as a column is added, the probability that the two rows are distinguished decreases approximately by half. The lines across single columns in Table 1.12 correspond to the k − 1 upper bounds on the number of columns needed to distinguish all rows of a k × n binary array. That is, from a previous discussion, no more than n = k − 1 variables are needed to distinguish all k rows of an n × k array representation of an index generation function. It follows that the k × n binary arrays represented by entries in Table 1.12 below these lines all have redundant columns. The taper in the column associated with 4 rows also occurs in the columns associated with 8, 16, and 32 rows. Indeed, except for a shift, the distributions are similar. In all distributions, the column total is 1.0, as expected. We observe that, for all rows, the average number of columns that occurs when all rows are distinguished, is slightly more than twice the minimum number of columns to distinguish all rows. This is shown at the bottom of Table 1.12. Of note is the fact that non-minimal solutions are included. For example, repeated columns are included in the count, even though they add nothing to the ability of a set of columns to distinguish the rows. Figure 1.12 shows a graph representation of the data in Table 1.12. Here, one can clearly see how increasing the number of vectors in-

Number of Variables of Index Generation Functions

35

fraction of functions

0.4

0.3

0.2

0.1

2

3

4

5

6 7 num 8 9 10 ber of v 11 12 aria 13 14 ble 15 s 16

32 17

18

19

4

16

nu m ve be ct r or of s

1

8

Figure 1.12. Fraction of functions realized using 4, 8, 16, and 32 vectors.

creases the number of variables needed to distinguish all vectors. Table 1.13 shows more detail on the average number of columns needed to distinguish rows in random binary matrices. As in Table 1.12 and Figure 1.12 this is the result of an experiment in which columns are added until all rows are distinguished. For binary matrices with 4, 8, 16, 32, . . . , 32,768 rows, Table 1.13 shows the average number of columns needed to distinguish all rows. Also shown are the lower bounds on the number of columns needed for specific binary matrices. It is interesting that the average is only slightly greater than 2log2 k, twice the lower bound. Note that the numbers of columns shown in Table 1.13 are greater than that of the minimum distinguishing sets, since the columns obtained in this section often include redundant columns. To show how the values in Table 1.13 are computed, consider the case of four rows (k = 4) shown in Table 1.13. The average value can be

36

Models, Methods, and Techniques

Table 1.13. Average number of columns to distinguish rows in random binary matrices number of rows k

lower bound on columns needed to distinguish

4 8 16 32 64 128 256 512 1, 024 2, 048 4, 096 8, 192 16, 384 32, 768

2 3 4 5 6 7 8 9 10 11 12 13 14 15

average number of columns to distinguish 4.19048 6.26937 8.30181 10.31749 12.32515 14.32885 16.33087 18.33181 20.33228 22.33251 24.33263 26.33268 28.33272 30.33273

The lower bound is log2 k.

seen numerically to be

avg(4) = 2 × 0.09375 + 3 × 0.31641 + 4 × 0.25635 + . . . ,

(1.12)

where the real values in (1.12) are taken from Table 1.12. Substituting expressions for these values yields ∞ i  (2 )(2i − 1)(2i − 2)(2i − 3) i − avg(4) = 2i 2i 2i 2i i=0

(2i−1 )(2i−1 − 1)(2i−1 − 2)(2i−1 − 3) . (1.13) 2i−1 2i−1 2i−1 2i−1 From (1.13), we can write avg(4) =

∞  i=0

i (ai − ai−1 )

(1.14)

Number of Variables of Index Generation Functions

where ai =

37

(2i )(2i − 1)(2i − 2)(2i − 3) , 2i 2i 2i 2i

(1.15)

such that 4·3·2·1 , 4·4·4·4

a2 =

a1 = 0 ,

and

a0 = 0 .

(1.16)

We can rewrite (1.13) as 4i 3i

2 +2 (−6)+22i (11)+2i (−6)+20 (0) i(ai − ai−1 ) =i − 24i

4i−4 3i−3 2 +2 (−6)+22i−2 (11)+2i−1 (−6)+20 (0) , i 24i−4 or

i(ai − ai−1 ) = 6i2−i − 33i2−2i + 42i2−3i .

(1.17)

Summing both sides of (1.17) from i = 0 to ∞, yields ∞ 

i(ai − ai−1 ) = 6

i=0

∞ 

i2−i − 33

i=0

∞ 

i2−2i + 42

i=0

∞ 

i2−3i .

i=0

Therefore, avg(4) = 6 The series

∞ i ∞ i ∞ i    1 1 1 i − 33 i + 42 i . 2 4 8 i=0 i=0 i=0

∞ i=0

(1.18)

ixi converges absolutely to x if |x| < 1 , (1 − x)2

and we can write avg(4) = 6

1 2

(1 −

1 2 2)

− 33

1 4

(1 −

1 2 4)

+ 42

1 8

88 = = 4.19048 . (1.19) (1 − 18 )2 21

The remaining values for the average number of columns shown in Table 1.13 were calculated in the same way.

38

Models, Methods, and Techniques

1.2.6. Expected Number of Balanced Columns in Random Binary Matrices In determining the appropriate columns to needed distinguish all rows, we find that columns with balanced weight tend to be used, since they are often necessary in the minimum distinguishing set. In this section, we consider k × n binary arrays that are built up in sequence, just as in the previous section. However, here we use only balanced columns. First, we calculate B(k), the average number of balanced columns in k × k binary arrays over all possible binary arrays. Theorem 1.3. Let B(k) be the average number of balanced columns across all k × k binary arrays. Then, the fraction of columns that are balanced is, on average 2 B(k) ∼ (1.20) k kπ where a(k) ∼ b(k) means

a(k) → 1 as k → ∞. b(k)

Proof. Expressed in a generating function, we seek bi in B(x) = b0 + b1 x + b2 x2 + . . . + bk xk where bi is the number of k × k binary arrays with i balanced weight columns. In closed form, this is (1.21)



k k k + x . (1.21) B(x) = 2 − k/2 k/2   k  Here, the left term, 2k − k/2 , expresses the number of ways not to choose a single balanced column, while the right term expresses the number of ways to choose a single balanced column. Differentiating B(x) with respect to x and setting x to 1 yields a weighted sum 2 0b0 + 1b1 + 2b2 + . . . + kbk , which, when divided by 2k , the number of k × k binary arrays, yields B(k), the average number of balanced columns in k × k binary arrays. Specifically:  k  k k/2 B(k) = . 2k



k

Number of Variables of Index Generation Functions

39

From Stirling’s approximation,

2n 4n ∼√ n πn

and we have B(k) ∼

2k . π

The hypothesis follows directly. We can immediately observe the following: Corollary 1.1.

B(k) → 0 as k → ∞ k

where B(k) is the average number of balanced columns in k × k binary arrays. Therefore, for large k, the fraction of balanced columns in k × k binary arrays is close to 0. While this may be perceived as a negative result, we gain insight by comparing the average number of balanced columns B(k) with the minimum number of columns in a distinguishing set D(k). We know that, when k is a power of 2, all columns in a minimal distinguishing set are balanced. From the discussion above, the average number of balanced columns in a k × k array is 2k , B(k) = π 2 × 2m 2 m/2 2√ m = 2 = 2 , π π π when k = 2m . The number of columns in a minimal distinguishing set is D(k) = log2 2m = m, when k is a power of 2. Therefore, we can state the following: which is



Corollary 1.2. B(k) → ∞ as k → ∞ D(k)

(1.22)

40

Models, Methods, and Techniques

where B(k) is the average number of balanced columns in k × k binary arrays and D(k) is the number of columns in a minimal distinguishing set for k × k binary arrays. Hence, although the fraction of columns that are balanced is a vanishingly small fraction of all columns, it is arbitrarily larger than the minimum number of (balanced) columns needed in a minimum distinguishing set. This is a positive result, but requires the qualification that the right balanced columns are needed in a minimum distinguishing set. We now consider to what extent the right balanced columns are found in a collection of columns. Specifically, we repeat the results of Subsection 1.2.5 in which we build up binary arrays by adding columns until all rows are distinguished. However, in this experiment, we use balanced columns only. The result is shown in Table 1.14. Comparing this data with that of Table 1.13 (corresponding to using all columns) shows that the use of balanced columns significantly reduces the number of columns needed to distinguish all rows. For example, the average number of columns is 4.19048 and 2.49964 in the case of four rows for unrestricted and balanced columns, respectively. This suggests that heuristics used to find columns needed in the minimum distinguishing set may benefit when balanced columns are used. However, the data suggests that the benefit of balanced columns diminishes when the number of rows is increased. For example, in the case of 32 rows the average number of columns needed to achieve all distinct rows is 10.31749 and 9.88313 for unrestricted and balanced columns, respectively. Here, there is a relatively smaller difference in the number of columns needed. The fact that balanced columns lead to a reduced average in binary arrays is consistent with the result in [266], which shows that the best choice for the functions in the linear circuit in Figure 1.11 are those with the lowest imbalance measure, which occurs when the number of 1’s and 0’s in a column is the same. In the case of unrestricted columns, the uniform distribution of columns assures us that, whilst columns may not be exactly balanced, they will be close due to the fact that the binomial distribution tends to produce near balance in the columns.

Number of Variables of Index Generation Functions

41

Table 1.14. Fraction of binary matrices with 4, 8, 16, and 32 rows versus the number of columns such that all rows are distinguished Monte-Carlo 10,000,000 samples balanced columns experimental results - balanced columns columns(n)

k=4

k=8

k = 16

k = 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

0 0.66688 0.22209 0.07405 0.02464 0.00822 0.04488 0.02291 0.01166 0.00582 0.00290 0.00146 0.00071 0.00035 0.00019 0.00010 0.00005 0.00002 0.00001 0.00001 -

0 0 0.11750 0.25527 0.29038 0.17731 0.16591 0.09551 0.05113 0.02643 0.01344 0.00679 0.00340 0.00170 0.00086 0.00042 0.00022 0.00011 0.00006 0.00003 0.00001 0.00001 -

0 0 0 0.00076 0.04711 0.22100 0.24708 0.24366 0.16956 0.09960 0.05399 0.02806 0.01433 0.00726 0.00363 0.00183 0.00091 0.00046 0.00023 0.00011 0.00006 0.00003 0.00002 0.00001 -

0 0 0 0 0.00051 0.01442 0.11795 0.23941 0.24113 0.17085 0.10187 0.05559 0.02895 0.01477 0.00746 0.00376 0.00189 0.00094 0.00047 0.00023 0.00012 0.00006 0.00003 0.00001 0.00001 -

Min Avg

2.00000 2.49964

3.00000 5.13276

4.00000 7.59693

5.00000 9.88313

“-” indicates this entry should be a small non-zero value, but the precision is insufficient to show a non-zero value. A 0 entry is exactly 0.

42

Models, Methods, and Techniques

1.2.7. Found Results We have shown that the search space for near-minimum distinguishing column sets for binary matrices holds several interesting insights. First, in the case of k × k binary matrices, most random matrices have a minimum distinguishing set that is close to the lower bound, log2 k and far from the upper bound, k −1. This observation holds especially when k is large. We investigated binary matrices constructed so that we could gain insights when the minimum distinguishing set is large. Specifically, starting with a single column of k rows, we added columns until all k rows were distinguished. The resulting histogram shows a decrease by approximately half in the number of binary arrays with all k rows distinguished with each additional column. Finally, we showed that balanced columns benefit the search for minimum distinguishing sets, especially when k is small.

Computational Complexity of Error Metrics

43

1.3. Computational Complexity of Error Metrics in Approximate Computing Oliver Keszocze

Mathias Soeken

Rolf Drechsler

1.3.1. Approximate Computing Approximate computing refers to techniques that relax the requirement of exact equivalence between the specification and the implementation of circuits [377]. There exist several applications, such as media processing (audio, video, graphics, and image), recognition, and data mining that tolerate acceptable, though not always correct, results [143]. Due to inherent error resilience, a precise functional behavior is not required. Several factors, such as the limited perceptual capability of humans, allow imprecision in the numerical exactness of the computation in these applications. This freedom can be exploited in the implementation of the applications by achieving significant improvements in terms of performance and energy efficiency in comparison to the exact implementation [54, 71, 296]. Several approaches for mathematical analysis related to imperfect computation have been proposed which are heavily used in the implementation of approximate computing [54]. Research in approximate computing spans the whole range of research activities ranging from programming languages [105] to transistors [133]. There exist two main strategies for introducing imperfection to the circuit with the aim of improving its performance [377]: 1. Timing-induced errors, e.g., by voltage over-clocking or overscaling, and 2. Functional approximation, e.g., by implementing a slightly different function. Our research targets the latter strategy.

44

Models, Methods, and Techniques

Given a specification f : Bn → Bm that describes the correct functionality, an approximated function fˆ : Bn → Bm is sought that minimizes a given circuit cost metric but respects a quality threshold. The quality of an approximation fˆ is measured using error metrics that compare fˆ to the original function f . Several error metrics exist and the choice of which is used to evaluate the quality depends highly on the application. Several error metrics occur frequently in the literature on approximate computing. The error-rate gives the percentage of input patterns for which the functional value changes. The worst bit-flip error expresses how many of the output bits can be wrong at the same time. If we interpret the output bits as encoding for an integer number, other interesting error metrics are possible, e.g., the worst case difference over all input combinations or the average case difference. Computing the error metric is essential in many synthesis algorithms for approximate computing [305, 377], however, this computation requires the largest computing resources. In this section, we derive the computational complexity of these error metrics. We show that error rate, average case error, and average bit-flip error are #P-complete and that worst bit-flip error and worst case error are FPNP -complete.

1.3.2. Preliminaries To formally define the error metrics, we need to fix the notations and introduce some basic definitions. In this section, we further introduce the problem of Boolean satisfiability (and some of its siblings) which will serve as the canonical problems for the complexity of exactly computing the approximation error of fˆ.

Notations and Basic Definitions. Notation 1.1. For a given function f , the approximated function is denoted by fˆ. Definition 1.4. The function intn : Bn → N returns the integer

Computational Complexity of Error Metrics

45

representation of a bit vector of length n. If n’s value is clear from the context, we omit it from the function name and simply write int. A typical integer encoding is, e.g., the two’s complement encoding. Definition 1.5. We define the sideways sum to be the function ν : Bn −→ N,

x −→

n 

[xi ] ,

i=1

i.e., the function counting the 1’s in a bit vector. Notation 1.2. The n bit zero-function is written as 0n : Bn −→ B,

x −→ 0 .

If the context is clear, we drop the n from the index. Note that for all inputs x we have that int(0n (x)) = 0. Notation 1.3 (Short-hands). To make the following discussions concise, we use some short-hand notations. Let a, b ∈ Bn , x ∈ B and let ◦ be any binary relation on the Boolean values. We use the following short-hands: a ◦ b := (a1 ◦ b1 , . . . , an ◦ bn ) x ◦ a := (x ◦ a1 , . . . , x ◦ an ) ◦ a := a1 ◦ a2 ◦ · · · ◦ an

Boolean Satisfiability. This paragraph introduces several algorithms and related complexity classes in the context of Boolean satisfiability solving. We review the problem formulations and their complexities. These problems will serve as the problem instances that will be shown to be equivalent to exactly computing error metrics. Definition 1.6 (SAT). Given a Boolean function f : Bn → B over variables x = (x1 , . . . , xn ), the SAT problem asks whether there exists an assignment a = (a1 , . . . , an ) to the variables x such that f (a) = 1. The assignment a is called a satisfying assignment

46

Models, Methods, and Techniques

and f is called satisfiable. If no such assignment exists, f is called unsatisfiable. We refer to an instance of the problem as SAT(f ). Theorem 1.4 ([79, 190]). The SAT problem is complete for the complexity class NP. Definition 1.7 (#SAT). Given a Boolean function f : Bn → B over variables x = (x1 , . . . , xn ), the #SAT problem asks how many assignments a = (a1 , . . . , an ) exist such that f (a) = 1. We refer to an instance of the problem as #SAT(f ). Theorem 1.5 ([371]). The #SAT problem is complete for the complexity class #P. Definition 1.8 (LEXSAT). Given a Boolean function f : Bn → B over variables x = (x1 , . . . , xn ), the LEXSAT problem finds an assignment a = (a1 , . . . , an ) to the variables x such that f (a) = 1 and such that there exists no other assignment b = (b1 , . . . , bn ) with b > a and f (b) = 1. We say that b > a (in words, b is lexicographically greater than a), if there exists a k ≤ n such that bk > ak and bi = ai for all i < k. If no satisfying assignment exists, LEXSAT finds that f is unsatisfiable. We refer to an instance of the problem as LEXSAT(f ). Theorem 1.6 ([181]). The LEXSAT problem is complete for the complexity class FPNP . Several algorithms for solving LEXSAT have recently been proposed, e.g., [176, 235], and applications in logic synthesis have been demonstrated [307].

1.3.3. Error Metrics In this subsection we review five error metrics that can be applied for different uses. Two metrics interpret the circuit’s outputs as integers while the other metrics work on the uninterpreted bit-vectors. In the following we consider functions f : Bn → Bm that are given in a way that easily allows us to create a circuit realizing f . By easily we mean in polynomial time. We explicitly forbid the representation of f as a

Computational Complexity of Error Metrics

47

truth table. While this representation would trivialize computing the error metrics by making the computations linear with respect to the rows of the truth table, it would use exponential space. As an abuse of notation, we will denote both the function as well as the circuit by f. Definition 1.9 (Error rate). The error rate measures how often the approximated circuit fˆ produces results that differ from the exact result of the error-free circuit f . It is defined as  [f (x) = fˆ(x)] . (1.23) er(f, fˆ) = x∈Bn

Corollary 1.3. For a function f , er(f, 0) computes the amount of non-zero outputs, i.e.,      er(f, 0(x)) = [f (x) = 0] = [int(f (x)) = 0] . f (x) = x∈Bn

x∈Bn

x∈Bn

For single-output functions this further reduces to  er(f, 0) = [f (x)] = #{x | f (x) = 1, x ∈ Bn } . x∈Bn

We introduce the function satisfiability count, defined as sc(f ) := er(f, 0) , to refer to this. Definition 1.10 (Average case error). The average case error interprets the outputs of the circuit and its approximation as integers which are then compared. It is defined as  1    (1.24) ac(f, fˆ) = n int(f (x)) − int(fˆ(x)) . 2 n x∈B

Corollary 1.4. For a function f , ac(f, 0) computes the average output value of f , i.e., 1  1  ac(f, 0) = n |int(f (x)) − int(0(x))| = n |int(f (x))| . 2 2 n n x∈B

x∈B

48

Models, Methods, and Techniques

For single-output functions this further reduces to ac(f, 0) =

1 1 er(f, 0) = n sc(f ) . 2n 2

Definition 1.11 (Worst case error). The worst case error interprets the outputs of f and fˆ as integers and determines the greatest difference. It is defined as      (1.25) wc(f, fˆ) = max int(f (x)) − int(fˆ(x)) | x ∈ Bn . Note that in a similar fashion, the best case error can be defined. Definition 1.12 (Average bit-flip error). The average bit-flip error is defined as abf(f, fˆ) =

 1   ν f (x) ⊕ fˆ(x) . n 2 n

(1.26)

x∈B

Note that ν(a ⊕ b) computes the Hamming distance. Corollary 1.5. For single-output functions f and fˆ, we have that abf(f, fˆ) = ac(f, fˆ) . Definition 1.13 (Worst bit-flip error). The worst bit-flip error is defined as     wbf(f, fˆ) = max ν f (x) ⊕ fˆ(x) | x ∈ Bn . (1.27) Note that in a similar fashion, the best bit-flip error can be defined. a0

s0

b0 a1

s1

b1

s2

Figure 1.13. Approximating adder circuit derived from a ripple-carry adder.

Computational Complexity of Error Metrics

49

Table 1.15. Truth table and bitflips for the exact adder + and the apˆ from Figure 1.13 proximated adder +

b

a

0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3

0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

a+b 000 001 010 011 001 010 011 100 010 011 100 101 011 100 101 110

/ / / / / / / / / / / / / / / /

0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6

ˆ a+b 000 001 010 011 000 011 010 011 010 011 100 101 010 101 100 101

/ / / / / / / / / / / / / / / /

bitflips 0 1 2 3 0 3 2 3 2 3 4 5 2 5 4 5

000 000 000 000 001 001 001 111 000 000 000 000 001 001 001 011

/ / / / / / / / / / / / / / / /

0 0 0 0 1 1 1 3 0 0 0 0 1 1 1 2

Example 1.6. Consider the circuit approximating an adder depicted in Figure 1.13. The circuit is derived from a ripple-carry adder, using only four gates instead of seven. The truth table is shown in Table 1.15. There, the results are put next to the exact results. The table also shows the bitflips when compared to the correct results. The corresponding error metric values are ˆ =8, er(+, +) and

1 , 2 ˆ =3. wbf(+, +) ˆ = ac(+, +)

ˆ =1, wc(+, +)

ˆ = abf(+, +)

11 , 16

1.3.4. Complexity of Computing Error Metrics In this subsection, we derive the complexity for the error metrics defined in the previous subsection. All proofs are done in the same

50

Models, Methods, and Techniques

manner. We find a suitable problem that is complete in a complexity class to which we show equivalence by reducing one problem to the other, and vice versa, in polynomial using a polynomial transformation. Theorem 1.7. Computing er(f, fˆ) is #P-complete. Proof. We will show the #P-completeness by proving that computing er(f, fˆ) is equivalent to #SAT. We start by reducing the computation of the error rate to #SAT. We  define the function g(x) := (f (x) ⊕ fˆ(x)). It is obvious that g(x) = 1 ⇔ f (x) = fˆ(x) . Then er(f, fˆ) = #SAT(g). To reduce from #SAT to the error rate, we use the results from Corollary 1.3. Given a single-output function g we note that #SAT(g) = sc(g) . This shows that er ⇔ #SAT and, therefore, concludes the proof that the error rate is #P-complete. Theorem 1.8. Computing wc(f, fˆ) is FPNP -complete. Proof. We will show the FPNP -completeness by proving that computing wc(f, fˆ) is equivalent to LEXSAT. Assume that f and fˆ are represented as circuits. We create a circuit C that computes | int(f ) − int(fˆ)|. This can be done in polynomial time and space, since subtraction and computing the absolute value are in NC1 [378]. The resulting circuit is translated into a Conjunctive Normal Form (CNF) D. Its size is linear with respect to the circuit size when using Tseytin’s encoding [367]. The CNF represents the characteristic function χC of the circuit C. The variables of D are the original inputs xi as well as variables for the outputs, which are denoted by dj for 0 ≤ j ≤ n − 1. We fix the variable ordering to start

Computational Complexity of Error Metrics

51

with the variables dj which, in turn, are ordered as dn−1 , . . . , d0 . This means that the output variables are the most significant bits. This variable ordering ensures that the solution to LEXSAT(D) is ordered by the output variables of C. Having this, computing LEXSAT(D) provides the solution to wc(f, fˆ) when reading the values from the variables dn−1 , . . . , d0 from a satisfiable LEXSAT instance. If no solution for the LEXSAT instance was found, f and fˆ are identical and wc(f, fˆ) = 0. To reduce LEXSAT for a given single-output function g to wc, we construct a new function f : Bn −→ Bn+1 x −→ (g(x), x1 , x2 , . . . , xn ) . Now compute s = (s1 , s2 , . . . , sn+1 ) = wc(f, 0). If g is unsatisfiable, we have that s1 is 0 and int(s) = 2n+1 − 1. If, on the other hand, g is satisfiable, the bits s2 to sn+1 of s represent the maximal fulfilling assignment of g. This shows that wc ⇔ LEXSAT and, therefore, concludes the proof that the worst case error is FPNP -complete. Theorem 1.9. Computing ac(f, fˆ) is #P-complete.

Proof. We will show the #P-completeness by proving that computing ac(f, fˆ) is equivalent to #SAT. To reduce from ac to #SAT, we create the same CNF D as was used in Theorem 1.8. Then, ac(f, fˆ) can be computed as 1  int(α) · #SAT(D ∧ d = α) . 2n m α∈B

In other words, we iterate over all 2m possible differences and count how many input assignments lead to this value. This reduction contains a mapping to a constant number (2m ) of #SAT instances. The overall complexity stays in the #P complexity class.

52

Models, Methods, and Techniques

To reduce from #SAT(f ) to ac we use the results from Theorem 1.7 and Corollary 1.4 and note that #SAT(f ) = sc(f ) = 2n · ac(f, 0). This holds as int(f (x)) can only take the values 0 and 1. This shows that ac ⇔ #SAT and, therefore, concludes the proof that the average case error is #P-complete. Theorem 1.10. Computing abf(f, fˆ) is #P-complete. Proof. We will show the #P-completeness by proving that computing abf(f, fˆ) is equivalent to #SAT. To reduce from abf to #SAT, we create a CNF D that is similar to the one that was used in Theorem 1.8. But instead of using a circuit computing | int(f )−int(fˆ)|, a circuit computing ν(f ⊕ fˆ) is used (both the XOR circuit and the adder are in NC1 ). This allows us to compute abf(f, fˆ) as m 1  k · #SAT(D ∧ d = k) 2n k=0

where d = k means di = ki for all i. The variable k iterates over all m + 1 possible outputs of ν(f (x), fˆ(x)). Similar to Theorem 1.9, the number of satisfying assignments for each possible output are accumulated. To reduce #SAT(g) to abf we use the derivation #SAT(g)

Theorem 1.7

=

sc(g)

Corollary 1.4

=

2n · ac (f, 0)

Corollary 1.5

=

2n · abf(g, 0) .

This shows that abf ⇔ #SAT and, therefore, concludes the proof that the average bit-flip error is #P-complete. Theorem 1.11. Computing wbf(f, fˆ) is FPNP -complete Proof. We will show the FPNP -completeness by proving that computing wbf(f, fˆ) is equivalent to LEXSAT.

Computational Complexity of Error Metrics

53

We re-use the circuit from Theorem 1.10 to create the CNF D. Then, the idea of Theorem 1.8 is employed. This means that solving the problem LEXSAT(D) and analyzing the variables dn−1 , . . . , d0 solves wbf(f, fˆ). The other direction is a little trickier. We want to solve LEXSAT(f ) for a given Boolean function f : Bn → B using wbf. For this purpose we introduce polarity variables p1 , . . . , pn that are computed using the recurrence ⎞ ⎛  p pi = wbf ⎝f (x) ∧ xi ∧ xi i ⎠ . 1≤j 2 [273]. For example, for a bent function of four variables f (x1 , x2 , x3 , x4 ),

128

Accelerated Computations

Sf,RM (0)

f (0)

Sf,RM (1)

f (1)

Sf,RM (2)

f (2)

Sf,RM (3)

f (3)

Sf,RM (4)

f (4)

Sf,RM (5)

f (5)

Sf,RM (6)

f (6)

Sf,RM (7)

f (7)

Figure 2.9. Example of a data-flow graph of the fast Reed-Muller transform algorithm of the Cooley-Tukey class.

given by F = [1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1]T , the PPRM spectrum is Sf,RM = [1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]T , from where the PPRM form is f (x1 , x2 , x3 , x4 ) = 1 ⊕ x1 x2 ⊕ x3 x4 , and the algebraic degree is 2, which is less than or equal to 4/2.

2.2.3. Random Generation of Bent Functions in the Reed-Muller Domain Since the algebraic degree of bent functions of n variables is at most n/2 for n > 2, the number of non-zero PPRM coefficients is limited and their positions in the PPRM spectrum are restricted. The number of 1s in the binary representation of the non-zero PPRM coefficient indices in the PPRM spectrum vector is less or equal to n/2. For example, for a bent function of four variables f (x1 , x2 , x3 , x4 ), given by F = [1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1]T , the PPRM spectrum is Sf,RM = [1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]T , the binary rep-

Efficient Generation of Bent Functions Using a GPU

129

Table 2.6. Limitation of the number of non-zero PPRM coefficients of bent functions

PPRM coefficients variables

limitation

8 10 12 14

163 638 2,510 9,908

total 256 1,024 4,096 16,384

resentations of the non-zero spectrum indices are 0000, 0011, 1100, where the number of values 1 is less than or equal to 4/2. Therefore, the maximal number of non-zero PPRM coefficients of bent functions is: n/2

 n . (2.14) i i=0 For example, Table 2.6 gives the limitation of the number of non-zero PPRM coefficients in relation to the total number of coefficients of bent functions for the number of variables ranging from 8 to 14. Positions of the non-zero PPRM coefficients in the spectrum Sf,RM of a bent function are also related to the order of coefficients. For example, the bent functions of four variables can have non-zero PPRM coefficients of order zero, one, two and the number of values 1 in the binary representation of their vector index is less than or equal to two. Therefore, the PPRM spectrum of each bent function of four variables can be represented by Sf,RM = [−, −, −, −, −, −, −, ×, −, −, −, ×, −, ×, ×, ×]T where the possible positions of the non-zero coefficients are denoted by dashes and the restricted positions are denoted with ×. The algorithm for the generation of bent functions in the Reed-Muller domain takes as its input the number of function variables and the minimum and maximum number of non-zero Reed-Muller coefficients of any order that is allowed [131]. These restrictions makes generation feasible, since they certainly reduce the possible search space for

130

Accelerated Computations

Algorithm 1. Random generation of bent functions 1: Set the number of function variables n and the minimum and max-

2: 3:

4: 5: 6: 7: 8: 9:

10:

imum number of non-zero PPRM coefficients (the maximum number of coefficients should be less than maximal number of non-zero PPRM coefficients); Random generation of the number of non-zero coefficients in the PPRM spectrum (between minimum and maximum); Random generation of the positions of non-zero coefficients in the PPRM spectrum (the number of values 1 in the binary representation of the spectrum vector index of the coefficient is less or equal to n/2); Computation of the truth vector of a Boolean function is done by using the fast Reed-Muller transform algorithm of the Cooley-Tukey class; (1, −1) encoding of a Boolean function; Bent testing of the first Walsh coefficient: continue if the first Walsh coefficient has the absolute value 2n/2 , otherwise go to step 2; Bent testing of the second Walsh coefficient, otherwise go to step 2; Bent testing of the middle Walsh coefficient, otherwise go to step 2; Bent testing of all Walsh coefficients. Computation of the Walsh spectrum is done by using the fast Walsh transform algorithm of the CooleyTukey class, otherwise go to step 2; Obtain random bent Boolean function.

random generation in the Reed-Muller domain. An outline of the algorithm for the generation of bent functions in the Reed-Muller domain is given as Algorithm 1. It is possible to fix the number of non-zero PPRM coefficients for some orders and have the varied number of coefficients for others. Due to the above limitation, this algorithm does not generate all possible bent functions with equal probability [131].

2.2.4. Implementation of Random Generation of Bent Functions on a GPU Platform GPUs are an attractive target for computations because of their high performance and low cost. Recent generations of GPUs have become programmable, enabling their use for general purpose computations [388]. Figure 2.10 presents the high-level hardware architecture of a modern GPU, including processors and memory hierarchy. A modern GPU consists of a large number of scalar processor cores (denoted by letter ‘C’ in Figure 2.10) that can execute the same program in

Efficient Generation of Bent Functions Using a GPU

131

parallel, using many threads. Core processors are grouped together into multiprocessors. Each multiprocessor has several thread contexts, and at any given moment, a group of threads, called a block, executes on the multiprocessor. When a group of threads becomes idle, memory latencies and pipeline stalls are avoided primarily by switching to another group of threads. Each multiprocessor also has a small amount of shared memory that can be used for communication between threads. The GPU memory hierarchy is designed for high bandwidth to the global GPU memory that is accessible to all multiprocessors [388]. At a high-level, computations on the GPU proceed as follows: The user allocates memory on the GPU, copies the data to it, specifies a program that executes on the multiprocessors and after execution, copies the data back to the host. The thread execution manager assigns threads to operate on the blocks and write the output to the global memory [388]. CUDA is a parallel computing framework and programming model created by NVIDIA and implemented on the GPUs that they produce. CUDA gives program developers direct access to the virtual instruction set and memory of the parallel computational elements. CUDA C/C++ is based on the ISO C99 standard for the C programming language, with certain restrictions (e.g., recursion is not allowed), and special extensions for writing massively parallel programs. Each CUDA program consists of two main parts [388]: • a host program; and • a device program. The host program, which is executed on the CPU, implements sequential tasks, e.g., creating context for the execution of kernels and controlling kernel processing. The host code is typically written in C/C++. The device program, which is typically processed on the GPU, implements parallel parts of an algorithm as data-parallel functions called kernels. The algorithm for random generation of bent functions in the ReedMuller domain has a large degree of parallelism. Steps 2 to 9 of

132

Accelerated Computations

Multiprocessor

Multiprocessor

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

Shared Memory

Shared Memory

Multiprocessor

Multiprocessor

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

Shared Memory

Shared Memory

GPU Global Memory Figure 2.10. High-level architecture diagram of the GPU: processors organized into multiprocessors and memory hierarchy.

Algorithm 1 are computationally independent and therefore can be executed simultaneously. Its implementation on the GPU platform is convenient. In this mapping, each thread performs an independent trial of random generation of a bent function in the Reed-Muller domain. It should be noticed that random generations of bent functions can be very time consuming, since the computations of the fast Reed-Muller transform (step 4 of Algorithm 1) are exponential in the number of variables of the function. Figure 2.11 shows a listing of the host program that launches the proposed algorithm on the GPU. The host GPU program has sequential tasks, while the parallel GPU

Efficient Generation of Bent Functions Using a GPU

1 2 3 4 5 6 7 8

133

hfound = 0; cudaMalloc (& dfound , sizeof ( int ) ) ; while (! hfound ) { randombent > ( dfound ,n , size , d1 , d2 ) ; cudaStatus = cudaDeviceSynchronize () ; cudaMemcpy (& hfound , dfound , sizeof ( int ) , cudaMemcpyDeviceToHost ); } cudaFree ( dfound ) ;

Figure 2.11. Host program to launch the GPU program: randombent.

computations are implemented using the randombent function call. The repetition of trials of the random generation of bent functions is implemented using a while loop as described in Figure 2.11. The variables hfound (host GPU variable) and dfound (device GPU variable) indicate that a bent function was discovered. Note that gridDim and blockDim are special objects provided by CUDA for defining the geometry of the GPU threads. The variable n stores the number of variables of the function, size is the length of the PPRM spectrum, and d1,d2 are the minimum and maximum number of non-zero PPRM coefficients. The efficient implementation of the random generation of bent functions on the GPU platform for functions with greater than 10 variables differs substantially from the implementation for functions with less or equal to 10 variables. In considered mappings of the sequential program, the random generated spectrum for functions with less or equal to 10 variables can be stored in the processor’s core high-speed local shared GPU memory (this is a limitation of the GPU memory hierarchy). The generated PPRM spectrum for functions with more than 10 variables must be stored in the global GPU memory. An additional step in the parallelization of this algorithm on the GPU platform is the mapping to processor cores of arrays representing PPRM spectrums that are used for the random generation of functions as well as bentness detection. Threads with consecutive global identifiers access consecutive locations in the global GPU memory. Thus, the total number of concurrent threads are limited by the size of the global GPU memory. The GPU implementation uses shared GPU memory for the data transfer. Note, a GPU implementation using the

134

Accelerated Computations

global GPU memory has not been realized because the comparison of random generation of bent functions between the CPU and GPU for functions with more than 10 variables has not been performed.

2.2.5. Comparison of Random Generation of Bent Functions on CPU and GPU Platforms CUDA gives program developers direct access to the virtual instruction set and memory of the parallel computational elements in NVidia GPUs. For purposes of comparison, a single-core CPU C++ and CUDA implementation on a GPU platform of the algorithm for the random generation of bent functions is developed. The single-core CPU implementation is performed on an Intel i7 CPU at 3.66 GHz with 12 GBs of RAM. The CUDA implementations on a GPU platform are performed on two GPUs: a low-cost NVidia GeForce GT420 on 700 MHz with 2 GB DDR3 900 MHz RAM (4 multiprocessors and 48 cores) and an expensive NVidia GeForce GTX560Ti on 900 MHz with 1 GB DDR5 4 GHz RAM (8 multiprocessors and 384 cores). The generation of random numbers is different for each of the threads using the CUDA function curand-uniform. Table 2.7 shows the performance of the algorithm for the random generation of one bent function using a single-core CPU C++ and the GPU implementation on two GPU platforms. The presented computational times represent average values for ten executions for each number of variables of a function and the minimum and maximum number of non-zero PPRM coefficients. From the data in Table 2.7, it can be seen that, on these two GPU platforms, for all the computations, the CUDA implementation of the algorithm significantly reduces computation times (about 100, up to 1,000 times) in comparison to the CPU implementation. Note that for large Boolean functions, the experiments are not shown, due to the CPU implementation computation time limitation of 30 minutes. It should be noticed that the random discovering of a bent function can be a computational intensive task and the processing time is often the limiting factor for practical applications. However, computing power can be substantially increased through the exploitation of the

Efficient Generation of Bent Functions Using a GPU

135

Table 2.7. Performance of the algorithm for random generation of one bent function on a CPU and two GPUs

non–zero RM number of variables 8 8 10 10 10 10 12

coefficients [min - max] 1 1 1 1 1 1 1

-

100 163 100 200 300 400 100

average computation time [sec] CPU

GT420 GPU

GTX560 GPU

1.443 1.792 57.201 32.103 27.883 39.751 > 30 min.

0.036 0.038 0.045 0.043 0.041 0.044 > 30 min.

0.0016 0.0016 0.0018 0.0017 0.0017 0.0017 > 30 min.

parallelism by mapping the algorithm to a GPU platform. The CUDA implementation of this algorithm is convenient since the algorithm has a large degree of parallelism. The experimental results confirm that the application of the proposed implementation on the GPU platform leads to significant computational speedups. The implementation on the GPU platform proved to be especially efficient for the generation of bent functions with less or equal to 10 variables due to the highspeed local GPU memory usage. From these experimental results, it is evident that the GPU platform can be efficiently used in the parallel implementation of the algorithm for the random generation of bent functions in the Reed-Muller domain. It is also shown that this implementation is able to generate bent functions of 10 variables within a few milliseconds, although the number of these functions is still unknown. Therefore, the proposed technique could widen the area of practical applications of random generated bent functions. Future work can be focused on the extension of the proposed technique and algorithm to various other classes of Boolean functions.

136

Accelerated Computations

2.3. Multi-GPU Approximation for Silent Data Corruption of AN Codes Matthias Werner

Till Kolditz

Tomas Karnagel

Dirk Habich

Wolfgang Lehner

2.3.1. Error Detection and Correction Error Detection Codes (EDCs) and Error Correction Codes (ECCs) have been widely studied in theory and applied in practice [203, 219, 249]. Linear codes such as Hamming are easy to implement, offer low coding overheads and constant decoding times. They have well-defined, but limited detection and correction capabilities due to their algebraically determined structure [203, 249]. Efficient decoding algorithms have been implemented in hardware and single errorcorrecting, and double error detecting Hamming is nowadays used in server-grade main memory, called ECC Dynamic Random Access Memory (DRAM). AN-coding [120], being one representative of nonlinear codes, only offers efficient error detection capabilities. AN codes are arithmetic codes, where the letters “AN” refer to the integer values A and N . The integer constant A encodes the data words N , which are represented as integers. Since AN codes are nonsystematic, error correction would have to be done in a brute force manner and more efficient correction is an open problem. However, some non-linear codes provide better reliability than any linear code with the same parameters [247]. While linear codes are typically used in communication systems, AN codes have been used to detect errors in hardware itself [148, 249, 281] and current research advocates to use it in main-memory database systems for bit flip detection as well [180]. A code is defined by C = Im(φ) ⊆ Bn with the injective function φ : Bk → Bn , n>k>1, over the Boolean set B = {0, 1} [132]. A given data word x ∈ Bk is mapped to a u ∈ C, which can be decoded back to the original data word as illustrated in Figure 2.12. The reliability

Multi-GPU Approximation Methods

data word space B

k

code word space Bn

137

valid word invalid word r-bit sphere

decode

encode

bit flip

Hamming distance

Figure 2.12. Encoding of data words and decoding of code words.

of a code depends on how many bit flips can be detected or even corrected after an error-prone transmission of a code word. The r-bit sphere of a code word vector u ∈ C contains all vectors v ∈ Bn with a Hamming distance (see Definition 2.23) dH (u, v) ≤ r, [203, p. 11]. The r-bit spheres do not overlap, if the minimum distance d between all different code words is d > 2r (for the case d = 2r see [203, p. 10]). If ≤r bit flips occur during a transmission, an ECC is able to correct such an invalid code word. For instance, Hamming [141] as a linear code with d = 3 is able to either detect double bit flips or correct single bit flips. Extended Hamming allows both at the same time (d = 4) by using an additional parity bit. When > r bits flip, this results either in an uncorrectable invalid code word or, even worse, in a code word within a bit sphere of another code word. The latter results in a decoding error where the decoder returns a data word without being aware of the error, which is referred to as Silent Data Corruption (SDC). Recent studies suggest that the single bit flip error model will become obsolete while multi-bit flips already lead to uncorrectable data corruption [51, 108, 152, 290]. It is assumed that both the rate and number of bits flipped will increase with future transistor technologies [108]. In the context of the database domain, where users typically expect accurate results, SDC can have a severe impact on all tasks of database systems. For query processing as an example, joins may be incomplete when tuples’ join attributes are altered, or filtered scans may have missing or even additional tuples when the filtered values contain bit flips. In short, SDC leads to false positives and false negatives in query (intermediate) re-

138

Accelerated Computations

sults. In contrast to typical channel-based considerations of EDCs or ECCs, we try to understand the robustness of codes being used for storing data in a computer system’s main memory within the scope of in-memory database systems. There, all important business data are stored in the main memory (DRAM). The problem here is that the probability of errors depends at least on the hardware (DRAM) technology: • each of the memory cells’ susceptibility to external influences, such as cosmic rays, heat, voltage fluctuations, electrical crosstalk, etc.; • each cell’s degradation due to aging; as well as • the time between successive writes and reads. Furthermore, it is not yet clear whether bit flips will be independent or not (burst errors, . . . ). Like Forin [120] we argue that, currently, it is not feasible to assume a specific error model. Earlier, we proposed AN codes as a software coding approach for in-memory database systems [180], showing that it comes at little or no cost in terms of throughput. AN codes are envisioned as a complementary, or even alternative, to hardware ECC with Hamming codes in the area of database systems to better detect multi-bit flips in main memory. It can be an alternative, because database systems inherently store all data redundantly anyway (materialized views and indexes, recovery log, . . . ). AN codes map data to code words by multiplying a factor of A. To find good factors of A the probabilities of SDC have to be taken into account. The biggest problem is that, for each data width, each A behaves differently. Here, brute force attempts were made for data widths of up to 16 bits [148, 281], where basically all code words were checked against all possible error patterns. These previous approaches are insufficient: On the one hand, in-memory database systems operate on larger native data widths of currently up to 64 bits. On the other hand, the computational effort can be reduced, if only the closest pairs of code words are examined. For AN codes the complexity k of brute force is O(22 ) [132]. An instance with k = 32 would take 213 days assuming a modern multi-GPU node with 1012 operations/s. For such instances sampling methods like Monte-Carlo and lattice points are investigated aiming for complexity reduction at reasonable costs of accuracy.

Multi-GPU Approximation Methods

139

Since this task is highly parallelizable, Graphics Processing Units (GPUs) are used for acceleration. Value ranges of A and their probabilities of SDC for data widths up to 24 bits are examined. Runtimes and the computation of optimal values of A are shown for data widths up to 32 bits using approximation and exact algorithms where feasible. Regardless of a specific error model, a methodology for determining the probability of SDC for coding schemes in general is provided. For describing and evaluating the algorithms the following definitions are introduced: Definition 2.23. The bit width of data words is assumed to be k ≥ 2: • let B = {0, 1} be the Boolean set. Bk is the k-fold Cartesian product of B; • for x ∈ Bk the function wt(x) returns the number of bits with the value 1 (weight of x); • dH (x, y) = wt(x ⊕ y) is the Hamming distance with the bitwise XOR operator ⊕; • δb (x, y) defines the indicator function for 0 ≤ b ≤ n: 0 1, if dH (x, y) = b δb (x, y) = ; 0, if dH (x, y) = b

• given a positive integer value A and h=n−k=log2 A the AN code is defined by: CA = {A · x ∈ Bn : x ∈ Bk } where n is the width of the code words, k is the width of the data words in CA , and the x is used as an integer in the multiplication with A. The value A is chosen to be an odd number, because even numbers are left shifted odd numbers [281, p. 94]. The AN code CA preserves the code words with respect to the addition, but not with respect to

140

Accelerated Computations

the multiplication due to the inequality (A · x1 ) · (A · x2 ) = A(x1 · x2 ) [249, p. 103]. Therefore, AN codes are non-linear. They are also nonsystematic, which means that the data word is not literally embedded in the code word. A code word pair (u, v) with dH (u, v) = b describes an undetectable b-bit flip, since u, v ∈ CA . For computing the SDC probability, the Hamming distances between all code word pairs have to be enumerated yielding the distance distribution [132]. Definition 2.24. cA b denotes the distance distribution of code CA by   2   cA b = {(u, v) ∈ CA : dH (u, v) = b} where |S| denotes the cardinality of the set S.   For a code word of length n there can be nb different b-bit flips, beA cause the order of how  the bits flip is unimportant. Each cb is related k n to the number 2 · b , which represents all possible b-bit patterns over all code words in CA , yielding the SDC probability pA b for b-bit flips: pA b =

cA b  . 2k · nb

If A = 1 or no encoding takes place, then n = k, cb = 2k pb = 1 are obtained.

(2.15) k  b

and

2.3.2. Computing Distance Distribution of AN Codes By using Definition 2.23 the distance distribution can be written as:   cA δb (A · x, A · y) . (2.16) b = x∈Bk y∈Bk

Equation (2.16) can also be considered as a histogram for the Hamming distances dH with b = 0, . . . , n, where δb is the indicator function only counting Hamming distances of value b, see Definition 2.23. It should be noted that Equation (2.16) also contains the distance distributions of lower code word widths. AN codes are non-linear, i.e., (A · x) ⊕ (A · y) = A · (x ⊕ y). The two nested sums cannot simply be reduced to a single sum unlike Hamming, where the distance distribution can be computed directly from the weight enumerator [203,

Multi-GPU Approximation Methods

141

125ff.]. Hence, the complexity of (2.16) remains O(4k ). In addition, the algorithm runs thousands of times with different values of A striving for a profound parameter analysis. Therefore, larger dimensions are tackled by an approximation. To approximate the distance distribution, only certain parts of the sum iterations are computed. The number of parts is referred to as the number of iterations M . The Monte-Carlo method achieves this by using random samples over a domain Ω to estimate the definite integral of a function f : 1 M V  f (x i ) ≈ f ( x)d x, M i=1 Ω

1 V =

d x . Ω

The sample points x i cover Ω as the number of iterations M grows: cˆA b =

N M 2k · 2k   δb (A · σ1 (r), A · σ2 (s)) ≈ cA b . N · M r=1 s=1

(2.17)

Equation (2.17) estimates cA b by using sample streams σ1 and σ2 . The samples are generated either by σpseudo of pseudo-random numbers or by σquasi of quasi-random numbers. Pseudo-random numbers are prone to clustering, while quasi-random numbers fill the space more uniformly. The probabilistic error of Monte-Carlo is known to be q O( √1M ) and for quasi-Monte-Carlo it is O( (logMM ) ) with q being the number of dimensions [189]. Another approach for approximation is k given by the grid point sampling which is defined by σgrid (r) = 2M·r . It will turn out that M should be an odd value for better convergence. If M = 2k then the grid sampling yields the correct result, while the random numbers still miss the solution due to collisions and gaps. Algorithm 2 shows the generic procedure for enumerating the Hamming distances of all code word pairs. The distance distribution, or histogram, cA consists of n integers holding the number of occurrences for the corresponding Hamming distances dH given in (2.16). Line 1 and line 2 correspond to the nested sums. Then line 3 computes the Hamming distance to get the histogram position b for the increment in line 4. For 1D sampling, line 2 is replaced by y = σ(r), r = 0, . . . , M . For 2D sampling2 , line 1 is then replaced by x = σ(s), s = 0, . . . , M2 . 2 For

2D sampling only σgrid has been implemented.

142

Accelerated Computations

Algorithm 2. Computing distance distribution of AN codes Input: k ≥ 2, A > 0, n = k + h + 1, h = log2 (A) Input: Initial distance distribution cA b = 0, b = 0, . . . , n−1 Output: Distance distribution cA of code CA 1: for x = 0, . . . , (2k − 1) do

loop is parallelized on GPU(s) 2: for y = 0, . . . , (2k − 1) do loop is processed by each thread 3: b ← dH (A · x, A · y) A 4: cA b ← cb + 1 5: end for 6: end for 7: return cA For simplification, the exact algorithm exploits the symmetry of the code word pairs and starts at y = x + 1 in line 2 (then line 4 is an A k increment of two and cA 0 is directly computed by c0 = 2 ). The outer loop is mapped to equal workloads onto the GPUs. If the symmetry of the code word pairs is applied, then the inner loop iterations decrease linearly in each step. To distribute equal workloads across all GPUs, the workload size of a GPU i is computed by: 2k ωi+1  − 2k ωi  i wi = 1 − 1 − N 0≤i0 cb k where b = 0 is omitted due to cA 0 =2 .

Algorithm 2 can be parallelized on the GPU, as the Hamming distances of two code words can be computed independently. The Compute Unified Device Architecture (CUDA) framework is used for pro-

Multi-GPU Approximation Methods

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

143

template __global__ void ancoding( UINT A, // value A of AN code uint64* hist, // histogram buffer UINT offset, // where outer loop starts UINT end, // where outer loop ends UINT Aend ) // A*2^k { UINT hist_local[HIST_SIZE] = { 0 }; UINT v, w, i; // grid−striding loop of data words for (i = blockIdx.x*blockDim.x + threadIdx.x + offset; i < end; i += blockDim.x*gridDim.x) { // encode data word i to code word w w = A*i; // loop succeeding data words for(v=w+A; v

zi .

i=1

(4.8) The analysis of the number of x’s that mask Equation (4.8) is carried out for each of the possible cases: Case I ex = 0. If eu = 0, no error has occurred. However, if eu = 0, Equation (4.8) becomes u ⊕ eu = u and is never satisfied; therefore the error is always detected. Case II ex = 0. The error masking probability depends on the value of ey and ez as follows. If ey = 0 and ez = 0, Equation (4.8) becomes the error masking probability of the QS code; that is, QT S (e) ≤ q −r . However, in the case where ey = 0 and ez = 0, the error masking probability is determined by the number of non-zero ez,i ’s:

316

Reliability and Linearity of Circuits

• if there is a single 1 ≤ i1 ≤ 3 with ez,i1 = 0, Equation (4.8) becomes (4.9) zi2 zi3 ez,i1 = eu . If eu = 0, Equation (4.9) is satisfied if zi2 or zi3 or both equal 0. As a result (q r+1 − 1)q k−2r = q −r+1 − q −2r . qk

QT S (e) =

However, if eu = 0, for each zi2 = 0 there is a single zi3 that satisfies Equation (4.9). As a result QT S (e) =

(q r − 1)q k−2r = q −r − q −2r ; qk

• if there are exactly two 2s − 1 ≤ i1 , i2 ≤ 2s + 1 with ez,i1 , ez,i2 = 0 , Equation (4.8) becomes zi2 zi3 ez,i1 ⊕ zi1 zi3 ez,i2 ⊕ zi3 ez,i1 ez,i2 = eu .

(4.10)

Using the following variables w1 = zi1 ez,i2 ez,i3 , w2 = zi2 ez,i1 ez,i3 , w3 = zi3 ez,i1 ez,i2 , and t = ez,i1 −1 ez,i2 −1 ez,i3 −1 Equation (4.10) becomes w3 (t · w1 ⊕ t · w2 ⊕ 1) = eu .

(4.11)

Here again, if eu = 0, Equation (4.11) is satisfied if w3 = 0 or if t · w2 ⊕ t · w1 ⊕ 1 = 0. As a result, QT S (e) =

(q r (q r − 1) + q 2r )q k−3r = q −r+1 − q −2r . qk

However, if eu = 0, for each w3 = 0 and for each w1 there exists a single w2 such that Equation (4.11) is satisfied. As a result, QT S (e) =

(q r − 1)q r q k−3r = q −r − q −2r ; qk

Low Complexity High Rate Robust Codes

317

• if for all 2s − 1 ≤ i1 = i2 = i3 ≤ 2s + 1, and ex,i1 , ex,i2 , ex,i3 = 0 , Equation (4.8) becomes w1 (t(w2 ⊕ w3 ) ⊕ 1) ⊕ tw2 w3 ⊕ w2 ⊕ w3 = eu $ t−1 . (4.12) For a given pair of w2 and w3 , if (t(w2 ⊕ w3 ) ⊕ 1) = 0, there is a single w1 that satisfies Equation (4.12). If (t(w2 ⊕ w3 ) ⊕ 1) = 0, the value of w1 has no impact on satisfying the equation; hence, QT S (e) ≤

(q r (q r − 1) + q 2r )q k−3r = q −r+1 − q −2r . qk

Finally, if both ey , ez = 0, for each value of y, Equation (4.8) becomes Equation (4.12). As a result QT S (e) ≤ q −r+1 − q −2r . In conclusion, the error masking probability QT S (e) of the TS code for any non-zero error e is smaller than q −r+1 − q −2r . Table 4.1 summarizes the properties of the robust binary codes. Characteristics of the known Quadratic-Sum (QS) code [167] as well as the Table 4.1. Comparison of binary high rate robust separable codes computation optimum field

n

k

r

QC

QS [167]

(2s + 1)r

2sr

k/2s

2−r

yes

GF (q r )

SQS [*]

(2s + 1)r − l

2sr − l

k + l/2s and r > 1

2−r+l/2s

no

GF (q r )

(2s + 2)r

(2s + 1)r

k/2s + 1

2−r+1 − 2−2r

no

GF (q r )

k+r

k

1≤r≤k if k is odd, and r > 1

2−r+1

no

GF (q k )

code

TS [*] PC [224]

[*] means that these characteristics are derived for the first time in this section

318

Reliability and Linearity of Circuits

Punctured-Cubic (PC) code [224] are compared with the new Shortened Quadratic-Sum (SQS) code and Triple-Sum (TS) code explored in this section. Table 4.2 compares the same characteristics for non-binary high rate robust separable codes. The known Quadratic-Sum (QS) code [167] and Punctured-Square (PS) code [4] are compared with the new Shortened Quadratic-Sum (SQS) code and Triple-Quadratic-Sum (TQS) code explored in this section. Table 4.2. Comparison of non-binary high rate robust separable codes

n

k

r

QC

QS [167]

(2s + 1)r

2sr

k/2s

q −r

SQS [*]

(2s + 1)r − l

2sr − l

TQS [*]

(2s + 2)r

(2s + 1)r

k/2s + 1

k+r

k

1≤r≤k

code

PS [4]

computation optimum field yes

GF (q r )

no

GF (q r )

q −r

yes

GF (q r )

q −r

yes

GF (q k )

k + l/2s 2−r+l/2s and r > 1

[*] means that these characteristics are derived for the first time in this section

Synthesis for Reliability Using Bi-Decompositions

319

4.2. Synthesis Techniques for Reliability Using Bi-Decompositions Bernd Steinbach

Emanuel Popovici

Daniel Rodas Bautista

Bo Yang

4.2.1. Methods to Improve the Reliability of Circuits Logic synthesis for reliability, and particularly for low power consumption, is a widely studied subject with many tools being developed throughout industry and academia. While the industry tools favor heuristics, in academia the focus is more on a more generic approach for finding optimal solutions. Fault tolerant techniques for improving the reliability of digital circuitry have been of interest for a long time. Von Neumann [379] first introduced a classification for this error type and proposed solutions based on multiplexing techniques as early as 1956. Important work to cross the field of circuit design with the knowledge of error correction theory has been done by Taylor [359, 360]; this used Low Density Parity Check (LDPC) codes, faulty memories, and logic to build fault tolerant architectures for reliable systems. Taylor’s approach has been the basis for numerous other works [140, 183, 376], Check Symbols Generation [218, 365] and the Parity Prediction Function [177, 208, 308], where circuitry is added to a combinatorial network to generate an extra bit to ensure parity. State of the art synthesis flows aim at optimizing performance or area or power of a given logic function. During the synthesis process, the redundancy which is inherently captured in its truth table is reduced or indeed removed, hence reducing the reliability of the circuit. One can conclude that the most reliable function is represented by the sum of constituent min-terms. However, this non-optimized function leads to a very large area/delay/power overhead given its exponential number of min-terms. From a gate level perspective, there are two methodologies to improve the reliability of a given circuit while maintaining a minimal overhead.

320

Reliability and Linearity of Circuits

The first method, as already presented in [129], is reliability optimization using combinatorial methods and graph manipulations. It was shown that these lead to relatively modest reliability improvements, but with a minimal area/delay/power overhead [129]. While the reliability is improved, the circuit is still not fault-tolerant. In this methodology, the number of outputs (or function associated truth table size) is not increased. Instead, the graph is transformed through a sequence of applications of a number of rules in order to increase its reliability. A second method of adding redundancy is by modifying the size of the function associated truth table. In [130, 241], a new methodology called Code Prediction Encoding (CPE) was introduced with the view of extending the logic network with a parity network, hence increasing the number of outputs of the circuit. The focus of this approach is not on changing the combinational logic but on augmenting it. It enables the retrieval of the correct output even if errors have occurred. The process is reminiscent of the process of adding redundancy to the transmitted signal in a communication system. It exploits the afferent topology of the original circuit, which is expanded by embedding an Error Correction Code (ECC) into its logical functionality. The embedded ECC has a limit to the number of errors it can correct, referred to as error correction capacity. Hence, if there are more errors than the number of errors that can be corrected we get an increase in the error rate. Any code could be used, hence the generic nature of the technique. This technique was successfully applied to improve the reliability of XOR-only logic networks (or linear circuits in our concept) in the context of LDPC encoding [100]. However, the majority of circuits in practice have a non-linear nature as they are built with arbitrary logic gates and hence they are less amenable for a CPE technique as remarked on in [241]. Following the rationale of the CPE approach, this contribution summarizes relevant knowledge regarding the separation of a combinatorial circuit into a linear and a non-linear part. The linear part can then be used in context of CPE. The application of recently published ideas of the vectorial bi-decomposition [326, 347] extends the known approaches of strong and weak bi-decompositions [44, 217, 243, 328, 329, 334, 342] and leads to a partition of circuit parts that contain

Synthesis for Reliability Using Bi-Decompositions

x

non-linear part

linear part

321

f

Figure 4.4. Preferred architecture for circuits with a high reliability.

either only non-linear AND gates or only linear XOR gates. This is particularly interesting as it is known that a AND-XOR network results in a much better realization and requires fewer product terms than a more classical AND-OR realization. We show that using our decomposition methods, one can derive the Reed-Muller form of a circuit which is known as a hard problem. Another contribution of this work is the quantification of the efficiency of linearization by introducing the notion of the degree of linearity. Adders are some of the most used non-linear logic functions in practice and numerous architectures were proposed to implement them. We show that even in the case of the symmetric carry function, for which no strong bi-decomposition exists [44, 217], a decomposition into a linear output part and a non-linear input part could be designed. We apply the described bi-decomposition at bit level, block level and unit level on an adder architecture and analyze the results in terms of area, delay, and power on two technologies, namely ApplicationSpecific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs).

4.2.2. Synthesis for Reliability Preferred Architecture. The application of the new CPE approach requires that the circuit structure is split into a linear and a nonlinear part. Therefore we explore synthesis methods that find such a separated structure. Figure 4.4 shows the needed architecture of the circuit.

322

Reliability and Linearity of Circuits

Linearity with Regard to a Variable. The linearity is a property of a Boolean function which is defined regarding each variable. Definition 4.37 (Linearity). A Boolean function f (xi , x0 ) is linear with regard to the variable xi if and only if ∂f (xi , x0 ) =1. ∂xi

(4.13)

If the Boolean function f (xi , x0 ) satisfies (4.13) this function can be expressed by (4.14) f (xi , x0 ) = xi ⊕ g(x0 ) . The function g(x0 ) can be calculated by g(x0 ) = max (xi ⊕ f (xi , x0 )) , xi

(4.15)

using the derivative operation single maximum of the Boolean Differential Calculus (BDC) [243, 332–334]. The function g(x0 ) is independent of the variable xi . The independence of a variable xi is also a property of a Boolean function. Definition 4.38 (Independence). A Boolean function f (xi , x0 ) is independent of the variable xi if and only if ∂f (xi , x0 ) =0. ∂xi

(4.16)

Example 4.31 (Linear separation of xi ). The sum function fsi (xi , yi , ci−1 ) of the adder cell i is defined by fsi (xi , yi , ci−1 ) = xi y i ci−1 ∨ xi yi ci−1 ∨ xi y i ci−1 ∨ xi yi ci−1 . (4.17) This function satisfies the condition (4.13) of the linearity with regard to the variable xi ∂fsi (xi , yi , ci−1 ) =1, (4.18) ∂xi which follows from the definition of the derivative ∂f (xi , x0 ) = f (xi = 0, x0 ) ⊕ f (xi = 1, x0 ) ∂xi

(4.19)

Synthesis for Reliability Using Bi-Decompositions

yi ci−1 xi (a) yi ci−1 xi (b)

g(yi , ci−1 )

323

fsi (xi , yi , ci−1 ) f si

fsi (xi , yi , ci−1 ) f si

Figure 4.5. Separation of a variable using XOR gates: (a) with regard to xi , (b) consecutively with regard to yi and ci−1 .

applied to (4.17): ∂fsi (xi , yi , ci−1 ) = (y i ci−1 ∨ yi ci−1 ) ⊕ (y i ci−1 ∨ yi ci−1 ) ∂xi = y i ci−1 ⊕ yi ci−1 ⊕ y i ci−1 ⊕ yi ci−1 =1.

Using the definition of the maximum with regard to xi max f (xi , x0 ) = f (xi = 0, x0 ) ∨ f (xi = 1, x0 ) xi

(4.20)

the function g(yi , ci−1 ) and can be calculated based on (4.15): gsi (yi , ci−1 ) = max (xi ⊕ fsi (xi , yi , ci−1 )) xi

= (0 ⊕ (y i ci−1 ∨ yi ci−1 )) ∨ (1 ⊕ (y i ci−1 ∨ yi ci−1 )) = y i ci−1 ∨ yi ci−1 . Figure 4.5 (a) shows the circuit structure created by this linear separation of the variable xi . The function g(yi , ci−1 ) is linear with regard to both the variables ci−i and yi . Hence, g(yi , ci−1 ) = yi ⊕ ci−1

324

Reliability and Linearity of Circuits

and the function fsi (xi , yi , ci−1 ) is completely linear g(yi , ci−1 ) = xi ⊕ yi ⊕ ci−1 . Figure 4.5 (b) shows the circuit structure created by two linear separations of a single variable using two XOR gates.

Degree of Linearity. The derivative with regard to the variable xi is a Boolean function for which the extreme values specify either the linearity by the value 1 or the independence by the value 0. The single derivative itself is independent of xi :

∂f (xi , x0 ) ∂ =0. ∂xi ∂xi

(4.21)

The number of function values 1 of the derivative with regard to the variable xi is a measure for the degree of the linearity: Definition 4.39 (Degree of Linearity). A Boolean function f (x) of n variables x = (xi , x0 ) has a degree of linearity with regard to the variable xi in the range 0, . . . , 1 defined by

∂f (xi , x0 ) 1 degreelin f (x , x ) = ∗ wt (4.22) i 0 xi 2n−1 ∂xi where wt(h) is the number of values 1 of the evaluated function h. Each variable xi can be separated from a function f (xi , x0 ) using an XOR gate: (4.23) f (xi , x0 ) = xi ⊕ g(xi , x0 ) . Each such linear separation satisfies the rule: lin degreelin xi f (xi , x0 ) = 1 − degreexi g(xi , x0 ) .

(4.24)

Hence, the linear separation of a variable xi can also be useful in the case where degreelin xi f (xi , x0 ) is nearly the value 1 because in this case the degree of linearity of the function g(xi , x0 ) is much smaller in comparison to f (xi , x0 ).

Synthesis for Reliability Using Bi-Decompositions

325

Table 4.3. Degree of linearity of adder functions s0 , . . . , s8

inputs

output functions s0

a0 b0 a1 b1 a2 b2 a3 b3 a4 b4 a5 b5 a6 b6 a7 b7

s1

s2

1.000 0.500 0.250 1.000 0.500 0.250 0.500 0.500

s3

s4

s5

s6

s7

s8

0.125 0.125 0.250 0.250 0.500 0.500

0.063 0.063 0.125 0.125 0.250 0.250 0.500 0.500

0.031 0.031 0.063 0.063 0.125 0.125 0.250 0.250 0.500 0.500

0.016 0.016 0.031 0.031 0.063 0.063 0.125 0.125 0.250 0.250 0.500 0.500

0.008 0.008 0.016 0.016 0.031 0.031 0.063 0.063 0.125 0.125 0.250 0.250 0.500 0.500

0.004 0.004 0.008 0.008 0.016 0.016 0.031 0.031 0.063 0.063 0.125 0.125 0.250 0.250 0.500 0.500

We calculated the degree of linearity for all output functions of an eight bit adder. Table 4.3 enumerates these results. It can be seen that the degree of linearity exponentially decreases with the distance between the index of the output and the index of the input. The low degree of linearity of the low-order increases the needed number of gates in the non-linear part for higher-order sum functions.

Bi-Decompositions. From Table 4.3 it can be concluded that the linear separation of a variable is only possible for the function s0 . Alternatively, the wanted XOR gates for the linear part can be found by means of the strong XOR bi-decomposition (see Figure 4.6 (c) on the left). The decomposition functions g(xa , xc ) and h(xb , xc ) are simpler than the given function f (xa , xb , xc ) because the function g does not depend on the variables xb nor the function h on xa , respectively. For recursive bi-decomposition in the non-linear part the strong OR bi-decomposition and the strong AND bi-decomposition can also be used (see Figure 4.6 (a) and (b) on the left).

326

xa

Reliability and Linearity of Circuits

g(xa , xc )

f (xa , xb , xc )

g(xa , xb , xc )

f (xa , xb , xc )

y xc

xc xb

xa

h(xb , xc )

y

xb

h(xa , xb , xc )

xa

g(xa , xb , xc )

(a) xa

g(xa , xc )

f (xa , xb , xc )

y xc

xc xb

f (xa , xb , xc )

h(xb , xc )

y

xb

h(xa , xb , xc )

xa

g(xa , xb , xc )

(b) xa

g(xa , xc )

y xc

xc xb

f (xa , xb , xc )

h(xb , xc )

xb

f (xa , xb , xc )

y h(xa , xb , xc )

(c) Figure 4.6. Strong bi-decompositions on the left in comparison to the vectorial bi-decomposition on the right using: (a) an OR gate; (b) an AND gate; (c) an XOR gate.

Unfortunately, there are functions for which no strong bi-decomposition exist. The completeness of the bi-decomposition provides the weak OR bi-decomposition and the weak AND bi-decomposition [186, 243]. A weak XOR bi-decomposition exists for each function, but their decomposition function can be even more complex than the given function; hence, we avoid this type of bi-decomposition. A vectorial bi-decomposition can exist even if there is no strong bidecomposition for a given function. Vectorial bi-decompositions were suggested in [326] for the first time. As can be seen in Figure 4.6 on the right, vectorial bi-decompositions exist for an OR gate, an AND gate, and also for an XOR gate. The simplifications of the decomposition functions follow from the independence of the simultaneous change of xb (4.26) and xa (4.27) requested in Definition 4.40.

Synthesis for Reliability Using Bi-Decompositions

327

Definition 4.40 (Vectorial Bi-Decomposition). A Boolean function f (xa , xb , xc ) is OR, AND, or XOR bi-decomposable with regard to the subsets of variables xa and xb if: 1. f (x) can be expressed by f (xa , xb , xc ) = g(xa , xb , xc ) ◦ h(xa , xb , xc )

(4.25)

where ◦ ∈ {∨, ∧, ⊕}; and 2. the decomposition functions g(xa , xb , xc ) and h(xa , xb , xc ) satisfy ∂g(xa , xb , xc ) =0, ∂xb ∂h(xa , xb , xc ) =0. ∂xa

(4.26) (4.27)

Example 4.32 (Vectorial bi-decomposition of a symmetric function). The carry function fci (xi , yi , ci−1 ) = xi yi ∨ xi ci−1 ∨ yi ci−1

(4.28)

of an adder is a symmetric function. It is known that there is no strong bi-decomposition for a symmetric function. As shown in [325], for each decomposition function of a vectorial bi-decomposition, that is independent of the simultaneous change of exactly two variables, a strong XOR bi-decomposition with regard to these two variables exists. Hence, without evaluation of an additional condition an XOR bi-decomposition of the decomposition function h1 (xi , yi , ci−1 ) into the consecutive decomposition function g2 (xi , ci−1 ) and h2 (yi , ci−1 ) can be realized. Figure 4.7 shows the result of the vectorial XOR bidecomposition of carry function (4.28) with regard to xa = (yi , ci−1 ) and xb = (ci−1 ) as well as the consecutive strong XOR bi-decomposition of h1 (xi , yi , ci−1 ) with regard to xa = (xi ) and xb = (yi ).

4.2.3. Experimental Results Two Synthesis Approaches. Due to the results of Subsection 4.2.2 we compare two synthesis approaches. In both architectures we as-

328

Reliability and Linearity of Circuits

non-linear part xi yi ci−1

linear part g1 g2 h1

f ci

h2

Figure 4.7. Vectorial and strong XOR bi-decomposition of the carry function fci (xi , yi , ci−1 ) of an adder.

sume that there is no carry in and carry out. The first one uses vectorial and strong bi-decompositions as shown in Figure 4.7 for adders with a restricted number of bits and generates larger adders in an architecture similar to a Ripple Carry Adder (RCA). This adder has a minimal area, and thus delay and power consumption, but is not optimal for the final scope, i.e., the CPE method, since the linear and non-linear parts are interconnected instead of being completely independent. The second one uses the architecture shown in Figure 4.4 for complete adders of different numbers of bits (similar to a Carry Look-Ahead Adder (CLA)). This architecture is perfect for the CPE method, yet it comes with a large area, power, and timing cost. The first architecture was described using the Verilog Hardware Description Language (HDL) in a parametrizable fashion, but from now on will be referred to as the locally decomposed adder since the linear/non-linear decomposition is only done at local granularity. The second architecture was instead generated by means of a Perl script, taking as its input the desired size of the architecture. This adder will now be referred to as the globally decomposed since the decomposition is total. In the following paragraphs the designed hardware description of the two architectures will be discussed.

Locally Decomposed Adder. To facilitate the description and analysis of the adder, it was divided into two modules, the lin_part mod-

Synthesis for Reliability Using Bi-Decompositions

329

ule and the nonlin_part module. The two modules are interconnected thus they are not independent of one another. At the end, it is a direct description of the structure shown in Figure 4.7, where the linear part is extended by the tree of two XOR gates of the sum function fsi as shown in Figure 4.5 (b). This structure of a non-linear part followed by a linear part is cascaded one after the other just like a Ripple Carry Adder (RCA). Since it is purely a combinational circuit only Verilog assign statements were used. A Verilog generate statement was used to make the description parametrizable. This allowed for the quick synthesis of differently sized adders by means of a Perl script. To understand the way the generic adder is constructed it is important to know which values are sent from the linear part to the non-linear part and viceversa. The non-linear part is made up entirely by the AND statements needed for the carry function. The keen reader will notice that there are three AND gates for every adder cell, i.e., for every bit. Yet the first and last adder cells do not need a carry input and carry output, respectively, so the actual number of non-linear outputs is then 3 ∗ n − 5 (where n is the number of bits for the adder under consideration).

Globally Decomposed Adder. The previously described adder architecture did not satisfy the condition for perfect CPE integration yet it was used as a starting point for a, possibly, better structure. Simply put, the problem was to completely separate the AND gates from the XOR gates, both in the description of the adder and in the final synthesis. Since the interconnection between the linear and nonlinear parts was due to the propagation of the carry function, it was enough to generate all the carries in parallel. Obviously this ended up being a structure very much resembling a Carry Look-Ahead Adder. Two approaches were tried for the description in Verilog. Initially an approach similar to the previous adder was used, where a loop would generate all the needed values. This approach worked but gave problems when synthesized. Particularly, bigger adders would not synthesize due to limitations, in maximum loop iterations, in the compiler (both Synopsys Design Compiler and Xilinx Vivado Design Suite). After several attempts, the best solution was to generate the Verilog description using a Perl script. An advantage of this was that

330

Reliability and Linearity of Circuits

linear part non-linear part

number of gates

105 104 103 102 101 100 4

5

6

7

8 9 10 11 12 13 14 15 16 number of adder bits

Figure 4.8. Gate equivalents for the decomposed adder.

it was more flexible and easily maintained, but more importantly the synthesis tools were then able to compile the generated netlists.

ASIC Synthesis. The adders were synthesized in 65nm CMOS technology using the Design Compiler from Synopsys. The global decomposition results in a very large area of implementation as noted in Figure 4.8.

power (W)

10−2

total power internal power

10−3 10−4 10−5 10−6 4

5

6

7

8 9 10 11 12 13 14 15 16 number of adder bits

Figure 4.9. Power for the decomposed adder.

Synthesis for Reliability Using Bi-Decompositions

331

Table 4.4. Areas and maximal time delay for the two implementations of the decomposed adders

bits 4 5 6 7 8 9 10 11 12 13 14 15 16

Areas (in μm)

Time Delay (in ns)

Global

Local

Global

Local

70 150 314 643 1,302 2,623 5,268 10,558 21,141 42,310 84,649 169,329 338,690

49 68 87 105 124 143 162 180 199 218 237 255 274

0.62 1.00 1.69 3.24 5.97 11.38 22.25 43.75 86.75 173.17 345.37 689.94 1379.43

0.70 0.97 1.24 1.50 1.77 2.03 2.30 2.56 2.83 3.10 3.36 3.63 3.89

Figure 4.9 shows the power associated to implementations on the Application-Specific Integrated Circuit (ASIC) of the globally decomposed adder for a different number of bits. As can be noted from Figure 4.8, the gate count is doubled with every bit increase. This is due to the process of internal carry generation which grows exponentially. Table 4.4 compares both the area and the maximum delay for the two different implementations of the decomposed adder. We can see that the local decomposition results in much smaller areas. Moreover, the maximum delay in the local decomposition is significantly shorter. For these reasons the Field Programmable Gate Array (FPGA) synthesis was then done for the locally decomposed adder only.

FPGA Synthesis. Figures 4.10 (a) and (b) show the number of LUTs used and the power associated to the implementations on the FPGA of the decomposed adder for different numbers of bits and compares them to the equivalent values for a standard adder. Synthesis and routing was done for a Zynq FPGA using the Vivado design suite.

332

Reliability and Linearity of Circuits

decomposed undecomposed

number of LUTs

50 40 30 20 10 0 5

10

(a)

15 20 number of adder bits

25

30

25

30

decomposed undecomposed

25 power (mW)

20 15 10 5 0 5 (b)

10

15 20 number of adder bits

Figure 4.10. FPGA synthesis results: (a) number of LUTs, (b) power consumption.

While for the locally decomposed adders the number of gates and associated delay is drastically improved, one needs to consider further optimization for the circuits such that the presented architectures become competitive. For the locally decomposed adders a particular emphasis will be on implementations on FPGA and power optimization. Further study into using academic tools for logic synthesis and optimization such as Espresso, SIS, ABC, etc., will also be performed.

Synthesis for Reliability Using Bi-Decompositions

333

4.2.4. Reliability: Challenges and Approaches With the continuing logic technology scaling, design for test/reliability/power is becoming a major concern. In this context, a significant effort across recent years has been in the efficient decomposition of logic functions in AND-XOR or OR-XOR networks. The strong XOR bidecomposition is an alternative method to split a give Boolean function into simpler decomposition functions using an XOR gate. These sub-functions are simpler than the given function because they do not depend on the variables of one of the dedicated sets of the decomposition. The vectorial bi-decomposition extents the possibilities for decomposition. We have shown that even in the case of the symmetric carry function, for which no strong bi-decomposition exists, a decomposition into a linear output-part and a non-linear input part could be designed. A Reed-Muller form for the adder can be achieved. Also, some quantification in terms of the degree of linearity is given. Using the proposed decomposition, we implement the resulting decomposed adders on ASIC and FPGA technologies and give some preliminary results. The results show that further development of custom optimization tools is required. This is part of future work together with the integration of the degree of linearity informed code prediction encoding for improved reliability.

334

Reliability and Linearity of Circuits

4.3. On Linearization of Partially Defined Boolean Functions and Applications to Linear Codes Jaakko Astola

Pekka Astola

Radomir S. Stanković

Ioan Tabus

4.3.1. Partially Defined Boolean Functions The complexity of digital devices continues to grow and also the amount of interconnections between semi-autonomous units is rapidly increasing. One aspect of this development is that digital devices responsible for controlling whole or part of the behavior of the units often have a very large number of inputs which in turn implies that only a very small fraction of the possible combinations of input values can ever appear in practice. Therefore, for these inputs we do not need to define the output value of the corresponding Boolean function, it could be either 0 or 1. We call such outputs “don’t cares” and the corresponding functions partially or incompletely defined functions. Both of the terms partially defined (or specified) and incompletely defined (or specified) are commonly used. Some authors use the term incompletely defined to mean a function with a relatively low number of “don’t care” values that are mainly used in the minimization of the implementation [153]. Formally, a partially defined m-output Boolean function F of n variables is defined as F : S → {0, 1}m

(4.29)

where S ⊆ {0, 1}n , i.e., the function value is specified only on a certain (usually proper) subset S (the “care” set) of the domain of a corresponding completely defined Boolean function. We denote the cardinality |S| of S by k which is also called the weight of the function F [265].

Linearization of Partially Defined Boolean Functions

335

Partially defined Boolean functions with a large number of variables are encountered in many modern application areas, such as the design of on-line real-time control systems, the design of built-in self-test equipment for Very-Large-Scale Integration (VLSI) circuits, artificial intelligence, Internet technology, software engineering, etc. These problems are mostly characterized by a relatively small number of input patterns for which the output value is determined (care states). On the other hand, the number of don’t care states will then be huge, and the quality of a minimization method is thus determined by its ability to take advantage of their existence without enumerating them. Many strategies have been developed to efficiently implement partially defined Boolean functions with a large don’t care set, see e.g., [110, 111, 153]. One strategy is to identify the binary spaces with linear spaces over the finite field of two elements F2 and to find a linear transform from the original n-dimensional space to a lower dimensional (say m-dimensional) space such that it is injective on the “care set”. The linear transform can be computed quickly with XOR circuits and the remaining non-linear part with m variables can be implemented, e.g., with a look-up-table. This decomposition is called linear decomposition and was first proposed by Nechiporuk [223]. The theory was later enhanced by Lechner [187], and a method using autocorrelation for finding the linear transformation was presented by Karpovsky [165], Varma and Trachtenberg [375]. In [265, 266], Sasao uses linear decompositions to derive efficient realizations for index generation functions and code converters. There has also been much interest in logic based methods in pattern recognition and data analysis. In these applications the number of variables is potentially very large and efficient methods for simplification and fast implementation are needed [49, 398, 399, 405, 406]. In this section we consider a recently developed method [15] to find the above described linear transform in a form that can be efficiently implemented as a Galois field deconvolution. It turns out that asymptotically similar results can also be obtained using the ring of integers and well known results of number theory. We also discuss the case of a general linear transform, i.e., without the restriction of being implementable by a convolution type operation. Both of these methods have close connections to the theory of linear binary codes, especially

336

Reliability and Linearity of Circuits

to bounds on code performance. This contribution is largely based on papers [14–16].

4.3.2. The Linearization Method In this subsection, we briefly present the linear reduction method for partially defined Boolean functions based on certain properties of polynomials over F2 [15]. To be able to work with linear spaces we identify {0, 1} with the finite field F2 . For a Boolean function F of the form (4.29) the reduction of the number of variables by a linear transform means to find a linear transform φ : F2n → F2m , where m < n, such that φ is injective on S. Obviously, in that case there is a possible nonlinear function h : F2m → F2 such that if x = (x0 , x1 , . . . , xn−1 ) ∈ S, then h(φ(x)) = f (x). Therefore, here we point out some basic definitions and fix the notation that will be used. For more information on finite fields we refer to [191, 203, 373]. Consider the finite field F2 = ({0, 1}, +, ·) where the operations +, · are addition and multiplication modulo 2. The set of polynomials F2 [t] = {a(t) = a0 + a1 t + · · · | ai ∈ F2 , only finitely many ai = 0} , with addition and multiplication defined in the usual way, is a ring with unit element e(t) = 1. The degree of a(t) is the highest power of t in all terms with non-zero ai . The division algorithm holds in F2 [t]:

Division algorithm. Let f (t), g(t) ∈ F2 [t], g(t) = 0. Then, there are q(t), r(t) ∈ F2 [t] such that f (t) = q(t)g(t) + r(t) where the degree of r(t) is less than the degree of g(t). Notice that F2 [t] is commutative, and in this section we consider only commutative rings.

Linearization of Partially Defined Boolean Functions

337

Example 4.33. Consider the polynomials f (t) = t + t2 + t3 + t4 , g(t) = 1 + t + t3 . With the usual long division, we get t + t2 + t3 + t4 = (t + 1)(1 + t + t3 ) + t + 1 . Note that because 1 + 1 = 0 in F2 , we have, for instance, (1 + t) + (t + t2 ) = 1 + t2 ,

(1 + t)2 = 1 + t2 ,

etc.

The polynomials of degrees smaller or equal to n − 1 form an ndimensional vector space over F2 , and all polynomials form an infinite dimensional vector space over F2 . The ring F2 [t] is a unique factorization domain. Thus, any f (t) ∈ F2 [t] has a unique (up to the order of pi ) factorization f (t) = p1 (t)p2 (t) · · · ps (t) where (not necessarily distinct) p1 (t), . . . , ps (t) are irreducible. As usual, a polynomial is irreducible if it cannot be written as a product of two polynomials, each of a positive degree.

4.3.3. An Efficient Method of Finding a Linear Transform Injective on S Assume that we are given a subset S of Fn2 that has cardinality k. Typically, in applications, k > n but k % 2n . Denote by X the matrix whose rows are the elements of S, i.e., ⎡ ⎤ x1,0 x1,1 · · ·x1,n−1 ⎢x2,0 x2,1 · · ·x2,n−1 ⎥ ⎢ ⎥ X=⎢ . . . (4.30) . ⎥ . ⎣ .. .. . . .. ⎦ xk,0 xk,1 · · ·xk,n−1 Our goal is to find a linear transform of the form φ : Fn2 → Fm 2 such that it is injective on S and m is as small as possible.

338

Reliability and Linearity of Circuits

Now, represent the rows of X as polynomials over F2 ⎡ ⎤ x1,0 x1,1 · · ·x1,n−1 ⎢x2,0 x2,1 · · ·x2,n−1 ⎥ ⎥ ⎢ X=⎢ . . . . ⎥ ⎣ .. .. . . .. ⎦ ⎡

xk,0 xk,1 · · ·xk,n−1

⎤ ⎡ ⎤ x1,0 + x1,1 t + · · · + x1,n−1 tn−1 x1 (t) ⎢ x2,0 + x2,1 t + · · · + x2,n−1 tn−1 ⎥ ⎢x2 (t)⎥ ⎢ ⎥ ⎢ ⎥ =⎢ ⎥ = ⎢ .. ⎥ . .. ⎣ ⎦ ⎣ . ⎦ . xk,0 + xk,1 t + · · · + xk,n−1 tn−1

xk (t)

The rows xi (t) of X belong to F2 [t]. Let g(t) ∈ F2 [t] with degree m smaller than n − 1. We define φ : Fn2 → Fm 2 as follows: let x(t) ∈ F2 [t], then φ(x(t)) = r(t) where x(t) = q(t)g(t) + r(t)

and

deg r(t) < deg g(t) .

Because g(t) is a fixed polynomial, the transform φ : r(t) = φ(x(t)) is obviously linear: it just sends any polynomial to its reminder when divided by g(t). For xi (t) ∈ S write xi (t) → ri (t) = φ(xi (t)) .

(4.31)

Now, ri (t) = rj (t) if and only if g(t) does not divide (xi (t) − xj (t)). Thus, the task of finding a linear decomposition is reduced to finding a polynomial g(t) of lowest degree, say m, such that xi (t) − xj (t) = 0

mod g(t),

1≤i m. If there is a pi0 (t), 1 ≤ i0 ≤ N such that it does not appear in any of the factorizations (4.33), then, by writing g(t) = pi0 (t)s(t), where deg(pi0 (t)) + deg(s(t)) = m , we have a polynomial g(t) such that zi (t), i = 1, 2, . . . , L, is not divisible by g(t). Thus, xi (t) − xj (t) = 0

mod g(t),

1≤i

s=1



k (n − 1) 2

there is a g(t) satisfying (4.34) and we can find an injective linear transform (4.35). Now, there is an explicit formula for the number of irreducible polynomials that was already known by Gauss [191, 255]  dN (d) 2m = d|m

where d|m denotes that m is divisible by d. Thus,

m 

sN (s) >

s=1 m

implying that if 2

>

k  2



dN (d) = 2m ,

d|m

(n − 1) or equivalently

m > log2 (k(k − 1)/2) + log2 (n − 1)

(4.36)

we can construct the required linear transform. Remark 4.1. The ring that we use in the construction does not have to be a polynomial ring, we can use the ring of integers Z. For simplicity we illustrate the situation in the binary case. Express an element

Linearization of Partially Defined Boolean Functions

341

in Fn2 as v = (v0 , . . . , vn−1 ) ↔ v0 + v1 21 + . . . + vn−1 2n−1 . Let a be the smallest positive number that does not divide any of the differences d of the integers corresponding to elements of S. Then the remainders of the integers v when divided by a are distinct and serve as unique indices of v ∈ S. Example 4.35. Consider again the vectors (1, 0, 0, 0, 0), . . . , (0, 0, 0, 0, 1) but now as elements 16, 8, 4, 2, 1 of ordinary integers. We see that 9 does not divide any of the differences and so the remainders (mod 9) provide the required linear function. Notice that though this operation is a ring homomorphism in Z it is not linear on F52 . Notice also that the structures of the rings are very different, especially the additive structures, for instance f (t) + f (t) = 0 for any f (t) ∈ F2 [t]. The multiplicative structures are more similar but far from identical. For instance t2 + 1 = (t + 1)2 ∈ F2 [t] corresponds to 5 in Z and the irreducible polynomial 1 + t3 + t4 ∈ F2 [t] corresponds to 25 in Z. Remark 4.2. It is interesting that the same counting argument produces asymptotically equivalent results when applied in the integer ring. Assume that the differences as integers are y1 = v1,0 + v1,1 21 + . . . + v1,n−1 2n−1 .. . yl = vl,0 + v,1 21 + . . . + vl,n−1 2n−1 . Then

l >

yi < 2nl .

i=1

On the other hand, it is well known (and equivalent to Prime Number Theorem [145]) that asymptotically ϑ(x) = ln(

> p

yi )

i=1

we can argue, as in the corresponding proof for polynomials, that the differences can consume at most all primes up to nl ln(2) and there must be a prime ≈ s that will give the required φ, and the number of bits when elements of S are divided by this prime is m ≈ log2 (nl ln(2)) ≈ 2 log2 (k) + log2 (n) . From the above considerations, we have the following procedure for the reduction of variables of a sparse, incompletely specified binary function by a linear transform. Consider an incompletely specified function f (x0 , x1 , . . . , xn−1 ), that is specified for k % 2n values of (x0 , x1 , . . . , xn−1 ). If m > log2 (k(k − 1)/2) + log2 (n − 1) , it can be expressed as fr (y0 , y2 , . . . , ym−1 ) where y0 = c0,0 x0 + · · · + c0,n−1 xn−1 .. . ym−1 = cm−1,0 x0 + · · · + cm−1,n−1 xn−1

(4.37)

where the coefficients cij ∈ F2 . In matrix form, the application of the transform (4.37) to all rows of X gives (4.38) Y = XC where Y is a (k × m) matrix, X is a (k × n) matrix, and C is the transpose of the (m×n) matrix C of the linear transform (4.37). Now, 1, t, t2 , . . . , tn−1 form a basis for the space of polynomials of degree at most n − 1. The matrix C is found at once by checking where the transform (4.35) sends the basis vectors 1, t, t2 , . . . , tn−1 . Consider the reminders ti = hi (t)g(t) + ci (t) , deg(ci (t)) < deg(g(t)), i = 0, 1, . . . , n − 1 .

Linearization of Partially Defined Boolean Functions

343

The polynomials ci (t), i = 0, 1, . . . , n − 1, are the results when the transform (4.35) is applied to ti , i = 0, 1, . . . , n − 1. In matrix form, this is ⎡ ⎤ 100· · ·0 ⎢010· · ·0⎥ ⎢ ⎥ ⎢ .. .. .. . . .. ⎥ C ⎣. . . . .⎦ ⎡

000· · ·1

⎤ c0,2 · · · c0,m−1 ⎢ c1,2 · · · c1,m−1 ⎥ ⎢ ⎥ = ⎢ ⎥ , .. . . .. ⎣ ⎦ . . . cn−1,0 cn−1,1 cn−1,2 · · ·cn−1,m−1 c0,0 c1,0 .. .

c0,1 c1,1 .. .

(4.39)

implying that ⎡

⎤ c0,2 · · · c0,m−1 ⎢ c1,2 · · · c1,m−1 ⎥ ⎢ ⎥ C = ⎢ ⎥ . .. . . .. ⎣ ⎦ . . . cn−1,0 cn−1,1 cn−1,2 · · ·cn−1,m−1 c0,0 c1,0 .. .

c0,1 c1,1 .. .

Remark 4.3. Notice that if we have an arbitrary C it is, in general, not possible to express it in the form (4.31). This expected fact becomes obvious if we assume that such a g(t) exists. Then, C can be written as ⎡

1 0 .. .

0 1 .. .

0 0 .. .

··· ··· .. .

0 0 .. .



⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ 0 0 0 ··· 1 ⎥ C = ⎢ ⎢ cm,0 c0,1 c0,2 · · · cm,m−1 ⎥ , ⎢ ⎥ ⎢cm+1,0 c1,1 c1,2 · · ·cm+1,m−1 ⎥ ⎢ ⎥ ⎢ . ⎥ .. .. . . .. . ⎣ . ⎦ . . . . cn−1,0 cn−1,1 cn−1,2 · · · cn−1,m−1 because the m-th row corresponds to the division of tm by g(t) = g0 + g1 t + · · · + gm−1 tm−1 + tm

(4.40)

344

Reliability and Linearity of Circuits

which immediately implies that [g0 , g1 , · · · , gm−1 ] = [cm,0 , cm,1 , · · · , cm,m−1 ] . From the results presented above we know that if m satisfies (4.36), then we can find a polynomial g(t) that satisfies (4.35). For a brute force search we need to generate all the 2m polynomials g(t) = g0 + g1 t + · · · + gm−1 tm−1 + tm . Note that there is a large reduction in complexity compared to searching the linear transform in the matrix form (4.38). From the expression (4.40), we see that an exhaustive search requires 2m(n−m) matrices to be checked.

4.3.4. Existence of an Injective Linear Transformation We have shown above that if we have an incompletely defined Boolean function that is defined on a subset S of Fn2 and |S| = k satisfies the condition n < k ≤ 2n we can find a linear transform that is implementable as a polynomial division and reduces the number of variables to ≈ 2 log2 (k) + log2 (n). In this subsection we study this reduction problem for general linear transforms φ : Fn2 → Fm 2 . More precisely we study the existence of a linear transformation φ for variables in X defined in (4.30) with m as small as possible, under the requirement that (4.41) φ : Fn2 → Fm 2 , is injective on S. Thus, we investigate for what range of m such a linear transformation can exist. In matrix notation, the problem can be expressed as the search for an (n × m) matrix Q such that the rows of the matrix Y = XQ are distinct. Obviously, for φ to be injective, we must have 2m > k, and because the identity transform is injective such a transform trivially exists for some m ≤ n.

Linearization of Partially Defined Boolean Functions

345

An upper bound for m in terms of k and n can be derived using an argument similar to the one used for proving the so-called WarshamovGilbert bound in coding theory.   Denote by u1 , u2 , . . . , uL where L = k2 , the difference vectors xj −xi , 1 ≤ i < j ≤ k. Let A be an (n − m) dimensional subspace of Fn2 generated by the columns of the matrix ⎤ ⎡ a1,1 a1,2 · · ·a1,n−m ⎢ a2,1 a2,2 · · ·x2,n−m ⎥ ⎥ ⎢ A=⎢ . . . . ⎥ . ⎣ .. .. . . .. ⎦ an,1 an,2 · · ·an,n−m Let B be the orthogonal complement A⊥ = {v ∈ F2n | va = 0 for all a ∈ A} of A and generated by the rows of a matrix ⎤ ⎡ b1,1 b1,2 · · · b1,n ⎢ b2,1 b2,2 · · · b2,n ⎥ ⎥ ⎢ B=⎢ . .. . . .. ⎥ . ⎣ .. . . ⎦ . bm,1 bm,2 · · ·bm,n Then, u ∈ A because (A⊥ )⊥ = A if and only if Bu = 0. If there is a subspace A0 of Fn2 of the dimension (n − m) such that / A0 , u 2 ∈ / A0 , . . . , u L ∈ / A0 , u1 ∈ then we have an (m × n) matrix B0 such that B0 u1 = 0, B0 u2 = 0, . . . , B0 uL = 0 as required. For n, k, and m, we will study which parameters exist for such a matrix B0 . The number of (n−m) dimensional subspaces is the Gaussian binomial coefficient

 (2n − 1) · · · (2n − 2n−m−1 ) n , (4.42) = n−m n−m 2 (2 − 1) · · · (2n−m − 2n−m−1 )

346

Reliability and Linearity of Circuits

and the number of (n − m) dimensional subspaces containing a fixed vector is

 n−1 . n−m−1 2 It is also easy to see that



 2n − 1 n n−1 = n−m . n−m 2 2 −1 n−m−1 2 Now, there are at most

L



n−1 n−m−1

2

subspaces containing any of u1 , u2 , . . . , uL . So, if





 2n − 1 n n−1 n−1 < = n−m , L n−m 2 n−m−1 2 2 −1 n−m−1 2 or equivalently, L
log2

347

k(k − 1) = log2 (k(k − 1)) − 1 , 2

or m > 2 log2 (k) . We have proved the following: Theorem 4.23. Let S ⊆ Fn2 , |S| = k, and n < k < 2n . If m > 2 log2 (k) there is a linear transform φ : Fn2 → Fm 2 that is injective on S.

4.3.5. Connections to Linear Error-Correcting Codes In this subsection we explore the connections between linear reduction of an incompletely defined Boolean function and linear binary errorcorrecting codes. We show that determining the linear transform of a suitably chosen incompletely defined Boolean function is equivalent to designing an e-error-correcting code. In particular, this means that Theorem 4.23 implies the well-known Gilbert-Warshamov bound. We recall the necessary concepts of the theory of linear (block) codes; more information can be found, e.g., in [203, 373]. A linear binary [n, k]-code of length n is a k-dimensional subspace of Fn2 . The Hamming distance d(x, y) between vectors x and y of Fn2 is equal to the number of component positions where x and y differ, and the Hamming weight w(x) of x is d(x, 0). Thus an e-error-correcting linear code is simply the subspace of Fn2 that contains no non-zero vector of weight ≤ 2e. The connection to linear reduction is the following: by choosing the domain S as the set of all vectors of weight ≤ e the method produces a linear transform φ : Fn2 → Fm 2 , where m ≤ n. Consider the subspace C = {x ∈ Fn2 | φ(x) = 0}. This code has a minimum distance of at least 2e + 1. To see this assume that there is a non-zero code vector with Hamming weight ≤ 2e, i.e., c ∈ C, c = 0 and w(c) ≤ 2e. If

348

Reliability and Linearity of Circuits

0 < w(c) ≤ e, then φ(c) = 0 by the definition of φ. If e < w(c) ≤ 2e we can find a, b such that c = a + b, w(a) ≤ e, w(b) ≤ e, a = b . Because φ is linear and injective for {x | w(x) ≤ e}, we have φ(c) = 0, contradicting the assumption that c is a code vector. Notice that we can (more effectively) produce the required linear transform φ by searching for the polynomial of lowest degree m such that it does not divide any of the polynomials that correspond to non-zero vectors of Fn2 of weight ≤ 2e. This way of producing codes is rather obvious. What makes it interesting is the inequality (4.36) that such polynomials exist for interesting values of m. Example 4.36. Consider the well known (7, 4) Hamming code. It can be described either as the row space of its generator matrix G ⎤ ⎡ 1000111 ⎢0100110⎥ ⎥ G=⎢ ⎣0010101⎦ = [I4 , A] 0001011 or the set (subspace) of vectors orthogonal to the rows of its parity check matrix H ⎡ ⎤ 1110100 H = ⎣1101010⎦ = [AT , I3 ] . 1011001 Now we interpret the vectors of F72 as polynomials, e.g., [1 0 0 0 1 1 1] = 1 + t4 + t5 + t6 and search for a polynomial g(t) that does not divide any of the polynomials corresponding to all non-zero binary vectors of weight ≤ 2 [1, 0, 0, . . . , 0], [0, 1, 0, 0, . . . , 0], . . . , [0, . . . , 0, 0, 1] , [1, 1, 0, . . . , 0], [1, 0, 1, 0, . . . , 0], . . . , [0, . . . , 0, 1, 1] . It can easily be seen that one such polynomial is g(t) = 1 + t + t4 .

Linearization of Partially Defined Boolean Functions

349

The code corresponds to the subspace of polynomials of degree at most 6 that are divisible by g(t) (i.e., those whose remainder is zero when divided by g(t)). Obviously this space is generated by g(t), tg(t), t2 g(t), t4 g(t) . Writing them as rows of a matrix we get the matrix ⎡ ⎤ 1011000 ⎢0101100⎥ ⎢ ⎥ ⎣0010110⎦ 0001011 that generates a code equivalent to the (7, 4) Hamming code. Repeating the above for lengths n = 2l − 1, 4 ≤ l ≤ 8, again produces codes equivalent to (2l −1, l) Hamming codes. Similarly, taking n = 23 and all vectors of weight at most 6 we get the well known binary Golay code. This is in fact obvious because a cyclic code exists for the above parameters and the process necessarily finds it. On the other hand, good cyclic codes only exist for particular choices of parameter values. In the following we discuss the interpretation of Theorem 4.23 in a coding context. Consider the Hamming spheres S(r) of radius r around the 0 vector. Obviously S(2r) consists of all distinct differences of the vectors in S(r). Repeating the argument in the proof of Theorem 4.23 we see that if 2n − 1 |S(2r)| < n−m 2 −1 then we have a linear transform φ : Fn2 → Fm 2 such that φ(x) = 0 for all x ∈ S(2r)\{0}. This implies that the kernel of φ is a linear code with minimum Hamming distance ≥ 2r + 1. We can summarize this as: Corollary. A linear code C ⊆ Fn2 of dimension l and minimum distance 2r + 1 exists if n − l > log2 |S(2r)| .

(4.44)

350

Reliability and Linearity of Circuits

This is the same as Gilbert-Warshamov bound. The linear reduction method is an efficient search method that is very much faster than an exhaustive search but still cannot be considered as a constructive algebraic method to design block codes such as Reed-Muller, Reed-Solomon BCH, Goppa, etc., constructions. It is, however, interesting that the class of linear codes produced by the linear reduction method contains asymptotically very good linear codes, namely codes that approach the Gilbert-Warshamov lower bound. Below we briefly outline the justification for this fact. Noticing again that the differences of vectors in S(r) form S(2r) and repeating the argument leading to inequality (4.36), we see that if n − l > log2 |S(2r)| + log2 (n)

(4.45)

the polynomial search produces a linear r-error-correcting code of dimension l. The asymptotic bounds in coding theory are usually expressed for a fixed rate R = l/n. So large n (4.44) corresponds to R=

1 l ≈ 1 − log2 (S(2r)) n n

and (4.45) corresponds to R=

l 1 1 ≈ 1 − log2 (S(2r)) − log2 (n) . n n n

Thus, asymptotically they are equal. As it is not known if there are any binary codes that exist that are asymptotically better (larger) than those guaranteed by the GilbertWarshamov bound, it seems rather difficult to improve the bound of Theorem 4.23. The connection also works to the other direction. As there are better asymptotic upper bounds on the minimum distance of a binary linear code than the Hamming bound, we can infer that the minimal dimension m of the range space of the linear injective transform is correspondingly larger than the trivial bound m ≥ log2 (k) that corresponds to the Hamming bound.

Linearization of Partially Defined Boolean Functions

351

The linear reduction method (when successful) produces a unique syndrome to all the vectors in S. Thus it can be used to produce a linear code that can be useful when designing codes for other purposes than typical communication applications where the Hamming metric is natural. For instance, the error model is completely different for codes correcting errors that appear in memories, especially for non-binary cases.

Towards Future Technologies

5. Reversible and Quantum Logic 5.1. Procedure for FDD-Based Reversible Synthesis by Levels Suzana Stojković Claudio Moraga

Milena Stanković Radomir S. Stanković

5.1.1. Methods to Synthesize Reversible Circuits A reversible circuit is a cycle free, fan-out free cascade of reversible gates, which behave as bijections. A reversible circuit realizes a reversible function. An irreversible function can be embedded into a reversible one, by properly extending the dimension of the input/output vectors [226]. In practice this means the introduction of additional input and output lines. To speak of ancilla lines when referring to the additional auxiliary inputs has been established. A class of design algorithms for reversible circuits is based on decision diagrams as a data structure to represent the function to be realized. The main idea is the same as in the circuit design for arbitrary functions from decision diagrams, i.e., the nodes of the diagram are replaced by the corresponding circuit modules. In the case of the synthesis of reversible circuits, the nodes are replaced by modules consisting of reversible gates. Therefore, the complexity of the circuit is proportional to the number of nodes in the diagrams.

356

Reversible and Quantum Logic

Another issue is the compatibility between modules to replace the nodes and the decomposition rules used in the construction of the corresponding decision diagrams. Papers [385] and [386] show the methods for reversible synthesis based on Binary Decision Diagrams (BDDs). Modules replacing the nodes can be viewed as the reversible realization of the Shannon decomposition rules, i.e., 2×1 multiplexers. The obtained circuits are characterized by a low number of required reversible gates, but a high number of ancilla lines. In [304] this method has been improved in both the number of gates and ancilla lines by using Kronecker Functional Decision Diagrams (KFDDs), consisting of nodes determined by both Shannon and Davio expansions [274]. Determining the exact minimum KFDD is time consuming, since it requires checking many combinations. Therefore, some heuristics are needed in the assignment of decomposition rules to the nodes per level in the decision diagrams. As in the case of a BDD, the obtained circuits have a small number of gates and a “less large” number of ancilla lines. In the case of Functional Decision Diagrams (FDDs), nodes are defined with respect to the rules of the positive and negative Davio expansion. Functional expressions for these rules resemble functional expressions describing Toffoli gates. This resemblance is stronger than in the case of the decomposition rules in other decision diagrams having Shannon or Shannon-similar nodes. Due to that, as shown in [231, 232, 350, 358], reversible circuits designed from an FDD require, on average, less ancilla lines than circuits derived from a KFDD. The number of ancilla lines in the reversible circuit designed from an FDD can be further reduced by traversing the FDD through levels and changing the order of implementation of the nodes at the same level. Instead of the post-order traversal, the diagram is traversed level by level by starting from the leftmost node at the bottom level, with nodes visited in the order of their appearance at the level. Not all visited nodes are implemented in the first visit; instead, the nodes with a potentially simplified realization are moved to the end of the level list and are implemented when they have been visited for the second time.

FDD-Based Reversible Synthesis by Levels

357

5.1.2. Post-Order FDD-Based Reversible Synthesis The realization of a Boolean function by a reversible circuit defined by any type of decision diagram maps the diagram into a cascade of the reversible gates. The methods proposed in the literature for reversible synthesis based on different types of decision diagrams, can be uniformly interpreted as the so-called post-order traversal of the diagram and the replacement of each non-terminal node by a reversible module consisting of Toffoli gates. Toffoli gates transform the set of input signals (x, y1 , y2 , ..., yn ) into the set of output signals (x ⊕ y1 y2 ...yn , y1 , y2 , ..., yn ). As can be seen, the Toffoli gate contains n + 1 input and output lines. Only the signal from one input line is transformed and this line is the target line of the gate. Signals from the other lines are transferred to the corresponding output lines. These input lines are named as control lines. Special cases of the Toffoli gates are: • the Toffoli gate without control lines: this gate generates the complement of the input signal and it is known as the NOT gate; and • the Toffoli gate with one control line: this gate realizes the XOR operation and it is known as the CNOT gate. Graphical symbols of the Toffoli gates with a different number of control lines are shown in Figure 5.1. yn−1 .. .

1⊕x

x (a)

y x

y x⊕y (b)

.. .

y2 y1 x

yn−1 .. . y2 y1 x ⊕ y1 y2 . . . yn−1

(c)

Figure 5.1. Reversible gates: (a) NOT gate, (b) CNOT gate, and (c) general Toffoli gate.

358

Reversible and Quantum Logic

In the realizations of the decision diagram nodes, NOT, CNOT and Toffoli gates with two control lines are used. The measures of complexity of the generated network are: • the total number of the lines in the network; and • the Quantum Cost (QC) that is computed as the sum of the QC of all gates in the network. The QC of both a NOT and a CNOT gate is equal to 1 and the QC of a Toffoli gate with two control lines is equal to 5. In general, the realization of each node from a decision diagram introduces an additional line that is used as the target line for the gates of the module realizing the node. On that new line the output signal of the node is generated. It follows that the number of ancilla lines is proportional to the number of nodes in the diagram. In the method presented in this section, Functional Decision Diagrams (FDDs) are used for the representation of Boolean functions to be realized. The reason is that the QC of positive and negative Davio nodes used in an FDD are smaller than the QC of the Shannon nodes used in a BDD and in a KFDD together with Davio nodes. Due to that, circuits produced from an FDD on average have a smaller cost than circuits produced from a BDD or a KFDD, although the number of nodes is not necessarily smaller [350]. Further improvements can be achieved when the number of nodes in an FDD is reduced by selecting between the positive and negative Davio nodes, i.e., by using Fixed-Polarity Functional Decision Diagrams (FPFDDs). Recall that finding an exact minimum FPFDD requires 2n checks, compared to 3n checks for a KFDD. Expansion rules assigned to the nodes of each level of the FPFDD are specified by a polarity vector. The i-th element of this vector 0 or 1 specifies that the positive or negative Davio rule, respectively, is assigned to the nodes of the i-th level in the diagram. Table 5.1 shows the decomposition rules used in BDDs and FPFDDs and the corresponding reversible modules of Toffoli gates.

FDD-Based Reversible Synthesis by Levels

359

Table 5.1. Realization of BDD and FDD nodes by Toffoli gates

diagram decomposition rule BDD Shannon node f = xk f0 ⊕ xk f1

f xk f0

FDD positive Davio node f = f0 ⊕ xk f2

FDD negative Davio node f = f 1 ⊕ xk f 2

realization by Toffoli gates

node

S

f f0 f1 xk

11

xk f1

0 f0 f1 xk

f f0 f2 xk

6

xk f2

0 f0 f2 xk

1 f1

0 f1 f2 xk

f f1 f2 xk

6

f 1 f0

pD

f xk f2

QC

nD

The fact that the node in a decision diagram is always realized with an additional line is the main disadvantage of the design from decision diagrams. This additional line is introduced to forward an input to the next level in the circuit, otherwise the input will be transformed and therefore lost. However, in the cases when a co-factor or an input variable is not later used, it is possible to escape this additional line. Tables 5.2 and 5.3 summarize cases which allow a simplified realization of positive and negative Davio nodes. Note that it is possible to transform the line of the input variable xk if the associated edge points to the constant 1 and begins at the last node at the level. Similarly, in the case when several nodes share the same sub-tree, it is possible to transform the line of this co-factor if the node is the right-most one and its 1-edge points to the shared sub-tree. The synthesis of an n-variable multi-output function using Toffoli gates, based on the post-order traversal of the FDD, is described in

360

Reversible and Quantum Logic

Table 5.2. Realization of positive Davio nodes by Toffoli gates node in FDD

realization by Toffoli gates

simplified realization

condition for simplification

f pD

1 0

xk

xk

f

1 xk

f xk

0 f0 xk

f f0 xk

0 f2 xk

f2 xk

1 f2 xk

f2 xk

0 f0 f2 xk

f f0 f2 xk

0 f0 xk

f0 xk

1 f pD

1 1

xk

pD

xk 1

f pD

1 0

xk f2

f pD

1 1

xk f2

f 1 f0

pD

xk f2

f 1 f0

f

the last usage of line xk

f0 xk

f xk

the last usage of line f0

f0 f

the last usage of line xk

f f2 xk

the last usage of line f0

1 f

1 f0

xk

pD

xk f0

f0 xk

or

f

f

f

f0 f2 xk

FDD-Based Reversible Synthesis by Levels

361

Table 5.3. Realization of negative Davio nodes by Toffoli gates node in FDD

realization by Toffoli gates

simplified realization

condition for simplification

f xk

nD

1 0

1

xk

f

1 xk

f xk

xk

f

the last usage of line xk

0 f1 xk

f

f xk

the last usage of line f1

f1 xk

f1 xk

f1 f

the last usage of line xk

0 f2 xk

f f2 xk

1 f2 xk

f f2 xk

0 f1

f1

the last usage of line f1

f2 xk

f2 xk

f f2 xk

0 f1 xk

f f1 xk

f xk

nD

1 1

1 f xk

nD

1 f1

1 f xk f2

nD

1 0 f

xk f2

nD

1 1 f

xk f2

nD

1 f1 f

xk f1

nD

1 f1

f

f1 xk

f1 f2 xk

or

362

Reversible and Quantum Logic

Algorithm 10. CreateToffoliNetworkPOS(n, FDD, polarity) Post-order procedure for FDD based Toffoli network synthesis Input: n: number of variables, Input: FDD: description of the function to be synthesized Input: polarity: vector of the node polarities per levels in the FDD Output: realization of the given function by a Toffoli network 1: foreach i in [1, n] do

initialization 2: lines[i]=CreateLine() 3: if polarity[i]=1 then 4: AddNotGate(line[i]) 5: end if 6: end foreach 7: foreach node in FDD.rootNodes do

node implementation 8: CreateNodeRealization(node, polarity) 9: end foreach

Algorithm 10, CreateToffoliNetworkPOS. The function to be realized is specified by a shared fixed-polarity FDD. The parameter polarity is a polarity vector of the FDD. Algorithm 10 uses the auxiliary Algorithm 11 CreateNodeRealization that recursively realizes the sub-tree starting from a given root node. Algorithm 11. CreateNodeRealization(root, polarity) Input: root: root node of the sub-tree to be realized Input: polarity: vector of the node polarities per levels in the FDD Output: realized network 1: if root.left is not terminal and root.left is not implemented then 2: CreateNodeRealization(root.left) 3: end if 4: if root.right is terminal and root.right is not implemented then 5: CreateNodeRealization(root.right) 6: end if 7: if exist optimal implementation of root and condition of optimal implementation is satisfied then 8: CreateOptimalImplementation(root, polarity) 9: else 10: CreateGeneralImplementation(root, polarity) 11: end if

FDD-Based Reversible Synthesis by Levels

363

f F pD x1

1 D pD x2

E pD

x1

1

x2 1

C pD

B pD x3

f

0 A pD

A

B

C

D

E

F

x4

1 (a)

x4

x3

1 1

x3

x2

1

(b)

Figure 5.2. Realization of the 4mod5 benchmark: (a) zero-polarity FDD, (b) associated reversible circuit using Toffoli gates.

Example 5.37. Figure 5.2 shows a zero-polarity FDD for the benchmark function 4mod5, the truth-vector of which is F = [1000010000100001]T and the reversible circuit with Toffoli gates realizing it. Figure 5.3 shows a fixed-polarity FDD of the same function that produces the minimal number of lines in the generated reversible circuit, and the corresponding reversible circuit.

5.1.3. FDD-Based Reversible Synthesis by Levels Algorithm 10 assumes the post-order traversal of the FDD. This determines the order of the implemented nodes by the requirement that a parent node has to be implemented after its successors. It can be observed that the order of the implemented nodes strongly affects the complexity of the produced network. This property is illustrated by the following example: Example 5.38. Figure 5.4 (a) shows a part of an FDD which should be implemented using Toffoli gates. The post-order traversal of the

364

Reversible and Quantum Logic

f F pD x1

1 D pD x2

E pD

x1

1

x2 1

C nD x3

1

x3

x2

B nD

x4

x3

1

f

0 A pD

0

B

C

D

E

F

x4

1 (a)

A

(b)

1

Figure 5.3. Optimal realization of the 4mod5 benchmark: (a) optimal FPFDD, (b) associated reversible circuit using Toffoli gates.

diagram requires that node A is visited before node B. If nodes are implemented in this order, node A must be implemented by the general method, because the output line of node C is required for the implementation of node B. The produced network is shown in Figure 5.4 (b). If node B is implemented before node A, the line fC is further not required. Therefore, node A can be realized by the simplified module. The network synthesized in this way is shown in Figure 5.4 (c). It can be seen that the circuit of Figure 5.4 (c) requires one line and one

fB B pD

fA

1

xj

A pD xi 1

f2

xi

xi

xj

xj

f1

f1

f2

fB

fC

f1

(a)

fB fA

fA

0 C pD fC

f2 fC

A (b)

B

B

A

(c)

Figure 5.4. Utilization of a shared successor: (a) part of an FDD containing a shared successor, (b) direct realization, and (c) optimized realization.

FDD-Based Reversible Synthesis by Levels

365

gate less than the basic circuit of Figure 5.4 (b). From Example 5.38 it is obvious that the complexity of a reversible circuit derived from an FDD depends on the order in which the nodes are implemented. An experimental analysis of the structure of FDDs for a set of benchmark functions has shown that the lengths of most edges in an FDD are equal to 1. Such edges connect nodes at successive levels. For example, in the zero-polarity FDDs of the benchmark functions which are used in the experiments in this section, 74.18% of edges have a length of 1. Because of that, if the nodes are implemented by levels, starting from level n (nearest to the terminal level) to level 1 (the root node), the complexity of the generated network can be reduced by reordering the implementation of nodes belonging to the same level. It remains an open task to determine an algorithm that finds an optimal order of implementation of nodes of a level. An FDD-based Algorithm for the synthesis of a reversible network of Toffoli gates by levels needs a linked list of the node of each level. Our new procedure for traversing the FDD will be based on the assumption that if there is a solution for a simplified realization of a node but, if the condition for such realization is not satisfied, such node is moved to the end of the linked list of the level and will be visited later on again. The implementation of the current node is postponed after visiting all nodes of the level. In this way, the probability that the condition for the simplified implementation will be satisfied is increased. Our procedure visits the nodes stored in the linked list of a level and a node is implemented if: 1. there is no simplified implementation of the corresponding node type; or 2. the condition for its simplified implementation is satisfied; or 3. it is the second visit of the node. Otherwise, the node is moved to the end of the list and it will be visited later on again. The procedure CreateToffoliNetworkSBL realizes the synthesis of a reversible network of Toffoli gates by levels; this procedure is formally described in Algorithm 12.

366

Reversible and Quantum Logic

Algorithm 12. CreateToffoliNetworkSBL(n,FDD,polarity) FDD-based Toffoli network synthesis by levels Input: n: number of variables, Input: FDD: description of the function to be synthesized Input: polarity: vector of the node polarities per levels in the FDD Output: realized Toffoli network 1: foreach i in [1, n] do

initialization 2: lines[i]=CreateLine() 3: if polarity[i]=1 then 4: AddNotGate(line[i]) 5: end if 6: end foreach 7: foreach level in FDD.levels from the bottom to the top do 8: foreach node in level.list do 9: if it exists a simplified implementation of the node and the condition of a simplified implementation is satisfied then 10: CreateOptimalImplementation(node, polarity) 11: else 12: if it does not exist a simplified implementation of the node or it is the second visit of the node then 13: CreateGeneralImplementation(node, polarity) 14: else 15: move the node at the end of the level.list 16: end if 17: end if 18: end foreach 19: end foreach

Example 5.39. Consider the realizations of the benchmark function rd53 by the post-order synthesis algorithm and by the synthesis algorithm by levels. The zero-polarity FDD of this function is shown in Figure 5.5 where the output function f3 is arranged before the function f2 for clarity, but in the post-order algorithm function f2 is realized before function f3 . When the post-order algorithm is used, the nodes are implemented in the order they are created: 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, and 14. The generated network contains 5 ancilla lines. The total quantum cost of the network is 46 (it contains 8 gates of 3 lines for

FDD-Based Reversible Synthesis by Levels

f1

f3

9: pD 1

x1

x2

8: pD 1

x2

x3

7: pD 1

x3

x4

6: pD 1

5: pD

14: pD 1

4: pD 1

13: pD 1

3: pD

1

367

f2

x1

12: pD 1

11: pD 1

x2 10: pD 1

x1 x2

x3 x4

1

2: pD x5

1

0

1

Figure 5.5. Zero-polarity FDD for the benchmark function rd53.

which the QC is 5, and 6 gates of 2 lines for which the QC is equal to 1). That realization is shown in Figure 5.6. x1 x2 x3 x4 x5 f3

0 0

f1

0 0

f2

0 2

3

4

5

6

7

8

9

10

11

12

13

14

Figure 5.6. Circuit rd53 synthesized using the post-order algorithm.

If the algorithm for the FDD-based synthesis by levels is used, the implementation of the nodes is done level by level starting from the

368

Reversible and Quantum Logic

last level: Level 5: Node 2 is implemented. Level 4: Nodes 3 and 6 are implemented (there is no optimal implementation for node 3, but node 6 is implemented in the optimal way). Level 3: There are 3 nodes labeled by 4, 7, and 10. For node 4 an optimal implementation does not exist, but for nodes 7 and 10 the conditions for optimal implementation are satisfied in the first visits of these nodes. Level 2: The list of nodes of level 2 contains the nodes 5, 8, 11, and 13. In the first visit of the nodes, node 11 is not implemented because its optimal implementation should transform the output signal from node 10, but that signal is needed for the implementation of node 13. Because of that, node 11 is moved to the end of the list, and after the implementation of node 13, it is implemented in the optimal manner. Level 1: The initial order of nodes of level 1 is 9, 12, and 14. Node 12 cannot be implemented in the optimal manner in the first visit. Hence, this node is moved to the end of the list and in the second visit it is implemented without an ancilla line. If the algorithm by levels is used, the generated network contains only 3 ancilla lines. The quantum cost of the network is 44. It contains 2 lines and 2 gates less than the network that is produced by the post-

x1 x2 x3 x4 x5

f2

0

f3

0 f1

0 2

3

6

4

7

10

5

8

13

11

9

14

12

Figure 5.7. Circuit rd53 synthesized by mapping of levels.

FDD-Based Reversible Synthesis by Levels

369

order algorithm. The optimal realization of the benchmark circuit rd53 by this algorithm is shown in Figure 5.7.

5.1.4. Experimental Results We performed some experiments for FDD-based synthesis by the proposed procedure by levels (Algorithm 12) and compared the complexity of the generated networks with the complexity of the networks that are generated by the procedure that uses the post-order implementation of the nodes (Algorithm 10). The experiments are done for zero-polarity FDDs (FDDs in which the positive Davio decomposition is used on all levels) the results of which are shown in Table 5.4 and for FPFDDs with optimal polarity (polarity that provides the minimal number of lines in the implementation) the results of which are shown in Table 5.5. The optimal polarity was found by an exhaustive search (by determining implementations for all polarities and choosing the optimal one) and because of that we could not find an optimal implementation for functions with more than 13 variables. We compared the number of lines (L) of the generated networks and the Quantum Cost (QC) of the networks. The columns dL and dQC contain the reduction ratio of the numbers of lines and quantum cost in percentages, respectively. Table 5.4 shows that 13 (50%) of the 26 benchmark functions explored lead to best results regarding the number of lines and QC when the zero-polarity FDD is used in the realizations by Algorithm 12. There are eight functions (30.77%) with identical realizations by both algorithms, three functions (11.54%) with best realizations by Algorithm 10 (smaller number of lines as well as QC) and two functions (7.69%) for which Algorithm 10 found a smaller number of lines. For four of five functions for which the number of lines was worse when Algorithm 12 was used, Algorithm 12 produced only one line more. Only for the function apex2-101, did Algorithm 12 produce six lines more, but it is only 0.13% more. On average, the proposed procedure produced 5.37% less lines than Algorithm 10.

370

Reversible and Quantum Logic

Table 5.4. Reversible networks using zero-polarity FDDs function 4mod5 alu apex2 bw cordic cycle10_2 decod24 e64 ex5p ham15 ham7 hwb5 hwb6 hwb7 hwb8 mini-alu mod5-adder plus127-mod8192 plus63-mod4096 rd53 rd73 rd84 seq spla sym6 average

in/out 4/1 5/1 39/3 5/28 23/2 12/12 2/4 65/65 8/63 15/15 7/7 5/5 6/6 7/7 8/8 4/2 6/6 13/13 12/12 5/3 7/3 8/4 31/45 16/16 6/1

Algorithm 10

Algorithm 12

reduction

L

QC

L

QC

dL dQC

5 9 6116 87 49 107 7 1504 244 42 18 30 55 101 169 10 25 24 22 10 14 28 1375 762 10

19 28 49266 718 432 562 23 8495 1860 188 74 236 476 969 1646 45 174 98 86 46 80 125 12309 5996 69

5 9 6124 85 45 97 7 1504 237 43 19 29 53 101 170 8 22 24 22 8 10 16 1357 732 11

19 28 49274 716 428 552 23 8495 1853 188 74 235 474 969 1647 43 171 98 86 44 76 113 12291 5966 70

0 0 0 0 -0.13 -0.02 2.3 0.28 8.16 0.93 9.35 1.78 0 0 0 0 2.87 0.38 -2.38 0 -5.56 0 3.33 0.42 3.64 0.42 0 0 -0.59 -0.06 20 4.44 12 1.72 0 0 0 0 20 4.35 28.57 5 42.86 9.6 1.31 0.15 3.94 0.5 -10 -1.45 5.37

1.09

In the case of the optimal FPFDD, Algorithm 12 produced better results than in the case of a zero-polarity FDD. In this case, 20 functions are studied and for 10 functions (50%) Algorithm 12 reported the best number of lines and QC. For five functions (25%) both algorithms produced identical implementations, for one function Algorithm 12 produced a better number of lines, but a worse QC, for one function a worse number of lines but better QC, for one function a better number of lines and the same QC and for two functions equal numbers of lines and worse QC. Notice that in the experiments performed our primary goal was reducing the number of lines. If we

FDD-Based Reversible Synthesis by Levels

371

Table 5.5. Reversible networks using optimal FPFDDs function

in/out

4mod5 4/1 alu 5/1 bw 5/28 cycle10_2 12/12 decod24 2/4 ex5p 8/63 ham7 7/7 hwb5 5/5 hwb6 6/6 hwb7 7/7 hwb8 8/8 mini-alu 4/2 mod5-adder 6/6 plus127-mod8192 13/13 plus63-mod4096 12/12 rd53 5/3 rd73 7/3 rd84 8/4 sym6 6/1 sym9 9/1 average

Algorithm 10

Algorithm 12

L

QC

L

QC

5 8 82 107 7 242 17 26 41 66 101 9 23 24 22 10 14 28 10 12

18 29 627 562 23 1840 103 203 385 690 1128 46 152 73 66 46 80 125 69 106

5 8 79 97 7 234 16 21 40 68 93 8 22 24 22 8 10 16 10 12

18 30 595 552 23 1811 85 202 408 693 1095 43 152 73 66 44 76 113 74 106

reduction dL dQC 0 0 3.66 9.35 0 3.31 5.88 19.23 2.44 -3.03 7.92 11.11 4.35 0 0 20 28.57 42.86 0 0

0 -3.45 5.1 1.78 0 1.58 17.48 0.49 -5.97 -0.43 2.93 6.52 0 0 0 4.35 5 9.6 -7.25 0

7.78

1.89

analyze only that parameter, Algorithm 12 produces: in 12 of 20 cases a smaller number of lines, seven cases both algorithms produce the same number of lines and only in one case an Algorithm 10 produces a small number of lines. On average, Algorithm 12 reduces the number of lines by 7.78%. The above experimental results confirm that the complexity of a reversible circuit derived from a decision diagram in terms of both the quantum cost and the number of lines depends on the number of nodes and the complexity of the reversible modules realizing the nodes in the diagram. Functional decision diagrams offer a possibility for simplification of the modules for the realization of Davio nodes. Due to that, the overall complexity of the produced reversible circuits is smaller.

372

Reversible and Quantum Logic

5.2. Distributed Evolutionary Design of Reversible Circuits Fatima Zohra Hadjam

Claudio Moraga

5.2.1. Extending RIMEP2 to DRIMEP2 A distributed hierarchical evolutionary system, named DRIMEP2, for the design of reversible circuits was successfully introduced earlier. In this contribution we extend the concept of the distributed evolutionary design algorithm, enlarging DRIMEP2 to a family of distributed systems including the hierarchical model, the Islands Model, and two hybrid architectures: • one comprising a hierarchical model with islands at the lower level; and • another consisting of islands of hierarchical models. A set of 17 selected 4-bit reversible benchmarks has been evolved under similar parameter environments for the four studied systems. For each benchmark, 100 independent runs were realized and statistics such as average quantum cost, average successful runs, and total execution time were considered in the comparison. The results show that in most cases the straight hierarchical model and the hierarchical model with islands of workers are the best in terms of Quantum Cost (QC), although all four distributed DRIMEP2 systems obtained a close performance.

5.2.2. Background Bennett showed in [25] that in order to limit power dissipation, it is necessary that all computations are performed in a reversible way. This may be considered as the beginning of new research towards what

Distributed Evolutionary Design of Reversible Circuits

373

today is called Reversible Computing. The design of reversible circuits is only one step towards the construction of a reversible computer, or reversible dedicated special hardware. A reversible circuit consists of a cascade of reversible gates. Both a reversible circuit and a reversible gate have the following features: the vectors of inputs and outputs are one-to-one mapped; thus, the vector of input states can always be reconstructed from the vector of output states. Reversible circuits realize bijections. They should be cycle free (i.e., no feedback loops are allowed) and fan-out free (i.e., every gate output can be used only once: either as a circuit output or as an input to another single gate). Some comprehensive reviews about reversible circuits as well as synthesis approaches are presented in [260, 387]. Some recent optimization results may be found in [99]. RIMEP2 is a linear genetic programming-based system [138]. It consists of a population of individuals each representing a potential solution for the problem being evolved, which in the present study is a reversible circuit. Each individual is structured as a cascade of reversible gates. Usually, the parameters such as the size of the population, the length of the chromosome and the number of generations are quite complex to fix since they are problem dependent. Several approaches can be found in the literature to tune these kinds of parameters. One can cite trial-and-error [380], parameter tuning [301] and parameter control [102]. A systematic tuning approach for RIMEP to solve reversible design problems based on the method Design Of Experiments (DOE) was reported in [137]. When evolving a given circuit, besides finding a correct solution (matching the specification of the circuit), RIMEP2 aims to minimize the QC and the gate count of the circuit. Increasing the population size may certainly increase the probability of finding an optimal solution but on the negative side, it may lead to a blind search. A strong increment of the number of generations to allow a long search will slow down the system. Limiting the population size and the number of generations may lead to a premature convergence and the algorithm will become trapped in a plateau. The various constraints imposed by the reversibility, as mentioned above, and the very large problem space make the design of circuits very difficult. RIMEP2 as an evolutionary approach has already obtained quite satisfying results by evolving reversible circuits from

374

Reversible and Quantum Logic

scratch without relying on databases, or a hash-table of partial solutions, working with a library of gates comprising NOT, CNOT and mixed polarity controlled Toffoli and Peres gates. However, to enlarge the potential of RIMEP2 when exploring the search space, we developed a distributed version of RIMEP2, called "DRIMEP2".

5.2.3. The Hierarchical Model Based on RIMEP2 The RIMEP2 algorithm is reproduced in one main unit and multiple worker units. Herein, we talk about a two level hierarchy. If the number of levels is higher, then every worker unit may in turn be a sub-main unit of another hierarchy. The nodes at the second level are workers for the upper level and sub-main units for the lower level. The words Hierarchical and Hierarchy refer to the structure in which these units are assembled and to the way they communicate. The proposed architecture differs from the one (often called master-slave found widely in the literature (see, e.g., [68]) where the master is an evolutionary algorithm which stores the population, executes the genetic operations and distributes the individuals fitness calculation upon multiples slaves which in turn send the results back to the master. The novelty of the proposed architecture may be summarized as follows: • every unit is an instance of RIMEP2. The parameter setting may however differ; • the communication is carried out one way from the worker units to the main unit. It is not bi-directional; • the workers evolve isolated from each other (i.e., independently and without information exchange). Their evolution periods, on given areas of the problem search space, are short before moving to other randomly selected areas. This is realized by a total/partial re-initialization of their populations using different randomly generated seeds. In this section we call this event a jump. The goal of these jumps is to allow the workers to play the role of explorers; • the parameter setting may differ from one population to another

Distributed Evolutionary Design of Reversible Circuits

375

such as using only crossover or only mutation or both, using different rates of crossover and mutations or maybe using different types of selection. Different libraries can be assigned to different populations with different random generator seeds to assure the maximum diversity. A library consists of the different gates used as building blocks during the evolution of a reversible circuit. The chromosome encoding is kept fixed in all the units in order to avoid pre/post- processing during the migration event (see below); • in addition to RIMEP2 parameters, we mention four important parameters: – Who is migrating? – Who is/are replaced? – How many times should the migration occur? – How many migrants should migrate (called rate)? These parameters are set experimentally. More details about the whole set of parameters are given in [139].

5.2.4. The Islands Model Based on RIMEP2 The islands model is also called the coarse-grained model, or multideme model [162]. Islands are regarded as sub-populations that together make up the population of the whole islands model. In the present case each island will run a RIMEP2 process. Islands evolve independently for a period of time (called epoch) after which they exchange individuals (solutions) in a process, called migration. If the epoch is constant the migration is said to be synchronous [5, 68]. The islands model behavior follows a migration policy, which comprises: • the migration topology, which is represented by a directed graph. Each node symbolizes an island and each directed edge connects

376

Reversible and Quantum Logic

two islands. Among the most famous topologies, one may cite the ring and torus topologies. In the present case, a ring of RIMEP2 islands is considered; • the migration interval, which is the time interval between two successive migrations. Its reciprocal is called migration frequency. Frequent migrations may lead to stagnation due to an increasing number of similar individuals whereas rare migrations yield isolated islands; • the migration rate, known as migration size, determines how many migrants are moving between neighboring islands. It is proportional to the population size of each island; • who is migrating and who is replaced? The candidates to migrate and the individuals to be replaced may be selected based on the fitness, randomly, or using the same selection operator used within the island. In addition, migrants can be recombined with individuals present in the island before selection.

5.2.5. Hybrid Models The first model, called hybrid 1, consists of a hierarchy of separated islands. The structure of hybrid 1 is shown in Figure 5.8 (a), where ‘R’ stands for the representative of each group of islands. The workers are grouped in a set of mini-island models with a unidirectional ring topology. The groups are working independently from each other. No exchange is permitted among the groups. Two kinds of migrations may occur: • inside each group following the migration policy of the islands model described above; • from the representative of each group (it should be one of the members of the group) to the main unit. If the number of levels of the hierarchy exceeds two, then the groups will be formed from workers of the lower level. Recall that the role of the

Distributed Evolutionary Design of Reversible Circuits

377

MU R

R

R ···

(a)

MU

(b)

W

W

MU ···

W

W

W

MU ···

W

W

W

···

W

Figure 5.8. Topologies used to design reversible circuits: (a) hybrid 1, (b) hybrid 2.

workers is to explore and by this type of hybridization one aims to accelerate the exploitation inside the group taking advantage of the cooperation inside the islands model. The second model, called hybrid 2, consists of multiple parallel hierarchies wherein the main units constitute an islands model. See Figure 5.8 (b), where an island model at the upper level utilizes a hierarchical model at the lower level. Each hierarchical model is working in isolation from the others. The main units exchange individuals in a migration process every predefined period of time.

5.2.6. Experiments and Interpretation of the Results Parameter Setting. In order to make a fair comparison among four different DRIMEP2 architectures the following main boundary conditions were fixed: • a common number of processing units (with an eventual tolerance of 1); and • a common number of individuals for every processing unit in order to achieve similar processing times per unit.

378

Reversible and Quantum Logic

The transmission time required for the migration of individuals from one unit to another is, in this context, irrelevant. Besides the parameters related to the sequential RIMEP2 which were tuned following previous experiments [137], all the processing units use the same encoding and length for the chromosome. The number of individuals, and the total size of the populations is kept close to 480. Additional parameters for the proposed hierarchical models and islands model are considered, keeping an equivalent parameter environment for all of them. For example, the amount and the time interval of transferred migrants from one population to another were preserved. The number of transferred migrants was fixed to 11 or 12 depending on the used topology. The migrant candidates are selected according to their fitness (the best) and will replace first the duplicated individuals and then the worst individuals at the receiving unit. This is applied to the four models. All the used hierarchies are built on two levels (i.e., one main unit and n workers). Furthermore, all the used islands models follow a unidirectional ring. Except for the RIMEP2-islands model, the partial re-initialization, called jump, which was considered as key for exploring in parallel multiple areas of the search space, was fixed to 50%, i.e., 50% of the individuals of a population are replaced by generated ones using different seeds of the digital generator of random numbers.

Benchmarks. To test and compare the four models, 17 relevant 4bit reversible benchmarks have been newly selected from [357, 394]. Their compact reversible specifications are given in Table 5.6. The 4-bit binary strings of the output truth tables are encoded in decimal, for example, 13 indicates (1101). Input strings are in lexicographic order. It should be recalled that the design of 4-bit reversible circuits (without ancilla lines) is presently considered to be a very complex problem since the problem space is (24 )! ≈ 20.9 · 1012 functions large. See further details in Section 4.1 of [260].

Experiments and Results. All the benchmarks were evolved under the same parameter setting. The experiments ran on a cluster of computers (LiDOng cluster of the University TU Dortmund, Germany). See [192] for more details about the total number of nodes, the CPU

Distributed Evolutionary Design of Reversible Circuits

379

Table 5.6. Benchmark specifications from [357, 394] name 4b15g_1 4b15g_2 4b15g_3 4b15g_4 4b15g_5 App2.2 App2.3 App2.8 App2.11 nth_prime_4 msaee 4_49_hwb4 4_49 oc6 oc8 f1 f2

specification [1,5,0,8,9,11,2,15,3,12,4,6,10,14,13,7] [1,9,0,4,10,8,2,11,3,15,5,12,7,14,13,6] [3,1,7,13,11,0,8,15,2,5,10,6,9,14,12,4] [3,1,11,7,8,0,9,5,2,6,15,13,14,4,10,12] [3,5,11,1,8,0,9,7,2,6,14,13,10,4,12,15] [7,14,9,6,11,0,13,2,5,15,10,12,1,4,3,8] [10,15,0,7,14,9,6,1,13,12,5,3,11,8,4,2] [12,15,5,8,3,2,1,10,7,14,13,6,11,0,9,4] [8,9,10,2,4,7,6,5,0,15,13,3,12,14,1,11] [0,2,3,5,7,11,13,1,4,6,8,9,10,12,14,15] [11,3,9,2,7,13,15,14,8,1,4,10,0,12,6,5] [15,2,3,12,5,9,1,11,0,10,14,6,4,8,7,13] [15,1,12,3,5,6,8,7,0,10,13,9,2,4,14,11] [9,0,2,15,11,6,7,8,14,3,4,13,5,1,12,10] [11,3,9,2,7,13,15,14,8,1,4,10,0,12,6,5] [0,1,2,3,15,10,11,13,9,12,5,4,14,8,6,7] [0,1,2,3,15,10,11,14,8,7,4,13,6,9,5,12]

clock rate and other features. Each processing unit is a processor in the cluster. In order to compare the four models, some statistics were collected during the runs. We distinguish: • the number of successful runs: 100 independent runs have been performed for each benchmark. A run is said to be successful if the evolved reversible circuit matches, totally, the specification of the target reversible circuit (a perfect fit with 0-errors); • the best quantum cost among 100 independent runs: As mentioned before, the Quantum Cost (QC) is well explained in [138]. The QC consists of the number of elementary reversible gates constituting the whole reversible circuit. Each elementary gate has a QC of 1, see [22, 212]; • the average quantum cost: The average is calculated based on successful runs only (from 100 runs, we consider only the cases with 0-errors); • the execution time over 100 runs: It is calculated in seconds. The algorithm ends when the maximum number of generations is reached. It has been experimentally set to 100,000 (the time

380

Reversible and Quantum Logic

Table 5.7. Average quantum cost within 100 runs benchmark

sequential

hierarchical

islands

hybrid 2

hybrid 1

4b15g_1 4b15g_2 4b15g_3 4b15g_4 4b15g_5 App2.2 App2.3 App2.8 App2.11 nth_prime_4 msaee 4_49_hwb4 4_49 oc6 oc8 f1 f2

54.19 40.62 39.25 47.04 39.50 50.04 28.76 64.00 25.98 37.38 54.39 37.40 36.36 54.27 44.79 25.29 38.65

52.64 40.79 36.25 49.09 38.77 50.66 25.69 55.38 26.26 33.37 51.89 32.13 33.25 55.05 50.29 24.10 28.00

53.12 38.59 36.25 47.05 37.06 54.80 26.90 55.25 25.56 33.30 47.33 30.08 32.70 56.10 45.95 24.75 28.33

54.27 40.38 35.10 50.66 37.70 51.11 25.94 54.93 26.03 33.28 48.90 31.55 32.82 53.66 51.91 24.27 32.00

51.64 39.18 36.27 50.27 37.13 49.41 26.41 54.08 25.81 32.70 43.69 31.70 34.88 54.39 50.31 23.58 30.00

average

42.23

40.21

39.60

40.27

39.50

needed for DRIMEP2 to find at least one solution for each benchmark). The results are given in Tables 5.7 and 5.8. The parameters of [137] allowed the sequential RIMEP2 to find optimal solutions (even the best in the case of, e.g., the benchmark 4b15g_4, see Table 5.7), and avoiding quick premature convergence due to the high population size fixed to the value of 480. But the negative side was the excessive execution time (a factor of 60) compared to the rest of the models. The hierarchical and islands models were then selected. On one hand and according to the last row of Table 5.7 (values in bold) and in terms of the overall quality of the solutions (average quantum cost over 100 of independent runs), the model of islands was the winner, even if the difference is small due, probably, to the fact that parameter setting was strong enough to permit DRIMEP2 (under different models) to find optimal solutions. To benefit from the strengths of both hierarchical and islands models, we combined them in two hybrid models, hybrid 1 and hybrid 2, described in Figures 5.8 (a) and (b), respectively: • hybrid 1: In order to accelerate the process of exploitation for

Distributed Evolutionary Design of Reversible Circuits

381

Table 5.8. The total execution time (in seconds) for 100 independent runs benchmark 4b15g_1 4b15g_2 4b15g_3 4b15g_4 4b15g_5 App2.2 App2.3 App2.8 App2.11 nth_prime_4 msaee 4_49_hwb4 4_49 oc6 oc8 f1 f2 average

sequential

hierarchical

islands

hybrid 2

hybrid 1

43360 44180 43820 42210 44620 43070 43550 43780 43490 43090 43730 40791 43629 42131 43361 42792 40006

689 687 774 610 777 707 696 771 727 710 742 625 766 694 704 685 423

712 699 775 609 824 716 740 780 775 750 769 612 791 699 732 733 420

686 681 773 603 787 700 711 767 745 718 753 626 769 694 701 708 422

711 707 780 593 833 715 703 769 726 719 779 629 778 692 706 703 426

43035.88

693.35

713.88

696.71

704.06

the explorers (workers), we grouped the workers attached to the same main unit, in rings of islands (4 rings of 3 islands each). The results were significant. We noticed a little improvement in terms of average QC (see the last row of Table 5.7 in the column corresponding to hybrid 1). At the same time equivalent values were found for the best QC and the number of successful runs in comparison with the hierarchical model, but with a slight increase in the execution time due, probably, to the communication time. Nevertheless, on average it was still lower than the execution time of the islands model (see Table 5.8); • hybrid 2: Although the execution time decreased, we did not notice a significant improvement in the performance. We concluded that it is probably due to the reduced number (equal to 3) of workers attached to each main unit. Recall that this topology has been set according to the restriction that all the models should have the same number of processing units (with a tolerance of one). Another measure of the quality of the distributed evolutionary design

382

Reversible and Quantum Logic

Table 5.9. Successful runs over 100 runs benchmark 4b15g_1 4b15g_2 4b15g_3 4b15g_4 4b15g_5 App2.2 App2.3 App2.8 App2.11 nth_prime_4 msaee 4_49_hwb4 4_49 oc6 oc8 f1 f2 average

sequential

hierarchical

islands

hybrid 2

hybrid 1

84 39 8 48 6 91 58 94 95 48 31 30 25 97 28 100 23

92 39 8 45 30 100 88 90 76 38 45 15 20 97 66 100 2

85 32 8 55 34 96 59 85 80 20 24 13 23 88 37 100 3

95 32 10 59 27 100 87 89 74 32 29 11 17 98 65 100 2

97 34 11 64 32 98 82 93 80 27 39 20 16 94 51 100 4

53.24

55.94

49.53

54.53

55.41

(over the 17 benchmarks) is shown in Table 5.9, which summarizes the number of successful runs out of 100 runs. A run is considered successful when at least one evolved circuit satisfies the specification, i.e., it is error free. It may also be seen here that the performance is very problem dependent: for example for the benchmark f1 all runs were successful, meanwhile for the benchmark f2 just a few were successful. We have, additionally, compared the best QC of the 17 benchmarks reversible circuits evolved by DRIMEP2 using the four proposed models with the best QC corresponding to recently published results (as of December 2014) found in [357, 394]. The maximum improvement was equal to 51.16% with an average of 9.22%, see Table 5.10.

5.2.7. Relevant Features In this section we have investigated four different distributed models of RIMEP2 applied to a set of 17 reversible circuit benchmarks chosen from [357, 394]. The results show that in most cases the hierarchi-

Distributed Evolutionary Design of Reversible Circuits

383

Table 5.10. Performance of DRIMEP2 versus [357, 394] best QC

benchmark 4b15g_1 4b15g_2 4b15g_3 4b15g_4 4b15g_5 App2.2 App2.3 App2.8 App2.11 nth_prime_4 msaee 4_49_hwb4 4_49 oc6 oc8 f1 f2 average

improvement

[357, 394]

DRIMEP2

in %

39 31 33 35 31 35 43 53 26 26 34 26 28 37 35 22 25

35 30 30 33 30 35 21 34 23 27 34 25 26 34 34 21 24

10.26 3.23 9.09 5.71 3.23 0.00 51.16 35.85 11.54 -3.85 0.00 3.85 7.14 8.11 2.86 4.55 4.00 9.22

cal model is the best in terms of best quantum cost and execution time over 100 runs. Possibly due to the constraints of homogeneous resources, all four distributed models have closely related average performances. The performance of all four systems is highly benchmark dependent. The results also show that the sequential RIMEP2, with a large population, was able to evolve optimal results, but obviously at the cost of a high execution time.

384

Reversible and Quantum Logic

5.3. Towards Classification of Reversible Functions Paweł Kerntopf Krzysztof Podlaski

Claudio Moraga Radomir S. Stanković

5.3.1. Reversible Boolean Functions A Boolean function is called reversible if it is a bijective mapping from {0, 1}n onto {0, 1}n , i.e., a permutation on {0, 1}n . A reversible function is by definition a multi-output function and therefore can also be treated as a vector of standard Boolean functions called component functions. Similarly, a gate (circuit) is called reversible if it realizes a one-to-one correspondence between its inputs and outputs. Reversible functions are considered in many fields of computer science and engineering. Research on reversible circuit synthesis is motivated mainly by advances in quantum computing, nanotechnology and low-power design [89, 260] and has been intensively studied recently. Although during the last 15 years the field of reversible circuit synthesis has been studied by many researchers very few publications have been devoted to the classification of reversible functions. We consider whether all component functions of a reversible function of any number of variables can either belong to the same equivalence class in some classifications or can have the same property in the sense of classical switching theory. This problem has a direct relationship to studying different aspects of the classification of reversible functions. In traditional logic synthesis different classifications of non-reversible Boolean functions have found many applications [88, 318, 366]. However, only a few papers have been published as attempts to deal with classifications of reversible functions. Paper [195] deals with an approach to the enumeration of equivalence classes of reversible functions where the equivalence relation was defined as follows: Denote by G and H the groups of permutations acting on the inputs and outputs

Towards Classification of Reversible Functions

385

of reversible Boolean functions, respectively. For an input n-tuple x, two functions f1 (x) and f2 (x) are equivalent if there is a g ∈ G and an h ∈ H such that f1 (x) = h(f2 (g(x))). The paper includes a list of all NPNP-equivalence classes of 3-variable reversible functions and additionally a classification based on properties of the inverses of the classes’ representatives. However, this paper had not been known to the community of researchers in reversible logic and so it was cited only by researchers interested in combinatorial mathematics and cryptography. The task of constructing classifications for reversible functions in the context of reversible circuit synthesis was formulated in [253] but no solutions were proposed. Let us cite from [253]: “There seems to be little work extending NPN classes to multiple-output functions, let alone to reversible logic functions. This is interesting, and begs the question as to why this is so. Certainly further investigation into the literature is warranted, and consideration as to how NPN classification, as used in traditional Boolean function logic synthesis, could be extended to reversible logic synthesis would also be interesting an avenue to pursue”. The proposed approaches for solving the classification problem are formulated from the logic designer’s point of view. Here is their brief summary: 1. are there classes of reversible functions which are easier to synthesize? 2. prepare a classification of non-reversible Boolean functions to see which can be realized most easily using reversible gates; 3. if one is using Decision Diagrams (DDs) to store transformed functions, then it would be useful to know which reversible functions are likely to be reached; 4. formal definitions of unateness, duality, and symmetry for reversible (and hence multi-output) functions would be useful to examine, and could lead to synthesis techniques for these special cases. Recently, some aspects of classifications for reversible functions have again been considered in the literature. In the context of studying

386

Reversible and Quantum Logic

the complexity of reversible circuits [302] the same list of all Negation and Permutation of variables, Negation and Permutation of component functions (NPNP)-equivalence classes of 3-variable reversible functions as in [195] was presented. However, representatives of equivalence classes were expressed in the form of permutations, thus not convenient for considering individual component functions. The need for a classification of reversible functions has been expressed in [303], however, as an open problem to be considered in the future. A new type of classification of reversible Boolean functions was introduced in [93] based on the minimal number of nonlinear gates needed in implementations of these functions. A structure of closed classes of reversible functions has been described in [1]. New enumeration results for the number of equivalence classes of reversible functions under action of permutation of the inputs and outputs on the domain and range are presented in [69]. However, in the last three cited papers, properties of component functions of reversible Boolean functions were not considered. If some functions belong to the same equivalence class in some classifications or can have the same property in the sense of classical logic synthesis we call them homogeneous. We considered the following problem: is there an extension of a Boolean function f : {0, 1}n → {0, 1} to a reversible function F : {0, 1}n → {0, 1}n , possible, such that all output component functions fi : {0, 1}n → {0, 1}, 1 ≤ i ≤ n , of F have a homogeneous property? The reason for initiating such research was that if the answer is positive, then new classes of reversible functions may be obtained. An attempt to generalize the classical paper [366] on the classification of Boolean functions is presented here (among others, such properties as nontrivial self-duality, self-complementarity, and linearity with respect to some variables were considered as component functions of

Towards Classification of Reversible Functions

387

reversible functions). Namely, we studied reversible functions with component functions which have the same known property or belong to the same equivalence class in some classifications. Our results show that for n ≥ 3 there exist reversible Boolean functions that have all component functions that are both non-degenerate and: 1. P-equivalent; 2. NP-equivalent; 3. NPN-equivalent; 4. unate; 5. nonlinear; 6. self-dual; 7. self-complementary. Similarly, our results show that for n ≥ 3 there are no reversible Boolean functions that have all component functions that are both non-degenerate and: 1. symmetric; 2. linear; 3. affine; 4. monotone increasing; 5. monotone decreasing; 6. majority; 7. threshold. We expect that these theoretical results will be helpful in further investigations of the problems mentioned above and will lead to new classification results which would be interesting from the point of view of reversible circuit synthesis.

388

Reversible and Quantum Logic

5.3.2. Preliminaries Below, the basic definitions and known results related to the subject of this section are provided for the convenience of readers. Let us first briefly remind ourselves of the fundamental notions related to standard and reversible Boolean functions. Any Boolean function f : {0, 1}n → {0, 1} can be described using an Exclusive-OR Sum Of Products (ESOP) expression. In ESOPs each variable may appear in both uncomplemented and complemented forms. The Positive Polarity Reed-Muller (PPRM) expression is an ESOP expression which uses only uncomplemented variables. It is a canonical expression and can be easily generated from a truth table or any other representation. Definition 5.41 (Non-Degenerate Function). A Boolean function depending essentially on all its variables is called non-degenerate, otherwise it is called degenerate. Definition 5.42 (Balanced Function). A Boolean function f : {0, 1}n → {0, 1} is called balanced if it takes the value 1 the same number of times as the value 0. Definition 5.43 (Linear and Affine Function). A Boolean function f is linear with respect to a variable xi if it can be expressed in the form f = xi ⊕ g, where ⊕ denotes the XOR operation and g is a function independent of xi (then the variable xi is called linear in f ). A function has property Linear Variable (LV) if it contains at least one linear variable. A function f is called affine if and only if each variable xi is either linear in f or f does not depend on xi , i.e., f (x1 , x2 , . . . , xn ) = a0 ⊕ a1 x1 ⊕ a2 x2 ⊕ · · · ⊕ an xn where a0 , a1 , a2 , . . . , an ∈ {0, 1}. If a0 = 0 then it is called linear. Any affine function which is not linear can be obtained by negating an appropriate linear function. A Boolean function which is not affine is called nonlinear. Definition 5.44 (Totally Symmetric Function). A Boolean function is (totally) symmetric if any permutation of all its variables does not change the function.

Towards Classification of Reversible Functions

389

There are 2n+1 symmetric Boolean functions of n variables. Definition 5.45 (Partially Symmetric Function). If any permutation of a proper subset S of the variables of cardinality of at least 2 does not change the function f , then f is called a partially symmetric function. If S = {xi , xj }, then f is symmetric with respect to xi and xj . Example 5.40. There are 70 balanced Boolean functions on three arguments, including degenerate ones. Only four of them are totally symmetric, namely parity and majority functions and their negations: parity

x⊕y⊕z

1⊕x⊕y⊕z ,

majority

xy ⊕ xz ⊕ yz

1 ⊕ xy ⊕ xz ⊕ yz .

Eight of the balanced functions, including degenerate ones, are partially symmetric with respect to each variable pair, e.g., the functions: x⊕y , xy ⊕ z , x ⊕ y ⊕ xy ⊕ z , x ⊕ y ⊕ xy ⊕ xz ⊕ yz ,

1⊕x⊕y , 1 ⊕ xy ⊕ z , 1 ⊕ x ⊕ y ⊕ xy ⊕ z , 1 ⊕ x ⊕ ⊕y ⊕ xy ⊕ xz ⊕ yz ,

are partially symmetric with respect to x and y. Let us define an order relation in the set {0, 1} in the usual way: 0 < 1 and a partial order relation in the set {0, 1}n : for any two vectors a = (a1 , a2 , . . . , an ), b = (b1 , b2 , . . . , bn ) in {0, 1}n a ≤ b if and only if ai ≤ bi for 1 ≤ i ≤ n. Definition 5.46 (Monotone Increasing or Decreasing Function). A Boolean function f is monotone increasing if and only if a ≤ b implies f (a) ≤ f (b) which will simply be called a monotone function. By changing the inequalities into inverse ones we obtain a definition of a monotone decreasing function. The following subset of monotone functions plays an important role in designing digital devices and error-correcting codes. Definition 5.47 (Majority Function). A Boolean function f on an odd number of arguments is called a majority function if and only if f = 1 when more than half of the arguments have the value 1.

390

Reversible and Quantum Logic

Definition 5.48 (Unate Function). A Boolean function f (x) is called unate (or mixed monotone) if and only if it is a constant or there exists its SOP representation using either uncomplemented or complemented literals for each variable. Definition 5.49 (Threshold Function). A Boolean function f (x) of n variables xi is called a threshold (or linearly separable) function if and only if there exist real numbers a1 , a2 , . . . , an , and b such that 0 n 1 if i=1 ai xi ≥ b f (x1 , x2 , . . . , xn ) = . n 0 if i=1 ai xi < b It is well known (e.g., see [271]) that the following result holds: Lemma 5.1. Every threshold function is a monotone increasing or decreasing. Definition 5.50 (P-, NP-, or NPN-Equivalent Functions). Two Boolean functions are: 1. P-equivalent if they can be transformed to each other by a permutation of variables; 2. NP-equivalent if they can be transformed to each other by a negation and/or permutation of variables; 3. NPN-equivalent if they can be transformed to each other by one or more of the following transformations: negation of variables, permutation of variables and negation of the function. Definition 5.51 (Self-Complementary Function). A Boolean function f is self-complementary if f and f are NP-equivalent, where f denotes the negation of f . Definition 5.52 (Self-Dual Function). A Boolean function f is selfdual if f (x1 , x2 , . . . , xn ) = f (x1 , x2 , . . . , xn ) . The following three results are well-known (e.g., see [366]):

Towards Classification of Reversible Functions

391

Lemma 5.2. 1. All self-complementary functions are balanced; 2. All self-dual functions are self-complementary; 3. All functions that have the property LV are self-complementary. Definition 5.53 (Reversible or Irreversible Function). A mapping F : {0, 1}n → {0, 1}n is called an n × n reversible function if it is bijective. An n × n reversible function F can also be considered as a vector of component functions fi : {0, 1}n → {0, 1} , 1 ≤ i ≤ n , which are defined at every x ∈ {0, 1}n by F (x) = (f1 (x), . . . , fn (x)). In the truth table of a reversible n × n Boolean function there are n input columns and n output columns. The output rows of such a truth table form a permutation of the input rows. From the bijectivity of reversible functions it follows that all component functions have to be balanced Boolean functions. Functions which are not reversible are called irreversible. In the literature on reversible logic there is no commonly accepted term with the same meaning as component functions which we prefer. In [306] the term components is used in a similar way, meanwhile in all the literature on cryptography (see e.g., [70]) the term coordinate functions is used. Definition 5.54 (NPNP-Equivalent Function). Two reversible Boolean functions are NPNP-equivalent if they can be transformed to each other by one or more of the following transformations: negation of variables, permutation of variables, negation of component functions, and permutation of component functions.

5.3.3. Homogeneous Component Functions We begin with the following general result.

392

Reversible and Quantum Logic

Theorem 5.24. If F (x1 , x2 , . . . , xn ) = (f1 , f2 , . . . , fn ) is an n×n reversible (irreversible) Boolean function, then the function obtained from F by any of the following transformations: negation of variables, permutation of variables, negation of a component function, permutation of component functions, is also reversible (irreversible). Proof. It is sufficient to notice that any of the above transformations corresponds to a permutation of rows in the truth table, i.e., preserves the property of bijectivity of reversible functions. Corollary 5.7. n × n functions belonging to an NPNP-equivalence class are either all reversible or none of them are. There are constraints on using totally and partially symmetric functions as component functions of an n × n reversible function F (x1 , x2 , . . . , xn ) = (f1 , f2 , . . . , fn ) . Let subsets of variables that don’t change the values of the component functions f1 , f2 , . . . , fn (see Definition 5.45) by permuting these variables be denoted by S1 , S2 , . . . , Sn , respectively. Theorem 5.25. A necessary condition for an n × n function F (x1 , x2 , . . . , xn ) = (f1 , f2 , . . . , fn ) . to be reversible is as follows: the intersection of all sets S1 , S2 , . . . , Sn , has to be equal to the partition of the set {x1 , x2 , . . . , xn } into 1element subsets. Proof. Let us assume that two variables xi and xj belong to the intersection of the sets S1 , S2 , . . . , Sn , i.e., to all these sets. It is equivalent to the equation: F (x1 , . . . , xi−1 , 0, xi+1 , . . . , xj−1 , 1, xj+1 , . . . , xn ) = F (x1 , . . . , xi−1 , 1, xi+1 , . . . , xj−1 , 0, xj+1 , . . . , xn ) .

Towards Classification of Reversible Functions

393

However, because any reversible function F is a bijective mapping then F (x1 , . . . , xi−1 , 0, xi+1 , . . . , xj−1 , 1, xj+1 , . . . , xn ) differs from F (x1 , . . . , xi−1 , 1, xi+1 , . . . , xj−1 , 0, xj+1 , . . . , xn ) . Thus, Theorem 5.25 holds. Corollary 5.8. For n > 1, n × n reversible Boolean functions with all symmetric component functions being non-degenerate do not exist. On the other hand, component functions of a reversible function can be totally or partially symmetric if at least two of them are partially symmetric. Example 5.41. It is easy to show that the following 3 × 3 function is reversible: f1 = x1 ⊕ x2 ⊕ x3 ,

f2 = x1 ⊕ x2 ,

f3 = x1 ⊕ x3

where f1 is a totally symmetric function and both f2 and f3 are partially symmetric degenerate functions. The simple generalization of the above reversible function to the case of any n can be defined as follows: f1 =

n ) i=1

xi ,

fk =

n )

xi ,

k ∈ {2, . . . , n} .

i=1,i =k

In some papers algorithms for the synthesis of reversible circuits for (totally) symmetric functions are considered. However, symmetric functions in these papers are first embedded in reversible specifications with (usually many) additional inputs and/or outputs. Now let us consider linear and affine component functions. For any n there is only one non-degenerate linear Boolean function: x1 ⊕ x2 ⊕ · · · ⊕ xn . Therefore by Theorem 5.24 the following result holds:

394

Reversible and Quantum Logic

Theorem 5.26. For n > 1, n × n reversible Boolean functions with all linear or affine component functions being non-degenerate do not exist. However, reversible Boolean functions that have, as their component functions, one non-degenerate linear/affine function and the other functions depending essentially on k < n variables do exist as is shown in Example 5.41. Now let us consider the following property of monotone Boolean functions. Lemma 5.3. Every monotone Boolean function which is balanced, except functions which are P-equivalent to the projection function, cannot take the value 1 for an assignment with weight 1, i.e., with only one non-zero entry.

Proof. Assume that the lemma is not true. Then there exists a monotone Boolean function f , which is not the projection function, and an assignment a = (a1 , a2 , . . . , an ) with weight 1 for which f (a) is not equal to zero. Without loss of generality let a = (1, 0, 0, . . . , 0), i.e., a1 = 1, ai = 0 for 2 ≤ i ≤ n. Because f is monotone so f (b) = 1 for all assignments b = (b1 , b2 , . . . , bn ) with b1 = 1. The number of such assignments is equal to 2n−1 . As the number of all binary assignments is 2n , the number of assignments c = (c1 , c2 , . . . , cn ) not compatible with assignments b, i.e., having c1 = 0, is equal to 2n − 2n−1 = 2n−1 . A balanced Boolean function takes values 0 and 1 the same number of times so f = 0 for all those assignments with c1 = 0. Thus f (a) = 0, f (a) = 1,

∀ a = (0, a2 , . . . , an−1 , an ) , ∀ a = (1, a2 , . . . , an−1 , an ) ,

i.e., f (a) = a1 , so it is the projection function that is in contradiction with the initial assumption. Theorem 5.27. An n×n reversible Boolean function, n ≥ 3, with all component functions being non-degenerate monotone does not exist.

Towards Classification of Reversible Functions

395

Proof. Lemma 5.3 states that for any monotone balanced Boolean function f and any input assignment a with weight 1 f (a) = 0 . Therefore for any n × n reversible Boolean function F , n ≥ 3, with all component functions being monotone and any input assignment a with weight 1 F (a) = F (1, 0, . . . , 0, 0, 0) = (0, 0, . . . , 0, 0, 0) F (a) = F (0, 1, . . . , 0, 0, 0) = (0, 0, . . . , 0, 0, 0) .. . F (a) = F (0, 0, . . . , 0, 0, 1) = (0, 0, . . . , 0, 0, 0)

that contradicts the reversibility constraint as F takes the values (0, 0, . . . , 0, 0, 0) more than once. Thus any n × n Boolean function, n ≥ 3, with all component functions being monotone, is not reversible. By Definitions 5.47 and 5.49, Lemma 5.1, and Theorems 5.24 and 5.27 the following result holds: Corollary 5.9. An n × n reversible Boolean function, n ≥ 3, with all component functions being non-degenerate and (1) majority, (2) threshold, does not exist. The next result follows from properties of one of the most frequently used reversible benchmark functions. The well-known Boolean function Hidden Weighted Bit (HWB) was introduced in [59]. An HWB function of n variables is denoted by HWBn. The author has shown that the size of Binary Decision Diagrams (BDDs) representing this function grows exponentially in n for all variable orderings (see also [48]). The function is defined as follows: Definition 5.55 (HWB Function). For n ≥ 3 HWBn(x1 , x2 , . . . , xn ) = xsum

396

Reversible and Quantum Logic

where sum = x1 + x2 + · · · + xn (+ means arithmetic addition) and HWBn(x1 , x2 , . . . , xn ) = 0 if sum = 0. Thus, if the sum of ones in the variable assignment (a1 , a2 , . . . , an ), where a1 , a2 , . . . , an ∈ {0, 1}, is greater than zero then the function HWBn takes the value of the input bit ai whose index i is indicated by the number of ones in the assignment and if the sum of ones in the variable assignment is equal to 0 then the function takes the value 0. The concept of a reversible analogue for the conventional HWBn function was introduced by I. L. Markov to be used as a difficult benchmark function for reversible circuit synthesis algorithms. Definition 5.56 (Reversible HWB Function). The reversible function HWBn×n(x1 , x2 , . . . , xn ) , n ≥ 3, is defined in such a manner that HWBn×n(a1 , a2 , . . . , an ) is the result of cyclic shifting the assignment (a1 , a2 , . . . , an ) left by its weight, i.e., by the number of ones in the assignment. Lemma 5.4. Each function HWBn×n is reversible. Proof. If for two different assignments the values of the function would be identical, then because of using the shifting in its definition it means that these assignments are identical, which is in contradiction with the initial assumption. The component functions of HWBn×n are interrelated. For example, in the case of HWB3×3(x1 , x2 , x3 ) they are as follows: f1 = HWB3(x3 , x2 , x1 ) , f2 = HWB3(x1 , x3 , x2 ) , f3 = HWB3(x2 , x1 , x3 ) . Notice above that the vector of arguments is gradually shifted cyclically right by one element. It is easy to show the same property for any n. Thus the following result holds [158]:

Towards Classification of Reversible Functions

397

Lemma 5.5. Each component function of the HWBn×n function can be obtained from the conventional HWBn function as a result of a permutation of its variables. From the above lemma the following two corollaries follow directly: Corollary 5.10. All component functions of HWBn×n, n ≥ 3, belong to the same P-equivalence class. Corollary 5.11. For any n ≥ 3 there exist n × n reversible functions where all component functions are P-equivalent (hence they are also NP-equivalent and NPN-equivalent). Theorem 5.28. All component functions of HWBn×n, for n ≥ 3, are nonlinear. Proof. Let us assume that function f3 is affine, i.e., f3 (xn , xn−1 , . . . , x1 ) = a0 ⊕ a1 x1 ⊕ a2 x2 ⊕ · · · ⊕ an xn . Let xn = xn−1 = · · · = x2 = x1 = 0. Then HWBn×n(0, 0, . . . , 0, 0, 0) = (0, 0, . . . , 0, 0, 0) . Hence f3 (0, 0, . . . , 0, 0, 0) = 0 and a0 = 0. Let xn = xn−1 = · · · = x2 = 0 and x1 = 1. Then HWBn×n(0, 0, . . . , 0, 0, 1) = (0, 0, . . . , 0, 1, 0) due to cyclic shifting the input assignment left by one position. Hence f3 (0, 0, . . . 0, 0, 1) = 0 and a1 = 0. In the same manner, it can be shown that ai = 0 for i ≥ 3. For example, if x3 = 1, x2 = 0, x1 = 0, and xi = 0 for all i > 3 then HWBn×n(0, 0, . . . , 0, 1, 0, 0) = (0, 0, . . . , 1, 0, 0, 0) . Hence f3 (0, 0, ..., 0, 1, 0, 0) = 0 and a3 = 0. Similarly, if xn = xn−1 = · · · = x3 = 0, x2 = 1, and x1 = 0 then HWBn×n(0, 0, . . . , 0, 1, 0) = (0, 0, . . . , 1, 0, 0) .

398

Reversible and Quantum Logic

Hence f3 (0, 0, . . . , 0, 1, 0) = 1 and a2 = 1. Thus the PPRM expression reduces to f3 (xn , xn−1 , . . . , x1 ) = x2 . Now consider the assignment: xn = xn−1 = · · · = 0, x3 = 1, x2 = 1, and x1 = 0. Then we have three cases: n=3: n=4:

HWB3×3(1, 1, 0) = (0, 1, 1) , HWB4×4(0, 1, 1, 0) = (1, 0, 0, 1) , and

n>4:

HWBn×n(0, 0, . . . , 1, 1, 0) = (0, 0, . . . , 1, 1, 0, 0, 0) ,

due to cyclic shifting the input assignment left by two positions. Thus in all cases we have f3 = 0, but f3 (0, 0, . . . , 0, 1, 1, 0) = x2 = 1 . Therefore f3 is nonlinear. Hence, by Lemma 5.5, all other component functions of HWBn×n are nonlinear for any n ≥ 3.

Let us introduce simple notions related to Positive Polarity ReedMuller expressions for Boolean functions. The number of literals in a term will be called its rank. Denote by Ti,j the exclusive-or sum of all terms having a rank not smaller than i and not greater than j (Ti,i will denote all terms with rank i). We define the following n × n reversible function which will be called Negation with Preservation of Constants (NPC) and labeled with the size of the reversible function in short as NPCn×n: ⊕ T2,n−1 , f1 = x1 ⊕ x2 ⊕ x3 ⊕ · · · ⊕ xn−2 ⊕ xn−1 f2 = x1 ⊕ x2 ⊕ x3 ⊕ · · · ⊕ xn−2 ⊕ xn ⊕ T2,n−1 , .. . ⊕ x3 ⊕ · · · ⊕ xn−2 ⊕ xn−1 ⊕ xn ⊕ T2,n−1 , fn−1 = x1 fn = x2 ⊕ x3 ⊕ · · · ⊕ xn−2 ⊕ xn−1 ⊕ xn ⊕ T2,n−1 . The component functions of NPCn×n will be called NPCn. The above

Towards Classification of Reversible Functions

399

Table 5.11. Establishing values of the 3-variable function f1 in three steps (1)

(2)

x1

x2

x3

f1

f1

f1

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 1 1 1 1 1 1

0 1 1 1 0 0 0 0

0 1 1 1 0 0 0 1

formulas can be transformed taking into account that: 1.

T1,n = x1 ⊕ x2 ⊕ x3 ⊕ · · · ⊕ xn−2 ⊕ xn−1 ⊕ xn ⊕ T2,n = x1 ∨ x2 ∨ x3 ∨ · · · ∨ xn−1 ∨ xn , this transformation can be easily proved by induction starting from the well-known formula: x1 ∨ x2 = x1 ⊕ x2 ⊕ x1 x2 ; and

2.

fi = T1,n ⊕ xn−i+1 ⊕ x1 x2 . . . xn = (x1 ∨ x2 ∨ x3 ∨ · · · ∨ xn−1 ∨ xn ) ⊕ xn−i+1 ⊕ x1 x2 . . . xn .

Using the above formula we will show by example how the values of the function fi can be calculated. Without loss of generality we will show this for f1 (see also Table 5.11): (1)

step 1: calculate f1

= x1 ∨ x2 ∨ x3 ∨ · · · ∨ xn−1 ∨ xn ;

(2)

step 2: calculate f1 = (x1 ∨ x2 ∨ x3 ∨ · · · ∨ xn−1 ∨ xn ) ⊕ x1 (by negating the lower half of the truth table obtained in step 1); step 3: calculate f1 = (x1 ∨ x2 ∨ x3 ∨ · · · ∨ xn ) ⊕ x1 ⊕ x1 x2 . . . xn (by negating the output value in the last row of the truth table obtained in step 2). Thus f1 has the well-known property of preserving constants: f1 (0, 0, 0) = 0

and

f1 (1, 1, 1) = 1

400

Reversible and Quantum Logic

Table 5.12. Truth table for the function NPC3×3

x1

x2

x3

f1

f2

f3

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 1 1 0 0 0 1

0 1 0 0 1 1 0 1

0 0 1 0 1 0 1 1

as well as negating the input x1 for all other vectors of input values. Similarly (see Table 5.12), the reversible function NPC3 × 3 is preserving constants: NPC3×3(0, 0, 0) = (0, 0, 0)

and NPC3×3(1, 1, 1) = (1, 1, 1)

as well as negating all the other input vectors. This is why we gave this reversible function the name Negation with Preservation of Constants. By analogy with the above example it is easy to show that the following two results hold for any n: Lemma 5.6. Each function NPCn×n is reversible. Lemma 5.7. Each component function of NPCn×n can be obtained from the Boolean function NPCn as a result of a permutation of its variables. Theorem 5.29. All component functions of NPCn×n, n ≥ 3, are: 1. nonlinear; 2. self-complementary; 3. self-dual; 4. P-equivalent; and 5. unate.

Towards Classification of Reversible Functions

401

Proof. 1. The function NPCn is nonlinear because its PPRM contains terms of rank 2 for any n ≥ 3; 2. Without loss of generality we write NPCn(x1 , x2 , . . . , xn ) = (x1 ∨ x2 ∨ · · · ∨ xn ) ⊕ x1 ⊕ x1 x2 . . . xn . On the other hand, by De Morgan’s laws the following two formulas hold: (x1 ∨ x2 ∨ · · · ∨ xn ) = (x1 x2 . . . xn ) = 1 ⊕ x1 x2 . . . xn , x1 x2 . . . xn = (x1 ∨ x2 ∨ · · · ∨ xn ) = 1 ⊕ (x1 ∨ x2 ∨ · · · ∨ xn ) . Thus NPCn(x1 , x2 , . . . , xn ) = = 1 ⊕ [(x1 ∨ x2 ∨ · · · ∨ xn ) ⊕ x1 ⊕ x1 x2 . . . xn ] = 1 ⊕ [(x1 x2 . . . xn ) ⊕ x1 ⊕ (x1 ∨ x2 ∨ · · · ∨ xn )] = 1 ⊕ [1 ⊕ x1 x2 . . . xn ⊕ 1 ⊕ x1 ⊕ 1 ⊕ (x1 ∨ x2 ∨ · · · ∨ xn )] = (x1 ∨ x2 ∨ · · · ∨ xn ) ⊕ x1 ⊕ x1 x2 . . . xn = NPCn(x1 , x2 , . . . , xn ) and by Definition 5.52 any NPCn is self-dual; 3. From Lemma 5.2 it follows that it is self-complementary; 4. The P-equivalence follows from Lemma 5.7; 5. Once again, without loss of generality we write NPCn(x1 , x2 , . . . , xn ) = [(x1 ∨ x2 ∨ · · · ∨ xn ) ⊕ x1 ] ⊕ x1 x2 . . . xn and transform it using well-known formulas a ⊕ b = ab ∨ ab,

402

Reversible and Quantum Logic

a ∧ a = 0, a = a ∨ ab, and De Morgan’s laws: NPCn(x1 , x2 , . . . , xn ) =   = (x1 ∨ x2 ∨ · · · ∨ xn )x1 ∨ (x1 ∨ x2 ∨ · · · ∨ xn )x1 ⊕ x1 x2 . . . xn = [x1 x2 ∨ x1 x3 ∨ · · · ∨ x1 xn ∨ (x1 x2 . . . xn )x1 ] ⊕ x1 x2 . . . xn = [x1 x2 ∨ x1 x3 ∨ · · · ∨ x1 xn ] ⊕ x1 x2 . . . xn = (x1 x2 ∨ x1 x3 ∨ · · · ∨ x1 xn )(x1 x2 . . . xn )∨ (x1 x2 ∨ x1 x3 ∨ · · · ∨ x1 xn ) x1 x2 . . . xn = (x1 x2 ∨ x1 x3 ∨ · · · ∨ x1 xn )(x1 ∨ x2 ∨ · · · ∨ xn )∨ (x1 ∨ x2 )(x1 ∨ x3 ) . . . (x1 ∨ xn ) x1 x2 . . . xn = x 1 x 2 ∨ x 1 x 3 ∨ · · · ∨ x 1 x n ∨ x 1 x 2 . . . xn = x 1 x 2 ∨ x 1 x 2 . . . xn ∨ x 1 x 3 ∨ · · · ∨ x 1 x n ∨ x 1 x 2 . . . xn = x 1 x 2 ∨ x 1 x 3 ∨ · · · ∨ x 1 x n ∨ x 2 . . . xn . Thus in the reduced Sum Of Products (SOP) for NPCn the variable x1 appears only as complemented and all other variables are uncomplemented, i.e., NPCn is unate.

By Lemmas 5.4 and 5.6, Corollary 5.11, and Theorems 5.28, and 5.29 the following result holds: Corollary 5.12. For any n ≥ 3 there exist reversible functions that have all component functions being: 1. nonlinear; 2. self-complementary; 3. self-dual; 4. P-equivalent; and 5. unate.

Towards Classification of Reversible Functions

403

Table 5.13. Construction of a 3 × 3 reversible function (1)

(2)

(3)

x1

x2

x3

f1

f2

f3

f3

f3

f3

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 0 1 1 1 1 0 0

0 1 1 0 0 1 1 0

a b c d d c b a

0 1 1 0 1 0 0 1

1 1 0 0 1 1 0 0

0 0 0 1 0 1 1 1

5.3.4. Motivation for Future Work We start with an example. Example 5.42. Consider three Boolean functions f1 , f2 , f3 of three variables x1 , x2 , x3 . If f1 and f2 are chosen to be specified as f 1 = x1 ⊕ x2

and

f2 = x2 ⊕ x3

then f3 must have the “abcd” structure shown in the column of f3 in Table 5.13 to obtain the bijection required by a reversible function. The functions f1 , f2 and f3 must be balanced. It is easy to notice that there are 16 possible choices for f3 , divided into eight complementary pairs. Some of them are shown in the right part of Table 5.13: (1)

f3 = x1 ⊕ x2 ⊕ x3 (2) f3 = x2 (3) f3 = x1 x2 ⊕ x1 x3 ⊕ x2 x3

(1)

f3 is a linear function, (2) f3 is a projection-negation function, (3) f3 is a majority function.

If f1 and f2 are chosen to be independent component functions, each pair of values of f1 and f2 (i.e., 00, 01, 10, and 11) should not appear more than twice, otherwise they cannot be distinguished by f3 . In this particular example, the distinguishing component function f3 will belong to some of the three classes illustrated above or their comple-

404

Reversible and Quantum Logic

ments. Also, some arguments may be complemented. For instance, (4)

f3

= x1 x2 ⊕ x1 x3 ⊕ x2 x3 = maj(x1 , x2 , x3 ) .

Thus, in future work we plan to study the generalization of the problem considered in this section. Namely, we would like to investigate possibilities of constructing reversible functions with all component functions having different “established” known properties of Boolean functions or belonging to different equivalence classes under the same equivalence relation. In the case of reversible functions with a larger number of arguments there may be component functions that satisfy a distinguishing structure, which, however, do not represent an “established” known property of a Boolean function. Thus, in the case of reversible functions with a larger number of variables one may find component functions that satisfy the above formulated structures.

The Cn F Logic Functions Derived from Cn NOT Gates

405

5.4. The Cn F Logic Functions Derived from Cn NOT Gates Martin Lukac

Claudio Moraga

Michitaka Kameyama

5.4.1. Reconfigurable Reversible Processors The design of reversible and quantum circuits is still an open research topic. Several technologies have been proposed for implementation [119, 188, 200, 226, 362] but, even theoretically, many issues of scalability, architecture and minimization are still to be solved. Similar to the design of quantum circuits, the design of programmable and reconfigurable devices in reversible and quantum technology is in development and still an open research topic [173, 199, 211, 222, 385]. However, the reconfigurable device implemented in reversible and/or quantum logic requires fast switching and potentially a very cheap method to switch between various functions. Each cell in a programmable device is also limited in the number of logic gates that it can contain. Consequently an efficient method of fast reprogramming with limited resources is desirable in order to ensure the most efficient processing. In general, the approach to reconfigurable reversible computing is limited in technology and resources. In particular this means that technologically the reconfiguration is limited to a set of logic elements such as C2 NOT gates, CSWAP or other more complex and Turing-universal sets of gates. From the resource point of view, the limitations one has to consider are how many logic gates and qubits one has available for the possible reconfiguration. Consequently, it is necessary to both find appropriate technology that would allow us to implement reversible and quantum reconfigurable devices efficiently as well as to determine a fast and efficient reconfiguration with very limited resources. In this section we closely look and study some of the properties related to fast switching of functions using one of the possible realizations of

406

Reversible and Quantum Logic

|a! |b! |c!

1

2

V

V

3

4

5

V†

Figure 5.9. C2 NOT gate implemented in the CNOT/CV/CV† .

the Cn NOT quantum gates. We analyze in details how the Cn NOT gates can be transformed into different functions and what is the fastest and least expensive change possible.

5.4.2. Background The C2 NOT gate was originally proposed in [22] as implemented by the CNOT/CV/CV† universal gate set. The realization of this gate is shown in Figure 5.9. The gate can be analyzed by looking more closely at two of its components. Table 5.14 shows the two components constituting the C2 NOT gate: the quantum component and the symmetric function of the controlled part. Consequently, the C2 NOT gate can be expressed as a product of two functions C 2 N OT = fg (V ) · fs (V † ) Table 5.14. Two components of the C2 NOT gate: the quantum part fg and the symmetric function fs of the controlled part a

b

c

fg

fs

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 V0 V1 V0 V1 N OT N OT

0 1 V1 V0 V1 V1 0 1

The Cn F Logic Functions Derived from Cn NOT Gates

407

with fg (·) being a function that contains only a set of single qubit controlled operators that in the Quantum Operator Form (QOF) can be described as follows: fg (·) = xV0 ◦ . . . ◦ xVk

(5.1)

where ◦ is the operation resulting from multiplying two n, n = 0, . . . , k, qubit controlled V/V† gates. The fs (·) function is one V† gate controlled by a symmetric function on all the control variables that can be again written in the Quantum Operator Form (QOF) as follows: fs (·) = (x0 ⊕ . . . ⊕ xk )V .

(5.2)

The notation used is the QOF introduced originally in [198]. It allows us to represent the circuit built using CNOT/CV/CV† in a unified Exclusive-OR Sum Of Products (ESOP) like form. The QOFs use two types of operators that allow us to express commutative (using ⊕) and non-commutative properties (using ◦) of the quantum circuits. For more details the complete set of QOF rules is introduced in [198]. Notice that the C2 NOT gate uses all possible symmetric functions available on two qubits given all possible non-quantum and non-superposed control signals. This means that the available control signals are either single variables (or negated) or CNOT gates. For the purpose of this section we will use the name Complete Symmetric Function (CSF) as a set of functions spanning all possible symmetric functions defined on a set of qubits. Definition 5.57 (Complete Symmetric Function (CSF)). A CSF is a function that is represented by all possible symmetric functions realized on all possible combinations of control variables without negation. Recall that V † ◦ V † = V ◦ V = N OT , N OT † = N OT , √ W = V , √ † W† = V , W ◦W =V , W† ◦ W† = V † .

408

Reversible and Quantum Logic

In a quantum circuit the quantum gates are applied to the input states. When dealing with logic states such as 0 and 1 the application of a unitary transform (a quantum gate) to them will result either in logical states 0 or 1 or in some quantum states. Moreover, a quantum circuit operates on quantum states such as |0! or |1! but the desired logic function is expressed in logic values 0 or 1. The relation is given by the postulate of measurement in quantum mechanics and thus a state |φ! = |0! = α|0! + β|1! with α = 1 and β = 0. A logical state 0 is thus represented by |φ! = |0! and logical state 1 is represented by |φ! = |1!.

5.4.3. C3 NOT Gate and C3 F Functions Using the algorithm proposed in [22] an arbitrary Cn NOT gate can be built given that at each additional control qubit ci the function applied is root-squared one more time. For instance, the C3 NOT gate is shown in Figure √ with the function applied to the target qubit √ 5.10 being W (W † ) = V ( V † ). The truth table of the C3 NOT gate is shown in Table 5.15 showing the functions of the quantum part fg , the symmetric part fs , and the whole function f . Notice that again the C3 NOT gate is using the CSF for its quantum components a, b, c, a ⊕ b, a ⊕ c, b ⊕ c and a ⊕ b ⊕ c. Additionally, the fact that the quantum operator must be

|a! |b! |c! |d!

1

2

3

W

W

W

a W bW c W

4

5

6

7

W†

(a ⊕ b)W

8

9 10 11 12 13

W† †

?

W

W†



(b ⊕ c)W (a ⊕ c)W † (a ⊕ b ⊕ c)W

Figure 5.10. C3 NOT using the Barenco model.



V /V †

The Cn F Logic Functions Derived from Cn NOT Gates

409

Table 5.15. The truth table of the C3 NOT function broken into its quantum part fg , symmetric part fs , and the final function f a

b

c

d

fg

fs

f

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

I I W W W W V V W W V V V V VW VW

I I W† W† W† W† V† V† W† W† V† V† V† V† W W

I I I I I I I I I I I I I I N OT N OT

means that several components of fast reconfiguration can be explored.

C3 F Functions. In [197] it was shown that modified functions can be derived simply by permuting the quantum operators V /V † on their control signals. Thus, from the initial function ab ⊕ c other functions such as ab ⊕ c or ab ⊕ c can be created. So, starting from the C3 NOT gates one can ask what happens if other the rule for functions are used instead of the W/W † . In particular √ creating Cn NOT gates assumes the usage of n−2 NOT gates. For instance, the C2 NOT gate becomes the identity when CV/CV† is replaced by CNOT. To analyze this possibility we represent the gate in the QOF. Starting from C3 NOT we replace CW/CW† by CV/CV† and get: †





C3 NOT = aV ◦ bV ◦ cV ◦ (a ⊕ b)V ◦ (b ⊕ c)V ◦ (a ⊕ c)V ◦ (a ⊕ b ⊕ c)V (5.3)

410

Reversible and Quantum Logic



C3 NOT = abcV ◦ abcV ◦ abcV ◦ abc ◦ abc ◦ abc ◦ abcV ◦ †























abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV =I .

(5.4)

However, the reduction to identity is only inevitable in the C 2 controlled gate. In C 3 controlled gates a different function can be obtained (by principle from [197]) that is not the identity I: †



C3 NOT = aV ◦ bV ◦ cV ◦ (a ⊕ b)V ◦ (b ⊕ c)V ◦ (a ⊕ c)V ◦ (a ⊕ b ⊕ c)V





= abcV ◦ abcV ◦ abcV ◦ abc ◦ abc ◦ abc ◦ abcV ◦ †













abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV †





abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV ◦ abcV = abc ◦ abc ◦ abc ◦ abc .





(5.5)

Thus it can be seen that reducing the control matrix from W/W † to V /V † results either in identity or in functions a ⊕ c, a ⊕ b, or b ⊕ c. A natural extension to the generated alternatives to C2 NOT and C3 NOT gates is a set of reversible functions (gates) called the C2 F and C3 F gates. Interestingly, if we look at the pattern of application of the V /V † or W/W † quantum gates in the C2 NOT and C3 NOT gates respectively, it represents a symmetric function. Thus the question that can be asked is, is it possible to obtain, for instance a majority function? Consequently the main difference with the C2 F and C3 F is that these functions are not constructed using a CSF but rather only a subset of the available symmetric functions. Notice that the Cn NOT gate contains the full set of symmetric functions realized on the control qubits. One can analyze the contribution of these different symmetrically controlled functions to the final function. For instance Equation (5.6) shows the function realized by aV · bV · cV · (a ⊕ b ⊕ c)V



:

The Cn F Logic Functions Derived from Cn NOT Gates

C3 F =aV ◦ bV ◦ cV ◦ (a ⊕ b ⊕ c)V

411

† †

= abcV ◦ abcV ◦ abcV ◦ abc ◦ abc ◦ abc ◦ abcV ◦ †





abcV ◦ abcV ◦ abcV ◦ abcV



= abc ◦ abc ◦ abc ◦ abc .

(5.6)

Function (5.6) is a majority controlled NOT gate. The truth table of this function is shown in Table 5.16. Note that the function realized by this gate is not maj(a, b, c) but rather C3 F = c(ab ⊕ d) ⊕ c(ab ⊕ d) . This is very interesting as the general meaning is that using the majority to control the quantum gates, two Cn−1 NOT gates are created. Naturally the question is to know if this is a scalable observation or how does such an approach work for circuits with more than three control qubits? Table 5.16. Truth table of the Majority controlled NOT gate

a

b

c

d

f

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

I I I I I I N OT N OT I I N OT N OT N OT N OT N OT N OT

412

Reversible and Quantum Logic

5.4.4. Analysis of C4 NOT and C4 F To determine whether the pattern of majority controlled NOT gates repeats for more than four qubits, Figure 5.11 shows the five qubit C4 F gate defined by †

aV bV cV dV · (a ⊕ b ⊕ c ⊕ d)V .

d e 0 0 1 1

0 1 1 0

C 4F I I I I 0 0 0

N OT I N OT I N OT N OT N OT N OT 1 0 1 1 0 0

I I N OT N OT 1 0 0

N OT N OT N OT I N OT N OT N OT I N OT N OT N OT I N OT N OT N OT I 0 1 1 0 c 1 1 0 0 b 1 1 1 1 a

Figure 5.11. Karnaugh-map of the function C 4 F built using simplified symmetric function controls.

The function can be derived using the QOF notation as shown in Equation (5.7): C4 F = aV bV cV dV · (a ⊕ b ⊕ c ⊕ d)V V

V



V

= abcd ◦ abcd ◦ abcd ◦ abcdV ◦ abcd ◦ abcd ◦ abcd ◦ abcd ◦ abcd ◦ abcd ◦ abcd abcd

V†

◦ abcd

V†

V†





V†



◦ abcdV ◦ abcdV ◦ abcdV ◦ abcd ◦ †

◦ abcdV ◦ abcd

V†





◦ abcdV ◦ abcdV ◦ abcdV



= abcd ◦ abcd ◦ abcd ◦ abcd ◦ abcd◦ abcd ◦ abcd ◦ abcd ◦ abcd ◦ abcd .

(5.7)

Notice that the function obtained in this case is   a maj(b, c, d) ⊕ a bc ⊕ cd ⊕ bd . Note that the function introduced in (5.7) is only the control function of the NOT gate and not the resulting logic function.

The Cn F Logic Functions Derived from Cn NOT Gates

413

5.4.5. Generalizations and Remarks The pattern of the simplest category of the Boolean function  n−2 V † V C n F = &n−2 i=0 xi ◦ ⊕i=0 xi can now be generalized by looking on the various components created by the different control function. From the analysis of C 2 F , C 3 F , and C 4 F and from the knowledge related to the properties of the V/V† quantum gates there are four distinctive control functions that can be used to determine the type of functions for C n F with n > 4. Let us express an integer encoded on the k control variables as i = c0 20 + c1 21 + · · · + ck−1 2k−1 with cj = 1, cj = 0. For each integer let |Ci | be the number of cj = 1 for 0 ≤ j ≤ k − 1. The studied functions in this section are composed of two component functions fg (·) and fs (·) (Subsection 5.4.2). Using the introduced notions above, all the components of the function fg (·) can be written using the four control functions shown in Table 5.17. Table 5.17. Significant control functions of the fg (·) components functions in the simplest C n F circuits

|Ci |

V

N OT

V†

I

k=1

|Ci | |Ci |

− |Ci |





|C1i |

|C2i |

− |Ci |



|C1i |

|C2i |

|C3i |

− |Ci |

1

|C2i |

|C3i |

|C4i |

2

|Ci |

|C3i |

|C4i |

6

3

4

k=2 k=3 k=4 k=5 k=6

|Ci | |C1i | 1

+ +

|Ci | |C5i |

|Ci |

5

2

+

Table 5.17 shows the number of terms given to a set of control variables as a result of interference between the various applied quantum gates. Each row represents how many of the total number of control variables

414

Reversible and Quantum Logic

|Ci | are used to generate the function fg (·) indicated by the parameter k. For instance, for k = 2 (the case of the C 2 F function), there will only be two types of terms, V and V · V = N OT . This case is illustrated in Table 5.14. The V terms appear as a result of a single positive V variable control thus abV and ab and the N OT terms are the result V V of interference from ab · ab = ab. Table 5.17 only shows the functions that need to be considered with up to three V /V † gates because any control function with more than three control variables can be decomposed into a function with 1, 2, or 3 control variables. Because the application of the V /V † operators has a result from {V, X, V † , I}, |Ci | mod 4 the possibilities of obtaining a particular term can be effectively expressed using combinations. This is shown in Table 5.17 with up to k = 6 used control variables. Table 5.17 can be generalized by observing the fact that each column represents a mod 4 particular combination of control    For  variables. instance the column V shows that it contains terms |C1i | + |C5i | and    i | as can be expected this will also include terms such as |C9i | , |C 13 and so on and similarly for other columns N OT , V † and I. In particular it can be observed  n  that the V column contains terms with given by such combinations m m = n mod 4 = 1 , the column N OT terms given by m = n mod 4 = 2 , the column V † terms given by m = n mod 4 = 3 , and finally the column V † I terms given by m = n mod 4 = 0 .

The Cn F Logic Functions Derived from Cn NOT Gates

415

Thus, let j4l = j mod 4 = l for j = 0, . . . , |Ci | and l = 0, 1, 2, 3. From Table 5.17 the general expression for the functions obtained by Ci control variables can be summarized as follows:











|Ci | |Ci | |Ci | |Ci | fg = V + N OT + + + 1 j41 2 j42











|Ci | |Ci | |Ci | |Ci | + I . (5.8) V† + + 3 j43 4 j44 The fs (·) components of the C n F functions can be similarly expressed by a simple equation because there is no interference. Thus Equation (5.9) shows the component functions of the fs (·) function. We get odd≤|Ci |

fs =

 j=1



|Ci | V · j †

(5.9)

where j = 1, 3, 5, . . . is the set of all odd numbers j ≤ |Ci | less than or equal to the number of control qubits in a given circuit. Equation (5.9) shows in what terms the V† gates are applied in the circuit. Combining the functions fg and fs one can, for a given number of control lines, determine what control function is created. Notice that because the control function obtained by the single control of variables and by XOR of these variables, only symmetric control functions can appear. Having both the component functions fs (·) and fg (·) in this notation we need to obtain the final function by f = fg (·) · fs (·). However the presented notation only allows us to reason on the number of terms of a particular type of function of the number of controlled variables. Both the functions fg and fs are symmetric and are specified by a combination of variables as shown in (5.8) and (5.9). Consequently any unitary operator controlled by the same permutation can be combined together. This results in a pairing of fs (·) and fg (·) functions in a systematic manner that allows us to predict the function. For instance, let i = 5 that results in







5 5 5 5 † 5 + + N OT +V +I fg (5) = V 1 5 2 3 4

416

and

Reversible and Quantum Logic







5 † 5 † 5 +V +V . fs (5) = V 1 3 5 †

Multiplying these together we obtain the result shown in (5.10): 





5 5 5 5 5 fg (5) · fs (5) = V + + N OT +V† +I · 1 5 2 3 4





5 5 5 +V† +V† . (5.10) V† 1 3 5 Next combine unitary operators using the same control functions from both functions fg (5) and fs (5) as shown in (5.11):





5 5 5 5 5 + ·V† + + N OT + fg (5) · fs (5) = V 1 5 1 5 2





5 5 5 ·V† +I . (5.11) V† 3 3 4 Notice now that several of the unitary operators match each other and thus can be combined according to the rules introduced in Subsection 5.4.2. So, for instance



 5 5 5 5 + = V + ·V† 1 5 1 5







5 5 5 5 +V ·V† =I . V ·V† 1 5 5 1 The result is shown in (5.12):





5 5 fg (5) · fs (5) = N OT + N OT . 2 3

(5.12)

Finally we can expand the minimized terms as shown in (5.13): fg (5) · fs (5) = a(bcde ∨ bcde ∨ bcde ∨ bcde)∨ ab(cde ∨ cde ∨ cde) ∨ ab(cde ∨ cde ∨ cde)∨ a(b(cde ∨ cde ∨ cde) ∨ b(cde ∨ cde) ∨ bcde)∨ a(bcde ∨ bcde ∨ bcde ∨ bcde) .

(5.13)

The Cn F Logic Functions Derived from Cn NOT Gates

fg (5) · fs (5)

d e 0 0 1 1

417

0 N OT N OT N OT N OT I I I I 1 N OT N OT N OT N OT N OT N OT I I 1 N OT N OT N OT N OT I I I I 0 N OT N OT N OT N OT N OT N OT I I 0 1 1 0 0 1 1 0 c 0 0 1 1 1 1 0 0 b 0 0 0 0 1 1 1 1 a Figure 5.12. Function obtained from Equation (5.14).

Moreover because the target variable in (5.13) is e we can merge over e to obtain the final smaller form shown in (5.14): fg (5) · fs (5) = a(bcd ∨ bcd ∨ bcd ∨ bcde)∨ a(bcd ∨ bcd ∨ bcde) ∨ ab(cd ∨ cde) ∨ abcde∨ a(b(cde ∨ cde) ∨ bcde) ∨ abcde .

(5.14)

The function obtained is shown in Figure 5.12. By extending this formalism it is possible to simply express a reversible or quantum function in terms of symmetric control functions. The specification using combinations is very useful as it allows us to directly determine exactly the number of terms where different unitary operators have to be applied. Notice also that the presented work is using gates with a V /V † unitary operator. This selected operator specifies the width of Table 5.17 for instance. In this section we presented a study on the possibilities of generating functions from the template of the Cn NOT gates by removing and replacing some of its components. That is by carefully selecting the symmetric control functions and using particular unitary operators, novel functions can be created from already existing circuit structures. We showed that this approach allows us to generate several reversible functions and that these functions are in general cheaper than the

418

Reversible and Quantum Logic

original Cn NOT gates. Moreover we showed a simple method of how to generate these functions in a formal way using the combinations applied on the control qubit.

Bibliography [1]

S. Aaronson, D. Grier, and L. Schaeffer. “The Classification of Reversible Bit Operations”. In: ArXiv e-prints (Apr. 2015). arXiv: 15 04.05155 [quant-ph]. url: https://arxiv.org/abs/1504.05155.

[2]

Z. J. Acs and D. B. Audretsch. “Innovation in Large and Small Firms: An Empirical Analysis”. In: The American Economic Review 78.4 (Sept. 1988), pp. 678–690. url: http://www.jstor.org/stab le/1811167.

[3]

C. Adams and S. Taveres. “Constructing Symmetric Ciphers Using the CAST Design Procedure”. In: Designs, Codes, and Cryptography 12.3 (Nov. 1997), pp. 283–316. issn: 0925-1022. doi: 10.1023/A:100 8229029587. url: http://dx.doi.org/10.1023/A:1008229029587.

[4]

N. Admaty, S. Litsyn, and O. Keren. “Puncturing, Expurgating and Expanding the q-ary BCH Based Robust Codes”. In: 2012 IEEE 27th Convention of Electrical & Electronics Engineers in Israel. IEEEI 27. Nov. 2012, pp. 1–5. isbn: 978-1-4673-4682-5. doi: 10.11 09/EEEI.2012.6376995.

[5]

E. Alba and J. M. Troya. “A Survey of Parallel Distributed Genetic Algorithms”. In: Complexity 4.4 (Mar. 1999), pp. 31–52. doi: 10.100 2/(SICI)1099-0526(199903/04)4:43.0.CO;2-4.

[6]

C. Albrecht. “IWLS 2005 Benchmarks”. In: Proceedings of the 2005 International Workshop on Logic Synthesis. May 2005. url: http: //iwls.org/iwls2005/benchmarks.html.

[7]

P. Alexandrov. “Discrete Spaces”. In: Matematicheskij Sbornik NS 2 (1937). In Russian, pp. 501–518.

[8]

Altera. Advanced Synthesis Cookbook. San Jose, CA, USA, July 2011. url: https : / / www . altera . com / content / dam / altera - ww w/global/en_US/pdfs/literature/manual/stx_cookbook.pdf.

[9]

Altera. Benchmark Designs for the Quartus University Interface Program (QUIP), Version 1.0. San Jose, CA, USA, June 2005. url: http://www.ecs.umass.edu/ece/labs/vlsicad/ece667/links/qu ip_benchmarks.pdf.

420

Bibliography

[10]

Altera. Synthesis Design Flows Using the Quartus University Interface Program (QUIP). San Jose, CA, USA, Nov. 2009. url: http: //xiaoleicestustc.blogspot.de/2009/11/synthesis-design-fl ows-using-quartus.html.

[11]

R. Anderson and M. Kuhn. “Low Cost Attacks on Tamper Resistant Devices”. In: Security Protocols: 5th International Workshop Paris, France, April 7–9, 1997 Proceedings. Ed. by B. Christianson et al. Berlin, Heidelberg: Springer, 1997, pp. 125–136. isbn: 978-3-54069688-9. doi: 10.1007/BFb0028165. url: https://doi.org/10.10 07/BFb0028165.

[12]

ANSI/EIA-548-1988, Electronic Design Interchange Format, Version 2.0.0. Standard: EDIF. 1988. url: http://www.freestd.us/s oft4/3477683.htm.

[13]

R. L. Ashenhurst. “The Decomposition of Switching Functions”. In: Proceedings of the International Symposium on Theory of Switching. Annals of the Computation Laboratory of Harvard University 29. Part 1. Cambridge, Massachusetts, USA: Harvard University, Apr. 1957, pp. 74–116.

[14]

H. Astola, R. S. Stankovic, and J. T. Astola. “Index Generation Functions Based on Linear and Polynomial Transformations”. In: 2016 IEEE 46th International Symposium on Multiple-Valued Logic. ISMVL. Los Alamitos, CA, USA: IEEE Computer Society, May 2016, pp. 102–106. isbn: 978-1-4673-9490-1. doi: 10.1109/ISMVL.2 016.20.

[15]

J. T. Astola et al. “An Algebraic Approach to Reducing the Number of Variables of Incompletely Defined Discrete Functions”. In: 2016 IEEE 46th International Symposium on Multiple-Valued Logic. ISMVL. Los Alamitos, CA, USA: IEEE Computer Society, May 2016, pp. 107–112. isbn: 978-1-4673-9490-1. doi: 10.1109/ISMVL.2016.1 8.

[16]

J. T. Astola et al. “On Linearization of Partially Defined Boolean Functions and its Applications”. In: Boolean Problems - Proceedings of the 12th International Workshop. Ed. by B. Steinbach. IWSBP 12. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2016, pp. 193–202. isbn: 978-3-86012-540-3.

[17]

J. Bachrach et al. “Chisel: Constructing Hardware in a Scala Embedded Language”. In: Proceedings of the 49th Annual Design Automation Conference. DAC. San Francisco, CA, USA: ACM, 2012, pp. 1216–1225. isbn: 978-1-4503-1199-1. doi: 10.1145/2228360.22 28584. url: http://doi.acm.org/10.1145/2228360.2228584.

Bibliography

421

[18]

A. A. Baduln. Automated Synthesis of Digital Devices. In Russian. Moscow, Russia: Radio i Svyaz, 1981.

[19]

M. Balzer, O. Deussen, and C. Lewerentz. “Voronoi Treemaps for the Visualization of Software Metrics”. In: Proceedings of the 2005 ACM Symposium on Software Visualization. SoftVis. St. Louis, Missouri, USA: ACM, May 2005, pp. 165–172. isbn: 1-59593-073-6. doi: 10.1 145/1056018.1056041. url: http://doi.acm.org/10.1145/10560 18.1056041.

[20]

D. Bañeres, J. Cortadella, and M. Kishinevsky. “A Recursive Paradigm to Solve Boolean Relations”. In: IEEE Transactions on Computers 58.4 (Apr. 2009), pp. 512–527. issn: 0018-9340. doi: 10.110 9/TC.2008.165. url: http://dx.doi.org/10.1109/TC.2008.165.

[21]

H. Bar-El et al. “The Sorcerer’s Apprentice Guide to Fault Attacks”. In: Proceedings of the IEEE 94.2 (Feb. 2006), pp. 370–382. issn: 0018-9219. doi: 10.1109/JPROC.2005.862424.

[22]

A. Barenco et al. “Elementary Gates for Quantum Computation”. In: Physical Review A 52.5 (Nov. 1995), pp. 3457–3467. doi: 10.11 03/PhysRevA.52.3457.

[23]

A. Barenghi et al. “Fault Injection Attacks on Cryptographic Devices: Theory, Practice, and Countermeasures”. In: Proceedings of the IEEE 100.11 (Apr. 2012), pp. 3056–3076. issn: 0018-9219. doi: 10.1109/JPROC.2012.2188769.

[24]

L. Benini and G. De Micheli. “State Assignment for Low Power Dissipation”. In: IEEE Journal of Solid-State Circuits 30.3 (June 1995), pp. 258–268. issn: 0018-9200. doi: 10.1109/4.364440.

[25]

C. H. Bennett. “Logical Reversibility of Computation”. In: IBM Journal of Research and Development 17.6 (Nov. 1973), pp. 525– 532. issn: 0018-8646. doi: 10.1147/rd.176.0525. url: http://dx .doi.org/10.1147/rd.176.0525.

[26]

R. Berghammer. “Column-Wise Extendible Vector Expressions and the Relational Computation of Sets of Sets”. In: Mathematics of Program Construction: 12th International Conference, MPC 2015. Ed. by R. Hinze and J. Voigtländer. Vol. 9129. LNCS. Königswinter, Germany: Springer International Publishing, June 2015, pp. 238–56. isbn: 978-3-319-19797-5. doi: 10.1007/978-3-319-19797-5_12.

422

Bibliography

[27]

R. Berghammer. “On the Use of Relation Algebra and the BDDbased Tool RelView in Formal Algorithm Development”. In: Boolean Problems - Proceedings of the 8th International Workshop. Ed. by B. Steinbach. IWSBP 8. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2008, pp. 153–162. isbn: 978-386012-346-1.

[28]

R. Berghammer and S. Bolus. “On the Use of BDDs for Solving Problems on Simple Games”. In: Boolean Problems - Proceedings of the 9th International Workshop. Ed. by B. Steinbach. IWSBP 9. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2010, pp. 113–124. isbn: 978-3-86012-404-8.

[29]

R. Berghammer and S. Bolus. “ROBDD-Based Computation of Sets of Minimal and Maximal Sets with Applications”. In: Boolean Problems - Proceedings of the 11th International Workshop. Ed. by B. Steinbach. IWSBP 11. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2014, pp. 55–66. isbn: 978-3-86012488-8.

[30]

R. Berghammer and F. Neumann. “RelView – An OBDD-Based Computer Algebra System for Relations”. In: Computer Algebra in Scientific Computing: 8th International Workshop, CASC 2005. Ed. by V. G. Gansha, E. W. Mayr, and E. V. Vorozhtsov. Berlin, Heidelberg: Springer, 2005, pp. 40–51. isbn: 978-3-540-32070-8. doi: 10.1007/11555964_4.

[31]

R. Berghammer, B. Leoniuk, and U. Milanese. “Implementation of Relational Algebra Using Binary Decision Diagrams”. In: Relational Methods in Computer Science: 6th International Conference, RelMiCS 2001 and 1st Workshop of COST Action. Ed. by H. C. M. de Swart. Berlin, Heidelberg: Springer, Oct. 2002, pp. 241–257. isbn: 978-3-540-36280-7. doi: 10.1007/3-540-36280-0_17.

[32]

Berkeley Logic Synthesis and Verification Group. ABC: A System for Sequential Synthesis and Verification. Sept. 2013. url: http : //www.eecs.berkeley.edu/~alanmi/abc/.

[33]

Berkeley Logic Synthesis and Verification Group. Berkeley Logic Interchange Format (BLIF). University of California in Berkeley, 1993. url: http://vlsi.colorado.edu/~vis/blif.ps.

[34]

A. Bernasconi et al. “Bi-Decomposition Using Boolean Relations”. In: 2015 Euromicro Conference on Digital System Design. DSD. Funchal, Madeira, Portugal: IEEE, Oct. 2015, pp. 72–78. isbn: 9781-4673-8035-5. doi: 10.1109/DSD.2015.48.

Bibliography

423

[35]

A. Bernasconi et al. “Logic Minimization and Testability of 2-SPP Networks”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27.7 (July 2008), pp. 1190–1202. doi: 10.1109/TCAD.2008.923072.

[36]

M. Bertol. “Efficient Algorithms of Normalforms for SubstitutionSystems on Free Partially Commutative Monoids”. In German: Effiziente Normalform-Algorithmen für Ersetzungssysteme über frei partiell kommutativen Monoiden. PhD thesis. Universität Stuttgart, 1996.

[37]

P. Bibilo et al. “Automatizations of the Logic Synthesis of CMOS Circuits with Low Power Consumption”. In: Programnaia Ingeniria 8 (2013). In Russian, pp. 35–41.

[38]

A. Biere. Lingeling, Plingeling, PicoSAT and PrecoSAT at SAT Race 2010. 10/1. Linz, Austria: Institute for Formal Models and Verification, Johannes Kepler University, Aug. 2010. url: http://baldur.i ti.uka.de/sat-race-2010/descriptions/solver_1+2+3+6.pdf.

[39]

A. Biere et al., eds. Handbook of Satisfiability. Vol. 185. Frontiers in Artificial Intelligence and Applications. IOS Press, Feb. 2009. isbn: 978-1-58603-929-5.

[40]

E. Biham and A. Shamir. “Differential Fault Analysis of Secret Key Cryptosystems”. In: Advances in Cryptology CRYPTO’97: 17th Annual International Cryptology Conference. Ed. by B. S. Kaliski. Santa Barbara, California, USA: Springer, 1997, pp. 513–525. isbn: 978-3-540-69528-8. doi: 10.1007/BFb0052259.

[41]

R. E. Blahut. Cryptography and Secure Communication. Cambridge University Press, 2014. isbn: 978-1-10701-427-5.

[42]

R. E. Blahut. Theory and Practice of Error Control Codes. AddisonWesley, May 1983. isbn: 978-0-20110-102-7.

[43]

N. Blanc, D. Kroening, and N. Sharygina. “Scoot: A Tool for the Analysis of SystemC Models”. In: Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 200. Ed. by C. R. Ramakrishnan and J. Rehof. Berlin, Heidelberg: Springer, 2008, pp. 467–470. isbn: 978-3-540-78800-3. doi: 10.1007/978-3-540-78800-3_36.

[44]

D. Bochmann, F. Dresig, and B. Steinbach. “A New Decomposition Method for Multilevel Circuit Design”. In: Proceedings of the Conference on European Design Automation. EDAC ’91. Amsterdam, The Netherlands: IEEE Computer Society, 1991, pp. 374–377. isbn: 0-8186-2130-3. doi: 10.1109/EDAC.1991.206428.

424

Bibliography

[45]

D. Bochmann and C. Posthoff. Binary Dynamic System. In German: “Binäre dynamische Systeme”. Munich (Germany), Vienna (Austria): Oldenbourg, 1981. isbn: 978-3-48625-071-8.

[46]

D. Bochmann and B. Steinbach. Logic Design with XBOOLE. In German: Logikentwurf mit XBOOLE. Berlin: Verlag Technik, 1991. isbn: 3-341-01006-8.

[47]

P. Bölau. “A Decomposition Strategy for the Circuit Design Utilizing the Properties of the Functions”. In German: “Eine Dekompositionsstrategie für den Logikentwurf auf der Basis funktionstypischer Eigenschaften”. PhD thesis. Technical University Karl-Marx-Stadt, 1987.

[48]

B. Bollig et al. “On The Complexity of The Hidden Weighted Bit Function for Various BDD Models”. In: RAIRO - Theoretical Informatics and Applications 33.2 (Mar. 1999), pp. 103–115. issn: 09883754. doi: 10.1051/ita:1999108.

[49]

E. Boros, P. L. Hammer, and T. Ibaraki. “Logic Analysis of Data”. In: Encyclopedia of Data Warehousing and Mining. Ed. by J. Wang. 2005. isbn: 978-1-60566-010-3. doi: 10.4018/978-1-59140-557-3 .ch131.

[50]

G. Borowik and T. Łuba. “Fast Algorithm of Attribute Reduction Based on the Complementation of Boolean Function”. In: Advanced Methods and Applications in Computational Intelligence. Ed. by R. Klempous et al. Heidelberg: Springer International Publishing, 2014, pp. 25–41. isbn: 978-3-319-01436-4. doi: 10.1007/978-3-319-0143 6-4_2.

[51]

L. Borucki, G. Schindlbeck, and C. Slayman. “Comparison of Accelerated DRAM Soft Error Rates Measured at Component and System Level”. In: Reliability Physics Symposium, 2008. IRPS 2008. IEEE International. Phoenix, AZ, USA: IEEE, Apr. 2008, pp. 482– 487. isbn: 978-1-4244-2049-0. doi: 10.1109/RELPHY.2008.4558933.

[52]

N. Bourbaki. Elements of Mathematics: General Topology. Massachusetts, Palo Alto, London, Don Mills, Ontario: Addison-Wesley, 1966.

[53]

R. K. Brayton et al. Logic Minimization Algorithms for VLSI Synthesis. Norwell, MA, USA: Kluwer Academic Publishers, 1984. isbn: 0898381649.

Bibliography

425

[54]

M. A. Breuer. “Hardware That Produces Bounded Rather Than Exact Results”. In: Proceedings of the 47th Design Automation Conference. DAC. Anaheim, California, USA: ACM, June 2010, pp. 871– 876. isbn: 978-1-4503-0002-5. doi: 10.1145/1837274.1837493.

[55]

F. Brglez, D. Bryan, and K. Kozminski. “Combinational Profiles of Sequential Benchmark Circuits”. In: IEEE International Symposium on Circuits and Systems. ISCAS. IEEE, May 1989, pp. 1929–1934. doi: 10.1109/ISCAS.1989.100747.

[56]

F. Brglez and H. Fujiwara. “A Neutral Netlist of 10 Combinational Benchmark Circuits and a Target Translator in Fortan”. In: Proceedings of the International Symposium on Circuits and Systems. Vol. 3. ISCAS. IEEE, 1985, pp. 663–698.

[57]

H. Broeders and R. van Leuken. “Extracting Behavior and Dynamically Generated Hierarchy from SystemC Models”. In: 2011 48th ACM/EDAC/IEEE Design Automation Conference. DAC. ACM, June 2011, pp. 357–362.

[58]

D. Bryant. The ISCAS’85 Benchmark Circuits and Netlist Format. 1988. url: http://www.facweb.iitkgp.ernet.in/~isg/TESTING /bench/iscas85.ps.

[59]

R. E. Bryant. “On the Complexity of VLSI Implementations and Graph Representations of Boolean Functions with Application to Integer Multiplication”. In: IEEE Transactions on Computers 40.2 (Feb. 1991), pp. 205–213. issn: 0018-9340. doi: 10.1109/12.73590.

[60]

J. T. Butler. “Bent Function Discovery by Reconfigurable Computers”. In: Proceedings of the 9th International Workshop on Boolean Problems. Ed. by B. Steinbach. IWSBP 9. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2010, pp. 1–12. isbn: 978-3-86012-404-8.

[61]

P. L. Butzer and H. J. Wagner. “Approximation of Walsh Polynomials and the Concept of a Derivative”. In: Proceedings of the 1972 Symposium on Applications of Walsh Functions. Washington, D.C., USA, 1972, pp. 388–392.

426

Bibliography

[62]

P. L. Butzer and H. J. Wagner. “Early Contributions from the Aachen School to Dyadic Walsh Analysis with Applications to Dyadic PDEs and Approximation Theory. A Monograph Based on Articles of the Founding Authors, Reproduced in Full”. In: Dyadic Walsh Analysis from 1924 Onwards, Walsh-Gibbs-Butzer Dyadic Differentiation in Science Volume 1, Foundations. Ed. by R. S. Stanković et al. Atlantis Studies in Mathematics for Engineering and Science. Paris: Atlantis Press, 2015, pp. 161–208. isbn: 978-946239-159-8. doi: 10.2991/978-94-6239-160-4_4.

[63]

CAD Group. ITC’99 Benchmarks (2nd Release). 2009. url: https ://github.com/squillero/itc99-poli.

[64]

L. Cai and D. Gajski. “Transaction Level Modeling: An Overview”. In: First IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. New York, NY, USA: IEEE, Oct. 2003, pp. 19–24. isbn: 1-58113-742-7. doi: 10.1109/CO DESS.2003.1275250.

[65]

Y. Can. “New Operative Methods and Equations to Realize Orthogonal Representations”. In German: Neue Boolesche Orthogonalisierende Operative Methoden und Gleichungen. PhD thesis. Erlangen, Germany: FAU University Press, Aug. 2016. isbn: 978-3944057-69-9.

[66]

Y. Can and G. Fischer. “Boolean Orthogonalizing Combination Methods”. In: Fifth International Conference on Computational Science, Engineering and Information Technology (CCSEIT 2015). Vienna, Austria, 2015, pp. 9–22. isbn: 978-1-921987-39-7. doi: 10.51 21/csit.2015.51102.

[67]

Y. Can and G. Fischer. “Orthogonalizing Boolean Subtraction of Minterms or Ternary-Vectors”. In: Acta Physica Polonica A. Special Issue of the International Conference on Computational and Experimental Science and Engineering (ICCESEN 2014) 128.2-B (2015), pp. 388–391. doi: 10.12693/APhysPolA.128.B-388.

[68]

E. Cantú-Paz. “A Survey of Parallel Genetic Algorithms”. In: Calculateurs Paralleles, Reseaux et Systems Repartis 10.2 (1998), pp. 141– 171.

[69]

M. Carić and M. Živković. “On the Number of Equivalence Classes of Invertible Boolean Functions under Action of Permutation of Variables on Domain and Range”. In: Publications de l’Institut Mathématique 100.114 (2016), pp. 95–99. issn: 350-1302. doi: 10.2298/P IM1614095C.

Bibliography

427

[70]

C. Carlet. “Vectorial Boolean Functions for Cryptography”. In: Boolean Models and Methods in Mathematics, Computer Science, and Engineering. Ed. by Y. Crama and P. L. Hammer. Encyclopedia of Mathematics and Its Applications. Cambridge University Press, 2010, pp. 398–472. isbn: 978-0-5218-4752-0.

[71]

S. T. Chakradhar and A. Raghunathan. “Best-effort Computing: Re-thinking Parallel Software and Hardware”. In: Proceedings of the 47th Design Automation Conference. DAC. Anaheim, CA, USA: ACM, June 2010, pp. 865–870. isbn: 978-1-4503-0002-5. doi: 10.11 45/1837274.1837492.

[72]

D. Cheng and X. Xu. “Bi-Decomposition of Logical Mappings via Semi-Tensor Product of Matrices”. In: Automatica 49.7 (2013), pp. 1979–1985. issn: 0005-1098. doi: 10.1016/j.automatica.20 13.03.013.

[73]

J. Cheng, M. Grossman, and T. McKercher. Professional CUDA C Programming. John Wiley & Sons, 2014. isbn: 978-1-118-73932-7.

[74]

V. Chickermane, J. Lee, and J. H. Patel. “A Comparative Study of Design for Testability Methods Using High-Level and Gate-Level Descriptions”. In: 1992 IEEE/ACM International Conference on Computer-Aided Design. ICCAD. Santa Clara, CA, USA: IEEE, Nov. 1992, pp. 620–624. doi: 10.1109/ICCAD.1992.279302.

[75]

M. Choudhury and K. Mohanram. “Bi-Decomposition of Large Boolean Functions Using Blocking Edge Graphs”. In: 2010 IEEE/ACM International Conference on Computer-Aided Design. ICCAD. San Jose, CA, USA: IEEE Press, 2010, pp. 586–591. isbn: 978-1-42448192-7. doi: 10.1109/ICCAD.2010.5654210.

[76]

V. Ciriani. “Synthesis of SPP Three-Level Logic Networks Using Affine Spaces”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 22.10 (Oct. 2003), pp. 1310–1323. issn: 0278-0070. doi: 10.1109/TCAD.2003.818121.

[77]

V. Ciriani and A. Bernasconi. “2-SPP: a Practical Trade-Off between SP and SPP Synthesis”. In: 5th International Workshop on Boolean Problems. Ed. by B. Steinbach. IWSBP 5. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2002. isbn: 3-86012-180-4.

[78]

J. Cong and K. Minkovich. “Optimality Study of Logic Synthesis for LUT-Based FPGAs”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26.2 (Feb. 2007), pp. 230– 239. issn: 0278-0070. doi: 10.1109/TCAD.2006.887922.

428

Bibliography

[79]

S. A. Cook. “The Complexity of Theorem-Proving Procedures”. In: Proceedings of the Third Annual ACM Symposium on Theory of Computing. STOC. Shaker Heights, Ohio, USA: ACM, 1971, pp. 151–158. doi: 10.1145/800157.805047.

[80]

J. W. Cooley, P. A. W. Lewis, and P. D. Welch. “Historical Notes on the Fast Fourier Transform”. In: IEEE Transactions on Audio and Electroacoustics 15.2 (June 1967), pp. 76–79. issn: 0018-9278. doi: 10.1109/TAU.1967.1161903.

[81]

J. W. Cooley, P. A. W. Lewis, and P. D. Welch. “Historical Notes on the Fast Fourier Transform”. In: Proceedings of the IEEE 55.10 (Oct. 1967), pp. 1675–1677. issn: 0018-9219. doi: 10.1109/PROC.1 967.5959.

[82]

J. W. Cooley and J. W. Tukey. “An Algorithm for the Machine Calculation of Complex Fourier Series”. In: Mathematics of Computation 19.90 (Apr. 1965), pp. 297–301. doi: 10.2307/2003354.

[83]

J. W. Cooley and J. W. Tukey. “On the Origin and Publication of the FFT Paper”. In: Current Contents by ISI, Physical Chemical & Earth Sciences 33.51–52 (Dec. 1993), pp. 8–9.

[84]

F. Corno, M. S. Reorda, and G. Squillero. “RT-Level ITC’99 Benchmarks and First ATPG Results”. In: IEEE Design & Test of Computers 17.3 (July 2000), pp. 44–53. issn: 0740-7475. doi: 10.1109 /54.867894.

[85]

R. Cramer et al. “Detection of Algebraic Manipulation with Applications to Robust Secret Sharing and Fuzzy Extractors”. In: Advances in Cryptology–EUROCRYPT 2008. LNCS 4965. Istanbul, Turkey: Springer, 2008, pp. 471–488. isbn: 978-3-540-78967-3. doi: 10.1007 /978-3-540-78967-3_27.

[86]

H. A. Curtis. A New Approach to the Design of Switching Circuits. Princeton, NY, USA: D. Van Nostrand Company, 1962. isbn: 9780-44201-794-1.

[87]

T. W. Cusick and P. Stănică. Cryptographic Boolean Functions and Applications. San Diego, CA, USA: Academic Press, 2009. isbn: 978-0-12374-890-4.

[88]

S. Dautovic and L. Novak. “A Comment on: Boolean Functions Classification via Fixed Polarity Reed-Muller Form”. In: IEEE Transactions on Computers 55.8 (Aug. 2006), pp. 1067–1069. issn: 00189340. doi: 10.1109/TC.2006.114.

Bibliography

429

[89]

A. De Vos. Reversible Computing: Fundamentals, Quantum Computing, and Applications. Weinheim: Wiley-VCH, 2010. isbn: 9783-527-40992-1. doi: 10.1002/9783527633999.

[90]

D. Debnath and T. Sasao. “Minimization of AND-OR-EXOR ThreeLevel Networks with AND Gate Sharing”. In: IEICE Transactions on Information and Systems E80-D.10 (1997), pp. 1001–1008.

[91]

D. Debnath and T. Sasao. “Multiple–Valued Minimization to Optimize PLAs with Output EXOR Gates”. In: Proceedings of the 29th IEEE International Symposium on Multiple-Valued Logic. ISMVL. Freiburg, Germany: IEEE, May 1999, pp. 99–104. isbn: 0-7695-01613. doi: 10.1109/ISMVL.1999.779702.

[92]

D. Debnath and Z. Vranesic. “A Fast Algorithm for OR-AND-OR Synthesis”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 22.9 (Sept. 2003), pp. 1166–1176. issn: 0278-0070. doi: 10.1109/TCAD.2003.816216.

[93]

T. G. Draper. “Nonlinear Complexity of Boolean Permutations”. PhD thesis. University of Maryland, Department of Mathematics, College Park, Maryland, USA, 2009. url: http://drum.lib.umd.e du/handle/1903/9449.

[94]

R. Drechsler and M. Soeken. “Hardware-Software Co-Visualization: Developing Systems in the Holodeck”. In: 2013 IEEE 16th International Symposium on Design and Diagnostics of Electronic Circuits & Systems. DDECS 16. Karlovy Vary, Czech Republic: IEEE, Apr. 2013, pp. 1–4. isbn: 978-1-4673-6135-4. doi: 10.1109/DDECS.2013 .6549775.

[95]

R. Drechsler and J. Stoppe. “Hardware/Software Co-Visualization on the Electronic System Level Using SystemC”. In: 29th International Conference on VLSI Design and 15th International Conference Embedded Systems. VLSID. IEEE, Jan. 2016, pp. 44–49. doi: 10.1109/VLSID.2016.45.

[96]

F. Dresig et al. Programming with XBOOLE. In German: Programmieren mit XBOOLE. TU Chemnitz, 1992.

[97]

E. Dubrova, D. Miller, and J. Muzio. “AOXMIN: A Three-Level Heuristic AND-OR-XOR Minimizer for Boolean Functions”. In: 3rd International Workshop on the Applications of the Reed-Muller Expansion in Circuit Design. RM 3. Oxford, UK, May 1997, pp. 209– 218.

430

Bibliography

[98]

E. Dubrova and P. Ellervee. “A Fast Algorithm for Three-Level Logic Optimization”. In: Proceedings of the International Workshop on Logic Synthesis (IWLS). Lake Tahoe, CA, USA, Nov. 1999, pp. 251–254.

[99]

G. W. Dueck. “Challenges and Advances in Toffoli Network Optimisation”. In: IET Computers & Digital Techniques 8.4 (May 2014), pp. 172–177. doi: 10.1049/iet-cdt.2013.0055.

[100]

E. Dupraz et al. “Practical LDPC Encoders Robust to Hardware Errors”. In: ICC. IEEE, May 2016, pp. 1–6. doi: 10.1109/ICC.201 6.7511552.

[101]

N. Dutt and U. Irvine. HLSW 92 Benchmarks. 1992.

[102]

A. E. Eiben, R. Hinterding, and Z. Michalewicz. “Parameter Control in Evolutionary Algorithms”. In: IEEE Transactions on Evolutionary Computation 3.2 (July 1999), pp. 124–141. issn: 1089-778X. doi: 10.1109/4235.771166.

[103]

S. Engelberg and O. Keren. “A Comment on the Karpovsky-Taubin Code”. In: IEEE Transactions on Information Theory 57.12 (Dec. 2011), pp. 8007–8010. issn: 0018-9448. doi: 10.1109/TIT.2011.21 62718.

[104]

M. Erné. “Structure- and Counting Formula for Topologies on Finite Sets”. In: Manuscripta Mathematica 11.3 (Sept. 1974). In German: Struktur- und Anzahlformeln für Topologien auf endlichen Mengen, pp. 221–259. issn: 1432-1785. doi: 10.1007/BF01173716.

[105]

H. Esmaeilzadeh et al. “Architecture Support for Disciplined Approximate Programming”. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS. London, UK: ACM, 2012, pp. 301–312. isbn: 978-1-4503-0759-8. doi: 10.1145/2248487.2151 008.

[106]

Espresso. Website for download. url: https://embedded.eecs.be rkeley.edu/pubs/downloads/espresso/index.htm.

[107]

G. Fey et al. “ParSyC: an Efficient SystemC Parser”. In: Workshop on Synthesis And System Integration of Mixed Information technologies. SASIMI. 2004, pp. 148–154. url: www.informatik.uni-b remen.de/agra/doc/work/04sasimi_parsyc.pdf.

Bibliography

431

[108]

D. Fiala et al. “Detection and Correction of Silent Data Corruption for Large-scale High-performance Computing”. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. SC. Salt Lake City, Utah: IEEE Computer Society Press, 2012, pp. 1–12. isbn: 978-1-4673-0804-5.

[109]

P. Fišer. Collection of Digital Design Benchmarks. 2016. url: http ://ddd.fit.cvut.cz/prj/Bench_Examples/.

[110]

P. Fišer and J. Hlavička. “BOOM – A Heuristic Boolean Minimizer”. In: Computing and Informatics 22.1 (2003), pp. 19–51. url: https ://cai.type.sk/content/2003/1/boom-a-heuristic-boolean-m inimizer/1238.pdf.

[111]

P. Fišer and J. Hlavička. “Efficient Minimization Method for Incompletely Defined Boolean Functions”. In: Boolean Problems - Proceedings of the 4th International Workshop. Ed. by B. Steinbach. IWSBP 4. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2000, pp. 91–98. isbn: 3-86012-124-3.

[112]

P. Fišer and J. Schmidt. “A Difficult Example Or a Badly Represented One?” In: Boolean Problems - Proceedings of the 10th International Workshop. Ed. by B. Steinbach. IWSBP 10. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2012, pp. 115–122. isbn: 978-3-86012-438-3.

[113]

P. Fišer and J. Schmidt. “How Much Randomness Makes a Tool Randomized?” In: Proceedings of the 20th International Workshop on Logic and Synthesis. IWLS 20. San Diego, CA, USA, 2011, pp. 136–143.

[114]

P. Fišer and J. Schmidt. “Improving the Iterative Power of Resynthesis”. In: 2012 IEEE 15th International Symposium on Design and Diagnostics of Electronic Circuits Systems. DDECS 15. Tallinn, Estonia: IEEE, Apr. 2012, pp. 30–33. isbn: 978-1-4673-1187-8. doi: 10.1109/DDECS.2012.6219019.

[115]

P. Fišer and J. Schmidt. “Permution Variables to Improve Iterative Re-Synthesis”. In: Recent Progress in the Boolean Domain. Ed. by B. Steinbach. Vol. 1. Boolean Domain. Newcastle upon Tyne, UK: Cambridge Scholars Publishing, Apr. 2014, pp. 213–230. isbn: 9781-4438-5638-6.

[116]

P. Fišer and J. Schmidt. “Small but Nasty Logic Synthesis Examples”. In: Boolean Problems - Proceedings of the 8th International Workshop. Ed. by B. Steinbach. IWSBP 8. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2008, pp. 183– 190. isbn: 978-3-86012-346-1.

432

Bibliography

[117]

P. Fišer and J. Schmidt. “The Observed Role of Structure in Logic Synthesis Examples”. In: Proceedings of the International Workshop on Logic and Synthesis 2009. Berkeley, CA, USA, 2009, pp. 210– 213.

[118]

P. Fišer, J. Schmidt, and J. Balcárek. “Sources of Bias in EDA Tools and Its Influence”. In: Proceedings of 17th IEEE Symposium on Design and Diagnostics of Electronic Systems. DDECS 17. Warsaw, Poland, Apr. 2014, pp. 258–261. isbn: 978-1-4799-4560-3. doi: 10.1 109/DDECS.2014.6868803.

[119]

J. Fiurášek. “Linear Optical Fredkin Gate Based on Partial-SWAP Gate”. In: Physical Review A 78 (2008). doi: 10.1103/PhysRevA.7 8.032317.

[120]

P. Forin. “Vital Coded Microprocessor: Principles and Applications for Various Transit Systems”. In: IFAC Proceedings Volumes 23.2 (1990), pp. 79–84. doi: 10.1016/S1474-6670(17)52653-1.

[121]

D. Foty. “The Future of “Moore’s Law” - Does it have one?” In: International Symposium on Signals, Circuits and Systems. July 2013, pp. 1–6. doi: 10.1109/ISSCS.2013.6651261.

[122]

M. Fujita et al. “Multi-Level Logic Minimization of Large Combinational Circuits by Partitioning”. In: Logic Synthesis and Optimization. Ed. by T. Sasao. The Springer International Series in Engineering and Computer Science 212. Springer Science & Business Media, 1993, pp. 109–126. isbn: 978-0-7923-9308-5. doi: 10.1 007/978-1-4615-3154-8.

[123]

M. Gao et al. “MVSIS”. In: Proceedings of the IWLS 2001. Tahoe City, CA, USA, 2001, pp. 138–144. url: https://embedded.eecs .berkeley.edu/mvsis/doc/2001/mvsis_iwls01.ps.

[124]

P. Gaudry, A. Kruppa, and P. Zimmermann. “A GMP-based Implementation of Schönhage-Strassen’s Large Integer Multiplication Algorithm”. In: Proceedings of the 2007 International Symposium on Symbolic and Algebraic Computation. ISSAC. Waterloo, Ontario, Canada: ACM, 2007, pp. 167–174. isbn: 978-1-59593-743-8. doi: 10 .1145/1277548.1277572.

[125]

C. Genz and R. Drechsler. “Overcoming Limitations of the SystemC Data Introspection”. In: Proceedings of the Conference on Design, Automation and Test in Europe. DATE. Nice, France: European Design and Automation Association, 2009, pp. 590–593. isbn: 9783-9810801-5-5.

Bibliography

433

[126]

J. E. Gibbs. Walsh Spectrometry: A Form of Spectral Analysis Well Suited to Binary Digital Computation. Teddington, Middlesex, UK, 1967. url: https://books.google.de/books?id=3RprNQAACAAJ.

[127]

J. E. Gibbs and H. A. Gebbie. “Application of Walsh Functions to Transform Spectroscopy”. In: Nature 224.5223 (Dec. 1969), pp. 1012–1013. doi: 10.1038/2241012a0.

[128]

J. E. Gibbs and R. S. Stanković. “Why IWGD - 89? A Look at the Bibliography of Gibbs Derivatives”. In: Theory and Applications of Gibbs Derivatives. Ed. by P. L. Butzer and R. S. Stanković. Belgrade, Serbia: Matematički Institut, 1990, pp. xi–xxiv. isbn: 978-8680-59305-0.

[129]

S. K. Grandhi et al. “ROST-C: Reliability Driven Optimization and Synthesis Techniques for Combinational Circuits”. In: 2015 33rd IEEE International Conference on Computer Design. ICCD 33 (Oct. 2015), pp. 431–434. doi: 10.1109/ICCD.2015.7357141.

[130]

S. Grandhi et al. “CPE: Codeword Prediction Encoder”. In: 2016 21st IEEE European Test Symposium. ETS 21. IEEE, May 2016, pp. 1–2. isbn: 978-1-4673-9660-8. doi: 10.1109/ETS.2016.7519283.

[131]

A. Grocholewska–Czurylo and J. Stoklosa. “Generating Bent Functions”. In: 2001 Proceedings of the Eighth International Conference on Advanced Computer Systems. Ed. by J. Sołdek and J. Pejaś. ACS 8. Mielno, Poland: Springer, Oct. 2002, pp. 361–370. isbn: 978-1-4613-4635-7. doi: 10.1007/978-1-4419-8530-9_29.

[132]

E. Guerrini, E. Orsini, and M. Sala. “Computing the Distance Distribution of Systematic Non-linear Codes”. In: Journal of Algebra and Its Applications 9.2 (Apr. 2010), pp. 241–256. issn: 0219-4988. doi: 10.1142/S0219498810003884.

[133]

V. Gupta et al. “Low-Power Digital Signal Processing Using Approximate Adders”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32.1 (Jan. 2013), pp. 124–137. doi: 10.1109/TCAD.2012.2217962.

[134]

Gürkan Uygur and Sebastian Sattler. “Parallel Decomposition for Safety-Critical Systems”. In: 2013 3rd International Electric Drives Production Conference. EDPC 3. Nuremberg, Germany: IEEE, Oct. 2013, pp. 1–8. doi: 10.1109/EDPC.2013.6689764.

434

Bibliography

[135]

F. K. Gurkaynak et al. “A Functional Test Methodology for Globally-Asynchronous Locally-Synchronous Systems”. In: Proceedings of the Eighth International Symposium on Asynchronous Circuits and Systems. IEEE, Apr. 2002, pp. 181–189. isbn: 0-7695-15401. doi: 10.1109/ASYNC.2002.1000308.

[136]

A. Haar. “About the Theory on Orthogonal Function Systems”. In: Mathematische Annalen 69 (1910). In German: Zur Theorie der orthogonalen Funktionssysteme, pp. 331–371. issn: 0025-5831.

[137]

F. Z. Hadjam. “Tuning of Parameters of a Soft Computing System for the Synthesis of Reversible Circuits”. In: Journal of MultipleValued Logic & Soft Computing 24.1–4 (2015), pp. 341–386.

[138]

F. Z. Hadjam and C. Moraga. “Introduction to RIMEP2: A MultiExpression Programming System for the Design of Reversible Digital Circuits”. In: arXiv preprint arXiv:1405.2226 (2014). url: http s://arxiv.org/ftp/arxiv/papers/1405/1405.2226.pdf.

[139]

F. Z. Hadjam and C. Moraga. Synthesis of 4-bit Reversible Circuits Using Hierarchical Distributed Reversible Multi-expression Genetic Programming. Tech. rep. Research Report 853. Available from the authors upon request. Faculty of Computer Science, TU Dortmund University, 2016.

[140]

C. N. Hadjicostis and G. C. Verghese. “Coding Approaches to Fault Tolerance in Linear Dynamic Systems”. In: IEEE Transactions on Information Theory 51.1 (Jan. 2005), pp. 210–228. doi: 10.1109/T IT.2004.839491.

[141]

R. W. Hamming. “Error Detecting and Error Correcting Codes”. In: The Bell System Technical Journal 29.2 (Apr. 1950), pp. 147–160. doi: 10.1002/j.1538-7305.1950.tb00463.x.

[142]

A. Hamou-Lhadj and T. Lethbridge. “Summarizing the Content of Large Traces to Facilitate the Understanding of the Behavior of a Software System”. In: 14th IEEE International Conference on Program Comprehension (ICPC’06). IEEE. 2006, pp. 181–190.

[143]

J. Han and M. Orshansky. “Approximate Computing: An Emerging Paradigm for Energy-Efficient Design”. In: 2013 18th IEEE European Test Symposium. ETS 18. May 2013, pp. 1–6. doi: 10.1109/E TS.2013.6569370.

[144]

M. C. Hansen, H. Yalcin, and J. P. Hayes. “Unveiling the ISCAS85 Benchmarks: A Case Study in Reverse Engineering”. In: IEEE Design & Test of Computers 16.3 (1999), pp. 72–80. issn: 0740-7475. doi: 10.1109/54.785838.

Bibliography

435

[145]

G. H. Hardy and E. M. Wright. An Introduction to the Theory of Numbers. Oxford Science Publications. Oxford Clarendon Press, May 1980. isbn: 978-0-1985-3171-5.

[146]

M. T. Heideman, D. H. Johnson, and C. S. Burrus. “Gauss and the History of the Fast Fourier Transform”. In: IEEE ASSP Magazine 1.4 (Oct. 1984), pp. 14–21. issn: 0740-7467. doi: 10.1109/MASSP.1 984.1162257.

[147]

T. Helleseth and A. Kholosha. “Bent Functions and Their Connections to Combinatorics”. In: Surveys in Combinatorics 2013. Ed. by S. Blackburn, S. Gerke, and M. Wildon. London Mathematical Society Lecture Notes Series 409. Cambridge University Press, 2013, pp. 91–126. isbn: 978-1-10727-281-1.

[148]

M. Hoffmann et al. “A Practitioner’s Guide to Software-Based SoftError Mitigation Using AN Codes”. In: Proceedings of the 15th IEEE International Symposium on High Assurance Systems Engineering. HASE 15. Miami, FL, USA, Jan. 2014, pp. 33–40. isbn: 978-1-47993465-2. doi: 10.1109/HASE.2014.14. url: https://www4.cs.fau .de/Publications/2014/hoffmann_14_hase.pdf.

[149]

H. H. Hoos and T. Stützle. “Empirical Analysis of Randomized Algorithms”. In: Handbook of Approximation Algorithms and Metaheuristics. Ed. by T. F. Gonzalez. Boca Raton, FL, USA: Chapman & Hall, May 2007, pp. 14.1–14.17. isbn: 978-1-58488-550-4. doi: 10.1201/9781420010749.ch14.

[150]

J. Hopf, G. S. Itzstein, and D. Kearney. “Hardware Join Java: a High Level Language for Reconfigurable Hardware Development”. In: Proceedings of the 2002 IEEE International Conference on FieldProgrammable Technology (FPT). FPT. Hong Kong, China: IEEE, Dec. 2002, pp. 344–347. doi: 10.1109/FPT.2002.1188707.

[151]

K. Huang et al. “Scalably Distributed SystemC Simulation for Embedded Applications”. In: International Symposium on Industrial Embedded Systems, 2008. SIES. Le Grande Motte, France: IEEE, June 2008, pp. 271–274. doi: 10.1109/SIES.2008.4577715.

[152]

A. A. Hwang, I. A. Stefanovici, and B. Schroeder. “Cosmic Rays Don’t Strike Twice: Understanding the Nature of DRAM Errors and the Implications for System Design”. In: ACM SIGARCH Computer Architecture News - ASPLOS ’12 40.1 (2012). issn: 0163-5964. doi: 10.1145/2189750.2150989.

436

Bibliography

[153]

T. Ibaraki. “Partially Defined Boolean Functions”. In: Boolean Functions - Theory, Algorithms, and Applications. Ed. by Y. Crama and P. L. Hammer. Cambridge University Press, 2011. isbn: 9780-52184-751-3.

[154]

Integrated Systems Laboratory (LSI). The EPFL Combinational Benchmark Suite. 2016. url: http://lsi.epfl.ch/benchmarks.

[155]

International Technology Roadmap for Semiconductors 2005 Edition, Design. 2005. url: https://www.semiconductors.org/clien tuploads/Research_Technology/ITRS/2005/Design.pdf.

[156]

L. Janin and D. Edwards. “Software Visualization Techniques Adapted and Extended for Asynchronous Hardware Design”. In: Ninth International Conference on Information Visualization. IV. London, UK: IEEE, July 2005, pp. 347–356. isbn: 0-7695-2397-8. doi: 10.1 109/IV.2005.119.

[157]

C. Jankowski et al. “Discretization of Data Using Boolean Transformations and Information Theory Based Evaluation Criteria”. In: Bulletin of the Polish Academy of Sciences 63.4 (2015), pp. 923–932. issn: 2300-1917. doi: 10.1515/bpasts-2015-0105.

[158]

J. Jegier, P. Kerntopf, and M. Szyprowski. “An Approach to Constructing Reversible Multi-Qubit Benchmarks with Provably Minimal Implementations”. In: Proceedings of the 13th IEEE International Conference on Nanotechnology. NANO 13. Beijing, China: IEEE, Aug. 2013, pp. 99–104. isbn: 978-1-4799-0676-5. doi: 10.11 09/NANO.2013.6721041.

[159]

C. D. Johnson. “The Circular Pipeline: Achieving Higher Throughput in the Search for Bent Functions”. MA thesis. Monterey, CA: Naval Postgraduate School, Sept. 2010.

[160]

D. B. Johnson. “Finding All the Elementary Circuits of a Directed Graph”. In: SIAM Journal on Computing 4.1 (1975), pp. 77–84. issn: 0097-5397. doi: 10.1137/0204007.

[161]

L. Jóźwiak. “Information Relationships and Measures: An Analysis Apparatus for Efficient Information System Synthesis”. In: Proceedings of the 23rd EUROMICRO Conference: New Frontiers of Information Technology. EUROMICRO. Budapest, Hungary: IEEE, Sept. 1997, pp. 13–23. isbn: 0-8186-8129-2. doi: 10.1109/EURMIC.1 997.617209.

Bibliography

437

[162]

A. Kandil and K. El-Rayes. “Multi-Deme Parallel Computing Model for Optimizing Large-Scale Construction Projects”. In: Joint International Conference on Computing and Decision Making in Civil and Building Engineering. Montreal, Canada, June 2008, pp. 1497– 1506.

[163]

D. Karaklajić, J.-M. Schmidt, and I. Verbauwhede. “Hardware Designer’s Guide to Fault Attacks”. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21.12 (2013), pp. 2295– 2306. doi: 10.1109/TVLSI.2012.2231707.

[164]

A. Karatsuba and Y. Ofman. “Multiplication of Many-Digital Numbers by Automatic Computers”. In: Proceedings of the USSR Academy of Science. Vol. 145. 2. In Russian. USSR, 1962, pp. 293– 294.

[165]

M. G. Karpovsky. Finite Orthogonal Series in the Design of Digital Devices: Analysis, Synthesis, and Optimization. John Wiley, 1976. isbn: 978-0-47015-015-3.

[166]

M. G. Karpovsky, R. S. Stanković, and J. T. Astola. Spectral Logic and Its Applications for the Design of Digital Devices. Wiley, 2008. isbn: 978-0-471-73188-7.

[167]

M. G. Karpovsky, K. J. Kulikowski, and Z. Wang. “Robust Error Detection in Communication and Computational Channels”. In: Processing of the 2007 International Workshop on Spectral Methods and Multirate Signal Processing. SMMSP. 2007. url: http://mark.bu .edu/papers/201.pdf.

[168]

M. G. Karpovsky and A. Taubin. “New Class of Nonlinear Systematic Error Detecting Codes”. In: IEEE Transactions on Information Theory 50.8 (Aug. 2004), pp. 1818–1819. issn: 0018-9448. doi: 10 .1109/TIT.2004.831844.

[169]

S. Kavut, S. Maitra, and M. D. Yücel. “Search for Boolean Functions with Excellent Profiles in the Rotation Symmetric Class”. In: IEEE Transactions on Information Theory IT-53.5 (May 2007), pp. 1743– 1751. doi: 10.1109/TIT.2007.894696.

[170]

S. Kavut and M. D. Yücel. “9-variable Boolean Functions with Nonlinearity 242 in the Generalized Rotation Class”. In: Information and Computation 208.4 (2010), pp. 341–350. doi: 10.1016/j.ic.2009 .12.002.

[171]

J. L. Kelley. General Topology. Springer Verlag, Aug. 1975. isbn: 978-3-54090-125-9.

438

Bibliography

[172]

O. Keren and M. G. Karpovsky. “Relations Between the Entropy of a Source and the Error Masking Probability for Security-Oriented Codes”. In: IEEE Transactions on Communications 63.1 (Jan. 2015), pp. 206–214. issn: 0090-6778. doi: 10 . 1109 / TCOMM . 2014 .2377151.

[173]

A. Khlopotine, M. Perkowski, and P. Kerntopf. “Reversible Logic Synthesis by Gate Composition”. In: Proceedings of International Workshop on Logic Synthesis. IWLS. 2002, pp. 261–266.

[174]

V. Khomenko et al. “STG Decomposition Strategies in Combination with Unfolding”. In: Acta Informatica 46.6 (Oct. 2009), pp. 433–474. issn: 1432-0525. doi: 10.1007/s00236-009-0102-y.

[175]

P. Kidwelly, ed. Synthesis of Electronic Computing and Control Circuits. Vol. 27. The Annals of the Computation Laboratory. Cambridge, Massachusetts, USA: Harvard University, 1951.

[176]

D. Knuth. The Art of Computer Programming. Vol. 4. Fascicle 6: Satisfiability. Addison Wesley, 2015. isbn: 978-0-13439-760-3.

[177]

S.-B. Ko, T. Xia, and J.-C. Lo. “Efficient Parity Prediction in FPGA”. In: Proceedings 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems. DFT. Washington, DC, USA: IEEE Computer Society, Oct. 2001, pp. 176–181. isbn: 0-7695-1203-8. doi: 10.1109/DFTVS.2001.966767.

[178]

J. V. Kohl et al. “Human Pheromones: Integrating Neuroendocrinology and Ethology”. In: Neuroendocrinology Letters 22.5 (2001), pp. 309–321. issn: 0172–780X. url: http://www.nel.edu/22_5 /NEL220501R01_Review.htm.

[179]

T. Kolditz and M. Werner. Coding Reliability: Computation of Silent Data Corruption. 2017. url: https://github.com/tuddbresilien ce/coding_reliability.

[180]

T. Kolditz et al. “Needles in the Haystack—Tackling Bit Flips in Lightweight Compressed Data”. In: 4th International Conference on Data Management Technologies and Applications. DATA 4. Cham, Germany: Springer, July 2015, pp. 135–153. isbn: 978-3-319-301624. doi: 10.1007/978-3-319-30162-4_9.

[181]

M. W. Krentel. “The Complexity of Optimization Problems”. In: Journal of Computer and System Sciences 36.3 (June 1988), pp. 490–509. issn: 0022-0000. doi: 10.1016/0022-0000(88)900396.

Bibliography

439

[182]

G. Kurepa. “Sets-logics-machines”. In: Proceedings of the International Symposium on Theory of Switching. Annals of the Computation Laboratory of Harvard University 29. Part 1. Cambridge, Massachusetts, USA: Harvard University, Apr. 1957, pp. 137–146.

[183]

A. Kuznetsov. “Information Storage in a Memory Assembled from Unreliable Components”. In: Problems of Information Transmission 9 (1973), pp. 254–264. issn: 0032-9460.

[184]

D. Lampret et al. OpenRISC 1000 Architecture Manual. Jan. 2003. url: https://www.isy.liu.se/en/edu/kurs/TSEA44/OpenRISC/o penrisc_arch3.pdf.

[185]

J. Lapalme et al. “ESys.Net: A New Solution for Embedded Systems Modeling and Simulation”. In: ACM SIGPLAN Notices. Vol. 39. LCTES 7. New York, NY, USA: ACM, July 2004, pp. 107–114. doi: 10.1145/997163.997179.

[186]

T. Le. “Testability of Combinational Circuits - Theory and Design”. In German: “Testbarkeit kombinatorischer Schaltungen - Theorie und Entwurf”. PhD thesis. Technical University of Karl-Marx-Stadt, Germany, 1989.

[187]

R. J. Lechner. “Harmonic Analysis of Switching Functions”. In: Recent Developments in Switching Theory. Ed. by A. Mukhopadhyay. New York, NY, USA: Academic Press, 1971, pp. 121–228. isbn: 9780-12-509850-2. doi: 10.1016/B978-0-12-509850-2.50010-5. url: http://www.sciencedirect.com/science/article/pii/B9780125 098502500105.

[188]

D. Leibfried et al. “Quantum Dynamics of Single Trapped Ions”. In: Reviews of Modern Physics 75.1 (Mar. 2003), pp. 281–324. doi: 10.1103/RevModPhys.75.281. url: http://cua.mit.edu/8.422_S 05/PHYSICS-leibfried-blatt-monroe-wineland-quantum-dynam ics-of-single-trapped-ions-rmp75-p281-2003.pdf.

[189]

C. Lemieux. “Monte Carlo and Quasi-Monte Carlo Sampling”. In: Springer Series in Statistics. New York, NY, USA: Springer, Nov. 2009. isbn: 978-0-387-78164-8. doi: 10.1007/978-0-387-78165-5.

[190]

L. A. Levin. “Universal Sequential Search Problems”. In: Problems of Information Transmission 9.3 (1973), pp. 265–266.

[191]

R. Lidl and H. Niederreiter. Introduction to Finite Fields and Their Applications. Cambridge University Press, 1994. isbn: 978-0-52146094-1. url: https://books.google.fi/books?id=AvY3PHlle3wC.

440

Bibliography

[192]

LiDO: Linux-HPC-Cluster at the TU Dortmund. Dortmund University. 2017. url: https://www.itmc.tu-dortmund.de/cms/de/dien ste/hochleistungsrechnen/lido/index.html.

[193]

O. A. Logachev, A. A. Salnikov, and V. V. Yashchenko. Boolean Functions in Coding Theory and Cryptography. Vol. 241. Translation of Mathematical Monographs. American Mathematical Society, 2012. isbn: 978-0-8218-4680-3.

[194]

M. Lohrey. “The Problem of Confluence for Trace Replacement Systems”. In German: Das Konfluenzproblem für Spurersetzungssysteme. PhD thesis. University of Stuttgart, 1999.

[195]

C. S. Lorens. “Invertible Boolean Functions”. In: IEEE Transactions on Electronic Computers EC-13.5 (Oct. 1964), pp. 529–541. issn: 0367-7508. doi: 10.1109/PGEC.1964.263724.

[196]

F. L. Luccio and L. Pagli. “On a New Boolean Function with Applications”. In: IEEE Transactions on Computers 48.3 (Mar. 1999), pp. 296–310. doi: 10.1109/12.754996.

[197]

M. Lukac and M. Perkowski. “Using an Exhaustive Search for the Discovery of a New Family of Optimum Universal Permutative Binary Quantum Gates”. In: Proceedings of International Workshop on Logic & Synthesis. IWLS. Lake Arrowhead, CA, USA, June 2005.

[198]

M. Lukac et al. “Minimization of Quantum Circuits using Quantum Operator Forms”. In: Proceedings of the 21st International Workshop on Post-Binary ULSI Systems. 2012. url: http://pdxscholar.lib rary.pdx.edu/cgi/viewcontent.cgi?article=1370&context=ece _fac.

[199]

M. Lukac et al. “Minimizing Reversible Circuits in the 2n Scheme Using Two and Three Bits Patterns”. In: 2014 17th Euromicro Conference on Digital System Design. Verona, Italy: IEEE, Aug. 2014, pp. 708–711. isbn: 978-1-4799-5793-4. doi: 10.1109/DSD.2014.106.

[200]

M. Lukac et al. “Reversible, Information-Preserving Logic and its Application”. In: Journal of Multiple-Valued Logic and Soft Computing 23.3-4 (2014), pp. 379–406. issn: 1542-3980.

[201]

J. Lunze. Event Discreet Systems. In German: Ereignisdiskrete Systeme. Oldenbourg, Oct. 2012. isbn: 978-3-48671-885-0.

[202]

E. Macii, M. Pedram, and F. Somenzi. “High-Level Power Modeling, Estimation, and Optimization”. In: IEEE Transaction on ComputerAided Design of Integrated Circuits and Systems 17.11 (Nov. 1998), pp. 1061–1079. issn: 0278-0070. doi: 10.1109/43.736181.

Bibliography

441

[203]

F. J. MacWilliams and N. J. A. Sloane. The Theory of ErrorCorrecting Codes. North-Holland Mathematical Library 16. NorthHolland, 1977. isbn: 978-0-44485-193-2.

[204]

A. A. Malik, D. Harrison, and R. K. Brayton. “Three-Level Decomposition with Application to PLDs”. In: Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer &Amp; Processors. ICCD. Cambridge, MA, USA: IEEE Computer Society, Oct. 1991, pp. 628–633. isbn: 0-8186-2270-9. doi: 10.1109/ICCD.1991.139989.

[205]

V. D. Malyugin. Parallel Calculations by Means of Arithmetic Polynomials. In Russian. Moscow, Russia: Physical and Mathematical Publishing Company, Russian Academy of Sciences, 1997.

[206]

V. D. Malyugin, M. Stanković, and R. S. Stanković. “Systolic Realization of the Discrete Haar Transform”. In: Avtomatika i Telemekhanika 9 (1997). In Russian, pp. 138–145.

[207]

V. D. Malyugin, R. S. Stanković, and M. S. Stanković. “Calculation of the Coefficients of Polynomial Representations of Switching Functions Through Binary Decision Diagrams”. In: Proceedings of the International Conference on Preventive Engineering, PIT-94. Niš, Yugoslavia, 1994, pp. 10-1–10-4.

[208]

S. Manich, M. Nicolaidis, and J. Figueras. “Enhancing Realistic Fault Secureness in Parity Prediction Array Arithematic Operators by IDDQ Monitoring”. In: Proceedings of 14th IEEE VLSI Test Symposium. VTS. Washington, DC, USA: IEEE Computer Society, Apr. 1996, pp. 124–129. isbn: 0-8186-7304-4. doi: 10.1109/VTEST.1996 .510846.

[209]

E. J. Marinissen, V. Iyengar, and K. Chakrabarty. “A Set of Benchmarks for Modular Testing of SOCs”. In: Proceedings of IEEE International Test Conference. ITC. Baltimore, MD, USA: IEEE, Oct. 2002, pp. 519–528. isbn: 0-7803-7542-4. doi: 10.1109/TEST.2002.1 041802.

[210]

K. Marquet and M. Moy. “PinaVM: A SystemC Front-End Based on an Executable Intermediate Representation”. In: Proceedings of the Tenth ACM International Conference on Embedded Software. EMSOFT. Scottsdale, Arizona, USA: ACM, Oct. 2010, pp. 79–88. isbn: 978-1-60558-904-6. doi: 10.1145/1879021.1879032.

[211]

D. Maslov, G. W. Dueck, and D. M. Miller. “Techniques for the Synthesis of Reversible Toffoli Networks”. In: ACM Transactions on Design Automation of Electronic Systems 12.4 (Sept. 2007), pp. 42.1– 42.28. issn: 1084-4309. doi: 10.1145/1278349.1278355.

442

Bibliography

[212]

D. Maslov. Reversible Logic Synthesis Benchmarks Page. 2015. url: http://webhome.cs.uvic.ca/~dmaslov/.

[213]

M. Matsui. “The First Experimental Cryptanalysis of the Data Encryption Standard”. In: Advances in Cryptology CRYPTO ’94. Vol. 839. Lecture Notes in Computer Science (LNCS). Berlin, Heidelberg: Springer, 1994, pp. 1–11. isbn: 978-3-540-58333-2. doi: 10 .1007/3-540-48658-5_1. url: https://link.springer.com/cont ent/pdf/10.1007%2F3-540-48658-5_1.pdf.

[214]

J. P. May. Finite Topological Spaces - Notes for REU. 2003. url: http://www.math.uchicago.edu/~may/MISC/FiniteSpaces.pdf.

[215]

K. McElvain. IWLS’93 Benchmark Set: Version 4.0. May 1993. url: http://ddd.fit.cvut.cz/prj/Benchmarks/IWLS93.pdf.

[216]

R. E. Miller. “Review of Proceedings of an International Symposium on the Theory of Switching, Parts 1 and 2”. In: The Journal of Symbolic Logic 33.1 (1968). Also in SIAM Review, 3.1, 1961, pp. 81–84, pp. 150–160. doi: 10.1137/1003018.

[217]

A. Mishchenko, B. Steinbach, and M. Perkowski. “An Algorithm for Bi-Decomposition of Logic Functions”. In: Proceedings of the 38th Annual Design Automation Conference. DAC. Las Vegas, NV, USA: ACM, June 2001, pp. 103–108. isbn: 1-58113-297-2. doi: 10.1145 /378239.378353.

[218]

K. Mohanram et al. “Synthesis of Low-Cost Parity-Based Partially Self-Checking Circuits”. In: Proceedings of the 9th IEEE On-Line Testing Symposium. IOLTS. IEEE, July 2003, pp. 35–40. isbn: 07695-1968-7. doi: 10.1109/OLT.2003.1214364. url: http://www.p itt.edu/~kmram/publications/iolts03.pdf.

[219]

R. H. Morelos-Zaragoza. The Art of Error Correcting Coding. John Wiley & Sons, Aug. 2006. isbn: 978-0-47001-558-2.

[220]

B. M. E. Moret. “Towards a Discipline of Experimental Algorithmics”. In: Data Structures, Near Neighbor Searches, and Methodology: Fifth and Sixth DIMACS Implementation Challenges. Ed. by M. H. Goldwasser, D. S. Johnson, and C. C. McGeoch. AMS, 2002, pp. 197–213. url: http://studylib.net/doc/8269056/towards-a -discipline-of-experimental-algorithmics.

[221]

S. Muroga. VLSI System Design: When and How to Design VeryLarge-Scale Integrated Circuits. New York, NY, USA: John Wiley & Sons, 1982. isbn: 978-0-47186-090-7.

Bibliography

443

[222]

N. M. Nayeem and J. E. Rice. “A Shared-Cube Approach to ESOPBased Synthesis of Reversible Logic”. In: Facta Universitatis. Electronics and Energetics 24.3 (Dec. 2011), pp. 385–402. doi: 10.2298 /FUEE1103385N. url: http://www.doiserbia.nb.rs/img/doi/035 3-3670/2011/0353-36701103385N.pdf.

[223]

E. I. Nechiporuk. “Network Synthesis by Using Linear Transformation of Variables”. In: Doklady Akademii Nauk SSSR 123.4 (1958), pp. 610–612.

[224]

Y. Neumeier and O. Keren. “Robust Generalized Punctured Cubic Codes”. In: IEEE Transactions on Information Theory 60.5 (May 2014), pp. 2813–2822. doi: 10.1109/TIT.2014.2310464.

[225]

Y. Neumeier, Y. Pesso, and O. Keren. “Efficient Implementation of Punctured Parallel Finite Field Multipliers”. In: IEEE Transactions on Circuits and Systems I: Regular Papers 62.9 (Sept. 2015), pp. 2260–2267. doi: 10.1109/TCSI.2015.2451914.

[226]

M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge, UK: Cambridge Univerisy Press, 2000. isbn: 978-0-52163-503-5.

[227]

S. M. Nowick and D. L. Dill. “Exact Two-Level Minimization of Hazard-Free Logic with Multiple-Input Changes”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 14.8 (Aug. 1995), pp. 986–997. issn: 0278-0070. doi: 10.1109 /43.402498.

[228]

M. Ogawa and K.-L. Ma. “code_swarm: A Design Study in Organic Software Visualization”. In: IEEE Transactions on Visualization and Computer Graphics 15.6 (Nov. 2009), pp. 1097–1104. doi: 10.1109 /TVCG.2009.123.

[229]

OpenCores.org. OpenCores. url: http://www.opencores.org.

[230]

C. Paar. Implementation of Cryptographic Schemes. Bochum, Germany: Ruhr-University Bochum, Apr. 2015. url: https://www.ems ec.rub.de/media/attachments/files/2015/09/IKV-1_2015-04-2 8.pdf.

[231]

Y. Pang et al. “A Novel Method of Synthesizing Reversible Logic”. In: Proceedings of the 2011 IEEE International Symposium on Circuits and Systems. ISCAS. Rio de Janeiro, Brazil: IEEE, May 2011, pp. 2857–2860. doi: 10.1109/ISCAS.2011.5938201.

444

Bibliography

[232]

Y. Pang et al. “Positive Davio Based Synthesis Algorithm for Reversible Logic”. In: Proceeding of the 2011 IEEE 29th International Conference on Computer Design. ICCD 29. Amherst, MA, USA: IEEE, Oct. 2011, pp. 212–218. doi: 10.1109/ICCD.2011.6081399.

[233]

M. Pedram. “Power Minimization in IC Design: Principles and Applications”. In: ACM Transactions on Design Automation of Electronic Systems 1 (Jan. 1996), pp. 3–56. doi: 10.1145/225871.2258 77.

[234]

M. Perkowski and S. Grygiel. A Survey of Literature on Function Decomposition – Version IV. Tech. rep. Portland State University (PSU), Electrical Engineering Department, 1995.

[235]

A. Petkovska et al. “Fast Generation of Lexicographic Satisfiable Assignments: Enabling Canonicity in SAT-based Applications”. In: Proceedings of the 35th International Conference on ComputerAided Design. ICCAD. Austin, Texas, USA: ACM, Nov. 2016, 4:1– 4:8. isbn: 978-1-4503-4466-1. doi: 10.1145/2966986.2967040.

[236]

F. Pichler. Some Aspects of a Theory of Correlation with Respect to Walsh Harmonic Analysis. Lecture presented at the Workshop on Applications of Walsh Functions. College Park, Maryland 20742, Aug. 1970.

[237]

F. Pichler. Walsh Functions and Linear System Theory. Lecture presented at the Workshop on Applications of Walsh Functions. College Park, Maryland 20742, 1970.

[238]

J. P. Pieprzyk and C. X. Qu. “Fast Hashing and Rotation-Symmetric Functions”. In: Journal of Universal Computer Science 5 (Jan. 1999), pp. 20–31. doi: 10.3217/jucs-005-01-0020.

[239]

G. Piret and J.-J. Quisquater. “A Differential Fault Attack Technique against SPN Structures, with Application to the AES and KHAZAD”. In: Cryptographic Hardware and Embedded SystemsCHES 2003. Vol. 2779. Lecture Notes in Computer Science (LNCS). Berlin, Heidelberg: Springer, 2003, pp. 77–88. isbn: 978-3-54040833-8. doi: 10.1007/978-3-540-45238-6_7.

[240]

T. Polzer and A. Steininger. “A General Approach for Comparing Metastable Behavior of Digital CMOS Gates”. In: 2016 IEEE 19th International Symposium on Design and Diagnostics of Electronic Circuits Systems. DDECS. Kosice, Slovakia: IEEE, Apr. 2016, pp. 1– 6. doi: 10.1109/DDECS.2016.7482456.

Bibliography

445

[241]

E. Popovici et al. Report on Fault Tolerant Synthesis through Error Correcting Codes Driven Graph Augmentation. FP7-ICT/FETOPEN/ i-RISC project, Deliverable 5.2. Feb. 2015. url: http://w ww.i-risc.eu/home/liblocal/docs/iRISC_Deliverables/i-RISC _D5.2.pdf.

[242]

C. Posthoff and B. Steinbach. Boolean Equations - Algorithms and Programs. Scientific Series (in German: Wissenschaftliche Schriftenreihe) 1/1997. In German: Binäre Gleichungen - Algorithmen und Programme. TH Karl-Marx-Stadt, 1979.

[243]

C. Posthoff and B. Steinbach. Logic Functions and Equations – Binary Models for Computer Science. Dordrecht, The Netherlands: Springer, 2004. isbn: 978-1-4020-2937-0.

[244]

C. Posthoff and B. Steinbach. “Solving Combinatorial Problems Using Boolean Equations”. In: Boolean Problems - Proceedings of the 11th International Workshop. Ed. by B. Steinbach. IWSBP 11. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2014, pp. 23–32. isbn: 978-3-86012-488-8.

[245]

C. Posthoff and B. Steinbach. “The Solution of Discrete Constraint Problems Using Boolean Models – The Use of Ternary Vectors for Parallel SAT-Solving”. In: Proceedings of the 2nd International Conference on Agents and Artificial Intelligence. Ed. by J. Filipe, A. Fred, and B. Sharp. ICAART 2. Valencia, Spain: INSTICC – Institute for Systems, Technologies of Information, Control, and Communication, Jan. 2010, pp. 487–493. isbn: 978-989-674-021-4.

[246]

G. N. Povarov. “On Functional Separability of Boolean Functions”. In: Doklady Akademii Nauk SSSR 94.5 (1954). In Russian, pp. 801– 803.

[247]

F. P. Preparata. “A Class of Optimum Nonlinear Double-ErrorCorrecting Codes”. In: Information and Control 13.4 (Oct. 1968), pp. 378–400. doi: 10.1016/S0019-9958(68)90874-7.

[248]

A. Puggelli et al. “Are Logic Synthesis Tools Robust?” In: Proceedings of the 2011 48th ACM/EDAC/IEEE Design Automation Conference. DAC. New York, NY, USA: IEEE, June 2011, pp. 633–638. isbn: 978-1-4503-0636-2.

[249]

P. Raab, S. Krämer, and J. Mottok. “Reliability of Data Processing and Fault Compensation in Unreliable Arithmetic Processors”. In: Microprocessors and Microsystems 40.C (Feb. 2016), pp. 102–112. issn: 0141-9331. doi: 10.1016/j.micpro.2015.07.014. url: http: //www.sciencedirect.com/science/article/pii/S014193311500 1131.

446

Bibliography

[250]

M. Radmanović. “Efficient Random Generation of Bent Functions Using a GPU Platform”. In: Boolean Problems - Proceedings of the 12th International Workshop. Ed. by B. Steinbach. IWSBP 12. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2016, pp. 167–173. isbn: 978-3-86012-540-3.

[251]

M. Radmanović and R. S. Stanković. “Random Generation of Bent Functions on a Multicore CPU Platform”. In: Proceedings of 51st International Scientific Conference on Information, Communication and Energy Systems and Technologies. ICEST. Ohrid, Macedonia: Faculty of Technical Sciences, June 2016, pp. 9–12.

[252]

F. P. Ramsey. “On a Problem of Formal Logic”. In: Proceedings of the London Mathematical Society 30.1 (1930), pp. 264–286. doi: 10.1112/plms/s2-30.1.264.

[253]

J. E. Rice. Technical Report TR-CSJR2-2007: Considerations for Determining a Classification Scheme for Reversible Boolean Functions. Tech. rep. Lethbridge, Alberta, Canada: University of Lethbridge, Department of Mathematics and Computer Science, 2007, pp. 1–26. url: http://www.academia.edu/2672870/Technical_Re port_Considerations_for_Determining_a_Classification_Sche me_for_Reversible_Boolean_Functions.

[254]

J. Rilling and S. Mudur. “3D Visualization Techniques to Support Slicing-based Program Comprehension”. In: Computers & Graphics 29.3 (June 2005), pp. 311–329. issn: 0097-8493. doi: 10.1016/j.ca g.2005.03.007. url: http://www.sciencedirect.com/science/a rticle/pii/S0097849305000476.

[255]

M. Rosen. Number Theory in Function Fields. Vol. 210. Graduate Texts in Mathematics. New York, NY, USA: Springer, Jan. 2002. isbn: 978-0-387-95335-9. doi: 10.1007/978-1-4757-6046-0.

[256]

T. D. Ross et al. Pattern Theory: An Engineering Paradigm for Algorithm Design. Final Report ADA243214. July 1991.

[257]

O. S. Rothaus. “On ‘Bent’ Functions”. In: Journal of Combinatorial Theory. A 20.3 (May 1976), pp. 300–305. issn: 0097-3165. doi: 10 .1016/0097-3165(76)90024-8.

[258]

R. L. Rudell and A. Sangiovanni-Vincentelli. “Multiple Valued Minimization for PLA Optimization”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 6.5 (June 1987), pp. 727–750. issn: 0278-0070. doi: 10.1109/TCAD.198 7.1270318.

Bibliography

447

[259]

R. L. Rudell. Multiple-Valued Logic Minimization for PLA Synthesis. Memorandum UCB/ERL M86/65. Berkeley, CA, USA: University of California, Electronics Research Laboratory, June 1986. url: https://www2.eecs.berkeley.edu/Pubs/TechRpts/1986/ERL-8665.pdf.

[260]

M. Saeedi and I. Markov. “Synthesis and Optimization of Reversible Circuits—A Survey”. In: ACM Computing Surveys 45.2 (Feb. 2013), 21:1–21:34. issn: 0360-0300. doi: 10.1145/2431211.2431220.

[261]

Y. L. Sagalovich. “Correctness Measures of Boolean Functions”. In: Problemy Peredachi Informatsii 8 (1961). In Russian, pp. 88–108.

[262]

A. Sangiovanni-Vincentelli. “The Tides of EDA”. In: IEEE Design & Test of Computers 20.6 (Nov. 2003), pp. 59–75. issn: 0740-7475. doi: 10.1109/MDT.2003.1246165.

[263]

T. Sasao. An Algorithm to Find a Reduced Set of Input Variables for Index Generation Functions. Tech. rep. Meiji University, 2016.

[264]

T. Sasao. “Index Generation Functions: Recent Developments”. In: Proceedings of the 2011 IEEE 41st International Symposium on Multiple-Valued Logic. ISMVL. Tuusula, Finland: IEEE, May 2011, pp. 1–9. isbn: 978-0-7695-4405-2. doi: 10.1109/ISMVL.2011.17.

[265]

T. Sasao. “Index Generation Functions: Tutorial”. In: Journal of Multiple-Valued Logic and Soft Computing 23.3 (Jan. 2014), pp. 235–263. issn: 1542-3980. doi: 10.1007/978-1-4419-8104-2_9. url: http://www.lsi-cad.com/sasao/Papers/files/MVLJ2014_s asao_a.pdf.

[266]

T. Sasao. “Linear Decomposition of Index Generation Functions”. In: 2012 17th Asia and South Pacific Design Automation Conference. ASP-DAC. Sydney, NSW, Australia: IEEE, Jan. 2012, pp. 781–788. isbn: 978-1-4673-0770-3. doi: 10 . 1109 / ASPDAC . 201 2.6165060.

[267]

T. Sasao. “Linear Transformations for Variable Reduction”. In: Proceedings of the Reed-Muller 2011 Workshop. Tuusula, Finland, May 2011.

[268]

T. Sasao. Memory-Based Logic Synthesis. New York, NY, USA: Springer, Mar. 2011. isbn: 978-1-44198-103-5. doi: 10 . 1007 / 978 -1-4419-8104-2.

[269]

T. Sasao. “On the Complexity of Three-level Logic Circuits”. In: IEEE/ACM International Workshop on Logic & Synthesis (IWLS). 1989, pp. 101–116.

448

Bibliography

[270]

T. Sasao. “OR-AND-OR Three-Level Networks”. In: Representation of Discrete Functions. Ed. by T. Sasao and M. Fujita. Norwell, MA, USA: Kluwer Academic Publishers, 1996, pp. 1–28. isbn: 978-14613-1385-4. doi: 10.1007/978-1-4613-1385-4.

[271]

T. Sasao. Switching Theory For Logic Synthesis. New York, NY, USA: Springer US, 1999. isbn: 978-0-7923-8456-4. doi: 10.1007/97 8-1-4615-5139-3.

[272]

T. Sasao and J. T. Butler. “On Bi-Decompositions of Logic Functions”. In: 6th International Workshop on Logic Synthesis. IWLS. Granlibakken Resort - Tahoe City, CA, USA, 1997, pp. 1–6.

[273]

T. Sasao and J. T. Butler. Progress in Applications of Boolean Functions. Synthesis Lectures on Digital Circuits and Systems 26. Morgan and Claypool Publishers, 2010. doi: 10.2200/S00243ED1V01Y2 00912DCS026.

[274]

T. Sasao and M. Fujita, eds. Representations of Discrete Functions. New York, NY, USA: Springer, Apr. 1996. isbn: 978-0-7923-9720-5. doi: 10.1007/978-1-4613-1385-4.

[275]

T. Sasao, K. Matsuura, and Y. Iguchi. “An Algorithm to Find Optimum Support-Reducing Decompositions for Index Generation Functions”. In: Proceedings of the 2017 IEEE Design, Automation Test in Europe Conference Exhibition. DATE. Lausanne, Switzerland: IEEE, Mar. 2017, pp. 96–101. isbn: 978-1-5090-5826-6. doi: 10.23 919/DATE.2017.7927100. url: http://www.lsi-cad.com/sasao/P apers/files/DATE2017.pdf.

[276]

T. Sasao, Y. Urano, and Y. Iguchi. “A Lower Bound on the Number of Variables to Represent Incompletely Specified Index Generation Functions”. In: Proceedings of the 2014 IEEE 44th International Symposium on Multiple-Valued Logic. ISMVL. Bremen, Germany: IEEE, May 2014, pp. 7–12. isbn: 978-1-4799-3535-2. doi: 10.1109 /ISMVL.2014.10.

[277]

T. Sasao, Y. Urano, and Y. Iguchi. “A Method to Find Linear Decompositions for Incompletely Specified Index Generation Functions Using Difference Matrix”. In: IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences E97-A.12 (Dec. 2014), pp. 2427–2433. issn: 1745-1337. doi: 10.1587/transf un.E97.A.2427.

Bibliography

449

[278]

T. D. Sasso et al. “Blended, Not Stirred: Multi-Concern Visualization of Large Software Systems”. In: 2015 IEEE 3rd Working Conference on Software Visualization. VISSOFT. Bremen, Germany: IEEE, Sept. 2015, pp. 106–115. doi: 10.1109/VISSOFT.2015.7332 420.

[279]

M. Schaefer. “Advanced STG Decomposition”. PhD thesis. Augsburg, Germany: University of Augsburg, Feb. 2008. url: https://w ww.deutsche-digitale-bibliothek.de/binary/AUC7VDXSNX6VHGB LGS6PVYEJAOH5KLY3/full/1.pdf.

[280]

N. B. Schafer. “Characteristics of the Binary Decision Diagrams of Bent Functions”. MA thesis. Monterey, CA, USA: Naval Postgraduate School, Sept. 2009. url: http://www.dtic.mil/cgi-bin/Get TRDoc?Location=U2&doc=GetTRDoc.pdf&AD=ADA509259.

[281]

U. Schiffel. “Hardware Error Detection Using AN Codes”. PhD thesis. TU Dresden, May 2011. url: http://www.qucosa.de/fileadm in/data/qucosa/documents/6987/diss.pdf.

[282]

G. Schmidt and T. Ströhlein. Relations and Graphs. Berlin, Heidelberg: Springer, 1993. isbn: 978-3-642-77970-1. doi: 10.1007/978-3 -642-77968-8.

[283]

G. Schmidt and M. Winter. Relational Topology. Number 3. University of the German Federal Armed Forces Munich, Faculty of Computer Science, 2014. url: https://www.unibw.de/inf/ful/be richte/technische_berichte/technischer_bericht%202014%2C3 /at_download/file.

[284]

G. Schmidt. “A Point-Free Relation-Algebraic Approach to General Topology”. In: Proceedings of the 14th International Conference on Relational and Algebraic Methods in Computer Science. Ed. by W. K. Peter Höfner Peter Jipsen and M. E. Müller. Vol. 8428. Marienstatt, Germany: Springer, Apr. 2014, pp. 226–241. isbn: 9783-319-06250-1. doi: 10.1007/978-3-319-06251-8.

[285]

G. Schmidt and R. Berghammer. “Contact, Closure, Topology, and the Linking of Rows and Column Types”. In: Journal of Logic and Algebraic Programming 80.6 (Aug. 2011), pp. 339–361. doi: 10.10 16/j.jlap.2011.04.007.

[286]

J. Schmidt and P. Fišer. “A Prudent Approach to Benchmark Collection”. In: Boolean Problems - Proceedings of the 12th International Workshop. Ed. by B. Steinbach. IWSBP 12. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2016, pp. 129– 136. isbn: 978-3-86012-540-3.

450

Bibliography

[287]

S. W. Schneider. “Finding Bent Functions Using Genetic Algorithms”. MA thesis. Monterey, CA, USA: Naval Postgraduate School, Sept. 2009. url: http : / / www . dtic . mil / cgi - bin / GetT RDoc?Location=U2&doc=GetTRDoc.pdf&AD=ADA509151.

[288]

C. Scholl. Functional Decomposition with Applications to FPGA Synthesis. New York, NY, USA: Springer, 2001. isbn: 978-0-79237585-2. doi: 10.1007/978-1-4757-3393-8.

[289]

A. Schönhage and V. Strassen. “Fast Multiplication of Large Numbers”. In: Computing 7.3 (1971). In German: Schnelle Multiplikation großer Zahlen, pp. 281–292. issn: 0010-485X. doi: 10.1007/BF0224 2355.

[290]

B. Schroeder and G. Gibson. “A Large-Scale Study of Failures in High-Performance Computing Systems”. In: IEEE Transactions on Dependable and Secure Computing 7.4 (Oct. 2010). issn: 1545-5971. doi: 10.1109/TDSC.2009.4.

[291]

T. Schuster et al. “SoCRocket - A Virtual Platform for the European Space Agency’s SoC Development”. In: 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-onChip. ReCoSoC 9. IEEE, May 2014, pp. 1–7. isbn: 978-1-4799-58115. doi: 10.1109/ReCoSoC.2014.6860690.

[292]

J. Seberry and X.-M. Zhang. “Constructions of Bent Functions from Two Known Bent Functions”. In: Australasian Journal of Combinatorics 9 (1994), pp. 21–34. url: http://ro.uow.edu.au/cgi/view content.cgi?article=2109&context=infopapers.

[293]

E. M. Sentovich et al. SIS: A System for Sequential Circuit Synthesis. Memorandum UCB/ERL M92/41. Berkeley, CA, USA: University of California, Electronics Research Laboratory, May 1992. url: https://www2.eecs.berkeley.edu/Pubs/TechRpts/1992/ERL-9241.pdf.

[294]

J. L. Shafer et al. “Enumeration of Bent Boolean Functions by a Reconfigurable Computer”. In: The 18th Annual International IEEE Symposium on Field-Programmable Custom Computing Machines. Charlotte, NC, USA: IEEE, May 2010, pp. 265–272. isbn: 978-07695-4056-6.

[295]

J. L. Shafer. “An Analysis of Bent Function Properties Using the Transeunt Triangle and the SRC–6 Reconfigurable Computer”. MA thesis. Monterey, CA. USA: Naval Postgraduate School, Sept. 2009.

Bibliography

451

[296]

N. R. Shanbhag et al. “Stochastic Computation”. In: Proceedings of the 47th Design Automation Conference. Anaheim, CA, USA: ACM, June 2010, pp. 859–864. isbn: 978-1-4503-0002-5. doi: 10.1145/18 37274.1837491.

[297]

C. E. Shannon. “The Synthesis of Two Terminal Switching Circuits”. In: Bell System Technical Journal 28.1 (Jan. 1949), pp. 59–98. issn: 0005-8580. doi: 10.1002/j.1538-7305.1949.tb03624.x.

[298]

B. I. Shestakov. Synthesis of Electronic Computing and Control Circuits. Translation from English to Russian of the book, published by the staff of Computation Laboratory of Harvard University in 1951, edited by B. I. Shestakov. Moscow, Russia: Izdatelstvo Inostrannoj Literatury, 1954.

[299]

P. P. Shirvani et al. “DUDES: a Fault Abstraction and Collapsing Framework for Asynchronous Circuits”. In: Proceedings of the Sixth International Symposium on Advanced Research in Asynchronous Circuits and Systems. ASYNC 6. Eilat, Israel: IEEE, 2000, pp. 73– 82. isbn: 0-7695-0586-4. doi: 10.1109/ASYNC.2000.836962.

[300]

W. Shum and J. H. Anderson. “Analyzing and Predicting the Impact of CAD Algorithm Noise on FPGA Speed Performance and Power”. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. FPGA 12. Monterey, California, USA: ACM, 2012, pp. 107–110. isbn: 978-1-4503-1155-7. doi: 10.1 145/2145694.2145711.

[301]

S. K. Smit and A. E. Eiben. “Parameter Tuning of Evolutionary Algorithms: Generalist vs. Specialist”. In: European Conference on the Applications of Evolutionary Computation. Ed. by C. Di Chio et al. Berlin, Heidelberg: Springer, 2010, pp. 542–551. isbn: 978-3642-12238-5. doi: 10.1007/978-3-642-12239-2_56.

[302]

M. Soeken, N. Abdessaied, and G. De Micheli. “Enumeration of Reversible Functions and Its Application to Circuit Complexity”. In: Proceedings of the 8th International Conference on Reversible Computation (RC 2016). Ed. by S. Devitt and I. Lanese. Vol. 9720. Lecture Notes in Computer Science (LNCS). Bologna, Italy: Springer International Publishing, 2016, pp. 255–270. isbn: 978-3-319-405773. doi: 10.1007/978-3-319-40578-0_19.

[303]

M. Soeken, N. Abdessaied, and R. Drechsler. “A Framework for Reversible Circuit Complexity”. In: Problems and New Solutions in the Boolean Domain. Ed. by B. Steinbach. Newcastle upon Tyne, UK: Cambridge Scholars Publishing, 2016, pp. 327–341. isbn: 9781-44388-947-6.

452

Bibliography

[304]

M. Soeken, R. Whille, and R. Drecshler. “Hierarchical Synthesis of Reversible Circuits Using Positive and Negative Davio Decomposition”. In: Proceedings of 5th International Design and Test Workshop. IDT 5. Abu Dhabi, United Arab Emirates: IEEE, Dec. 2010, pp. 143–148. isbn: 978-1-61284-291-2. doi: 10.1109/IDT.2010.572 4427.

[305]

M. Soeken et al. “BDD Minimization for Approximate Computing”. In: Proceedings of the 2016 21st Asia and South Pacific Design Automation Conference. ASP-DAC 21. Macau, China, Jan. 2016, pp. 474–479. isbn: 978-1-4673-9569-4. doi: 10.1109/ASPDAC.2016 .7428057.

[306]

M. Soeken et al. “Embedding of Large Boolean Functions for Reversible Logic”. In: ACM Journal on Emerging Technologies in Computing Systems 12.4 (2015), 41:1–41:26. issn: 1550-4832. doi: 10.1 145/2786982.

[307]

M. Soeken et al. “Heuristic NPN Classification for Large Functions Using AIGs and LEXSAT”. In: Proceedings of the 19th International Conference on Theory and Applications of Satisfiability Testing (SAT 2016). Vol. 9710. Lecture Notes in Computer Science book series (LNCS). Cham, Germany: Springer International Publishing, 2016, pp. 212–227. isbn: 978-3-319-40970-2. doi: 10.1007/978-3-3 19-40970-2_14.

[308]

E. S. Sogomonjan and M. Goessel. “Design of Self-Parity Combinational Circuits for Self-Testing and On-Line Detection”. In: The IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems. Venice, Italy: IEEE, 1993, pp. 239–246. isbn: 0-81863502-9. doi: 10.1109/DFTVS.1993.595814.

[309]

F. Somenzi. CUDD: CU Decision Diagram Package, Release 3.0.0. University of Colorado at Boulder, Department of Electrical, Computer, and Energy Engineering, Dec. 2015. url: http://vlsi.col orado.edu/~fabio/CUDD/cudd.pdf.

[310]

R. S. Stanković and J. T. Astola, eds. Reprints from the Early Days of Information Sciences, Reminiscences of the Early Work in DCT, Interview with K. R. Rao. TICSP #60. Tampere, Finland: Tampere International Center for Signal Processing, Tampere, Univeristy of Technology, 2012. isbn: 978-952-15-2818-7.

Bibliography

453

[311]

R. S. Stanković and J. T. Astola, eds. Reprints from the Early Days of Information Sciences, Reminiscences of the Early Work in Walsh Functions Interviews with Franz Pichler, William R. Wade, Ferenc Schipp. TICSP #58. Tampere, Finland: Tampere International Center for Signal Processing, Tampere, Univeristy of Technology, 2011. isbn: 978-952-15-2598-8.

[312]

R. S. Stanković and J. T. Astola. Spectral Interpretation of Decision Diagams. New York, NY, USA: Spinger, 2003. isbn: 978-0-38795545-2. doi: 10.1007/b97562.

[313]

R. S. Stanković, J. T. Astola, and C. Moraga. “Design of MultipleValued Logic Networks with Regular Structure by Using Spectral Representations”. In: Journal of Multiple-Valued Logic and Soft Computing 19.1-3 (2012), pp. 251–269. issn: 1542-3980.

[314]

R. S. Stanković, C. Moraga, and J. T. Astola. “From Fourier Expansions to Arithmetic-Haar Expressions on Quaternion Groups”. In: Applicable Algebra in Engineering, Communication and Computing 12.3 (2001), pp. 227–253. issn: 1432-0622. doi: 10.1007/s0 02000100068.

[315]

R. S. Stanković et al. Dyadic Walsh Analysis from 1924 Onwards Walsh-Gibbs-Butzer Dyadic Differentiation in Science Volume 1, Foundations: A Monograph Based on Articles of the Founding Authors, Reproduced in Full. Vol. 12. Atlantis Studies in Mathematics for Engineering and Science. Atlantis Press, Springer, 2015. isbn: 978-94-6239-159-8. doi: 10.2991/978-94-6239-160-4.

[316]

R. S. Stanković et al. Dyadic Walsh Analysis from 1924 Onwards Walsh-Gibbs-Butzer Dyadic Differentiation in Science Volume 2, Extensions and Generalizations, A Monograph Based on Articles of the Founding Authors, Reproduced in Full. Vol. 13. Atlantis Studies in Mathematics for Engineering and Science. Atlantis Press, Springer, 2015. isbn: 978-94-6239-162-8. doi: 10 . 2991 / 978 - 94 6239-163-5.

[317]

R. S. Stanković et al., eds. Reprints from the Early Days of Information Sciences, Early Work in Switching Theory and Logic Design in USSR. Vol. 66. Tampere, Finland: Tampere International Center for Signal Processing, Tampere Univeristy of Technology, 2016. isbn: 978-952-15-3786-8.

454

Bibliography

[318]

R. Stanković, J. Astola, and B. Steinbach. “Former and Recent Work in Classification of Switching Functions”. In: Boolean Problems Proceedings of the 8th International Workshop. Ed. by B. Steinbach. IWSBP 8. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2008, pp. 115–126. isbn: 978-3-86012-346-1.

[319]

B. Steinbach and M. Werner. “Alternative Approaches for Fast Boolean Calculations Using the GPU”. In: Computational Intelligence and Efficiency in Engineering Systems. Ed. by G. Borowik et al. Vol. 595. Studies in Computational Intelligence. Switzerland: Springer International Publishing, 2015, pp. 17–31. isbn: 978-3-31915719-1. doi: 10.1007/978-3-319-15720-7_2.

[320]

B. Steinbach, ed. Boolean Problems - Proceedings of the 12th International Workshop. IWSBP 12. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2016. isbn: 978-3-86012-540-3.

[321]

B. Steinbach. “Derivative Operations for Lattices of Boolean Functions”. In: Proceedings of the Reed-Muller Workshop 2013. RM 12. Toyama, Japan, 2013, pp. 110–119. doi: 10.13140/2.1.2398.6568.

[322]

B. Steinbach. “Generalized Lattices of Boolean Functions Utilized for Derivative Operations”. In: Materiały konferencyjne KNWS’13. KNWS. Łagów, Poland, 2013, pp. 1–17. doi: 10.13140/2.1.1874 .3680. url: http://www.informatik.tu-freiberg.de/prof2/pub likationen/KNWS_2013_glbfudo.pdf.

[323]

B. Steinbach, ed. Problems and New Solutions in the Boolean Domain. Vol. 2. Boolean Domain. Newcastle upon Tyne, UK: Cambridge Scholars Publishing, 2016. isbn: 978-1-4438-8947-6.

[324]

B. Steinbach, ed. Recent Progress in the Boolean Domain. Vol. 1. Boolean Domain. Newcastle upon Tyne, UK: Cambridge Scholars Publishing, Apr. 2014. isbn: 978-1-4438-5638-6.

[325]

B. Steinbach. “Relationships Between Vectorial Bi-Decompositions and Strong EXOR-Bi-Decompositions”. In: Proceedings of the 25th International Workshop on Post-Binary ULSI Systems. ULSI. 2016, p. 44.

[326]

B. Steinbach. “Vectorial Bi-Decompositions of Logic Functions”. In: Proceedings of the Reed-Muller Workshop 2015. RM 13. Waterloo, Canada, 2015, 4:1–4:10.

[327]

B. Steinbach. “XBOOLE - A Toolbox for Modeling, Simulation, and Analysis of Large Digital Systems”. In: Systems Analysis and Modeling Simulation 9.4 (1992), pp. 297–312. issn: 0232-9298.

Bibliography

455

[328]

B. Steinbach and C. Lang. “Exploiting Functional Properties of Boolean Functions for Optimal Multi-Level Design by Bi-Decomposition”. In: Artificial Intelligence Review 3-4 (Dec. 2003), pp. 319– 360. issn: 0269-2821. doi: 10.1023/B:AIRE.0000006606.01771.8f.

[329]

B. Steinbach and C. Lang. “Exploiting Functional Properties of Boolean Functions for Optimal Multi-Level Design by Bi-Decomposition”. In: Artificial Intelligence in Logic Design. Ed. by S. Yanshkevich. Dordrecht, The Netherlands and Norwell, MA, USA: Kluwer Academic Publisher, 2004, pp. 159–200. isbn: 1-4020-2052-X.

[330]

B. Steinbach and C. Posthoff. “A Hierarchy of Models for Lattices of Boolean Functions”. In: Computer Aided System Theory, Extended Abstracts. 16th International Conference on Computer Aided System Theory (Eurocast 2017). Ed. by A. Quesada-Arencibia et al. Las Palmas de Gran Canaria, Spain: IUCTC Universidad de Las Palmas de Grand Canaria, 2017, pp. 213–214. isbn: 978-84-617-8087-7.

[331]

B. Steinbach and C. Posthoff. “An Extended Theory of Boolean Normal Forms”. In: Proceedings of the 6th Annual Hawaii International Conference on Statistics, Mathematics and Related Fields. Honolulu, Hawaii, 2007, pp. 1124–1139. url: http://www.informatik.tu-fr eiberg.de/prof2/publikationen/HICSM_2007_ETBNF.pdf.

[332]

B. Steinbach and C. Posthoff. “Boolean Differential Calculus”. In: Progress in Applications of Boolean Functions. Ed. by T. Sasao and J. T. Butler. Series Lectures on Digital Circuits and Systems 26. San Rafael, CA, USA: Morgan & Claypool Publishers, 2010, pp. 55–78, 121–126. isbn: 978-1-60845-181-4. doi: 10.2200/S00243ED1V01Y20 0912DCS026.

[333]

B. Steinbach and C. Posthoff. Boolean Differential Calculus. Series Lectures on Digital Circuits and Systems 52. San Rafael, CA, USA: Morgan & Claypool Publishers, June 2017. isbn: 978-1-62705-922-0. doi: 10.2200/S00766ED1V01Y201704DCS052.

[334]

B. Steinbach and C. Posthoff. “Boolean Differential Calculus - Theory and Applications”. In: Journal of Computational and Theoretical Nanoscience 7.6 (2010), pp. 933–981. issn: 1546-1955.

[335]

B. Steinbach and C. Posthoff. Boolean Differential Equations. Series Lectures on Digital Circuits and Systems 42. San Rafael, CA, USA: Morgan & Claypool Publishers, June 2013. isbn: 978-1-62705-241-2. doi: 10.2200/S00511ED1V01Y201305DCS042.

456

Bibliography

[336]

B. Steinbach and C. Posthoff. “Boolean Differential Equations - a Common Model for Classes, Lattices, and Arbitrary Sets of Boolean Functions”. In: Facta Universitatis. Electronics and Energetics 28.1 (2015), pp. 51–76. issn: 0353-3670. doi: 10.2298/FUEE1501051S.

[337]

B. Steinbach and C. Posthoff. “Classes of Bent Functions Identified by Specific Normal Forms and Generated Using Boolean Differential Equations”. In: Facta Universitatis. Electronics and Energetics 24.3 (2011), pp. 357–383. doi: 10.2298/FUEE1103357S.

[338]

B. Steinbach and C. Posthoff. “Evaluation and Optimization of GPU Based Unate Covering Algorithms”. In: Computer Aided Systems Theory – EUROCAST 2015. Ed. by R. Moreno-Díaz, F. Pichler, and A. Quesada-Arencibia. Vol. 9520. LNCS. Springer International Publishing, Switzerland, 2015, pp. 617–624. isbn: 978-3-319-273396. doi: 10.1007/978-3-319-27340-2_76.

[339]

B. Steinbach and C. Posthoff. “Extremely Complex 4-Colored Rectangle-Free Grids: Solution of Open Multiple-Valued Problems”. In: Proceedings of the IEEE 42nd International Symposium on MultipleValued Logic. ISMVL. Victoria, BC, Canada, 2012, pp. 37–44. isbn: 978-0-7695-4673-5. doi: 10.1109/ISMVL.2012.12.

[340]

B. Steinbach and C. Posthoff. “Fast Calculation of Exact Minimal Unate Coverings on Both the CPU and the GPU”. In: 14th International Conference on Computer Aided Systems Theory – EUROCAST 2013 - Part II. Ed. by R. Moreno-Díaz, F. Pichler, and A. Quesada-Arencibia. Vol. 8112. Lecture Notes in Computer Science (LNCS). Springer, Berlin, Heidelberg, 2013, pp. 234–241. isbn: 9783-642-53861-2. doi: 10.1007/978-3-642-53862-9_30.

[341]

B. Steinbach and C. Posthoff. “Improvements in Exact Minimal Waveform Coverings of Periodic Binary Signals”. In: Computer Aided System Theory, Extended Abstracts. 13th International Conference on Computer Aided System Theory (Eurocast 2011). Ed. by A. Quesada-Arencibia et al. IUCTC Universidad de Las Palmas de Grand Canaria, 2011, pp. 410–411. isbn: 978-84-693-9560-8.

[342]

B. Steinbach and C. Posthoff. Logic Functions and Equations - Examples and Exercises. The Netherlands: Springer, 2009. isbn: 9781-4020-9594-8. doi: 10.1007/978-1-4020-9595-5.

[343]

B. Steinbach and C. Posthoff. “Parallel Solutions of Covering Problems—Super-Linear Speedup on a Small Set of Cores”. In: GSTF International Journal on Computing 1.2 (2011), pp. 113–122. issn: 2010-2283.

Bibliography

457

[344]

B. Steinbach and C. Posthoff. “Solution of the Last Open FourColored Rectangle-free Grid - an Extremely Complex MultipleValued Problem”. In: Proceedings of the IEEE 43nd International Symposium on Multiple-Valued Logic. ISMVL. Toyama, Japan, 2013, pp. 302–309. isbn: 978-0-7695-4976-7. doi: 10 . 1109 / ISMVL .2013.51.

[345]

B. Steinbach and C. Posthoff. “Sources and Obstacles for Parallelization - a Comprehensive Exploration of the Unate Covering Problem Using Both CPU and GPU”. In: GPU Computing with Applications in Digital Logic. Ed. by J. T. Astola et al. Vol. 62. TICSP. Tampere: Tampere International Center for Signal Processing, 2012, pp. 63– 96. isbn: 978-952-15-2920-7. doi: 10.13140/2.1.4266.4320.

[346]

B. Steinbach and C. Posthoff. “The Last Unsolved Four-Colored Rectangle-Free Grid: The Solution of Extremely Complex MultipleValued Problems”. In: Journal of Multiple-Valued Logic and Soft Computing 25.4–5 (2015), pp. 617–624. issn: 1542-3980.

[347]

B. Steinbach and C. Posthoff. “Vectorial Bi-Decomposition for Lattices of Boolean Functions”. In: Boolean Problems - Proceedings of the 12th International Workshop. IWSBP 12. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2016, pp. 93– 104. isbn: 978-3-86012-540-3.

[348]

B. Steinbach and M. Werner. “XBOOLE-CUDA - Fast Boolean Operations on the GPU”. In: Boolean Problems - Proceedings of the 11th International Workshop. Ed. by B. Steinbach. IWSBP 11. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2014, pp. 75–84. isbn: 978-3-86012-488-8.

[349]

B. Steinbach and M. Werner. “XBOOLE-CUDA - Fast Calculations of Large Boolean Problems on the GPU”. In: Problems and New Solutions in the Boolean Domain. Ed. by B. Steinbach. Newcastle upon Tyne, UK: Cambridge Scholar Publishing, 2016, pp. 117–149. isbn: 978-1-4438-8947-6.

[350]

S. Stojković, M. Stanković, and C. Moraga. “Complexity Reduction of Toffoli Networks Based on FDD”. In: Facta Universitatis. Electronics and Energetics 2.28 (2015), pp. 251–262. issn: 0353-3670. doi: 10.2298/FUEE1502251S.

[351]

R. E. Stong. “Finite Topological Spaces”. In: Transactions of the AMS 123 (1966), pp. 325–340. url: http://www.ams.org/journal s/tran/1966-123-02/S0002-9947-1966-0195042-2/S0002-99471966-0195042-2.pdf.

458

Bibliography

[352]

J. Stoppe and R. Drechsler. “Analyzing SystemC Designs: SystemC Analysis Approaches for Varying Applications”. In: Sensors 15.5 (2015), pp. 10399–10421. issn: 1424-8220. doi: 10.3390/s15051039 9. url: http://www.mdpi.com/1424-8220/15/5/10399.

[353]

J. Stoppe, R. Wille, and R. Drechsler. “Data Extraction from SystemC Designs Using Debug Symbols and the SystemC API”. In: 2013 IEEE Computer Society Annual Symposium on VLSI. ISVLSI. IEEE, 2013, pp. 26–31. isbn: 978-1-4799-1331-2. doi: 10.1109/ISV LSI.2013.6654618.

[354]

J. Stoppe et al. “Towards a Multi-Dimensional and Dynamic Visualization for ESL Designs”. In: DATE Friday Workshop on Design Automation for Understanding Hardware Designs (DUHDe). Dresden, Germany, 2014.

[355]

A. Sudnitson. “Partition Search for FSM Low Power Synthesis”. In: Fourth International Conference Computer-Aided Design of Discrete Devices. CAD DD 4. Minsk, Belarus: National Academy of Sciences of Belarus, Institute of Engineering Cybernetics, 2001, pp. 44– 49.

[356]

K. Suenaga. “Toward a Theory of Industrial Development and Vertical Disintegration: The Case of the Semiconductor Industry”. In: The Josai Journal of Business Administration 4.1 (2007), pp. 49– 56. url: http://sucra.saitama-u.ac.jp/modules/xoonips/down load.php/JOS-KJ00005032535.pdf?file_id=3648.

[357]

M. Szyprowski and P. Kerntopf. “A Study of Optimal 4-Bit Reversible Circuit Synthesis from Mixed-Polarity Toffoli Gates”. In: 2012 12th IEEE Conference on Nanotechnology. IEEE-NANO 12. Birmingham, UK: IEEE, Aug. 2012, pp. 1–6. isbn: 978-1-4673-21983. doi: 10.1109/NANO.2012.6322178.

[358]

K. Takahashi and T. Hirayama. “Reversible Logic Synthesis from Positive Davio Trees of Logic Functions”. In: Proceedings of IEEE TENCON Conference. Singapore: IEEE, Jan. 2009, pp. 1–4. doi: 10.1109/TENCON.2009.5395805.

[359]

M. Taylor. “Reliable Computation in Computing Systems Designed from Unreliable Components”. In: The Bell System Technical Journal 47.12 (Dec. 1968), pp. 2339–2266. issn: 0005-8580. doi: 10.100 2/j.1538-7305.1968.tb01088.x.

[360]

M. Taylor. “Reliable Information Storage in Memories Designed from Unreliable Components”. In: The Bell System Technical Journal 47.12 (Dec. 1968), pp. 2299–2337. issn: 0005-8580. doi: 10.100 2/j.1538-7305.1968.tb01087.x.

Bibliography

459

[361]

Texas Instruments Inc. “Type SN74181 Arithmetic Logic Unit/Function Generator”. In: TTL Catalog Supplement. 1970, S7–1.

[362]

H. Thapliyal and H. R. Arabnia. “Reversible Programmable Logic Array (RPLA) Using Fredkin & Feynman Gates for Industrial Electronics and Applications”. In: Proceedings of the International Conference on Computer Design. 2008, pp. 70–76. url: https://a rxiv.org/ftp/cs/papers/0609/0609029.pdf.

[363]

N. Tokareva. Bent Functions Results and Applications to Cryptography. Boston, MA, USA: Academic Press, Aug. 2015. isbn: 978-012-802318-1. doi: B978-0-12-802318-1.09991-X.

[364]

V. Tomashevich et al. “Protecting Cryptographic Hardware Against Malicious Attacks by Nonlinear Robust Codes”. In: 2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems. DFT. Amsterdam, The Netherlands: IEEE, Oct. 2014, pp. 40–45. isbn: 978-1-4799-6156-6. doi: 10.1109 /DFT.2014.6962084.

[365]

N. A. Touba and E. J. McCluskey. “Logic Synthesis of Multilevel Circuits with Concurrent Error Detection”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 16.7 (July 1997), pp. 783–789. issn: 0278-0070. doi: 10.1109/43.644041.

[366]

C. C. Tsai and M. Marek-Sadowska. “Boolean Functions Classification via Fixed Polarity Reed-Muller Forms”. In: IEEE Transactions on Computers 46.2 (Feb. 1997), pp. 173–186. issn: 0018-9340. doi: 10.1109/12.565592.

[367]

G. S. Tseitin. “On the Complexity of Derivation in Propositional Calculus”. In: Studies in Constructive Mathematics and Mathematical Logic 8 (1968). Ed. by A. O. Slisenko. In Russian, pp. 234–259. doi: 10.1007/978-3-642-81955-1_28.

[368]

S. Unger. Asynchronous Sequential Switching Circuits. John Wiley & Sons Ltd., May 1970. isbn: 978-0-47189-632-6.

[369]

G. Uygur and S. Sattler. “Structure Preserving Modeling for Safety Critical Systems”. In: 2015 20th International Mixed-Signal Testing Workshop. IMSTW 20. IEEE, June 2015, pp. 1–6. isbn: 978-1-46736733-2. doi: 10.1109/IMS3TW.2015.7177866.

[370]

A. Vachoux, C. Grimm, and K. Einwich. “Extending SystemC to Support Mixed Discrete-Continuous System Modeling and Simulation”. In: IEEE International Symposium on Circuits and Systems. ISCAS. Kobe, Japan: IEEE, May 2005, pp. 5166–5169. isbn: 0-78038834-8. doi: 10.1109/ISCAS.2005.1465798.

460

Bibliography

[371]

L. G. Valiant. “The Complexity of Computing the Permanent”. In: Theoretical Computer Science 8.2 (1979), pp. 189–201. issn: 03043975. doi: 10.1016/0304-3975(79)90044-6.

[372]

C. H. van Berkel, M. B. Josephs, and S. M. Nowick. “Scanning the Technology: Applications of Asynchronous Circuits”. In: Proceedings of the IEEE 87.2 (Feb. 1999), pp. 223–233. url: http://www.cs.c olumbia.edu/~nowick/pieee99.pdf.

[373]

J. H. van Lint. Introduction to Coding Theory. Vol. 86. Graduate Texts in Mathematics. Berlin, Heidelberg, Germany: Springer, 1999. isbn: 978-3-642-63653-0. doi: 10.1007/978-3-642-58575-3.

[374]

M. Y. Vardi. “Formal Techniques for SystemC Verification; Position Paper”. In: 44th ACM/IEEE Design Automation Conference. DAC 44. San Diego, CA, USA: IEEE, June 2007, pp. 188–192. isbn: 9781-59593-627-1. doi: 10.1109/DAC.2007.375150.

[375]

D. Varma and E. A. Trachtenberg. “Design Automation Tools for Efficient Implementation of Logic Functions by Decomposition”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 8.8 (Aug. 1989), pp. 901–916. issn: 0278-0070. doi: 10.1109/43.31549.

[376]

B. Vasic and S. K. Chilappagari. “An Information Theoretical Framework for Analysis and Design of Nanoscale Fault-Tolerant Memories Based on Low-Density Parity-Check Codes”. In: IEEE Transactions on Circuits and Systems I: Regular Papers 54.11 (Nov. 2007), pp. 2438–2446. issn: 1549-8328. doi: 10.1109/TCSI.2007.9 02611.

[377]

R. Venkatesan et al. “MACACO: Modeling and Analysis of Circuits for Approximate Computing”. In: 2011 IEEE/ACM International Conference on Computer-Aided Design. ICCAD. San Jose, CA, USA: IEEE, 2011, pp. 667–673. isbn: 978-1-4577-1399-6. doi: 10.1109/ICCAD.2011.6105401.

[378]

H. Vollmer. Introduction to Circuit Complexity. Berlin, Heidelberg, Germany: Springer, 1999. isbn: 978-3-662-03927-4. doi: 10.1007/9 78-3-662-03927-4.

[379]

J. Von Neumann. “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components”. In: Annals of Mathematics Studies 34 (1956), pp. 43–98.

[380]

H. Wade. “Trial and Error: An Organized Procedure”. In: InTech Magazine 52.5 (2005), pp. 38–42.

Bibliography

461

[381]

J. L. Walsh. “A Closed Set of Normal Orthogonal Functions”. In: American Journal of Mathematics 45.1 (1923), pp. 5–24. doi: 10.2 307/2387224. url: http://www.jstor.org/stable/2387224.

[382]

Z. Wang and M. Karpovsky. “Algebraic Manipulation Detection Codes and Their Applications for Design of Secure Cryptographic Devices”. In: Proceedings of the 2011 IEEE 17th International OnLine Testing Symposium. IOLTS 17. Washington, DC, USA: IEEE Computer Society, 2011, pp. 234–239. isbn: 978-1-4577-1053-7. doi: 10.1109/IOLTS.2011.5994535. url: http://mark.bu.edu/papers /223.pdf.

[383]

Z. Wang, M. Karpovsky, and K. Kulikowski. “Replacing Linear Hamming Codes by Robust Nonlinear Codes Results in a Reliability Improvement of Memories”. In: IEEE/IFIP International Conference on Dependable Systems & Networks 2009. DSN. Lisbon, Portugal: IEEE, 2009, pp. 514–523. isbn: 978-1-4244-4422-9. doi: 10.1 109/DSN.2009.5270297. url: http://mark.bu.edu/papers/207.p df.

[384]

Z. Wang, M. Karpovsky, and K. J. Kulikowski. “Design of Memories with Concurrent Error Detection and Correction by Nonlinear SECDED Codes”. In: Journal of Electronic Testing 26.5 (Oct. 2010), pp. 559–580. issn: 1573-0727. doi: 10.1007/s10836- 010- 5168- 5. url: http://mark.bu.edu/papers/214.pdf.

[385]

R. Wille and R. Drechsler. “BDD-Based Synthesis of Reversible Logic for Large Functions”. In: Proceedings of the 46th Annual Design Automation Conference. DAC 46. San Francisco, CA, USA: ACM, July 2009, pp. 270–275. isbn: 978-1-60558-497-3. doi: 10.11 45/1629911.1629984. url: http://www.informatik.uni- bremen .de/agra/doc/konf/09dac_bdd_synth.pdf.

[386]

R. Wille and R. Drechsler. “Effect of BDD Optimization on Synthesis of Reversible and Quantum Logic”. In: Electronic Notes in Theoretical Computer Science. ENTCS 253.6 (Mar. 2010), pp. 57– 70. doi: 10.1016/j.entcs.2010.02.006.

[387]

R. Wille and R. Drechsler. “From Truth Tables to Programming Languages: Progress in the Design of Reversible Circuits”. In: 2011 IEEE 41st International Symposium on Multiple-Valued Logic. ISMVL. Tuusula, Finland: IEEE Computer Society, May 2011, pp. 78– 85. isbn: 978-1-4577-0112-2. doi: 10.1109/ISMVL.2011.40. url: ht tp://www.informatik.uni-bremen.de/agra/doc/konf/11_ismvl _reversible_circuit_design_tutorial.pdf.

462

Bibliography

[388]

N. Wilt. The CUDA Handbook: A Comprehensive Guide to GPU Programming. Pearson Education, June 2013. isbn: 978-0-13326150-9.

[389]

C. Wu and B. Steinbach. “Relations between Different Constructions of Bent Functions and Their Enumerations”. In: Proceedings of the 2015 Asia-Pacific Conference on Computer Aided System Engineering. APCASE. Quito, Ecuador: IEEE Computer Society, July 2015, pp. 308–313. isbn: 978-1-4799-7589-1. doi: 10.1109/APCASE .2015.61.

[390]

C. Wu and B. Steinbach. “Applications of Boolean Functions in Cryptography”. In: Boolean Problems - Proceedings of the 11th International Workshop. Ed. by B. Steinbach. IWSBP 11. Freiberg, Germany: Freiberg University of Mining and Technology, Sept. 2014, pp. 189–198. isbn: 978-3-86012-488-8.

[391]

H.-D. Wuttke and K. Henke. Switching Systems - An Automaton Based Introduction. In German: Schaltsysteme - Eine automatenorientierte Einführung. Pearson Studium, Dec. 2002. isbn: 978-3-82737-035-8.

[392]

S. Yang. “Logic Synthesis and Optimization Benchmarks”. In: MCNC International Workshop on Logic Synthesis. MCNC, Dec. 1988.

[393]

S. Yang. Logic Synthesis and Optimization Benchmarks User Guide Version 3.0. User Guide. Triangle Park, NC, USA: Microelectronics Center of North Carolina, 1991.

[394]

A. Younes. “Detection and Elimination of Non-Trivial Reversible Identities”. In: International Journal of Computer Science, Engineering and Applications. IJCSEA 2.4 (Aug. 2012), pp. 49–61. url: https://de.scribd.com/document/105881634/Detection-and-El imination-of-Non-Trivial-Reversible-Identities.

[395]

A. M. Youssef and G. Gong. “Hyper-Bent Functions”. In: Advances in Cryptology - EUROCRYPT 2001, International Conference on the Theory and Application of Cryptographic Techniques, Proceedings. Ed. by B. Pfitzmann. Vol. 2045. Lecture Notes in Computer Science (LNCS). Innsbruck, Austria: Springer, May 2001, pp. 406– 419. isbn: 978-3-540-44987-4. doi: 10.1007/3-540-44987-6_25.

[396]

N. Y. Yu and G. Gong. “Constructions of Quadratic Bent Functions in Polynomial Forms”. In: IEEE Transactions on Information Theory 52.7 (July 2006), pp. 3291–3299. issn: 0018-9448. doi: 10.1109 /TIT.2006.876251.

Bibliography

463

[397]

R. Berghammer. Computer-Aided Program Development: RelView System. Home page. url: http://www.informatik.uni- kiel.de /~progsys/relview.

[398]

A. D. Zakrevskij. “A Common Logic Approach to Data Mining and Pattern Recognition”. In: Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Ed. by E. Triantaphyllou and G. Felici. Boston, MA, USA: Springer, 2006, pp. 1–43. isbn: 978-0-387-34296-2. doi: 10 . 1007 / 0 - 387 - 34296 - 6 _ 1. url: http://dx.doi.org/10.1007/0-387-34296-6_1.

[399]

A. D. Zakrevskij. “A Logical Approach to the Pattern Recognition Problem”. In: Proceedings of the 9th International Conference on Knowledge-Dialog-Solution. KDS 9. 2001, pp. 238–245.

[400]

A. D. Zakrevskij. “Algorithms for Low Power State Assignment of an Automaton”. In: Informatika 1.29 (2011). In Russian, pp. 68–78.

[401]

A. D. Zakrevskij. Algorithms for Synthesis of Discrete Automata. In Russian. USSR: Nauka, 1971.

[402]

A. D. Zakrevskij. “Optimization of Set Covers”. In: Logical Language for Representation of Algorithms for Synthesis of Relay Devices (1966). In Russian, pp. 136–148.

[403]

A. D. Zakrevskij. Synthesis of Asynchronous Automata in Computers. In Russian. USSR: Nauka i Tehnika, 1975.

[404]

A. D. Zakrevskij, Y. Pottosin, and L. Cheremisinova. Design of Logical Control Devices. Tallinn, Estonia: TUT Press, 2009.

[405]

Y. I. Zhuravlev. “Recognition Algorithms with Representative Sets”. In: Zhurnal Vychislitelnoi Mathematiki i Mathematitscheskoi Fiziki 42.9 (2002). In Russian, pp. 1425–1435. url: http://www.mathnet .ru/php/archive.phtml?wshow=paper&jrnid=zvmmf&paperid=113 8&option_lang=eng.

[406]

Y. I. Zhuravlev, V. V. Rjazanov, and O. V. Sen’ko. Recognition - Mathematical Models, Programming Systems, Practical Applications. In Russian. Moscow: Fazis, 2006.

[407]

A. J. Zomorodian. Topology for Computing. Cambridge Monographs on Applied and Computational Mathematics 16. Cambridge, UK: Cambridge University Press, Sept. 2009. isbn: 978-0-52183-666-1. url: http://www.worldcat.org/title/topology-for-computing /oclc/730238286.

List of Authors Jaakko Astola Faculty of Information Technology Tampere University of Technology Tampere, Finland E-Mail: [email protected] Pekka Astola Faculty of Information Technology Tampere University of Technology Tampere, Finland E-Mail: [email protected] Rudolf Berghammer Department of Computer Science University of Kiel Kiel, Germany E-Mail: [email protected] Robert K. Brayton Department of Electrical Engineering and Computer Science University of California Berkeley, California, USA E-Mail: [email protected] Anna Bernasconi Department of Computer Science University of Pisa Pisa, Italy E-Mail: [email protected]

466

List of Authors

Jon T. Butler Department of Electrical and Computer Engineering Naval Postgraduate School Monterey, California, USA E-Mail: [email protected] Yavuz Can Department of Electrical Engineering, Electronic, and Computer Science Friedrich-Alexander-University of Erlangen-Nuremberg Nuremberg, Germany E-Mail: [email protected] Valentina Ciriani Department of Computer Science University of Milan Milan, Italy E-Mail: [email protected] Rolf Drechsler Group of Computer Architecture University of Bremen Bremen, Germany E-Mail: [email protected] Petr Fišer Faculty of Information Technology Czech Technical University in Prague Prague, Czech Republic E-Mail: [email protected] Danila Gorodecky United Institute of Informatics Problems National Academy of Sciences of Belarus Minsk, Belarus E-Mail: [email protected]

List of Authors

Dirk Habich Database Research Group TU Dresden Dresden, Germany E-Mail: [email protected] Fatima Zohra Hadjam Department of Computer Science Djillali Liabes University Sidi Bel Abbes, Algeria E-Mail: [email protected] Michitaka Kameyama Department of Information Technology and Electronics Ishinomaki Senshu University Ishinomaki-shi, Japan E-Mail: [email protected] Tomas Karnagel Database Research Group TU Dresden Dresden, Germany E-Mail: [email protected] Osnat Keren Faculty of Engineering Bar-Ilan University Ramat-Gan, Israel E-Mail: [email protected] Paweł Kerntopf Faculty of Physics and Applied Informatics University of Łódź Łódź, Poland E-Mail: [email protected]

467

468

Oliver Keszocze Group of Computer Architecture University of Bremen Bremen, Germany E-Mail: [email protected] Till Kolditz Database Research Group TU Dresden Dresden, Germany E-Mail: [email protected] Wolfgang Lehner Database Research Group TU Dresden Dresden, Germany E-Mail: [email protected] Martin Lukac Department of Computer Science Nazarbayev University Astana, Kazakhstan E-Mail: [email protected] Claudio Moraga Faculty of Computer Science TU Dortmund University Dortmund, Germany E-Mail: [email protected] Yaara Neumeier Faculty of Engineering Bar-Ilan University Ramat-Gan, Israel E-Mail: [email protected]

List of Authors

List of Authors

Krzysztof Podlaski Faculty of Physics and Applied Informatics University of Łódź Łódź, Poland E-Mail: [email protected] Christian Posthoff Department of Computing and Information Technology The University of the West Indies Trinidad & Tobago E-Mail: [email protected] Emanuel M. Popovici Electrical & Electronic Engineering University College Cork Cork, Ireland E-Mail: [email protected] Yuri Pottosin United Institute of Informatics Problems National Academy of Sciences of Belarus Minsk, Belarus E-Mail: [email protected] Hila Rabii Faculty of Engineering Bar-Ilan University Ramat-Gan, Israel E-Mail: [email protected] Miloš Radmanovć Department of Computer Science University of Niš Niš, Serbia E-Mail: [email protected]

469

470

Daniel Rodas Bautista Electrical & Electronic Engineering University College Cork Cork, Ireland E-Mail: [email protected] Tsutomu Sasao Department of Computer Science Meiji University Kawasaki, Japan E-Mail: [email protected] Sebastian M. Sattler Chair of Reliable Circuits and Systems Friedrich-Alexander-University Erlangen-Nuremberg Erlangen, Germany E-Mail: [email protected] Jan Schmidt Faculty of Information Technology Czech Technical University in Prague Prague, Czech Republic E-Mail: [email protected] Stuart W. Schneider Department of Electrical and Computer Engineering Naval Postgraduate School Monterey, California, U.S.A. E-Mail: [email protected] Mathias Soeken Integrated Systems Laboratory École Polytechnique Fédérale de Lausanne Lausanne, Switzerland E-Mail: [email protected]

List of Authors

List of Authors

Milena Stanković Department of Computer Science University of Niš Niš, Serbia E-Mail: [email protected] Radomir S. Stanković Department of Computer Science University of Niš Niš, Serbia E-Mail: [email protected] Bernd Steinbach Institute of Computer Science Freiberg University of Mining and Technology Freiberg, Germany E-Mail: [email protected] Suzana Stojković Department of Computer Science University of Niš Niš, Serbia E-Mail: [email protected] Jannis Stoppe Group of Computer Architecture University of Bremen Bremen, Germany E-Mail: [email protected] Ioan Tabus Faculty of Information Technology Tampere University of Technology Tampere, Finland E-Mail: [email protected]

471

472

Index of Authors

Gabriella Trucco Department of Computer Science University of Milan Milan, Italy E-Mail: [email protected] Gürkan Uygur Chair of Reliable Circuits and Systems Friedrich-Alexander-University Erlangen-Nuremberg Erlangen, Germany E-Mail: [email protected] Tiziano Villa Department of Computer Science University of Verona Verona, Italy E-Mail: [email protected] Matthias Werner Centre for Information Services and High Performance Computing TU Dresden Dresden, Germany E-Mail: [email protected] Bo Yang Electrical & Electronic Engineering University College Cork Cork, Ireland E-Mail: [email protected]

Index of Authors A

K

Astola, Jaakko . . . . . . . . . . 54, 334 Astola, Pekka . . . . . . . . . . . . . . 334

Kameyama, Michitaka . . . . . . 405 Karnagel, Tomas . . . . . . . . . . . 136 Keren, Osnat . . . . . . . . . . . . . . . 303 Kerntopf, Paweł . . . . . . . . . . . . 384 Keszocze, Oliver . . . . . . . . . . . . . 43 Kolditz, Till . . . . . . . . . . . . . . . . 136

B Bautista, Daniel Rodas . . . . . 319 Berghammer, Rudolf . . . . . . . . 69 Bernasconi, Anna . . . . . . . . . . 214 Brayton, Robert K. . . . . . . . . . 214 Butler, Jon T. . . . . . . . . . . . 25, 99 C Can, Yavuz . . . . . . . . . . . . . . . . . 156 Ciriani, Valentina . . . . . . . . . . 214

L Lehner, Wolfgang . . . . . . . . . . .136 Lukac, Martin . . . . . . . . . . . . . . 405 M Moraga, Claudio . . 54, 355, 372, 384, 405 N

D Neumeier, Yaara . . . . . . . . . . . 303 Drechsler, Rolf . . . . . . . . . . 43, 199 F Fišer, Petr . . . . . . . . . . . . . . . . . .281 G

P Podlaski, Krzysztof . . . . . . . . . 384 Popovici, Emanuel . . . . . . . . . .319 Posthoff, Christian . . . . . . . 3, 175 Pottosin, Yuri . . . . . . . . . . . . . . 253 R

Gorodecky, Danila . . . . . . . . . . 240 H Habich, Dirk . . . . . . . . . . . . . . . 136 Hadjam, Fatima Zohra . . . . . 372

Rabii, Hila . . . . . . . . . . . . . . . . . 303 Radmanović, Miloš . . . . . . . . . 120 S Sasao, Tsutomu . . . . . . . . . . . . . .25

474

Sattler, Sebastian M. . . . 87, 268 Schmidt, Jan . . . . . . . . . . . . . . . 281 Schneider, Stuart W. . . . . . . . . 99 Soeken, Mathias . . . . . . . . . . . . . 43 Stanković, Milena . . . . . . . . . . 355 Stanković, Radomir S. . .54, 334, 355, 384 Steinbach, Bernd . . . 3, 156, 175, 319 Stojković, Suzana . . . . . . . . . . 355 Stoppe, Jannis . . . . . . . . . . . . . 199 T Tabus, Ioan . . . . . . . . . . . . . . . . 334 Trucco, Gabriella . . . . . . . . . . . 214

Index

U Uygur, Gürkan . . . . . . . . . 87, 268 V Villa, Tiziano . . . . . . . . . . . . . . .214 W Werner, Matthias . . . . . . . . . . . 136 Y Yang, Bo . . . . . . . . . . . . . . . . . . . 319

Index ABC, 284, 287, 295–299, 302, 332 absorption, 12, 14, 23 adder, 49, 52, 205, 244, 246–249, 251, 252, 310, 315, 321, 322, 325, 327–330, 333 complete, 328 decomposed, 330, 331, 333 globally, 328, 329, 331 locally, 328, 331, 332 large, 328 ripple-carry, 48, 49, 294 standard, 331 addition modulo, 123, 336 algebra computer, 69 relation, 69, 70, 81 algorithm approximation, 139, 154 by levels, 368 cryptographic, 101 design, 355 evolutionary, 374 exact, 139, 142, 154, 155 genetic, 121 greedy, 263 mapping, 135 post-order, 366, 367, 369

random generation, 134 sampling, 155 synthesis, 44, 366, 393, 396 analysis dyadic, 57, 62 Fourier, 58 harmonic, 54, 55 abstract, 54 spectral, 58, 59, 61 animation, 69 antivalence, 5, 6 application cryptographic, 105, 118, 120 approach algebraic, 55, 56 box-and-connection, 205 code prediction encoding, 320, 321 covering, 176 dual-rail, 93 generic, 319 mathematical, 56 optimization, 96 topological, 268 architecture, 102, 106, 112, 114, 130, 131, 147, 247, 248, 319, 321, 327–329, 332, 374, 377, 405

476

Index

adder, 321 fault tolerant, 319 hybrid, 372 master-slave, 374 Mealy, 87 area small, 177 arithmetic regular, 241–244, 246–248, 250, 251 saturation, 240–242, 244–251 attack cryptanalytic, 120 differential fault analysis, 303, 307 fault injection, 306 linear, 99, 101 side channel, 303, 304 software, 304 autocorrelation, 335 automaton finite, 253, 254 asynchronous, 254, 255, 257 synchronous, 255 axiom, 72, 81, 83, 183 separation, 81, 83 bank conflict, 144 benchmark reversible, 372, 378, 395 benchmark set, 281, 286, 288–291, 293, 294, 296–298, 300 Berkeley PLA, 289, 291 current, 286 documentation, 288, 289 EPFL’16, 281, 289, 294

Illinois University, 289, 293 ISCAS’85, 281, 288, 289, 293 ISCAS’89, 288, 289, 293 ITC’99, 289, 293 IWLS’05, 289, 291 IWLS’91, 286, 289, 291, 300 IWLS’93, 286, 289, 291 LEKO/LEKU, 285, 289, 293 LGSynth’91, 291 MCNC’89, 289, 291 proposed, 297 property, 300 structure, 297 QUIP, 289, 293 bi-decomposition, 177, 189, 218, 321 approach, 197 BDC-based approach, 177 disjoint, 177 disjoint AND, 195 graph-based approach, 177 multi-level, 238 non-disjoint, 177 possibility, 179 semi-tensor product-based approach, 178 strong, 178, 179, 181, 191, 195, 197, 198, 218, 321, 326–328, 333 strong AND, 178, 195 strong OR, 178, 195 strong XOR, 178, 194, 195, 325, 327, 333 vectorial, 179–181, 189,

Index

191, 194, 195, 197, 198, 218, 320, 326, 327, 333 vectorial AND, 189, 192, 326 vectorial EXOR, 193 vectorial OR, 189–192, 195, 196, 326 vectorial XOR, 189, 194, 326, 327 weak, 178, 179, 197, 198, 218, 320 binary equation solution, 5 Birkhoff’s Diamond, 15, 17 bit range, 240 arbitrary, 240 large, 242 small, 240, 241 wide, 240 Boolean Algebra, 56, 87, 183, 192, 270, 282 Boolean function multi-output, 219 compatible, 219 bound Gilbert-Warshamov, 347, 350 Hamming, 350 lower, 25, 32, 42, 306, 309, 312, 350 upper, 25, 34, 42, 120, 122, 293, 345, 350 boundary, 74, 76, 77, 84, 377 calculus propositional, 3 capacity error correction, 320 CD-SAT

477

equation, 24 formula, 13, 21 model, 16 problem, 9, 11, 17, 21, 24 CDC-SAT equation, 24 formula, 11 problem, 9, 11, 16 change direction of, 60, 184–186, 188, 189, 191, 194 simultaneous, 179, 183, 184, 188, 190, 194, 195, 326, 327 checker, 304, 310 nonlinear, 309, 310 chessboard, 9 chromosome, 373, 375, 378 encoding, 375 circuit asynchronous, 88, 89, 255 balanced, 179 benchmark, 287–291, 294, 296, 369 combinational, 87, 175, 176, 281, 283, 290, 291, 293, 298, 329 complemented, 214, 216, 217, 220–229, 234, 236–238 optimal, 220, 225 digital, 96 equivalent, 296 functionally, 297 structurally, 297 linear, 26, 27 non-reversible, 384 practical, 288 quantum, 405, 407, 408 reversible, 355–357, 363,

478

365, 371–373, 375, 377–379, 382, 384–387, 393, 396, 405 safety critical, 89, 97 sequential, 175, 290, 291, 293, 298 size, 50, 284, 290, 301 two-level, 294 unbalanced, 179 circuit example, 282 alu4, 287 ex1010, 288, 291 cleaned, 290, 294 generated, 282, 288, 294 loss of structure, 283 practical, 282 purpose specific, 288 separate components, 290 transformed, 283, 286 class affine, 104, 105 complexity, 3, 45, 46, 50 equivalence, 104, 297, 384–387, 392, 404 class N P, 3, 4 classification, 67, 121, 238, 319, 384–387 clause, 6, 9, 11, 14, 17 clique, 17–21 monochromatic, 19 closure, 74, 76, 77, 79, 84 reflexive-transitive, 81 CMOS inverter totally defined, 277 Co-Design Hardware/Software, 199, 200, 204–206, 209 co-factor, 359 Co-Visualization

Index

Hardware/Software, 200, 205, 206, 210–213 code, 136 AN, 136, 138–140, 144, 147, 154 block, 350 cyclic, 349 error-correcting, 305, 320, 347, 350, 351, 389 reliability oriented, 305 Golay, 349 Hamming, 136, 138, 305, 348, 349 linear, 136, 137, 305, 335, 347, 349–351 low density parity check, 319 non-linear, 136, 137, 140 non-systematic, 140 parity bit, 305 punctured-cubic, 310, 318 punctured-square, 309, 310, 318 quadratic-sum, 310–312, 315, 317, 318 reliability oriented, 305 robust, 304, 306, 308–310, 314, 315 binary, 315 optimum, 310, 311, 313 partially, 313, 314 perfect, 309 security oriented, 304–306, 308 separable, 305, 309, 317, 318 shortened quadratic-sum, 312, 318 triple-quadratic-sum, 311, 313, 318

Index

triple-sum, 309, 311, 315, 317, 318 code word, 136 r-bit sphere, 137 coding matrix, 257, 258, 267 coefficient PPRM, 128–130, 133, 134 Reed-Muller, 129 Walsh, 120, 124–127, 130 coloring, 15, 16, 20 k-edge, 15 edge, 15, 17, 20 graph, 15, 17 grid, 24 monochromatic, 20, 21 node, 15 vertex, 15 column balanced, 25, 38–40, 42 combinatorics, 24, 120 communication channel, 304, 307 communication model, 304 comparator, 26, 27, 111 complement, 14, 28, 45, 70, 74, 84, 100, 107, 156, 161–164, 216, 220, 221, 226, 238, 239, 272, 273, 345 orthogonal, 163 complemented circuit, 214, 216, 217, 220–229, 234, 236–238 complexity, 3, 17, 44, 49, 50, 53, 67, 89, 119, 138, 141, 154, 208, 263, 283, 293, 304, 314, 334, 344, 355, 358, 363, 365, 369, 371, 386

479

computational, 44, 127 exponential, 122 implementation, 293 input, 286 overall, 51 component quantum, 406, 408 component function, 384, 386, 387, 391–397, 400, 402–404, 413, 415 composition, 70, 84, 277, 280 dual-rail, 95 parallel, 94, 277 computer reconfigurable, 29, 31, 102, 118 reversible, 373 computing algorithm, 63, 67 approximate, 43, 44, 53 parallel, 131 quantum, 384 resource, 44 reversible, 373, 405 concatenation, 162, 165, 167, 168, 247, 249 module, 273, 275, 278, 279 condition matrix, 256, 257, 261, 263–265 conjunction, 5–7, 9–11, 16, 93, 156–160, 196, 220, 225, 226, 242, 272 elementary, 156, 157, 196 inner, 16 prime, 14, 175, 196 consequence, 11, 83, 286 converter code, 335 radix, 63

480

convolution, 63, 335 cost minimal, 220, 224, 226, 227 cover, 214–216, 258, 262 edge, 21 minimal, 21, 22 minimal, 263 weighted, 262 node, 13 set, 11 exact, 12, 13 minimal, 12 vertex, 13, 14, 22 minimal, 22 crossover, 375 cryptography, 68, 120, 385, 391 CUDA, 122, 131, 133–135, 142–144, 154, 155 data mining, 43, 199, 215 decision diagram, 69, 85, 121, 228, 284, 355–359, 362, 363, 365–367, 369–371, 385, 395 decoder, 137, 304, 305 decomposition, 195, 215, 238, 281, 331, 333, 335, 356, 358, 359 Ashenhurst, 176 bi-, 177–181, 189–198, 218, 238, 320, 321, 325–328, 333 Curtis, 176 disjoint, 176 functional, 214 global, 328, 330 linear, 328, 335, 338 local, 331

Index

non-linear, 328 parallel, 177 positive Davio, 369 Povarov, 176 sequential, 177 Shannon, 356 weak, 218 decomposition function, 177–179, 181, 189–193, 195, 325–327, 333 decomposition gate, 177, 178 deconvolution, 335 degree algebraic, 121, 122, 124, 127, 128 compatibility, 263 function, 104 indeterminacy, 215 linearity, 324, 325 parallelism, 131, 135 simplification, 177 delay low, 177 denominator common, 260, 262 derivative, 181 classical Newton-Leibniz, 62 dyadic, 62 Gibbs, 61, 62 single, 181, 322, 324 vectorial, 180, 183, 184 derivative operation m-fold, 179 single, 181 vectorial, 180 description abstract, 199 circuit, 282, 283, 287, 291

Index

asynchronous, 269 equivalent, 296 hierarchical, 290 multi-level, 291 two-level, 291 universal, 290 component-wise, 72, 77 function, 366 hardware, 199, 205, 328 hierarchical, 295 impractical, 283 parametrizable, 329 redundant, 287, 291 relation-algebraic, 84 size, 283 software, 205 Verilog, 252, 329 VHDL, 252 design built-in self-test, 335 circuit, 14, 180, 182, 319 distributed evolutionary, 372, 381 for low power consumption, 253, 333 for reliability, 305, 306, 333 for security, 306 for testability, 333 hardware, 199, 204, 207, 211, 213 language, 205, 210 low-power circuit, 384 paradigm, 205 practical, 281, 291 quantum circuit, 405 reversible circuit, 355, 372, 373, 378, 405 system, 208, 269

481

digital, 55, 60, 61, 67 device programmable, 405 reconfigurable, 405 reversible, 405 difference, 15, 162, 165–167, 169 Boolean, 62 orthogonal, 163, 164, 167, 169 directed edge, 23, 268, 375 directed graph, 24, 78, 268, 279, 375 discovery random, 121 discrete event, 88, 268, 269, 274, 279 disjunction, 5, 7–9, 156, 157 elementary, 156, 157 disjunctions, 6 distance bent, 106 Hamming, 48, 100, 110, 111, 117, 137, 140–142, 154, 155, 262, 306, 347, 349 distance distribution, 140 distinguishable topologically, 81–83 domain Boolean, 127 circuit testing, 293 database, 137, 146 factorization, 337 finite, 55 hardware, 209 original, 63, 65, 67 Reed-Muller, 121–123, 127, 129, 131, 132, 135

482

spectral, 54, 63 Walsh, 127 don’t care, 89, 182, 214, 215, 218, 287, 334, 335 edge directed, 23, 268, 375 eigenfunction, 62 element ternary, 160–162 encoder, 304 deterministic, 306 nonlinear, 309, 310 punctured-cubic, 310 punctured-square, 310 quadratic-sum, 310 encoding code prediction, 320 deterministic, 306 low density parity check, 320 equality, 70, 226, 309 equation Boolean, 3–6, 9, 10, 20, 23, 24, 219 Chapman-Kolmogorov, 259 error masking, 311, 314, 315 homogeneous, 5 equivalence, 4, 6, 158, 270 error, 43, 53, 85, 99, 136–138, 151, 208, 209, 304–306, 308–310, 313–315, 317, 320 additive, 304, 307 approximation, 44 average bit-flip, 44, 48, 52 average case, 44, 47, 52 best bit-flip, 48

Index

best case, 48 detection, 305 injected, 304 relative, 142, 148, 151, 152, 155 worst bit-flip, 44, 48, 53 worst case, 44, 48, 51 error correction, 136, 319, 320 error detection, 136, 306 double, 136 error free, 382 error model, 137–139, 351 error pattern, 138 error rate, 44, 47, 50, 320 error type, 319 error vector, 307, 308, 310, 311, 314, 315 error-correction, 136 Espresso, 218, 229, 234–238, 241, 242, 291, 297, 332 evaluation experimental, 281, 282, 288, 296 event discrete, 88, 268, 269 modeling, 274, 279 execution time total, 372, 380 expansion Davio, 356 Shannon, 356 experimental evaluation, 281 expression arithmetic, 62 Haar, 63 propositional, 272, 273 Reed-Muller, 62 relation-algebraic, 69, 85

Index

factor irreducible, 339 fan-out, 355, 373 feedback asynchronous, 88, 89, 95 field binary, 339 finite, 60, 307, 310, 315, 335, 336 filter digital, 63 finite state machine, 87 synchronous, 254 fitness, 262, 374, 376, 378 form, 157 antivalence, 157, 159 conjunctive, 3, 157, 159 disjunctive, 157, 159 equivalence, 157, 159 flattened, 290 orthogonal, 167 PPRM, 123, 124, 128 shortest implying, 257, 258 three-level, 214, 215, 218, 238 Fourier analysis, 58, 60 classical, 60, 63 series, 54 spectrum, 64 transform, 54, 59, 60, 243 classical, 58 three-dimensional, 59 function affine, 100, 101, 104, 106, 107, 109, 387, 388, 393, 394, 397 balanced, 388, 389, 391, 394, 395, 403

483

bent, 99–108, 110, 111, 113–115, 117, 118, 120–122, 126–135 Boolean, 5, 18, 46, 53, 55, 62, 68, 282 Haar, 60, 62 hidden weighted bit, 395–397 reversible, 396–398 incompletely defined, 334, 344, 347 incompletely specified, 179, 182, 183, 218, 238, 283, 334 independence, 184 index generation, 25–27, 29, 32, 34, 335, 339 integer, 25 irreversible, 355, 391 linear, 100, 101, 107–113, 117, 120, 341, 387, 388, 393, 394 linearly separable, 390 majority, 387, 389, 395, 403, 410 monotone, 387, 389, 395 decreasing, 387, 389, 390 increasing, 387, 389, 390 mixed, 390 most nonlinear, 100 multi-output, 216, 227, 359, 384 non-degenerate, 387, 388, 393–395 non-reversible, 385 nonlinear, 101, 336, 387, 388, 398, 400–402 output, 87, 253, 269, 324,

484

Index

325, 366 single, 47, 48, 50, 51 partially defined, 87, 88, 93, 96, 272, 334–336 partially specified, 334 partially symmetric, 389, 392, 393 probability, 151, 155 quantum, 417 Rademacher, 62 reliable, 319 reversible, 355, 384–387, 391–393, 396–398, 400, 402–404, 410, 417 self-complementary, 387, 390, 391, 400–402 self-dual, 387, 390, 391, 400–402 state transition, 87, 88, 93, 269 symmetric, 106, 178, 327, 387, 388, 392, 393, 406, 407, 410, 412, 415, 417 rotation, 102, 105, 106, 111 totally defined, 87, 89, 273 totally symmetric, 389, 392, 393 transition, 87, 88, 93, 253, 256, 269 trigonometric, 65 two-output, 224, 225 unate, 387, 390, 400, 402 Walsh, 60–62 gate AND, 216, 217, 220, 226,

321, 325, 329 decomposition, 177, 178 nonlinear, 386 OR, 216, 220, 325, 326 Peres, 374 quantum, 406, 408, 410, 411, 413 qubit controlled, 407 reversible, 355–357, 373, 379, 385 Toffoli, 356–359, 361, 363, 374 two-input, 198, 215, 218, 288 XOR, 217, 218, 220, 294, 321, 323–326, 329 gate count, 373 gate set universal, 406 generation random, 69, 121, 122, 126, 127, 130–135 glitch hazardous, 88, 96 graph directed, 24, 78, 268, 279, 375 signal flow, 268 signal transition, 269 grid point sampling, 141 group, 54 Abelian, 54, 62, 67 compact, 62 compact, 54 cyclic, 64 dihedral, 64 dyadic infinite, 62 finite, 54 dyadic, 61, 62, 64

Index

infinite dyadic, 62 non-Abelian, 63, 64, 67 compact, 62 theory, 69 Hamming distance, 48, 100, 110, 111, 117, 137, 139–142, 154, 155, 262, 306, 347, 349 heat rejection, 253 hierarchy, 183, 201, 283, 289, 290, 374, 376 memory, 130, 131, 133 parallel, 377 homomorphism, 341 hypothesis, 10, 39 implementation optimal, 219, 224, 226, 227, 362, 368, 369 implication, 5, 73, 77, 159, 256–258, 260–262, 264–266, 270, 271, 334, 344, 347, 349, 389 improvement reliability, 320 inclusion, 70, 72, 73 independence, 322, 324, 326 independence function, 184 independence matrix, 184–187 index, 27, 29, 45, 71, 111, 124, 129, 138, 167, 186, 325, 396 block, 143 chromatic, 15 thread, 143 information, 24, 54, 89, 271, 280, 304 content, 54, 271

485

flow, 282 important, 184 input, 269 logical, 269 loss, 294 secret, 303 signal, 268 source, 303, 304 state, 269 symbol, 305, 307, 309, 312 transmission, 305 vector, 312 word, 304, 312 instruction-level parallelism, 146 instrument unifying, 24 integration code prediction encoding, 329 intelligence artificial, 55, 335 interior, 74, 76, 84 intersection, 6–9, 16, 70, 72, 73, 76, 84, 161–163 intrusion detection, 304 jump, 374, 378 kernel, 131, 143, 144, 147, 149, 150, 155 key secret, 303 Kirchhoff’s law, 259 latency, 150, 309 lattice, 85, 138, 154, 178–180, 182–185, 188–198 level

486

Index

abstraction, 89, 203, 208, 209, 279, 280 register transfer, 203 transistor, 268 line ancilla, 355, 356, 358, 366, 368, 378 control, 357, 358, 415 input, 357 output, 355, 357, 364 target, 357, 358 linearity, 321, 322, 324 literal, 6, 16, 93, 157, 158, 216–219, 221, 229, 234, 242, 271, 272, 286, 300, 301, 390, 398 negative, 93 positive, 93 logic combinational, 268, 269, 320 partially defined, 97 sequential, 88, 269 spectral, 54 three-level, 217, 227, 229 logic description equivalence, 297 language BENCH, 283 EDIF, 283 languages, 290 BLIF, 286 EDIF, 286 translation, 283, 297 loop unrolling, 146 lower bound, 32, 35 machine Mealy, 87, 269

Medwedew, 87 Moore, 87, 97 machine learning, 199 majority function, 387, 389, 395, 403, 410 mapping bijective, 384, 393 mark function, 179, 182, 183, 194, 197 Markov Chain, 259 master, 15, 374 matrix adjacency, 18 binary, 25 coding, 257, 258, 267 condition, 256, 257, 261, 263–265 incidence, 18 independence, 184–187 parity check, 348 transform, 67, 122, 123, 127 max-term, 156 maximum, 181 single, 181, 322, 323 vectorial, 180 memory content addressable, 26, 27 main, 26, 27, 29 random access, 26, 86, 134, 229 shared, 131, 144–146, 148–150, 155 merge unique, 186 metastability, 88, 89, 91, 96, 97 method code prediction encoding,

Index

328 covering, 175, 176 decomposition, 175, 176, 321 Monte-Carlo, 138, 141 statistical, 282 metric cost, 219, 220, 224–226 error, 44–47, 49, 53 Hamming, 351 trade-off, 238 migration, 375–378 frequency, 376 interval, 376 kind, 376 policy, 375, 376 process, 377 rate, 376 size, 376 topology, 375 min-term, 156, 214, 319 minimization, 176, 214, 217, 218, 220, 224, 226–229, 237, 241, 242, 248, 254, 281, 288, 334, 335, 405 exact, 242 minimum, 181 single, 181 vectorial, 180 model block level, 309 coarse-grained, 375 hierarchical, 372, 377, 378, 381, 383 islands, 375–378, 380, 381 mathematical, 63, 307, 310 membership, 71, 72, 74–76, 80, 81, 83, 85

487

multi-deme, 375 vector, 71, 72, 74–76, 80, 81, 85 Monte-Carlo, 29, 30, 138, 141 multiplexer, 238, 284, 356 multiplication, 105, 123, 139, 140, 240, 241, 244–252, 336 hardware, 240, 252 Karatsuba, 240 modulo, 123, 240, 244, 245, 336 monolithic, 240, 246 regular, 241 saturation, 241 unsigned, 240 multiplier, 240–244, 250–252, 310, 315 monolithic, 241–243, 250–252 multiprocessor, 131, 134, 144, 149, 155 mutation, 375 negation, 4 netlist multi-level, 294, 297 network, 268, 358, 363, 364, 368, 369 AND-OR-AND, 217, 226 AND-OR-XOR, 217, 227 AND-XOR, 321, 333 Boolean, 283 combinatorial, 319 generated, 358, 365, 366, 368, 369 large, 286 logic, 218, 320 minimal, 217 OR-AND-OR, 217, 227

488

OR-XOR, 333 parity, 320 reversible, 365 Toffoli, 362, 365, 366 XOR-AND-OR, 218 XOR-only, 320 node Davio, 358, 359, 371 Shannon, 358 nonlinear, 309 nonlinearity, 100–103, 105, 106, 111, 120 normal form, 103, 156, 157 algebraic, 103 antivalence, 156 conjunctive, 50–53, 156 disjunctive, 93, 156, 241, 242, 281 equivalence, 156 number chromatic, 15 Ramsey, 17, 19–21, 24 occupancy, 144 OFF-set, 179, 182, 183 ON-set, 179, 182, 183 operation butterfly, 127 commutative, 166 operator differential, 60–62 Gibbs, 62 unitary, 415–417 optimization, 15, 54, 55, 67, 96, 148, 229, 237, 253, 289, 291, 293, 332, 333, 373 reliability, 320 order relation, 389 partial, 83, 389

Index

orthogonality, 156, 157, 159, 160, 168, 169 orthogonalization, 166, 172 pairing left, 71 right, 71 parallel composition, 94, 277 parameter control, 373 parameter tuning, 373 part combinational, 268 linear, 320, 321, 325, 327, 329 output, 321 non-linear, 320, 321, 325, 327–330, 335 input, 321 quantum, 406, 408 sequential, 268 symmetric, 408 path Eulerian, 23 Hamiltonian, 22, 23 performance uncorrelated, 284 permutation, 30, 384, 386, 388–392, 397, 400, 415 permutation matrix, 339 persistence, 109, 115–117 Petri net, 69, 269 pipeline circular, 107–116, 119 polarity vector, 358, 362 polynomial irreducible, 337, 339–341 population, 373–375, 378, 383 size, 373, 376, 380

Index

power consumption, 242, 253, 255, 328, 331 low, 89, 177, 254, 333 power dissipation, 67, 372 preprocessing, 170 probability, 34, 130, 138–140, 150, 151, 153, 155, 258, 259, 262, 304–306, 308, 310, 365, 373 absolute, 259 error masking, 306, 308–313, 315, 317 zero, 259, 260 problem N P, 4 N P-complete, 9, 176 3-SAT, 7, 8 complex, 378 covering, 14, 22 unate, 14, 22 cryptographic, 24 minimization, 25, 224, 226, 227 satisfiability, 3 processing digital, 55 logical, 55 product direct, 70 property structural, 268 proposition, 270, 272, 277 quantum cost average, 372, 379, 380 qubit, 405, 407, 408, 410–412, 415, 418 quotient symmetric, 70, 72

489

race, 88, 89, 96, 97, 254 condition, 90, 91 critical, 254–257 free, 254, 257, 265 non-critical, 254 rank, 184, 187, 189 re-initialization, 374, 378 recognition pattern, 335 redundancy, 283, 285, 301, 304, 305, 310, 312, 319, 320 reflexive, 79, 83 relation, 104 relation algebraic, 57 binary, 70 Boolean, 215, 216, 218–220, 222–225, 227–229, 237, 238 embedding, 71 empty, 70 equivalence, 83, 104, 384, 404 identity, 70, 83 membership, 71, 72, 75, 77 pre-order, 79, 81, 85 specialization, 79, 80, 83 universal, 70 well-defined, 218 reliability, 136, 305, 306, 319–321, 333 representation compact, 9, 65–67 irreducible, 54, 62, 64 visual, 199 requirement, 10, 12, 15, 17, 24, 29, 43, 88, 160, 191, 254, 294, 344,

490

363 restriction, 8–11, 15–17, 24, 67, 129, 131, 182, 183, 335, 381 ring, 340 commutative, 336 integer, 335, 340, 341 polynomial, 340 routing, 27, 199, 252, 331 row incompatible, 263 SAT equation, 6, 9 formula, 14 problem, 3, 4 solver, 4, 9, 14 satisfiability, 3, 14, 47 Boolean, 44, 45 security, 304–307 self-complementarity, 386 self-duality, 386 separable, 81–83, 309, 318 separation linear, 323–325 set care, 334, 335 compatible, 257, 258, 260–262, 264 dense, 79 distinguishing, 29, 32, 39 irredundant, 28 minimum, 28, 29, 32, 35, 38–40, 42 don’t care, 182, 335 incompatible, 263 OFF, 179, 182, 183 ON, 179, 182, 183 operation, 57 orthonormal, 60

Index

shared memory, 144 signal, 54, 97, 203, 204, 295 audio, 58 clock, 254 constant, 295 control, 407, 409 non-quantum, 407 non-superposed, 407 curve, 90 destructive, 89 digital, 55, 60 discrete, 67 dual-rail, 95 flow, 268, 279 information, 268 input, 254–257, 259, 357 interfering, 96 logic, 54, 55 multi-valued, 60 name, 295 non-determined, 90 output, 254, 358, 368 particular, 54 processing, 55–58, 60, 61 radar, 58 radio, 63 seismic, 58 sonar, 58 speech, 58 state, 95 superimposed, 88, 90 transition, 90 transmitted, 320 undefined, 97 signal flow graph, 268, 279 signal flow plan, 269, 272, 279 signal transition graph, 269 significant bit least, 244, 246 most, 51, 186, 244, 245,

Index

249 silent data corruption, 137, 140 SIMD, 146 simulation, 29, 108, 109, 210, 237, 268, 269 SIS, 229, 237, 238, 294, 295, 297, 298, 332 slave, 374 solution optimal, 219, 220, 225–227, 319, 373, 380 space binary, 335 linear, 335, 336 topological, 72 finite, 70 vector, 337 specification relation-algebraic, 77, 82–85 structure, 270 spectrum, 64, 66, 67, 124, 133 PPRM, 123, 127–129, 133 Reed-Muller, 121 Walsh, 125, 127 state care, 335 current, 254, 255, 259 digital, 89, 91, 93 don’t care, 335 metastable, 89, 91 next, 88, 89, 253, 255, 259, 260 quantum, 408 stable, 91, 254 state assignment, 253–255, 257, 258, 262, 263, 265

491

race free, 254 state code, 254, 258, 265 length, 254 minimal length, 254 structure algebraic, 55, 67, 307 circuit, 89, 90, 97, 177, 179, 196, 197, 268, 270, 279, 283, 285, 288, 321, 323, 324, 417 data, 7, 8, 355 gate, 281 multi-level, 238, 281 inverter, 274, 276 multiplexer, 284 sequential, 88 switching fast, 405 switching activity, 253–255, 258, 263 symbol redundancy, 306, 307, 309, 312 symmetric, 83 relation, 104 Synopsys, 242, 250–252, 329, 330 syntax propositional, 270, 279 synthesis circuit reversible, 387 for low power consumption, 253, 319, 333 high-level, 289 logic, 46, 120, 215, 281, 282, 289, 291, 293,

492

319, 332, 384–386 reversible, 385 synthesis procedure, 281 adaptation, 285 transformation tolerance, 285 system digital, 55, 60, 61, 67, 68, 175 orthogonal, 60 reliable, 319 SystemC, 199, 205, 210–212, 269 taxonomy, 146 technique code prediction encoding, 320 relation-algebraic, 74, 79 spectral, 54, 55, 57–60, 63, 65, 67 visualization, 201, 203, 211, 212 test gate-level, 289 relation-algebraic, 72, 83 system-level, 289 theorem convolution, 63 prime number, 341 Ramsey, 19 theory coding, 120, 345, 347, 350 complexity, 3 error correction, 319 graph, 24 group, 69 lattice, 69 number, 335 social choice, 69

Index

switching, 384 system, 61 linear, 62 trace, 269 threshold function, 387, 390, 395 topology, 70, 72, 74–85, 378, 381 afferent, 320 Alexandrov, 72, 79, 81, 85 discrete, 83 finite, 70, 71, 79 Fréchet, 81 Kolmogorov, 81 random, 79, 81, 84 ring, 376 unidirectional, 376 symmetric, 81 torus, 376 transaction level modeling, 269 transeunt triangle, 102, 105, 121 transform algebraic, 55 arithmetic, 56, 62 Fourier, 59 discrete, 58, 59 fast, 58, 59 three-dimensional, 59 Laplace, 63 linear, 335–340, 342, 344, 347–349 existence, 344 injective, 350 PPRM, 122, 123 Reed-Muller, 122, 123, 126, 127, 130, 132 spectral, 54, 56, 60, 63, 67, 122, 127

Index

unitary, 408 Walsh, 122, 124–127, 130 transformation radical, 286 transistor pull-down, 276, 277 pull-up, 274, 275, 277 transistor level, 268 transitive, 79, 83 relation, 104 transposition, 70, 84 traversal level by level, 356 post-order, 356, 357, 359, 363 trial-and-error, 373 triangle transeunt, 102, 105, 121 truth table, 3, 47, 49, 100, 101, 103, 105, 110, 111, 241, 274, 276, 319, 320, 378, 388, 391, 392, 399, 408, 411 TVL, 7, 8, 91–93, 159, 163, 164, 166, 167, 169, 170, 172 algorithm, 159 npn-orthogonal, 167 orthogonal, 163, 165–170, 172 union, 11, 70, 72, 73, 159, 162, 166, 169 orthogonal, 165, 166, 168 unique merge, 186 unit main, 374, 376–378, 381 sub-main, 374 worker, 374

493

validation, 268 value logic, 408 propositional, 87 variable complemented, 388, 390, 402 control, 407, 413–415 propositional, 271 uncomplemented, 388, 390, 402 vector binary, 3, 6, 27, 159–162, 166, 167, 185, 249, 348 element, 159 matrix, 257, 258 non-registered, 27, 31 registered, 27–29, 32 ternary, 4, 6–8, 16, 93, 159–162, 165, 171, 256, 257, 262 vector list ternary, 159 verification, 54, 67, 84, 269 view block, 274, 276 digital, 89, 97 formal language, 271 graph, 204 logic, 89, 93, 271 module, 273, 274, 276 specification, 271 structural, 283 waveform, 204 visualization, 69, 77, 78, 199, 200, 204, 205, 212 approach, 205 engine, 205, 212 hardware, 199, 200,

494

Index

203–206, 208, 210 layer, 205 music, 213 paradigm, 203, 205 software, 199, 200, 203–206, 208 weight, 27, 28, 64, 107, 117, 139, 140, 261–265, 334, 347–349, 394–396 balanced, 38

bent, 106, 108, 109, 115, 116 Hamming, 305, 306, 347 minimal, 262, 263, 267 total, 267 workflow, 199, 208–212 XBOOLE, 4, 9, 11, 14, 159 algorithm, 4 monitor, 8 operation, 8